 today about my project called Edge Elixir, which is distributed graph processing in Elixir. I first touched Elixir code maybe six months ago on an unrelated Phoenix project and now I'm standing here, so I guess I'm all in on Elixir. So today with this project I wanted to really dive in and learn more about airline OTP, the virtual machine, some of Elixir's unique features. So I worked with graph processing systems for a few years and it's something I thought would translate really well into Elixir. So for this talk though I won't assume you have any background knowledge on graphs or distributed systems or different Elixir features, I'll explain everything as I go along. And the next goal that I sort of had for this talk was to start talking a bit more about using Elixir for distributed data processing. I come from JVM, Java, Hadoop, big data world and we never talk about airline. Maybe it's time to start talking about airline for big data in Elixir. And finally of course a graph. This is a graph talk so maybe it'll inspire you to start using some graphs in your work. I love graphs and I'm going to try to convince you that you should love graphs too. And then I'll talk about distributed graph processing once we have big enough graphs in our workload, how we can scale up our processing to match that. And that's where the Pregel model comes in and there's a few other models and existing frameworks that were built on top of this. Again mostly in the JVM world. But I think Elixir makes a lot of sense for this and that's why I started this Edge Elixir project and I'll show you how to use that and then dive into how I implemented it. And then take your feedback. I know there's some people in this room who will instantly find some more Elixir-y ways to do some of the stuff that I've done so I'm definitely open to hearing about that. Just so we start off all on the same page here, I think it's kind of obvious to everybody but I'm talking about the kind of graph where you have a representation of some objects and the relationships to each other, not a graph where you plot something over time. So social graph is what comes to mind for most people. And in a social graph you have vertices, a vertex which represents a person and then an edge, also called an arc sometimes, I'll just say edge, that represents the relationships that those people have so for example friendship in a social network. And to be clear, vertex is sometimes called a node in a graph but in this talk since we're talking about distributed systems, a node refers to a computer running the Erlang VM only. And graphs can also have weights applied to their vertices and edges and they can be directed or undirected so following the path of an edge could go one way or it could go both ways. And that's important for certain types of graph algorithms. And graphs are awesome because we can model a lot of different problems as a graph. Social network obviously pops to mind immediately. The web graph, there's a lot of stuff in science that uses graph models. You can solve constraint satisfaction problems using graphs. I mean the list just goes on and on. And once we model our problem as a graph we can apply all kinds of well understood machinery to these graphs. So you can look at the topology of the graph, you can check distances within the graph, you can color the graph to solve different kinds of problems, you can measure centrality. So how many, for example, edges go through one person to find out if that person is important within the network. You can cluster the graph to pull out different communities of interest. And once you have this, your problem model is a graph and you have all this machinery and it makes solving problems really intuitive and I think really satisfying. But our graphs can easily become too big for one computer, could be too big for the RAM, maybe too big for the disk, or it's just too slow, you're doing something that's CPU bound and you want to do it faster. You want to scale it up, so of course the first step is multi-core, but then you need to think about distributed computing clusters. So as our problem size grows we can throw more computing power at it. And so in Elixir I find everybody's talking about distributed chat apps and that's cool, but it's kind of easy. So some classes of problems are really easy to parallelize or distribute. If the node doesn't need to know anything about any other node to do its job, that's the best. So with a graph it's not immediately obvious how you can split your graph up across a computer cluster while keeping communication overhead low. So traveling over the network is infinitely slower than accessing RAM and you want to avoid that. And so one of the big models for distributed graph processing, there are others, came out of Google years ago called Prego and it does graph distribution by message passing and it had a few different goals. The biggest one I think is that it should have one system for just about any graph algorithm. You have an organization as big as Google, everybody's writing a different graph algorithm. You can't rewrite the entire stack every time. So they came up with a programming model, it's known as think like a vertex, where you implement one function and you see the view as if you were a vertex and that function is applied to every vertex in the graph and I'll go over that. And the next goal of course is that it should be fault tolerant in a big cluster, nodes are going to be dropping out and you need to be able to recover from that and it should be scalable of course. I ripped this diagram off from the giraffe which is a framework for graph processing from their docs but I think it does a good job of explaining how Prego works. So you have this compute method and you can apply it to, or it gets applied to every vertex in your graph. Here we have a really simple graph, just three vertices. You can pretend there's split over the network over a cluster somehow and maybe just three different computers. And you want to find out based on the weights that are applied to those edges, one and three, what the weight is to go from the first node to the last node. So it's single source shortest path, you can say that four times fast. So at the first step you see this vertex and you see its neighbors in your function and you say okay, the weight to my neighbor is one and then every vertex does that and it's broken up into synchronization barriers that they call super steps. So across the entire graph this is done for every vertex and then they accumulate messages that they want to send to other vertices. But these messages don't get sent until every vertex has been computed and then you hit a synchronization barrier and then you send all the messages along. And so that the next vertex you get an incoming message that its neighbors weight was one and you keep doing this until some defined endpoint and you get that the weight from your source vertex to your destination vertex is four. It takes a little bit to wrap your head around what's going on here. This is just one algorithm. You can basically do any graph algorithm using this model and a lot have been implemented already for you to look at if you want to start doing this. And there's a few open source frameworks that use this model. The most popular are Apache Giraffe and GraphX which is Apache as well. And there again JVM-based Java or Scala. And they're built on top of Hadoop, either Hadoop MapReduce or Yarn or Apache Spark. And these are basically distributed systems frameworks for JVM. And both of these projects I've used Giraffe more than GraphX but they're both mature projects. Full featured, deployed at scale in huge organizations for huge graphs but they're also terrifying. Giraffe itself just the core giraffe is like 50,000, 60,000 lines of code. And they're complicated to deploy and configure. I've spent more time looking at XML configuration than I care to think about. And so it is a sort of a theme in big data communities where we're all talking about building on top of Hadoop and Spark and all of this and we almost never talk about Erlang and Erlang OTP and all the experience they have building distributed systems. And I'm not going to derive these projects. They're well engineered and they're incredible projects. But when I first learned about Elixir and Erlang OTP, this is the best Disney reaction gift I could find. I know that either Disney or my Hadoop friends are going to yell at me for that. So yeah, I think Elixir just works really well for building this kind of problem. The Erlang VM and the Elixir language and standard libraries have just about everything we need to implement really concisely out of the box. We don't need to build a lot at all. And Erlang OTP has decades of distributed computing and knowledge baked into it. So yeah, let's use that. So the toolbox that we're going to use to build this project out, first of all, we have Digraph which is in the Erlang standard library. It's a graph representation that's already built in. It's backed by ETS which is an in-memory concurrent data store. And you can build a mutable graph right away. You don't need to think about building how to store a graph in memory. It's already there. Next we're going to use global from Erlang which is just global name registration across the entire cluster. So when you have a fully connected cluster in Erlang, you can call things on any node. This has some scalability issues that I've just become aware of for larger clusters but as I understand, OTP's team is working on. Main server, so that's for message passing and holding state. And then in elixir side of things, we're going to use behaviors and protocols to build extensible graph input and output. So this is sort of what the application, the OTP app looks like. So on every node, you have this edge elixir app running and connected to it is the whole graph persistence app on ETS and digraph. And then underneath that, we have a gen server which I just called the super state supervisor and it manages the synchronization barrier and it also spins up the compute function for every vertex in its vertices that it's responsible for. I may have forgot to mention that the graph is split according to some partitioning scheme that you can choose so you can split the graph up across your cluster. So to build an edge elixir based app, first of all, just in mix we pull in the app and start it up. There's some configuration to do. All of these have default values but I'll just show them here. So the partition scheme and the graph format are how you choose how to split the graph up over the cluster. These are based on a behavior and there's some default implementations included in the library, just really simple ones. So partition scheme called edge cut, basically it's just, I'll show you the algorithm later, but it just cuts the edges and puts vertices on each cluster equally. And then the graph format, so there's a million graph format specifications. This is just comma separated source vertex, destination vertex, and that forms an edge. Just a CSV, basically. And then we have a thing called max super steps which some algorithms might need to use where you want to finish up your job after a certain number of iterations. And then here's an example of computing page rank on a graph in edge elixir. So edge elixir is a behavior. You use it and it expects as an option when you pull it in a graph input. And I implemented this as a protocol. Graph output as well, but I don't show it here. Just use the default to print to standard out. So this is a protocol. It's a few built in. There's file. So you can load a graph that's just sitting on the disk. There's also URL, so you could load a graph that's sitting on a distributed file system like HDFS, for example. And so why did I do this as a protocol when I did the rest of the configuration as a behavior? Well, you can imagine maybe you have a string or a binary in your elixir code somewhere else and you could transform that into a graph. So you just include it directly as graph input and implement a protocol implementation for it and it'll dispatch to that implementation without any other work. So I think that's kind of nice. And then the edge elixir behavior expects you to implement a compute function. It takes three parameters, the vertex that you're computing on, any incoming messages to that vertex and some options like the graph size and things that you might need to use in your algorithm. And to compute page rank, this is just as simple as it is. This is pulled directly from the paper. You can just sum up all of your incoming messages and basically, if you don't know what page rank is, it sort of ranks your importance in the graph based on how important the people are that you're connected to. So we basically apply some coefficients to our incoming importance or incoming page ranks and then we update that vertex's value using update vertex which is part of the public API for this library. And then we send a message out to all of our neighbors saying what our new page rank is divided by how many neighbors we have. And boom, you run this and scale it up to however big your graph is and you have distributed page rank. So how to run it. There's other ways to do this but for the simplest is just to use IEX and pass in an Erlang configuration and in the Erlang configuration you can say which node should be in your cluster and you run this on every node, give it a name and Erlang will wait until every node has started before it runs your program. So it's completely distributed. In other packages you usually have a head node that has to be live to spin up the other nodes but this is completely distributed so it's a nice advantage and it's really simple just from a couple lines of configuration for the Erlang VM. So how does the library work? As I mentioned earlier there's graph input and output. So first of all before we do anything else we read the graph in and your graph could be stored anywhere, it could be in any format so we have graph source and graph format and they don't need to know anything about each other you can implement these behaviors and say you have some graph stored on some crazy file system in some crazy format you can implement that and then you partition the graph across your cluster. The default one that's included is just edge cut and it's vertex ID, mod number of nodes so it just equally spreads your graph out across your cluster. And then if your vertex belongs on the current node that's running the graph reading code it just stores it into the digraph. So the hardest part of this is the distributed sync barrier. So OTP doesn't really have anything exactly like this built in it usually assumes that a node can do whatever it wants and at any time without worrying about keeping in sync with other nodes but for this you need to wait for every single node for example to finish loading the graph before you can do the next thing or you need to wait for every node to finish the current super step before you can move on to the next super step. And so you need a distributed synchronization barrier. Global has a global lock system that could have worked it's not exactly what we wanted because it's assuming that you don't you know it's for locking data so it doesn't have race conditions if you're modifying something but we could have probably corralled that to work to do this but I thought just a really simple gen server could do the trick with the assumption that there are no nodes coming online after you start your processing. Another assumption of course is that the graph is static it's not going to change during the processing. And so when we start up this gen server the name there is a global name so that's registered across the entire cluster and we started up with a barrier count of zero and in our code we call the barrier function and then it does this multi-call, gen server multi-call which is a synchronous call to every barrier in the callback, barrier callback and what we do is we just see if the we increment the barrier count and if it's equal to the number of nodes that are in our cluster then we know that every node is hit the barrier and we can do something like a new super step or after the graph has been loaded if it hasn't then we just return and so this works because multi-call is synchronous it waits for every single node to respond and right here we just say okay if we've hit the barrier from every node in our cluster then we can do something together in lockstep every node in your cluster and then the next part of this is message passing and of course that's sort of just what gen server is so we have a public API that you can call from the compute function called message neighbors and you can send messages to all of that vertexes neighbors or you can send a message to a specific vertex if you want when it receives a message in the gen server it checks the partition that that node resides on because we have access to this partitioning function and it sends it to the correct node in your cluster so we're trying to reduce network overhead there and then the messages are just stored in the gen server state until we hit the next super step and then when we run compute for every vertex on that super step we just pass in the messages as part of an argument for that compute function and so finally the last step is to actually run compute on every vertex in your in your graph and for this we're just going to spawn the compute function locally so you know even if we have a huge graph we just throw all these functions out to spawn and let the VM handle the scheduling it'll automatically you know put everything on your multi-core CPU and since each compute function doesn't talk to another compute function it just sends messages back up to its supervisor there's no worried about when a computer is going to run you just you know unleash them all and let the VM handle it and then when we hit the sync barrier then the messages get passed to the next round of compute functions so this is this slide is really short just because this part of the the job which is actually you know one of the harder parts to implement in a lot of other systems is just really simple I never had a chance to run any benchmarks but there's been a lot written about how fast you can spawn a process in elixir and it's a lot faster than booting up JVM and your entire app stack so I there's a few things I didn't get a chance to to finish in this project so one of the the things that prego implements is as an aggregation so you can keep track of some global state during your your computation and a lot of graph algorithms use that but it's it's really easy it's just that's exactly what gen server or agent is for and there's a few other functions that you could implement in the behavior so one is if you try to hit a vertex that just has dangling edges like the edges the the destination nodes are all on another computer some algorithms need if this need this function to do something with that vertex like delete it and then the biggest one is really better fall tolerance so when you boot it up using the run method that I showed earlier it waits for every node to come online and then it starts your task your your app but we can improve so if a node drops out during one of the super steps for example with some more advanced VM configuration you could restart the entire thing but you'd be starting from scratch so what we want to do is have at every super step flush the the digraph state out to disk or something just so you can start again from that super step if something bad happens that's what pregels does for its fall tolerance but I haven't implemented that yet so that's that's my talk I put everything up on this website edgelixer.net you can grab the code I think I haven't I haven't tagged a release yet that's there's I think the build is broken but very soon you can you can use it and I'm definitely open to questions and feedback and pull requests and anything else hi Nathan can we see what kind of graphs this code produces well it doesn't produce like a what what's the output of this the output yeah so the output right now is just a digraph so you don't get a visualization of the graph you'd have to do that in another system what you get is just like the so when you configure it one of the configuration options that I didn't show it here is graph output right now the only thing I have implemented is standard out so it just spits the graph out to the command line what you get during the computation process is basically you're updating the values that are attached to each vertex in each edge so for the page rank example each value would start at zero for or or some other value and at the end of the process each value would be the page rank for that node if you wanted to you could implement the protocol for the output and write it back to disk or pass it on to some other program to visualize it or something like that does that answer your question so I have two questions the first one is so a couple of years ago three four I did some work doing simulations of distributed distributed graph algorithms and one reason I didn't go with using or lying to run the simulations was difficulty of generating quality graphs like just random graphs like I want to put some random small world graph for example you know I was just getting all those out of i-graph so what would what what's the solution from your point of view for like generating some random graphs for some good graphs for input into your into your system is there something in or lying that does that I haven't looked at that so all the graphs that I tested on are either public networks like the karate club network and you can grab a bunch from the internet on different social networks so yeah I haven't looked at generating graphs I know that you know that's something that's been done a lot in r with ERGM and i-graph of course I think it's a totally different problem almost so yeah I haven't really looked at it but necessary to run the yeah definitely I don't assume where you're getting these graphs from and the second question is one of the things that I was looking at for the building the distributed algorithms was using finite state kind of an internal finite state machine per node just to make sure that the node kept in sync with its neighbors and thus since every node is in sync in a message passing sense with each of its neighbors it kind of deletes the need for any kind of central lock yeah did you look at that or did you find it difficult in that kind of solution yeah so I know besides the pregel model there's other ones like graph lab and what you suggest sounds interesting but uh yeah this is just what I was familiar with uh I'm sure I'm sure there's other and and the field is moving you know quickly I'm sure if you read new papers on it you would find some better way of doing this so this is really just a learning exercise for me more than anything uh not necessarily the best way to do distributed graph processing anyone else uh where in the world we can see this graph processing what's who basically a client who would pay for that that's like a straight question uh well I mean this stuff came out of google facebook and that sort of thing uh but like I said there's so many different feel so many different areas where you can use graphs to solve problems uh I don't I don't use it in my own day job um but uh I mean even if you're not facebook or anywhere near that scale uh you can probably find something to do if you work with a lot of complicated models of something science maybe I'm not sure yeah yeah probably like uh constraint systems so yeah anybody from engineering background can suggest some complicated physics I always approach this from an academic point of view so I don't know uh yeah also uh yeah uh wireless sensors um the way which can be low power uh you want to you want to make sure that you don't overuse any one individual uh sensor so you can uh you can direct them to uh send information through lower used sensors anyone else okay thank you all right thanks a lot