 Welcome how many of us are already doing functional programming have played with functional programming Awesome any objects oriented practitioners who are looking to switch to functional programming cool. If you're awesome excellent So today we'll talk about loom and functional graphs, but specifically how we implement them in closure however, even though I spoke a focus on closure implementation of functional graphs a lot of these ideas are easily translatable to other functional languages So a little about me. I work at Google on a distributed build system. So in my day job, I write C++ Why am I here at a functional programming conference? Well, I write closure on weekends and weeknights And I'm the maintainer of library code loom, which is graph algorithms and visualization library So I got into closure about maybe two years ago first starting out with just you know some little hobby projects exercises and then I wanted to work on graph algorithms because my background is some of my background is in compiler research and In compilers you represent programs as graphs. So I wanted to do more of graph manipulations So I started looking around for existing graph Algorithms libraries because you know, it's always better to pick up something that is already written well and I came across loom, which was initially written by Justin Kramer and Then Justin moved on to other projects and I really liked the way was designed and I liked a lot of things that it already had in it So I decided to pick it up add more algorithms to it and make it as usable as possible for other People and if you're wondering, yes, I did make that logo. I'm very proud of it Okay, so this talk will be in four parts first. We'll talk about graph Theory in general how many of us have taken graph theory class or have dealt with graphs in general Okay, a few of us So those will be a good point for us to get on the same page And then we'll talk about graph API design, which in my opinion is one of the most fascinating things to think about How do you represent graphs such that they're flexible but also are efficient? Then we'll talk about graph algorithms and how we would implement them functionally Specifically would look at the object-oriented implementation and then side-by-side compared to the functional implementation And at last we will do an overview of loom and what loom does for you And if you're interested in how you can jump into using graph algorithms library So let's take a moment of graph theory first. So graphs are collections of nodes and edges And the nodes are connected by edges. So on this graph we see which are connected by some edges and Graphs are all over in real life. So we have you know subway maps or Transportation maps where it tells us how to get from point A to point B. We can calculate distances. We can calculate the cost There are social network graphs so you can figure out how information gets propagated from one person to another What types of clicks get formed how people get to know each other and so on and Since I work on build systems for me dependency graphs are something that I deal with in day-to-day life. So When you have a source code and you specify In your say make file how to pull other dependencies and how to link libraries Then you'll create something like this where on the bottom You'll start out with all the different files the header files and the C files And then you know create a final race and then link them up and so on and there are also network graphs programs that are presented as graphs as I mentioned and compilers and Three data structures are probably all of you are dealing with in your everyday Jobs are also structured as graphs They're just you know more Specific example of graphs. So what kinds of graphs are there? Well, there is just simple graphs undirected graphs Were no there connected by edges. There's also weight of graphs where now each edge has some weight associated with it There are director graphs also known as Diagraphs where now the edge has a specific direction. So in this case, there is an edge a director edge between a to be So there's a way to get from a to be and you can also send weights to those edges now more Exotic types of graphs, but still a very commonly used are Multigraphs where you can have multiple edges between two nodes so they're usually used for suppose you want to figure out whether you should drive from Boulder to New York or you want to fly from Boulder to New York It will tell you how many hours you will take you to drive or how many hours will take you to fly and you can associate the cost With traversing that distance and so on Another cool type of graph is mixed graphs where you have Undirected edges and directed edges in the same graph So those are commonly used to represent job shop scheduling problems. So those are the problems where you have big job and then you can Separate it out into multiple tasks and now the directed edge specifies the precedence of tasks So which task in this case task B has to be executed after task A and Undirected edge specifies which task cannot be executed concurrently for in this case C and D For some reason cannot be executed concurrently For instance, we may know that they are using a lot of resources and our system may not be able to support that amount of your resources Okay, so now that we looked a little bit on Graph theory, let's talk about graph API design Also in object-oriented languages will have you know something where like it's a class node object node has some data associated with it and we have object edge which has Two nodes that it actually connects and we have a class graph which has collections of nodes and edges and You know, of course, but you can also have some attributes associated with it and so on So But let's talk about more abstractly how we can represent graphs first So same graph as we saw earlier. We have five nodes ABC and D ABC D and E and we will label the edges So now there are four general types of graph representations. You can represent graphs using adjacency lists Each node is mapped to a list of its adjacent successor nodes. In this case A is mapped to B because there's an edge between A and B and B actually has two edges To nodes C and D and so on You could use incidents list to represent in which each vertex stores its incident edges and each edge Stores its incident vertices You can also use adjacency metrics where The rows are the source nodes and the columns are the destination nodes And now this intersection in the cell you would put zero if it's the From the node to itself and then one if there is a sense of we have an unweighted graph we'll say that the cost is one for if there is exist an edge and The cost is infinity if there's no edge between two nodes and we can also use incidence matrices We're now rows a still nodes and then columns are edges and at the intersection We'll basically put whether the node is a source a destination or non-existence in the edge So adjacency matrices are best at most efficient at representing dense graphs and a distance a list I'm most efficient at representing sports graphs my experience has been that most graphs that we deal with in a day-to-day life are Sports graphs so loom users actually adjacency lists to represent graphs internally So now let's talk about how represented graphs Um, yes, sorry, so dance graphs are So You have we have nodes and edges and there is suppose we have n number of nodes and m number of edges The maximum number of connectivity that we can have is n times m So dance graph approaches n times m number of edges the more edges between the nodes the more edge The more densities and the fewer edges there are the the more sparse it is So so what about graphs in functional programming world? So what we have is More abstract we have structs which represent data, you know, there are your classes Let's store some data Then we have interfaces. What kinds of functionality does the graph provide? What kinds of functions can we call on them and then we have implementations which connect the two So they provide how do you access this data this tract with this API? This allows us for more flexible Representation we're now graphs don't assume anything about your graph So as the author of the graph you have all the power to specify what kind of graph you're dealing with and Graphs are generally irritable. So anytime you do modification you add nodes remove edges you get yet a new graph and a lot of Functional programming languages closure specifically I actually use co algorithms to enable you to keep references to every version That you need so you can backtrack easily so you can do you know time travel to previous version of your graph, which is pretty cool So now what we're going to do is we're going to see graph Implementing enclosure. How many of you have written closure? Okay, wow quite a fair number. Okay. Awesome. So for those of you not familiar don't worry about it I'll walk you through and I'll point out the major features of the language so you can follow along But the syntax is pretty minimal because we're dealing with a list so So So we define a protocol which is an interface like in Java I'm not familiar with Haskell that much, but I think it's more similar to type classes I'll let the Haskell experts clarify that part so We define a protocol graph and we specify which functions you can call on a graph so you can crawl in nodes Function which which you just provide the graph to it and then it's able to retrieve the nodes you can Retrieve edges you can specify whether the nodes or the edge is Existence in the graph You can retrieve successors of the node you can calculate the number of Edges coming out of the node and you can enumerate all the edges coming out of the node. This is to support the multigraphs Now we have a protocol for directed graph also known as that graph So we have a way to retrieve predecessors given a node in the graph We have a way to retrieve in degree. So how many edges are actually going into the node and What are the edges going into the given node and we also have the function transpose which basically returns the graph with all the edges reversed Now we also have a protocol with a graph Which defines just one function, which is a custom with function which allows it for way to be derived from other attributes So if your weight Depends on the airplane distance and maybe on the cost and so on you can specify how to actually retrieve that data and how to compute that weight And there's also editable graph which allows you to edit graphs as Persistent data structures so you're able to add nodes add edges Remove them and remove everything from the graph and remember that that will return to you a new graph after each operation So suppose we wanted to create a more complex graph Like for instance basic editable a digraph. How would we go about it? So in closure will diff def record defines a type basic editable digraph and We'll provide three functions to it nodes Successors and predecessors Now what you see here is extend basic editable digraph So this is not a single for inheritance like you would do in Java. This is just a pattern for Specifying implementation for those protocols So it would implements three protocols graph digraph and editable graph and what we'll do is Will so remember the graph is all this functions and this is just a protocol definition. That's not actually valid code so now if we wanted to Implement the protocol functions the API then what we'll do is we'll actually provide a map. This is a method map so It's the map of the implementation where keys are function names and values of the functions So let's see how it looks So we have a key which is edges function, which will specify how exactly we implement the Edges function and then it provides the implementation which is the value the function itself Where for each node in the graph what it's going to do is going to pull out all the edges coming out of that node and Then it's going to return them and then we'll do similar things to extract the nodes and so on Now for directed graph We will provide a different method default graph input which was be a map with where the key is transpose function Which will basically takes the successes and the predecessors function that we provided and just swap them and return a new graph for us and And The other functions to implement a much more trivial to implement so when I can look at those and then you know, similarly Implement the edit ball graph protocol. You get the idea. The rest is implementation details So this actually can be found in loom that graphing space if you're curious to look through the implementation details and try to understand all of it okay, so So how is how is it different implementing graphs representing graphs and functional programming work from object oriented? So we have our data our API and our implementation separately and our implementation Specifies how API can be called on our data. We'll have flexible representation We're in this case. I was the author of the graph that decided that I have a basic editable digraph So I was able to specify exactly which protocols you need to implement and provide the implementations for them and Finally, we'll have immutable graphs. So anytime we Add a node or add an edge mutate in any way it produces yet a new graph for us Now let's talk about how to implement graph algorithms functionally Specifically, we will look at the example of Bellman Ford's algorithm. Is anyone here familiar with the Bellman Ford? It's similar to dexter algorithm in that it finds a path between a single source to all their other vertices in the graph With one caveat. It's actually able to detect negative weight cycle So negative weight cycle is when some of weight of all edges are negative. What does this matter? Well, there's a negative weight cycle between BD and E The total weight of it is negative one. So when we're traversing in our algorithm the edges What we're going to think is that every time we traverse the negative weight cycle We found yet a shorter path and yet a shorter path We will never actually converge on a fixed point and we'll run infinitely So we'd like to detect this as early as possible and quit and say we couldn't we can't possibly find the shortest path from the source To all the other nodes because you have negative weight cycle So this is the description of the Bellman Ford algorithm in the introduction to algorithms book also known as CILRS It's common that a lot of algorithms implemented use Assuming procedural object oriented languages. So this is, you know, perfect example for us to see how to map that to functional world Okay, so let's take a look one by one because there are a couple functions definitions, which we actually don't see here So initialize single source tells us that for each vertex We want to specify that the distance to it is right now infinity because we haven't computed anything and then the Parents pointer is now so parent pointer is the vertex that comes before the given vertex when you're finding the path And now we'll initialize our for start note the distance to be zero because it's just the distance to itself Okay, so the way we're to implement this is we'll call the innate estimates function the definite Minus thing that you see here is actually just a way to specify enclosure This is a private select not accessible publicly method because we're just using that as a helper So now it takes a graph and a start note and Now we'll specify the node data structure Which is all the nodes except for the start note and we'll use the disjoint send to get the Disjoint set to get all the nodes without the start note Now we'll initialize cost which is our pathcast from the start vertex to all the other vertices and What intro leave does is it basically gives us the? mapping from the nodes to infinity it turns out that Infinity words itself is long it's too long to put on the slides given this font so they This could listen both the infinity symbol that you see here And so we will also define paths because we want to figure out how to get We want to Store the parent pointers in our case So we'll provide the mapping from no to no because we haven't computed its parent pointers yet Okay, and now basically we want to initialize our cost With start nodes map to zero as a cost and start node will not ever have the parent pointers So we'll just map it to no Okay, so so far so good Now online for we define function code relax The way you can think about the function without bothering with implementation is Given the two vertices you and V and the cost between them, which is w if we have a current path to node V from the start note and We have a path to the unit and there is an edge between you and V Is there shorter path to node V from the start node traversing the you so in this case on on the right-hand side? So from the start node to know V because so far we've computed is for now to get to you know It's on the cost us to and to get to Know the three node you it has a three So we would like to relax the edge because we'll find shorter path through that note now Notice that on line six and online one of relax function. We have exact same condition So let's pull that out into its own function So we'll define can relax edge function where we'll specify three arguments Which can also be referred to as By its elements you and V and then the edge cost and they cost map They the hope of the infrastructure that we created at the beginning So now we will pull out the V distance to be by Getting that from the map of cost that we computed so far and their distance to you and now we will also compute the sum So this the path going through node you to node V And now we will just return the result of comparing them Okay, so now to implement the whole thing for relax Function, but we'll do is again. We'll take the edge which we can also refer to its vertices as UV. This is called The structuring enclosure, which is pretty cool feature of it And now we will take the UV cost which is basically our W and it will take our internal data structure That we've been keeping around estimates which encapsulates costs and paths So now we'll get the cost to you function Now it costs to you vertex and we'll compute the sum and then if we can relax the edge Then we would like to update our cost and parent pointers So we will update the costs map and the past map and otherwise We'll just return the estimates the hope of data structure as is without modifying it Okay, so now let's do this for each edge in the graph. We'll relax it So we'll specify functions relax edges, which takes a graph start node and our estimates help a data structure and We will reuse reduce function for each training over all elements. So we would specify reduced function which on the bottom we see goes over each Edge and What we will do is for each edge. It will take the initial the initialized estimates function and It's going to try to relax each edge and update the estimate data structure as we go along Okay, so now we're actually very close to being done tying this all together But you probably can't see it. So don't worry. We'll actually take a look step by step But before we do that let's take a Short break to reflect on what we know so far. So we used function composition We define functions. We pulled out a separate function for defining the condition. We Compose that with relax edge. We use that inside the relax edge Now we defined it that a different function for relax edges which code internally relax edge and so on and functions operated on a lot of different help a data structure So we actually had to costs and paths which and which were we referred to it as estimates overall and We use higher order functions such as reduce to help us with that so reduce what helped us to iterate over elements and Reduce actually is a very interesting function, but I'm not going to talk about all its algebraic properties It's probably deserves a talk on its own, but if you're interested I highly recommend you look into that how it works So now going going back to the Bellman Ford function so tying it all together So we'll define a bowman for it function We take the graph and the start node and now we will initialize the So we'll initialize the estimates By calling the function in it estimate And now we will relax edges. So online to you see we do this for we do this V minus one time so what we'll do is we will code a reduce function on edges and This is the trick to do the V minus one times If this doesn't quite make sense to you I'll post the slides online and you can verify to yourself that it actually works trust me It does but the cool thing about is that now you take the reduce function You pull out the end you specify how to operate over edges and now you specify how many times to do this by Calling so you count the number of nodes and then so that's V and you document that So that's minus one and then you do that over the range one two V minus one Okay, so now For the part that actually does the section of negative white cycle So so far we computed the path from the start node to all that vertices Now we want to know if there is a negative white cycle Which means that we're just if we keep iterating we're going to keep finding short and shorter paths and The way we do we do this is this So some function enclosure basically returns to you the value itself which is truthy enclosure or returns to you And now which is evaluated to false So here what we want to do is we want to see is there an edge that we can still relax and if we find such edge that Returns to us that we could relax it then we will Exit immediately and return false and otherwise vote that the The algorithms to think here such return true, but that's not really helpful So let's return parent pointers itself and the construction there are pointers you know, it's just mangling more of internal data structures and calling more functions on them, so okay, so What we've seen so far when implementing graph algorithms functionally is that everything was a function and Functions operate on on lots of different internal data structures which put themselves just maps lists infinite sequences and we operating them back and forth and Every function returns to us in your representation so the estimates that we computed initially then we passed it to function and There was actually two data structures custom path. We'll update of them as we went and we got a new copy of it Okay, so Now let's talk about what loom does so as I mentioned loom is a Graph algorithms and visualization library written closure. It's accessible on github and supports a variety of graphs It has undirected graphs directed graphs weighted graphs Multigraphs which we mentioned earlier which you can specify many edges between two different nodes It also supports fly graphs which are read on the ad hoc graphs This is something that we call fly graphs. So what they are able to do is they're able to infer Given a few functions they able to infer other ones that it needs to call For instance, if you give it node and successions functions a ways to retrieve them Then it will figure out a way to retrieve edges And if you give it successors and the start node, then it will figure out how to retrieve nodes and edges So it supports also a variety of algorithms It supports depth for search and breadth for search So on the left-hand side you see the walk of the tree using depth for search where it first tries to go It's deep as they can and then backtrack and expose the rest of the tree Where's breadth for search tries to explore each level of the tree one by one And it also has a bidirectional breadth for search implemented in addition it has topological sorts Where every success of a vertex is guaranteed to come after it in the ordering? It computes a variety of shortest path algorithms including dextra but one for that we looked at earlier a Star which is commonly used in AI's to compute path finding and Johnson's algorithm which instead of computing from single source to all the path it to all the vertices in the graph It computes all pair shortest path So from all the vertices to all the other vertices and it specifically does that on sparse weighted directed graphs They're actually also algorithms to compute clicks using Braun Kerr-Busch For finding maximal clicks in an undirected graph. So clicks are When you have a click what it guarantees is that every vertex is adjacent to every other So there's a direct direct edge between all the other vertices in the click Now as you see is strongly connected components, which slightly relaxes this notion strongly connected components defined by Every vertex is reachable from every other vertex in the strongly connected components So it doesn't have to have an direct edge, but it needs to have a path So we compute strongly connected components using Kassar-Raj algorithm There's also a density function which computes just the ratio of edges to nodes So it's you are able to figure out whether how dense your graph is or how sparse your graph is There's also a loner node way to Way to compute a loner node So on this graph that we see G is a lower node because it's not connected to all their other nodes in the graph There's also greedy coloring and two coloring algorithms So greedy coloring what it does is as soon as it cannot find a color from the currently used set of colors it would just assign a new color and Two coloring is used to figure out whether the graph is bipartite So if you are able to do two coloring on a graph that means your graph is bipartite And there's also max flow Algorithm using Edmunds-Kerp which finds a feasible flow through a single source sink single sink flow network That is maximum. I couldn't put it on the slide, but there's also primes minimum spanning tree So remember I mentioned that your graphs your trees are also graphs So what primes algorithm does is it computes minimum spanning tree for a connected weighted undirected graph so When it traverses and figures out which tree to compute it picks the nodes that have minimal edges So it's guaranteed that they it picks edges that have minimal weight So it's guaranteed that all the wedge all the edges in the minimum spanning tree are minimal from this subset that it could have picked So the previous algorithms I mentioned are in loan that our name space. There's also our generic name space Which basically allows you to run graph algorithms without any knowledge of the graph representation So you don't have to implement looms graph API to use out generic So basically out generic doesn't know anything about underlying representation of the graph so it requires only successors function and it requires a start node for depth for search and breadth for search because you have to start somewhere and Also for topological sorts and extra algorithm and it I also if you want to find a breadth for search path between Two different nodes then it requires an end node as well And as I mentioned loom is a graph visualization library, so it uses graph visit graph vis which is a commonly used graph representation library visualization library so by default it uses Dot algorithm for those of you familiar, but you can also specify other layout algorithms if you wanted to pretty easily so what I wanted to find out for myself is whether loom is actually as Flexible and as an unopinionated about graphs as I claimed it was so I implemented Three different ways to represent three different graphs from three different problem domains and represent to them in loom so one of them was I took a core async single static assignment form function so basically what it does is it takes the they go block and Represents the program in a single static assignment form, which is a common representation that compilers generate for a program. I also represented so titanium is a Closure wrap around Titan DB So I start some data in the in-memory Titan DB and then I specified a way to Have your data represented in loom and run graph algorithms on your data in the titanium database and I also took the Dependency graph of closure get hop repositories and try to figure out what are the most commonly Used libraries are besides the you know the closure core libraries And they're actually infinite number of graphs that can be represented by loom I know that some people use it to represent their workflows. So If you want to figure out Specify a way to build your code and roll it out into staging environment production. You could use loom I There is I think a research group in Berkeley that is used it for some time to represent to do a programmer presentation of their For their compiler and I'm sure there are many more that I don't know if yet But if you're playing with it and let me know what you're doing with your looming integration Okay, so Let's wrap up Functional graphs are very interesting very somewhat new field for a lot of people to think about because They should be represented differently if you want to learn more about how to Represent data structures in a functional world. There's fantastic book by Chris Akasaki called purely functional data structures You may have to learn standard amount. That's the language the Examples are in but many people have found that it was pretty easy to follow without actually understanding standard now So What are some takeaways? Well, when we represent graphs functionally our data and our functions are separate So unlike an object-oriented world where we had Notes and we specify nodes what kind of operations you can do and edges specify the object edge specifies what kind of operations you can do on them and functional FP world Those are separate so you didn't leave somewhere like it lived in my you know titanium database right like the graph database And then I my functionality which was loom had some API and then I specify how exactly to retrieve it from the database Functional graphs have a much more flexible representation because they don't make any assumptions about your graph So it's all up to the graph offer to figure out what kind of graph they want You may actually have a directed unweighted graph and for some problems You may want to treat it as unweighted and for some other problems You want to treat it as weight a graph because you want to compute some properties on it so you're able to do that and The other third takeaway is Graphs are immutable in the FP world So anytime you do mutation to the graph it's yet a different graph unlike an object-oriented world You just you know keep changing things and maybe it's it's never the same graph But as you pass it to your User maybe they'll do something else to it and you know return to and you don't really know what happened to it Well and take it way number four is use loom if you have graph algorithms problems if you have any need for graphs cool Awesome. Well, thanks so much for coming