 So Guillaume is a developer at the SOS Media Lab in Paris and the ideation, the web-based firm. Also, I developed the first version of SigmaJS and just to correct you, you can see it right here. Guillaume is actually the main maintainer of Sigma today and our strategy is kind of, if we have no $10 rep library kind of a huge problem because we have more and more use cases that other people wanted to build through the web graph analysis. We use cases like with session network analysis, the reasons, page runs, et cetera, and even sometimes we'd like to have like in Node.js, which is more and more used, a tool like network is called Python or boosts, but there's nothing today. All we have today is some graph rendering and giants which have their own profiles and algorithm implementations, like in my NCA stage. There is two ports of network in Node.js. Graph are actually both squatted and NPM. So we cannot create anything that's called graph with lower case or a ver case G. So the problem is that in the state of the art, and I know, yes, we have like some, both are really layout, so things like this implemented in Sigma, but it's absolutely not usable in anything else than in the web page. And it's a problem for data page, I guess. And when it's too much time to render it, it's only for the web, but it cannot be used as standard SNA library. And yeah, for the SNA algorithm, I do more in the state. I take the quite standard JP workflow. When I have a social network, what do I want to do to create a network map? First stage, we're going to be able to compute some metrics like Python, or Geats, or any centrality score for my nodes, and map to map the results to that site. Then we search for communities, and we map the explorers to the communities. And then we run some layout algorithm. And we have Sigma kind of covered the part of this, only if you run it with or under it. It's not possible to use this in the JS. There are some local solutions, like, I know there's a page wrong, but not because I run models in NPN, but none of them are bound to a new app model, the same bar hits in NPN. About the communities, there's like two kind of loophole applications on NPN, but none of them are quite, and some graph rendering have their own. So if you go to the site, you have the links. And for the project, it's quite worse, because it's really only tied to graph rendering, and we have Sigma has closed at last two, which is also developed for some other rendering library, I guess. There's a lot of string implementations for D3, Cyclocay, Petra, and I have seen some string modules, again, really different. So if I want to use some of these implementations that proven, I have to switch with the original CTIs all the time, and it's quite painful. So basically, it's all SNA algorithm, it's all SNA process is really painful to do for any industrialization, are you doing it? So are we doing it? Okay, so I hope it's fine. So we certainly hope not, and so what we propose here to you today is actually with something which we called Graphology, because we had to find a name, because NPM is a wild place, and when someone takes a name, you can't take it back. So we just used a lame pun, and this is it, Graphology. So Graphology, what is it? It is a specification for a robust and multipurpose graph object in JavaScript. So what does it mean, really? So yeah, this is it. So this is a specification, not a library, but you have a reference library, which is the reference implementation of this specification, which you can download and use. And you have also a standard library, which comes attached to it, which contains a lot of classical algorithms. So you've got layouts, you've got generations, you've got its, et cetera, et cetera. So this specification is meant to be multipurpose. So we are not targeting one specific kind of graph. We are targeting a lot of graphs, so the graph can be directed, can be undirected, it can be mixed, which is probably some pervert thing when you want to have directed and undirected edges. And the graph can be simple, so A to B is only one, or you can have a multiple graph, which is A to B plus A to B plus A to B, we don't care. And the graph will or will not accept self-loops. We don't have an opinion on this, we let the user decide. So we have a lot of use cases for this kind of library. We have graph analysis, if you want to compute metrics or a whole indices about your graph. If you want to perform graph on linked as, like you want to build specific graphs like bipartite graph from data or modify something which already exists, et cetera, et cetera, or maybe interface with some databases like Neo4j or Titan or so on. And like you have also a data model for rendering, which means that graphology is actually suitable for rendering libraries as a graph model, because it has like a lot of events and you can listen to mutations on the graph and so on. We will see that later. So what we won't do is handle graph data that does not fit in RAM. We are doing like the weakman thing. So we just like take graphs, hold them in RAM and that's all. If your graph is like 14 gigabytes, it's not all work. And so I want to stress that this is a specification. Here we are building, it's not a library. So this means actually the following things is that, I'm sure you will agree here that there is not a perfect way to implement graphs data. So you can't really do it. So there are implementations that will cover more use cases than others. And you can aggressively optimize an implementation to work on some really specific use cases. So with a specification, what we have is that anybody can implement the specification, however it fits, and it will use its, or their implementation, but they will still like have the benefit of the standard library and be able to use like all the ecosystem of libraries which use this specification without having to record them and again and again and again. So for instance here, you've got an example. You've got a graph, which is actually an implementation of graphology, which is my custom graphology implementation, which is why not build upon say plus plus for not JS. And here you've got something which is a library from graphology, which is the function which is able to extract the connected components. And so these functions will still work on your particular implementation of the graph because yeah, it's the same API, so no biggie here. So that's why we have a specification and not a library. So what are the main concepts of this specification? It's really simple, you've got nodes and nodes are represented by keys and those nodes may be described by several attributes which are like key value, key value and so on and so on. And an edge is represented by a key that may be provided or not because the graph is able to generate the IDs and you'll see why later. But can also be represented by attributes. So you've got nodes, key attributes, edges, key, source, target attributes. So that's a graph, I guess. And that's all. So for instance here, you've got like an example of the code if you like load the reference implementation of the specification. So you are going to like build the graph and you add nodes to it and then you add one really interesting edge and then you add a really interesting attribute to the Susie node. And you've got a miscellaneous information like the order of the graph, so two obviously. And you can like iterate on the nodes, you can iterate on the one nodes neighbor and so on and so on. It's pretty boring actually, it's just a great API basically. And so this is the current state of the standard library for instance. So this will grow. You've got like assertion, centrality metrics, components, detection, generators, like if you want to generate a random graph or an L-dose-reney graph. You've got hits, you've got layouts, you've got four, eight less two layouts, you've got operators and new utilities and so on and so on and so on. So you see the thing. So what we want to speak about here is more about like API design. So because when we intended to implement a specification, we were like a bit like a bit befuddled by the whole diversity of graph library that exists and how we were going to like implement it, what would be the semantics of the library, how are we going to like add a node, add an edge, et cetera, et cetera. And there are a lot of ways to do so. So we want to show with you what the choice, what were the choice we made and how we came upon the API we've got today. And what kind of issues we had also and how we solved them or I hope we solved them. So first of all, we're in JavaScript and so JavaScript is not Java. I hope you all know that otherwise I'd have to kill you. So obviously we don't have classes for nodes and edges. So this means that there is only the graph which is actually an instance of a class. And so you don't have things like this like your const node is gl equals new node. No, you don't like get node instances from the graph by adding a node or something. So basically this means that the node is just a key and some attributes and that's it. So if you want to ask questions about the graph or about the structure, you will use keys, both for nodes and both for edges, that's all. So the other issues we had were concerning like the default graph type. So because a graph can be a lot of things, can be mixed, can be directed, simple, multi, we had to make a choice and tell you like what is the default graph? And so we made a choice and we decided like the graph is mixed by default because most of the developers actually don't want to make the choice because this question doesn't interest them at all. So by default, the graph is mixed. So you can add directed edges and directed edges also and that's not a problem. But by default, the graph is simple. So you can't add multi edges. So you would just like create a graph and this is actually the same thing as saying that my graph is mixed and my graph is not multiple. Okay, so this means some things. Like this means that usually when you build a graph implementation, it's kind of useful to know whether the graph is directed or not because you can optimize the implementation because you won't implement an underrated graph the same way you would with a multiple directed and directed graph and so on. So if you don't want to choose, yeah, don't choose the implementation will work. But if you want to choose because you know it, you can use like type constructors to do so and the implementation will be able to optimize itself because it has information now basically. So if you don't want to choose, you don't choose, but if you know, just state and the optimization will be possible to be around. So here for instance, we are going to instantiate a multi-directed graph which has exactly the same API as any other graph of the library. So there are no like difference in the semantics of the API. For instance, in network X, if you build a directed graph, you won't have the exact same API as you would be with the undirected graph. So here you have the exact same methods everywhere. So this means that we have to rely on some useful error messages and hints. So if the user is going to do something which is completely stupid, for instance, adding a second AB edge where there is already an AB edge, the graph will tell him really politely that he's a moron. So here you have it. So we add some nodes, we add an edge, we add the same edge. The graph will tell you that you can't really do that. You know, it's a simple graph. So if you want to have a multi-graph, maybe you should like use a multi-graph. So this way we both have the benefits of having like a standardized API for all types and like being able to explain really gently to the user that is making some mistakes. Okay, so the other problem we had was concerning like the edge keys. So it might not seem a problem because usually you don't use keys because the keys is just like the source to the target. But when you have multi-edges or multi-graphs, you have the problem of being able to target a really specific edge and you can't do it if you just have the source in the target because you have to be able to like loop on the edges to find the one you want and this is a bit silly. So here we told ourselves we are going to use keys for edges because this is quite useful and you can target really specific edges. But the thing is, and we learned this the hard way on Sigma is that it's really boring as shit to ask your user to like invent keys for things which don't have a key specifically. Like in a simple graph, you don't want to tell your user, okay, like take your counter and increment it each time you add an edge. So you will have a unique ID. This is really a pain. So at the beginning, we told, okay, we are going to like have a method which is add edge, a key, a source and a target and some attributes. And we are going to have like a method which would be graph.aid, add edge without a key and we have thrust and target and attribute but this was like really boring. So we went the other way and tell ourselves, so if you had an edge, the graph will generate the ID for you and the key for you. And if you want to explicitly provide a key for the edge, just do it with this one, which is add edge with key. So like the really common case is eased and the hard case is like swept away. But this means that you have to generate a key and actually it's not a really easy task because like incremental ideas are really shitty because if you merge two graphs, you will have like some issues like, I already have the one edges and you have also the one edge but they are not the same, how do you do? So you have to find a clever way to do so. So we went with the easy way, which is UID. And in the current implementation, actually the edges are processed, the edge keys are generated likewise. So we generate a UID V4 and then we compressed it in base 62 to reduce the amount of RAM used. And the really neat trick with base 62 is that you can double click on the thing and copy it really easily. Whereas in base 64, you have like iPhones and underscores and shitty things like that. Yeah, you can double click and it's a pain. So this is the most crucial point of the implementation. So what about adding and merging nodes for instance? It could be edges also but I'm going to speak about nodes in this example. So what does it mean exactly? So when you add a node, which is John and you add the same node again, what should happen? Like for instance, in NetworkX, if you do so, it works like a set. So if you do so, the graph will just tell you, okay, I won't do anything, no biggie, I already have the node. But here we've got a little bit of a problem which is we also have attributes. So if you do this twice, what the hell should we do? So NetworkX actually applies some magic on there and you will say, okay, I'm going to not add the node but I'm going to merge the attributes. So this was a bit magical too, magical for us. So we decided upon something which is a bit different which is you're going to add the node. If you add the node twice, we're going to yell at you by saying you that you do something which is really silly and this is actually quite useful because for instance, when you serialize data and you process this data and you like notice that you have twice the same nodes whereas you should not have it. It's quite useful to have an error rather than just silently do nothing. And so if you want to explicitly merge something, you are going to use merging methods like the merge node methods which actually will not add a node, will not yell if the node is already existing and will merge the attributes as you would expect the method to do. So instead of like doing some magic we chose to yell at the user and tell him to do some things explicitly rather than implicitly. So what is a key in the graph? So this one was a bit long to solve but I'm going to do like a bit faster because we're running out of time. So what should be a key? Like for in the Turkish for instance, you can have anything as a key. So a key can be a reference, a key can be a string, et cetera, et cetera. So here we chose like to be more like the JavaScript way for multiple reasons. So only a string can be a key. So if you pass like an integer, it will be coarsed and cast into a key like an object would do. Why do we do that? Because it's more JavaScripting. It's really easy to serialize because serializing an object is like a pain. And that's mostly it. And so to, yeah, that's it. So the other problem we had was that we needed events because we might use this library in a rendering context and we might be interested by some really focused informations about for instance, this node was updated and the color of this node was updated. This is important for two reasons, rendering and like keeping indices synchronized. So we have events. And so because we have events, we obviously need to have like some setters and getters because otherwise we are not able to know something was mutated. So it's not Java, but it's a little bit Java anyway. So you want to attribute, get node attributes. You pass the node key, the name of the attribute, you got it. You want the attributes, just do that. You want to set an attribute, the same thing, et cetera, et cetera. So basically quite simple, but this doesn't mean that we have to be stupid about Java. So basically nobody should have to write this kind of thing because to increment a counter be able to say like graphs set the fucking node attribute for this key, for this counter equals, grab that, get the fucking node attribute. It's silly. So we went with an OOFP approach, object-oriented functional programming, which is update the node attribute and just pass some functions to apply the change. And this means actually as a cool side effects, simpler iteration semantics because at the beginning we were cushioning ourselves about like if you iterate on the node, what should we provide to your users? The node key plus the attributes plus some other informations. But it was a bit strange and hard to grasp. So we just said, okay, you'll just have the keys and then you'll ask the graph for other information if you need it. So you save up some memory, you save up some semantics and that's a bit better, I guess. And so the last point we tackled during the design of the API was another information, which is what should we do with labels, weights and so on because the graph theory is full of those kind of special attributes. And so because the graph theory fosters an environment where you have really variation about a common typology, like you would say a node or a vertex or something, we just told okay, there are attributes like any other and that's all you have. And so all the library is made while thinking about this and if you want to call your weights with a shitty name, you can just provide a configuration and you should be okay. So this leads us to like how did we implement the reference library? So how did we implement the specifications so in JavaScript for anyone to use the reference implementation that you can download and use by doing NPM installed graphology. And so the main issues we had was concerning like constant time versus memory because we don't really know what the users are going to do with the graph. We can't really aggressively optimize for some use case. So we decided we are going to like support everything in constant time. So this means if you had a node, constant time. Delete a node, constant time. You want to like find if an edge exists, constant time. So this means we have like some memory overhead to be able to provide this kind of fast operations. And so the actual data structure is likewise. We have two maps which we're introducing in ES6 which are going to store like a node index key value and an edge key value index. And we have a lazy indexation of neighbors which means that we won't build like the information of the neighbors if we don't need it. So the node maps will store this kind of things like degrees, attribute data and a lazy neighborhood indexation. So this means that if you have to check like this is my neighbors, it will compute the index and then you'll have it but not on the first time. And so this, the index of neighbors look like this. So it's actually sparse matrix. It's a really simple sparse matrix. So you've got like the node A has got like out neighbors which are B and C and so on. And you've got a set of the edges which are related to the path A to B, et cetera. And the only optimization we run is that like this set is actually the same reference as this set. So it means that A out B is actually the same as B in A. And so the edge map is quite similar. You've got like the source, the target, the directedness and some attribute data. And so I'm sure that someone can find something better. So please bash me and help me, you'll find something a bit like interesting. And so the last issue we will tackle is like the cases of undirected edges. How do you store them? So in memory, we just like to keep an implicit direction. So if you say at edge A, B, we will store like undirected in A undirected out B. And if you say like B, A, it will change the internal representation in memory of the edges. So with two equivalent graphs may have different memory representation. And for some people, it seems to be an issue. So I want to have feedback on that. And so should we sort the source and target keys? Should we ask them? I don't know. I really don't know. So please bash me again. And if you want to have some precision on the implementation, please do read the code. Everything is open source and everything is accessible and you can contribute. Yeah, so the future roadmap now really quickly. Yeah, Sigma, we will really develop a fully Sigma without a graph model, which is kind of good news for us. And with a really smaller, specific functional scope, which is just doing rendering management and interaction management, which is really nicer for me. So no more, yeah, I need to implement the page rank to render graphs, which is kind of nonsense for, we will have graphology to end all that. And some just quick note community notes. Since Guillaume is actually more maintaining it than I do, we'll try to adopt a more community workflow with an actual roadmap, put this on a GitHub organization and try to have more frequent updates, actually. And so for like future ideas, should we try to support iPod graphs? Does someone need it? Does someone need an imageable immutable version? Like it's easy to write using immutable JS or Mori. We would like to have like type script definitions and so on because it's all the rage right now because like static types, yeah. And thank you. So this is all a work in progress. So please bash us and help us like achieve a more useful tool. Yeah, thank you. Yeah. What was the biggest graph you tried to build? Sorry, I can't hear you. What was the biggest graph that you were used to test it? Yeah, last time I checked I had a graph with like one billion note, but not really representative because it did not have a lot of attributes data. So, but it will like store a lot of data, but not more than you can store in a map in node JS, for instance. So it will like run a memory quite easily because of node and things. Actually it could be like the object of a customer implementation because it could be like quite easy to do, but not for the reference implementation. But if like you can't implement the specification without being able to like implement your own compression then we failed. And so maybe we failed, but I think you can do it. What? Yeah. Can you speak louder? What's about the layouts of the graphs? If I want to render with the force of it, the output of the SQL layout. Yeah, so there is a stub of library which does those kind of layouts. So the rendering is like a different question which will be under like with other libraries like Sigma JS and so on, but for the layout like for now the implement into what I like to run.