 So hello everyone. So first thing who knows about data structures in the room. Can you show your hand? Okay, that's fine. So So this talk is called developing data structures for JavaScript And so I'm really glad to be here on the back again the JavaScript dev room And so I hope I will be able to entertain you during those next 20 minutes So who am I I'm Messing up Ah, okay, so The subject of this talk is why and how to implement data structure for JavaScript and why is that a thing? So who am I I'm Guillaume, so I go on the internet by the infamous name of young Gabriel, which is a shitty name But that's life and I'm a research engineer working from for a research lab in Paris, which is called the media lab So what's the data structure just to be sure we agree on the same ground So that that structure is actually a way to organize and move data around in the computer So we are able to query and update it efficiently So basically when you deal with computer you can sometimes trade off some memory space to be able to do computation faster And so basically this is what encompasses Data structure. So for instance in JavaScript, you have a lot of data structures. You have arrays. You've got objects set maps and so on So we will start with some Quotes from the internet so web development is not real development and is ends forth easier. So this is bullshit Web development is trivial and web developers don't need fancy data structures or any solid knowledge and algorithm So someone obviously still wrong So the point of my talk here is to show you that we can use that a structure in JavaScript And that's a good thing and if you do all do that you will lead a happier life So the question is don't we have already that a structure in JavaScript like for instance We have we have arrays we have objects and now with ES six. We've got maps We've got sets. So why the hell do we bother about custom data structures? Isn't this really enough? So why do we want other data structure structure? So the first point is because it's convenient like any other kind of abstraction having data structure is convenient And it's easy for you to do like some kind of heavy bookkeeping using those custom data structure So I will use an example to clarify this point For instance, you can implement something which is called a multi set a multi set is actually a set in which you can store An item more than once So for instance if you do some basic JavaScript and you need to count the number of Occurrences of an item in a sequence you can do is nasty code you're going to iterate on this list and then you have to check if your it item already exists in the list and then you set up The item to be zero and then you increment it and so on So this is bookkeeping you have to track a lot of things and so on whereas if you have a multi set You can just do that you have abstracted this complexity away And then you are able to delegate the bookkeeping to the custom data structure. So this is a first example Another example would be like more complex data structures because the multi set is actually really really easy And straightforward, but for instance if you want to implement a graph So a graph is nodes connected by edges And so this is actually quite doable if you only use arrays and objects But you have to keep a lot of indexation running correctly and smoothly for instance if you want to Check who which are the neighbors of a node you have to index that and so if you do it by hand and you can do it I won't prevent you from doing so It's a bit messy. It's a bit hard to do and you will forget things you will stumble on issues and so on So you can like delegate the bookkeeping to a custom structure and it will do that for you And what's more it's usually better to have a good and legible interface So for instance here you've got an example which is taken from the library graphology Which implements graphs in JavaScript and so you can just ask the out neighbors of a node You can just iterate of the edges of a node etc etc. So it's quite easy And all this is done in constant time so you don't have to bother about indexing things Yeah, so the first point is it's good because abstraction is a good thing The second point is actually if you only use arrays and objects you will mess up because it's not good enough Sometimes you have to develop things which are a bit more complex So that's just because like nowadays JavaScript and the web is not something for script kiddies anymore So no GSB came a thing we have to process a lot of data on the client to be able to like power useful applications and Sometimes you have algorithm which cannot be implemented without the use of custom algorithm and data structures Dijkstra for instance for the one who knows And I will show you a concrete example for instance you have let's say you have a canvas So when HTML5 canvas and you will draw points on the canvas And now you just want to answer this precise question My user has his mouse on the screen and you want to know if the mouse is on a node or a point So this seems trivial and in the dumb it's really easy to do but in canvas The naive approach is to test in an array all your points and say is this node under my cursor No, is this not under my cursor? No and so on and so forth So the more points you have the longest it it will take So then you will have to implement a structure, which is called a quad tree Which is actually a recursive partition of the space and so you will just like this and recursively into the tree to find the nodes you need to to check So basically it will change your linear time access into something which is more like logarithmic time access So that was the second point So first point is bookkeeping and abstraction is good The second point is I raise an object will only get you so far and you will need custom data structures So the question is what are the challenges here when you try to implement data structures on JavaScript? Because you could implement data structures the old way like in C C plus plus and so on and it's quite easy to do but in JavaScript you have got some you've got some traps and some pitfalls that you need to avoid and try to Take a step back to be sure what are the challenges here. So the first one is that we JavaScript developers are handling a language, which is interpreted. It's not compiled So we are far from the metal. So we can't really know what's happening The only thing is we have no control over the memory layout and how the memory is organized by the interpreter and also we have no control ish over garbage collection garbage collection is a system that will clean unused memory automatically for you and it's quite painful to do Something which can control the logic of this because it will slow down your code Another challenge is that we all use just-in-time compilation schemes and optimizing engines such as Gecko and Firefox and V8 and Chrome and those have their own logic And they will transform your code into something that is really alien to you And you don't have any control on what they will do. So how do you do? To like implement something which is efficient So basically the thing is the gist here is that benchmarking code accurately in JavaScript is pretty hard But it doesn't mean that we cannot do it and it doesn't mean that we cannot be clever about it There is a lot of people on the internet that will argue that since we cannot know anything it doesn't matter anymore and you don't have to optimize code because all this is Pointless and we are all going to die in devoid and oblivion So Don't do that. So I'm going to give you some implementation tips So generic information implementing Implementation tips on how to be able to like implement data structures efficiently in JavaScript without being Killed by the engine and we are going to try to outsmart the engine in a way So First thing is minimize your lookups. So lookup. What is the lookup? A lookup is if you need to access an object property if you need to accept to access For instance a key in a map or in a set etc. Those things are the most fucking costly thing in JavaScript So if you minimize this you will go up in performance for instance here You've got an example where in the graph I just want to check some attribute of a node and the nodes are stored in a map So what I do is that I first check that the map has the node to be sure I'm not doing something which is not possible and then I get the node. So there I did two lookups This is bad. So here. I only made one because I just get the node. I Infer from the fact that the data isn't defined That the node doesn't exist and that's all so I made one lookup And if you do a quick benchmark about it, you will see that it's actually quite straightforward Two lookups is 30 milliseconds and one lookup is 15. So it's quite half of the time So the point here is that the engine is really clever, but it's not that clever It's a bit done still so it improves Frequently though so you have to benchmark things to be sure that you are not making something which is stupid So the approach which is like oh, I'm going to go to code like a dirt or just I will make bad code and the engine will clean up for me and do things which are like Blazing fast is stupid. It won't work So first thing don't use too much lookups. The second tip here is Creating objects and allocating memory in JavaScript like in any language is very costly So avoid allocating objects when you don't need to allocate objects avoid Recreating reg X's for instance if you create reg X's in a function You will have bad issues and avoid nesting function when you can so concretely this look like like this This is bad. This is good. So just offload the reg X outside the function so that the engine won't be created each time Here it's very very bad So you have an array for each you iterate in here just here because you are nesting the loop You are going to create one function per fucking element in your array So third thing mixing types is really bad in JavaScript. You don't make style here You've got an example, which is a bit distorted So, oh That's too bad. Okay, so that's a shitty array anyway. So we've got an array which contains number strings Strings looking a lot like numbers and this will mess up V8 for instance You've got a reg X and you've got an object. So if someone does that here I want to meet you honestly because I don't understand why you would want to do that So next tip and this is actually my favorite one is the poor man's malloc So malloc in C is a way to agree like to allocate some piece of Contiguous memory and so the gist is the following so in Jala Street recently We had a new thing and you shine a thing which are byte arrays and typed by the race Which means that you can allocate an array of n elements using the given number type you want for instance you have you Unsigned integer 8 array you've got float 32 array and you have a lot of types and You can like simulate a kind of memory allocation with that and so you can be clever about it and cheat a little bit So my my point here is we can implement our own pointer system So you can have your own like see in Javascript Your own way and this will speed up things and this will make memory really lighter So let's use a concrete example to explain that because I guess it's a bit obscure So who knows here what a linked list is Okay, that's a fine thing so a linked list is just like nodes linked to the one another With a pointer here which points to the next item. So here you've got those basic list So a to be to see to the void and the oblivion of life And so under it you've got object references as pointers Because in Javascript you don't have pointers you don't have c pointers and so on so the only way to simulate this is actually to use object properties so Basically any Javascript the person would do it in this way So create a node which is a class kind of class and have a next property which will be a Reference to the next to the next node and so if you need to change a pointer You just do not that next and you allocate the thing. This is the same way to do things This is the insane way to do things, but it's way faster So basically we are going to do a linked list, but we are going to roll our own pointers so you have to keep an Index which will be the head and you will keep an array in which you will have your values for instance In BAC and in another array, which will be a byte array for instance And here you only need the UN you in 8 array You will have an index pointing toward the next item in the list So for instance 1 2 0 so this means that if you need to check Which is the next item after B you check the index of B in next which is 2 and then you check Here index 2 it's here. So the next item is C So this is how you can try to implement your own C pointers in Javascript So let's use More concrete example to tell you why you would do that because linked lists are quite useless in most languages But you can do nice things with this. So I don't know if you know about a structure Which is called an LRU cache, but the LRU cache is actually an object We have a site which has a fixed size capacity in which you cannot like Set more than a fixed number of keys It's a good thing when you have like a constrained Environment when memory is actually really critical and you need to save up some RAM the idea Here is that you only want to keep the most frequent key in the dictionary or in the object So you can like alleviate the other one in the in the in the map so for instance if we had a new key and The object is already full we are going to throw away the list recently used one And this is why it's called an LRU cache So to implement an LRU cache what you have to do is to is to maintain an object key to value And you have to maintain a doubly linked list when you can go forward and backward Because when you are going to like add a new thing and you need to throw away away something you are going to take the last element and pop it away and Take the first element and put it on top of the list and In the same way if you are going to get an element in the list You are going to take it away from the list put it back on front and so on and so you can maintain a list of Used items based on the list recently used etc So what you would do in this precise case You would have a pointer to the head a pointer to the tail an array of pointer Pointed to next an array pointing to the previous item and you would actually Manage to do that and so you only keep items as a JavaScript object Pointing the key to the pointer and the pointer is just the index and you keep your values in an array And so it may seem pointless, but it's not so here you've got this approach Used there so it kind of beats everything that was made before and the really good advantage is that it does not allocate It does not garbage collect you don't have garbage collection and it's really really light in memory So that's was the third and most interesting tips so to go fast on the last tips Function calls are costly in JavaScript like in any language is so everything is costly in life is hard So don't worry. So this means that usually a Recursion is a worse idea than using iterative versions using stacks. So this for instance Recursion scheme to traverse a binary tree is actually slower than doing this strange alien But your Malay mileage may vary So please benchmark it because sometimes it is the case sometimes it's not that's pointing back to the challenges earlier Okay, and so last thing what about web SMB and so on because we are like oh, yeah JavaScript is so fast, but we have a faster things. So you've got lots of shiny options You have ASM. Yes, you've got web assembly and in Node. Yes, for instance You can just use like native C++ code and optimize things but for data structure if you need to keep a bridge to JavaScript and be able to like use a set on Under jack and have in JavaScript site The issue is that communication between those and JavaScript as a really heavy cost So if you need to do a lot of computation on the web assembly side It might be a good perf boost if you need to call back and forth between the web assembly and JavaScript really really fast It will slow you down It's improving and for instance in Firefox those kind of performance when really up, but we are not there yet There yet. So either you do everything web assembly or you don't Okay, so as a conclusion and to wrap up all we said and learn I guess some parting words So yes optimizing JavaScript is hard But it does not meet we can undo it and please do it. We can do it So most tips I show you I show you here are applicable to any kind of high-level languages mostly, but JavaScript has its own very kings Now for instance the byte array tips does not work in Python if you try to use list in Python to simulate this kind of pointer it will go bad and if you use NumPy it's even worse because you have like the bridge between native code and Python code and So the just be is as a conclusion to be efficient your code must be Statically interpretable if you do that the engine we will have no hard decision to make and if the engine has no hard decision to make the engine will safely choose the best path to optimize your code So rephrase optimizing JavaScript is squinting a little and pretending really hard that the Language is statically types and that the language is low-level if you do that you will go fast So just pretend that JavaScript is C and everything will roll Okay, and so the next frontier if you want to like improve a text Disaster is that for now? Nobody has been able to beat associative arrays in JavaScript It means that you cannot go faster than a map. You cannot go faster than the object. It's not possible yet When do we key value association? But maybe with some kind of trees or some clever hashing schemes We may be able to beat some native optimization on the JavaScript side by being clever So please implement a way and use all those tip to like flourish and make a new data structure So we can all go lead a happier life so Some references to and wrap up all those things so all the examples you shown were actually taken from the following libraries, so you've got nemonist, which is a library implementing a lot of data structures in JavaScript with fancy Fancy APIs and typescript. You've got graphology, which deal with graphs and sigma js Which is actually a graph rendering engine in the browser using WebGL and and so on So that's it basically. So thank you for your art Thank you a lot any questions What? This one this one this one Yeah Thank you very much for the presentation. I have two questions What's your opinion about bubble and typescript because this is a transpire and transpire Atmoscript date structure to atmosphere Five what's your opinion about the performance? Yeah, so I hear I heard about typescript But what what was bubble bubble? What is this is transpire? What? Okay, so basically the gist is typescript will have no impact on your runtime because it's just on the Translation time, but basically if you want to write performance code, don't use bubble Because bubble will use by default some kind of helpers to ensure that the specs are respected and sometimes those helpers are function Which are costly and you don't have to do that So either you transpire using the loose option in bubble or you don't use it and you write Echma 5 screen codes basically and My second question is what's your opinion because from medium has a lot of articles about how Echma scripts Cures performance if you use only Echma script 5 I'm not sure to get it So your point would be some people say that if you use like mass script 5 yes Here's a lot articles for example if you use for the each Voices yeah sure sure sure and so yeah, it's not about X max with 5 It's it's more a generic thing which is if you use for each performance is usually a bad because a loop will always be faster But it's not related to week mass 6 or 5. It's more like Function calls are costly so don't use them Yeah follow up on that functions Caughtly should we not use map reduce that often? I use it in application code because I'm like everyone I don't like to do for loops It's a it's a harsh way to do so but when I write like optimization critical code I don't But it's it's between because like in v8 like recently map became really fast And so it's okay But if you need to ship code which will work in different node versions or in Gekko or so on you don't use map basically that's all Any other question? Hi, thank you for presentation. Just one question regarding Similar question to the one person already asked about bubble transpiler. So yes, if you use transpiler like that, it really hurt our performances, but Could could be more maintainable. So yeah, and also some other of your solutions look smart and but Where's the boundary between performance and maintainability of project especially if more people are working on Sure, the boundary is yours to draw but basically Like data structures need to be optimized like really optimized So I will write code which is less maintainable in a way But that's the cost I have to pay to be sure that it's the best But for instance, if I write application code, I will never do those kind of silly things. I don't do that So for data structures, it's okay to have this kind of weird code But for anything over that I guess it's not an issue Last question somebody Hey, uh, do you use like some