 Hi. I'll be talking about immutable data trees in JavaScript today. So first bit about myself. So my name is Szymon Witamborski. This is where you can find me. So where I come from, apart from the country I come from, basically I come from closure world. When at my university time we were allowed to write our assignments in any language we could. So we wanted. So I just closure. And I wrote this library because most of them had GUI. I wrote this library GUI for the win. And actually it was pretty much looked like HTML, CSS, and JavaScript. So it actually brought me to web development. I just figured why they just didn't use it. It's there. So something's up with closure. It's functional program language. It has very cool immutable data structures. But today we'll be only worried about the middle one. And we'll be actually stealing from closure those immutable data structures. So whoops. I think we are getting too far. So why would we want to actually to have immutability in our programs? So one thing is to enforce separation between functions, modules, or third party code. That's pretty much explains itself, I think. So basically we don't want anybody messing with our stuff. So we want to avoid side effects of one thing changing state of another. So if you pass value or a collection from function to function, you don't really want the other function to modify the state of your object. You probably want to enforce it somehow. The same goes for other modules or other third party code like widgets that's also on your website. So why immutably security? If there's a third party code that we don't necessarily trust, we would want immutability in our data to forbid it from changing it at any event. So if you want actually to enforce it, we might want to just copy the data and give the other party a copy of it. So it won't actually mess with our internal state. But we might want to avoid it by just using smart immutability that I will just explain today how we can do that. So the things I described is basically different ways of sharing states. So when we pass data from one place to another, be it function, module, or third party code, we can do it with a convention, for example. So we say there's a set of rules that we abide to in our organization. So one convention could be with sending data, just mutable data, what convention would be that the data belongs to the sender and the receiver shouldn't modify it. So whenever I get some data, I assume that I can't change it and I have to create my own copy if I want to do anything with it. The other convention could be data belongs to the receiver and sender shouldn't modify. Wait, did I said it the other way? Yeah, sorry, no, it was good. So basically whenever we send the data, we assume that it belongs to the receiver. So the sender is basically assuming that whenever he wants to do something with it, he will have to create a copy. So the third convention is that both sender and receiver can modify it. And this is pretty much a strategy because both sender and receiver at this time are basically coupled very closely together because you have to take into account that the other module can change your data at any time, which is just asking for bugs basically here. So you probably want to use number one and number two. And I'm not really a big fan of conventions because somebody at some point will break them. So a newcomer will probably break it. The first thing they do probably is just break it because they don't know about them. You will probably break it at some point because you forgot about the convention or had a bad day or you have to just like rush because you have to fix something very quickly. So there's another thing about third party code that you and the organization agree to these conventions. With somebody that's writing a third party code that runs on their website, it's not actually abiding to it. Even if it's just a library, not necessarily a widget or something. So we have those things called computers and we can make computers and force our conventions basically instead of making people remember them. I think that's a win in this case because we don't have to remember and they are enforced and everybody that comes in is abiding to the conventions because there's no other way computer tells them to. So the other way of we are now back to sharing state between things. So we can do object free. That's what ES5 gives us. And that basically means that we are freezing, we are forbidding for modifying the thing by anyone. But there's some catch here that has to be recursively for immutability. So if you have a tree of objects, we have to recursively traverse the whole tree, make everything sure that everything is immutable. And any modification, we have to do a full copy of things that we modify even if it's like a part of a tree. Well, we have to clone this part of a tree, make it immutable in the process and make a modification and probably again freeze it. And that's also only ES5 browser, so no I8 here. But I think the best way is to do it with a function. And this is probably the most secure and portable way because we make the state private to a function with a closure. So if we're going to pass it to a function, it returns a function that has it in a closure then. And the function controls access to its data. And whatever rules we impose, they are programmed in this function. So notice them to another subject, how we can actually make those immutable data structures behave nicely with nice performance and stuff like that. So there's something called multiversion concurrency control which unfortunately has the acronym as MVCC, which is very close to MVCC. But just bear with it. So my main characteristics are that the data is immutable, it's versioned and the MVCC comes as quite, concept is quite old because it was in the bees like relational databases for years and actually decades for transaction stuff like that. And it's quite recently adopted by Clojure-Haskara-Scala in memory system or in RAM systems. So basic assumptions is that each version is immutable and each mutation creates a new version. That's basically it. That's the concept. So in the bees, it was originally for that the writes don't block reads which is pretty important with concurrent access to the database and then readers never see consistent state because the new versions are accumulated on transactions and that can be even rolled back and the transaction isn't committed to the database unless it's committed and it's not rolled back. So Clojure's persistent data structures are basically implementation of MVCC. So they do pretty cool stuff and my end of it is this sharing of the structure or the data between versions. So version zero is a full version, full three and then when you create a new version only part of it is affected by the update. So the only the part of it is copied to a new version and the new version points to data in the old version. That's basically my main concept here. So collection is internally a tree instead of being a flat. So instead of having a flat array, we have basically trees that actually I would explain later what are pictures and stuff. And it gets nice performance characteristics and like logarithmic performance characteristics. So this tractor-sharing saves RAM and saves operations with creating versions. That's main wins here. So the lookups are if you see this logarithm with base 32 of the size of a thing which until 32 it will be one lookup and after that it will be like two lookups until 2024 or something. So how it works? Let's say we have this very simple array and what we do? We create a binary tree of it. So instead of having a flat array we create a tree and to do that we first see the keys in binary code then we split those keys into bits basically and then each bit will be an address in our tree. So for example the A has address 0, 0 so the first bit was 0, the other bit is 0, the S had 0, 1 because it was 1 in binary code and then D had 1, 0 and F had 1, 1. Cool. So lookup performance of that is basically 2 because we have the depth of the tree is 2 so there will be two lookups which is not accidentally look base 2 or 4 because this is how binaries basically work here. So we want to still with a binary tree when we expand the size of it we have to actually grow the depth here so if you want to store eight elements we'll have to have three, depth of three because the logarithm with base 2 of 8 is 3 in this case. Hope that logarithm does not scare anybody. So how can we cope with the tree growing very deeply like just gaining weight in this case. We can increase the size of the chunks that we divide the key to so in this example we have 16 elements, area of 16 elements and we divide the keys into two bit chunks. So the lookup of it is now again back to 2 because we have only two levels. So because logarithm with base 4 of 16 is 2 so to make it more intuitive we can say that we choose two bit chunk size each case is four bits long so that gives us for each key two chunks for two bits and so the tree has to be two level deep because we split the key into two chunks that means two lookups. So now how can you actually make a performance? Let's say we increase chunk size to 5 increase area size to 30,000 something something which is 2 to the power of 15. So the same logic. We have 15 bits keys because logarithm 5 bit chunk size in this case so key is split to three chunks and three has to be three level deep three lookups for 32,000 elements so that's pretty good performance I guess and you don't always cope with that big data I guess so this is a bit math so this is basically formalizing how we derive the complexity and maybe you can just look at it later I'll post the link to slice on the group that's another math you can actually experiment with different configurations if you just use this function which basically says that for 5 bit keys for 30,000 you get three perhaps 5 bit keys in 2024 you get two and so on and so on you can just look at it how big data if you're interested how many lookups for how big data you can get and how was the optimal key size so basically for 5 bit chunks you get log 32 and that's what closure uses so now let's try to mutate things so mutation is producing a new version so we want to actually keep this version so we are assigning it to V1 just to keep a reference the number on the index is actually two it's one zero in binary code just wanted to make it clear what's the address in our binary tree because right now we are back to our binary tree so mutation, how does it work? we copy only nodes on the path to the updated leaf so if the address is like zero one we just copy those nodes that are on the root at the zero index and later at the one index set the new value at the new copy of the leaf that we are... yeah sorry so this is what happens and unfortunately in the projector the color is not really looking too well but all of this stuff on the bottom has basically this color so what it means all the stuff that's green that's the new stuff that we allocated and everything that's white that's the old stuff that we still reference to so we can actually see that the root node had to be copied because it is always on the path root node is always on the path to any leaf so it always has to be copied but the zero on the root node is pointing to the whole branch of the old tree that we don't care about because we don't modify it so we just keep the old version of it and then because we set one zero we had to put at the index of one new copy of the other of this branch and then f remains where it was and we set zero to z because the second part of the address is zero here so at the leaf node we set zero index to z cool so that was the most like... somebody wants to look at it a bit more because this is like the most important thing how can we actually accomplish nice performance characteristics with updates so let's do the intuitive reasoning here so three is log levels deep affected part is log nodes long and then log new nodes has to be created because the whole path has to be recreated then each node has 32 elements so that means after we create a new node we have to do 32 assignments which is more or less 33 operations that are log 30 to n long but with like performance proof we always drop the constants so it's... because it's not important we want to see the order of the expected performance you're not the exact value so log 32n is again the same as for lookups it's a bit more work so it's obviously slower than the lookup so now with all of that in mind we can return to how we want... or why... why would we want to use immutable data so a case for MVC frameworks that's very often that's Ember, that's Angular, that's React, that's anything that you use they all do the same thing they ask your data if it changed in any way so because MVC frameworks need to know when to update the view and the view is generated from data so you have to check if it changed somehow so with mutable data either one recursive check or many pinpoided checks so for example in Angular anytime you use a value in a view there's a new watch expression called on the thing that you bind to so you have a lot of watches for everything basically you have a lot of watches you can... sorry when you do recursive checks it's slow the same thing is if you have many many pinpoided checks if you do recursive checks again it's run hungry because you have to actually have two two copies of an array in memory at the same time to compare each value on each position so that makes the MVC framework unhappy so a case for... sorry again so with immutable data only root reference on the root node reference has to be checked because whenever you change anything inside a tree you get a new root and you have to keep this new root because otherwise you haven't changed anything because the old version is immutable so that's pretty cool the only thing that the MVC framework has to do is to check the reference on the root element so that's just crazy crazy crazy fast and it's also very much more efficient if you use the structure sharing that Clojure uses so that makes the MVC framework very happy so using immutable data might actually speed up your MVC application so there obviously the immutable data is a bit slower than the immutable one but when you want to do things like checking recursively if anything changed that's just... that can outweigh the disadvantage that it is a bit slower to manipulate but you can also choose where do you use immutable data you can either use immutable everywhere or like inside a function you do immutable things and whenever you pass it between you pass immutable versions or on module boundary you do the same thing or you do it between you and third-party boundary or you use immutable data if you really want to that's just up to you, that's your choice but in a functional program and a perfect maybe functional program you would make the data immutable as soon as you get it from like HTTP request or file or whatever then use profiler which you probably use anyway to find bottlenecks in your program and then if the immutable data is the reason of it you would change it to mutable but only in this small part and you probably should convert it back to immutable when you're done doing your critical things so in a functional program mutable data is considered a type of premature optimization so you optimize with like one of the techniques of optimizing is just making some parts of program mutable so how do it in JavaScript? there's a quick reminder there are things that are already immutable there are numbers, things and balls which means any primitive types that are already immutable so you don't need to do anything actually without it and they are perfect so we have existing libraries one of which is mori which is probably most popular it derives direct from closure via closure script and it's basically a JavaScript wrapper you close your script collections the main thing I found with it my main problem is that you give it a tree and the only root level is immutable so basically it's non-recursive whenever you pass it a tree only the root node will be immortalized and the other parts will remain as they were so this is the same situation as with object freeze recursively by yourself so yeah, I just mentioned it's just basically the whole collection API from closure script it's pretty cool if you want it, it's very powerful so that's the thing, yeah, of course but I wanted to experiment a bit with both interface and implementation and have created an ancient oak which is basically closure immutable data library for JavaScript data trees which really emphasis on trees because it seems like nobody is doing it this way so what do you get? so you give it a whole tree of data and it actually processes it recursively so you don't need to wrap anything by hand, sorry so what it does, you give it a data and it gives you a function that basically cards the access to this data to the whole tree with various update iterate methods on that function so, example you give it a very simple tree with an array inside and you get a function with set patch map and a bunch of others also, yeah, it's very easy to get a data in and data out and they are basically have always the same structure the types are never changed, they are preserved so basically that's pretty much self-explanable so every node of the tree actually is a tree on its own so you can always take a part of a tree for example, in this part we're interested only in the B array so we just get the B from the tree and we get another tree so this is like the main assumptions why we can actually do it recursively is that we have one-to-one mapping between native JavaScript types and immutable types so whenever there's an array there's only one type it's transferred to when we immortalize it so, for example, array is as with JavaScript arrays it's sorted inside integer keys and the only difference here is that sides are reported in the size property instead of length because it's a function length is reserved for the number of function arguments object is just unsorted map with string keys the same constraints as with JavaScript objects so main assumptions are that functions and primitive types are treated as immutable because functions are assumed to be interfaces to data so the getters so whether there's a function it assumes that's another tree of data somewhere there so this actually makes it good for storing plain data but not really for anything else if you want to store like module interfaces in it that's probably not a good idea at this moment maybe we can do something with it that will make it doable so let's just go quickly through the API and we'll be done so the main part of the API is the I, big I which is the immortalizer I'm not sure about this yet if I should use the capital I and a global variable that's maybe not the best idea but that's just for this example I think that's enough so basically the data is a function if you get the A, you get the number it's already immutable so we don't do anything we just return as it is you get the B, you get a function because it's the gather for the array and then if you want to get the value from that array you basically pass it in the index so just as you would chain the square brackets with multiple nested things you can do the same thing by just like parenthesis which makes it for a very easy understanding and transition how to use them compared to immutable data so that's the same array we just do here dump and JSON so it's a very easy way to get it from so set is always returning a new version here so in this example we do two sets we set C to 5, A to 4 so the C to 5 returns a new version and then on this version we do another modification basically here I just wanted to show that we can chain them easily so if you do dump on the V0 we can see that it's still unchanged if you do dump on V1 then both modified variables fields will be there then remove an address in that tree so in this example we say RM remove D inside the B so first get to the B subtree and from that we remove the D so you can see that when you dump it the B is still there but the D from B is missing sorry about those names they should probably be like G and K so they don't sound so similar so update is just applying it's like a one-off map you just map a value on one thing basically so we update one version we get the value and the A is here increased by one and push I think this is the coolest thing here actually this is how it works you give it a diff of the tree that you apply on the old tree so in this example we have A field, B array and that has some stuff in it so we say set A to 2 and inside B set 0 to 4 and 3 to 5 so those numbers in the B are actually indexes so when you do dump we can see that A is updated to 2 and inside B there's 0 indexes updated to 4 and 3 indexes is updated to 5 there's an empty element in between never mind so iteration that's boring stuff because it always looks the same currently for each map in reduce it's mostly the same semantics as native I just want to be compatible here only reduce is a bit incompatible that's an easy fix we always require the init value for reduce but that can be changed easily so one thing about map that's interesting maybe that it always returns the same type of collection so unlike underscore for example if you do map on an object it will give you an array here we assign for the same keys the return values so we just increase every value in this array but the type keys remain the same another thing if you use data as a function you can do crazy things like you're applying data on top of other data here so what it actually means that the data will be cold with first A then B and C so what it means is that the first iteration will get 1 the second iteration will be B which means 2 and the third range will get C which means 3 so you can do that that's your thing actually in closure that's very common to use this kind of thing even to like create sorry like if two sets and you want to like the common part of them then you would apply one set on top of the other set something like that so about AncientGog it's every stage experimental the target is to handle data that can just sonify so trees basically and not graphs without any loops and stuff plans are to tweak it for speed to really make it performant and there are some open things that I haven't resolved yet so for example is the API cool or not and if you have any ideas about it and how to store dates because that's the last thing probably that's very common and that's data pass between servers and clients that we still don't have no way of storing it either you can just put in a string but that's not really a take time the perfect way would be to have a way to interface with date, getters and setters each time create a new version so if you have any suggestions how to handle this I'll be very nice to hear so that's pretty much it and any resources for you is very cool post by Jean Niklas Lorange I hope that this is how you pronounce it by how those things work together and there are some things that I haven't covered like you have to sometimes grow and shrink the tree the other constructors and there's some things like we're going to have our docs included so you can actually open your developer tools and start just experimenting experimenting with it my twitter and the talk can be found under this address I'll post the talk to the AmberJay's meetup group sorry comments so we're running a little bit behind time so do a couple quick questions and then we'll move on to James how much did it take how much did it take well we started late it wasn't until 7.30 now so in terms of questions yeah because the browser is single thread how is it going to help me so in the closure it's one of the reasons is that you have multiple threads and you have conflicts when they override each other so in here mainly the reason why you want to do it is the security and also the MVC scenario here I was thinking this is cool but yeah you don't have multiple threads but if you want to pass data from one one thing to another I think it's nice to have this guarantee that they won't modify my data that's the main reason basically so why not just using closure script yeah I had actually a slide on it but I thought hey somebody will just ask a question about this so yeah if you're fine with closure and closure script you can just use it and I was really interested how can we do that in plain java script and the chase of language is usually very subjective and in my opinion java script is a good enough language and I really like just doing stuff with functions and plain data you can do it in closure too yeah I just there's a reason why I don't use for example coffee script that's just my preference java script is not enough to be replaced and it's good enough for my stuff yeah so that's just basically subjective for technical reasons that closure is quite big closure script is big too I guess you don't want to have big stuff sort of big third-party library in your in your code I just like to try if the outcome of it is that you will try coffee script that I will be very happy because coffee script is cool and closure is cool so yeah any other questions thank you thanks