 Okay, so I'm Andreas Rumpf, the original inventor and still lead developer of NIMP. And so this is my talk Move Semantics, which is the new stuff coming to NIMP, inspired by Rust and C++, but we tweaked it. So let's get started. So the unofficial motto of NIMP is actually copying bad design is not good design. So it's already a useful motto because it tells us what not to do. So yeah, we should not copy bad designs. And more useful than knowing what not to do is knowing what to do. So paraphrasing this into recombined good pits from several sources. And so that's what we did. We looked at, as I said, Rust and C++ and Swift, how they do memory management. And if these concepts also apply to NIM, and it turns out the answer is yes. So here, this is an example. So I have an array with two elements inside, and then I append the number three to it. And so this is a growable array. I think in C++ it's a vector and in NIM it's called a sequence. So here is what happens in memory. We have this growable array, and it actually has a length and a capacity and a single pointer to a block of memory that can grow. And when we append the number, we need to, and since the capacity is already full, like we had capacity for two elements, we need to create a new block of memory, which is big enough to contain all three numbers. And we need to do something with the old memory block. And usually, you would say it's a realloc in C and it would free the old block immediately. And now this is the most effective way of doing things. However, it causes a problem, and that is, is that? Yeah, okay. And the problem is that if I have other aliases to this pointer, I must ensure that it doesn't cause a dangling pointer. So here in line two, I say I have these other variable, and it should have the same contents as some numbers. And if I do a shallow copy and just copy all the bits, then I would copy the pointer, which is invalidated in line three by the append, causing this to contain a dangling pointer. So that would be very, very unsafe, and it's a very bad idea. So to solve this problem, there are a couple of solutions. One is to deep copy the elements in the container, which is what C++ does, and also what NIMP semantics do. You could also say, okay, so let's have a pointer to a pointer, so everybody gets the new update. This is done in Java and C sharp, I think. But it's slightly less effective efficient because then you have another indirection. You could also say, well, this is an assignment, but it's a bad assignment, so let's just forbid it. This would be a terrible solution, but yeah, you could. The first solution, as I mentioned, you just have a garbage collector cleaning up this bad pointer for you, or only if no other variable refers to it. Finally, we could move it, and that is the fifth point here. So we could steal the block of memory and perform a move, and that's also available in C++. So this would be an explicit move here in NIM, so you can do this. You can say, I'm gonna move these numbers over to other, and then afterwards the source is invalidated. So it becomes an empty sequence. And so if you then append the three, this is the only thing that's left inside there. So as you can see in line six, after that some numbers only has the three inside. So this is the explicit move. I mean, you can try to program in this style, and it's not really pleasant. So but if it's explicit, it's okay. Like you are aware that some numbers is empty afterwards. But there are plenty of cases where you can move implicitly. The first example, famous, if you have a result of a function call, you know it's not gonna used afterwards, so you can move directly into the variable A. Yeah, and then you could also say that if you know that it's not used afterwards, this is what you can do, you can move it. So and one design goal was to make this work. Like okay, we know function calls can be moved, but I want to be able to name my results for readability without performance overhead. So as long as named value is a local variable, the NIM compiler can see that named value is like it's used for the F call and that not afterwards anymore. So it would move the value, the named value into the F, and then it would move F's result into A. And here's another example. So I have a list of three integers inside. And if I say Y is X, then since X isn't used anymore, we can move and likewise for the set equals Y. And so this works for local variables. So let's think about parameters to cause problems because we don't know if the value that's been pushed to our put is used afterwards. So this example is about like it's pseudocode for a hash table implementation. I mean usually it would more be more than two lines, but so we hashed the value. We want to move this key value pair into the T. And given the current semantics, this would mean to make this expensive copy operation here. But you can annotate this parameter values to use the sync keyword. And then it's like the constraint that afterwards it shouldn't be used anymore. It bubbles up to the call chain. So now because it's a sync parameter, we know it won't be used afterwards. And we can perform the move in line three. So again, if I have values like a list with three strings inside, and I don't use them afterwards, I can move. Now what happens if I use values afterwards? Then we, since we want to take ownership of the guts of this object, the compiler produces a warning for us telling us like you are about to sync something that is used afterwards and I will make a copy for you. So to ensure safety. So this has also been a design criterion. If you get it wrong, the performance suffers, but no weird crashes place. So and that's true. And the compiler warns about the performance aspect. And currently this warning is overly like it's a bit too aggressive. So I need to make this a bit better. So one solution here would be you move it around. If you echo the values before you embed it into this hash table, then it would work because the compiler knows echo doesn't want to take over ownership of values. But table put does because of this sync annotation. So that's one solution of course, like if you are just adding some code for debugging purposes, you don't care if it causes more copies or not because this code will be removed soon again afterwards. Okay, so as I said, it's the sync parameter is an optimization. You don't have to use it. If you get it wrong, performance is worse than before. If you get it right, you get better performance. And we're also working on inferring this property so that you don't have to annotate it at all. Because actually I went through the standard library trying to add these sync annotations everywhere and I was like, yeah, no, I'm not going to do that. I let the compiler figure this out. So anyway, here are a couple of favorite examples. So we have a hash table and this is this put putter or whatever, like insert or update. Then we have equality on some generic type T or plus on T on finally append or add on this global sequence. And the question is where to put the sync annotation. And you don't have to guess, I'm telling you. So, sorry, it's not really. Yeah, okay. So embedding stuff into a hash table takes the sync annotation and the append for the sequences takes the sync annotation. And now here is, this is an, so the first line is an insert or an update. And if I insert into the hash table, I also want to take ownership of the key. But if it's just an update on the table, I already have the key and then what happens? Should this be a sync or not? Well, I don't know. But the thing is, if you do this with a sync, the compiler will actually ensure that this value is consumed for all cases. So you don't have to do that. And there is some notion of what it means to consume something. So we are going into destructors anyhow. So there's a different problem. Okay. So now I can put stuff into a hash table very effectively. That's good. How about, how do I get values out of it? And the game, again, the same problem like, result equals is the same as return statement in this case. But I wrote it as an assignment so that it's more obvious that this, again, is an expensive copy. So, okay, we can now try to move this. And then you would, the compiler will complain that T is actually not mutable. Like, you cannot move out of it because move mutates the source. Okay. So, and then let's, let's make it mutable. Okay. This, this works. But now you need to think like, what, what happens? Like, you move the value out of this table. So you can access it exactly once and then there's, it's, it's gone afterwards. So that's pretty bad. Unless maybe, I mean, if you have a pop operation for your stack, that's exactly what you want. But for a hash table, it's pretty shitty. So yeah, we need another annotation which is lending a value or lent V. And this is then a borrowing operation. So in Rust, this would be a borrowed cop pointer. In C++, it's a ref. So it's, it's actually the same thing. And yeah, you, we need to ensure that the, once you borrow that this doesn't outlive the, the collections lifetime and stuff. But it's, it's, yeah. It's like in Rust, but not like in C++ because where it can't be checked. Yeah, okay. So the point is in Rust, it would be checked and in C++ it wouldn't. And in NIM, it is checked except that we need to be better. So, okay. So now that we have, now that we understand how to optimize complex assignments like deep copies or whatever you want to call it, we can apply this knowledge to something else like take reference counting. The reference counting is basically the pointer assignment just simply got way more expensive than it used to be because if I copy a pointer around, I need to increment the reference count of the source. I need to decrement the reference count of the destination and then I can do the pointer copy. But if I'm able to move the pointer, then that could just be this bitwise copy and maybe nilify the source afterwards if, if required. So this insight led us to the development of a new garbage collector mode. I mean it's called GC but GC is actually NIMS name for any kind of memory management that you want to. So, and here I have a benchmark. This is the binary, like this is a standard benchmark for throughput of a garbage collector. And so I'm not, I don't expect you to understand all of this. But the point is now here all the annotations like sinks and lens, they are not there. But even so they work under the hood for us. And so we create binary trees and then trillions of these to some depth. And this is the main part. As I said, this is standard benchmark and the results are really, really nice. So we have a couple of garbage collectors so we can compare all of them. And the new one is fastest by quite a lot. Like a factor of three or two or whatever you want to compare it with. And memory consumption is about the same as before. For BermGC I haven't been able to figure out the memory consumption precisely. So that's not available. So this, yeah. So now the question is, okay, this is very better than before. How does it compare to manual memory management? And NIM can do both. So you can use your own PTRs. So previously this was a ref in line four. Now it's a PTR. And to make a tree we have this nasty allocation with a cast in line 12. And of course we need to free the tree manually. So this is a recursive free. So first free the left and the right and then deallocate this node. And again this, and now in the main part we have to free these trees manually which is very annoying. So in line 18 for instance you can see this. Or in line 15 where you actually had to introduce a new temp variable just to be able to free it later on. So, and the result is it's still slower. I'm sorry. But here is the thing. What Arc actually does, it does optimize reference counting. And what the manual version does is basically I don't have a reference count because I know these are unique pointers. So if you add just the machine word for this reference count back to this manual version, it's back to almost the same to six dot two seconds. Whereas Arc is at six point seven seconds. So there's still, and the memory consumption is identical under the assumption that I fixed this one buck. That's left. So we are getting close to manual memory management. And for this particular benchmark I think we can get to the difference into the noise level. But no they are not. We will get to that. So I'll get to that. Okay. There's a different benchmark for latency. I don't have the source code for this. But previously we had a soft real-time garbage collector. And the latency was 0.0.3 milliseconds for this benchmark. And now with Arc it's better by over a factor of three. The total runtime has been reduced also. And the peak memory consumption is also better. So not just throughput is better but also latency. Now I've already outlined it. So what's going on under the hood is that we have destructors and move operators and assignments. And we can exploit them for other things like they are exposed to you. We'll see in a minute. So you can now make your files close automatically after you. So that's very nice. And there's better composition between these custom containers. Previously we had manual memory management and GC memory management. You must be careful not to mix them because it doesn't really work well. But with these extension points the interop between these two worlds is much, much better than before. So here's another thing that we now can do. Again the same benchmark but now we want to have some object pool. I think it's better called an arena. So we have an arena allocator still dealing with these silly nodes with only have two pointers inside. So to allocate a new node we basically check if there's a capacity left kind of like for a sequence. But the node itself is an unchecked pointer so we take the address of the element in the array which is the backup storage for our node. And now here we can say look if you want to copy a pool it's not supported because I couldn't be bothered to implement it. So if you try to copy the full pool around accidentally the compiler will complain and tell you like no you can't. And if the pool goes out of scope the destructor is called so this is in line three and of course what do you do in a destructor? Well you free the blocks of memory that's been and they have been chained in a linked list via this next pointer. And then you need to change the program unfortunately so if you want to make a tree you need to be aware of this pool where to get the new nodes from and so this becomes a parameter of this make tree and recursively you need to pass it on as you can see in line 11 and 12 and now the bench. Okay. And now it's a bit easier to use because these pools are free for us afterwards automatically. And in this case I had to make two pools like one for the long lift data and one for the short lift data so that you can see this in line five and 14 and the question is how does it perform? So that's the result it's still much faster over a factor of two performance improvement and memory consumption is roughly the same I mean. So yeah. Okay so in summary move semantics mostly work under the hood for us they give us really good optimizations five minutes left. Yeah we have seen the speedups and they make the memory management deterministic so what's actually the case here so if you use a reference counting scheme and optimize it what you can do is you can attach a cost model to your programming language and once you do that you get into the realm of hard real-time systems so you can use NIM for hard real-time system with this technology we have seen it improves throughput, latency, memory consumption and threading well I don't have an example but as you can imagine if you can move data from one thread to the other it's and because you are guaranteed that this is kind of the last user of this data you cannot have data races so it's a very nice feature of that yeah it also improves the ease of programming like just imagine your files close automatically and your sockets and we get a better composition between these different container classes let's say okay yeah you can play with these benchmarks I have uploaded them to GitHub and yeah if you don't know already this is our website and the forum and we are active on IRC as well so that's my talk thank you for your attention also the next speakers will come up and start preparing okay questions yeah so is this available right now or is this still a work in progress well it's on GitHub in the development sorry I need to repeat the question is how to get it it's not in the version one of the language well it will be our next major release this year but you can already use it if you want to if you use the development version of them yeah yes the threading no okay oh sorry sorry okay next question yes for what ah owned refs so yeah we've been tinkering with owned refs which is kind of unique pointers NIMS style but I did these benchmarks and I noticed that it doesn't pay off like it's it's so for instance you can use this binary tree benchmark with owned annotations and use this new runtime and I got the same number so in this case we didn't see the benefit and so currently we do not know what to do with the owned yes so my question was how do you define visible after and in particular how subtle are you when the if statements and stuff like that okay okay so the question is okay void safety void safety very like from so void in Eiffel Eiffel yes yes void safety okay okay okay okay okay so two questions the first was how do we know that it's the last usage of a variable the answer is we have a control flow graph and we track these precisely ah well it's in it's conditional then we notice like say you move it in every case then or you how to put it let's say you move something in a loop and it's used it's moved every time except the last one yeah we notice that I mean the compiler will tell you I cannot move because you use it for the next loop iteration no exactly we are smart enough about it in practice okay yes the second question is void safety on ill-ability it's ah it's also like work in progress like we want this not nil annotation in the language to to guard against this and you're right like if you use an explicit move you need a value to have afterwards to put inside the source and then you need to make the pointer nil-able just for that right but if you use an implicit move we know it's not going to use afterwards so we can simply pretend that nil is not a really value for your pointer so the reason I was pointing to void safety is because in the case of iPhone I decided to have things like if, blah, then else and I can say for instance if my test is specifically on this test then the then part is considered to be void safe and the else part is not we do the same thing for moves we decide that if I have my move semantics inside an if inside if I'm in the position for the test then I can keep the move semantics in the else I cannot that's what I was referring to for the next one I need to think about it you do have time for another question if you want really? ok, more questions we'll have to wait ok, it's not actually a question ok, good I just want to say a big thank you because I learned a meme last year in Boston in a presentation landing talks and I have to say a big thank you because you saved from the dilemma if I had to use Go one last time like the other you're welcome