 Welcome back. It's the first stream of 2020 and this is going to be probably the last stream on porting the concurrent hash map from Java to Rust. This will be part three, I think. And we did part two about a month ago, so I'm going to spend a little bit of time like walking through where we left off last time so that we're all on board with what we're going to do this stream. For those of you who don't know me, if so, it's a little weird that you're on part three. You should probably start at part one. I am John. I'm a PhD student at MIT and I do a bunch of these live streams where we try to build like, write real Rust code that does interesting stuff and really get into weeds and program something complicated. If you want to support me, I have an Amazon wish list where you can get me stuff that I need in my life or want in my life, I guess. The code for part one and part two is already up on GitHub and I'll push the changes that we make today as well to here afterwards. Hopefully, by the time we finish today, this page will be a lot more helpful, like my guess is in addition to doing things like writing tests, we're also going to expand upon the read me, write some decent documentation, maybe set up CI, a bit of the sort of more meta stuff for projects. The Java code, I think I've linked to in past streams, but the Java code reporting is available online under a sort of weird license, but one that allows us to publish a port with the same license, at least as far as I'm aware. If you know otherwise, please tell me. And the Java test suite is also online. And so all these files I've linked from the repository. So if you look down here, the source file is linked here from the GitHub repository. And also the tests are linked right here. Before we dive in, one thing you should know is also that this will probably be the last stream for a little while. My plan is to graduate sometime the end of this year, as in like somewhere in the September to December range. And so I need to like actually sit down and do that. So I don't know how much time I'll have to do streams in like the coming months. I'll try to get in like maybe a few more in 2020, but I don't know when they'll be. Just so you're prepared for that. I am not planning to stop though. Like, I think this is really fun. I think it's interesting to build stuff in this way. And so even though there might be a bit of a hiatus, the plan is to keep going. All right. So let's dig into the code. I think that should be large enough for everyone to see. So the code we had last time, where we let's make that a little bit smaller to fit some more on screen. That's probably good. It's a pretty straightforward port of the Java code. The trickiest parts you'll remember from part two is dealing with all the safety guarantees and dealing with garbage collection. While we did the implementation, we left a couple of to-dos for ourselves that we might deal with in this stream. Things that I marked as to-do here are things that aren't necessary in order to have a working implementation, but they're things that we will want to fix before like not just a 1.0 release, but before like a public release. Like either because there's safety guarantees that we need to really write down the explanation for or there are API changes that we really want to get to. So for example, the get method for this hash map currently returns you this shared type that you might remember from Crossbeam. And it's a little awkward for this to be exposed in the public interface. It also means that if we say upgrade the version of Crossbeam to a Sembrer incompatible version, then that will also require us to update the major version of this library because we directly expose one of their types. And so that's a little awkward, and we probably want to mask this at the very least behind a new type, but maybe something more than that. So one way to do that, for example, would be to stick. If you recall with these shared, you need this notion of a guard. And that is that is why this currently takes a guard in one option here is forget to construct the guard itself. And then first to wrap that in some some kind of returned guard type. This is something we can think about a little bit later. Let's look at what other to-dos we have. Yes, we did not implement the Java version has this notion of upgrading. If you have a bucket that gets very long, for us the buckets are linked lists, then if the bucket gets very long, it gets converted into a tree. And that's an optimization that we have not implemented. But it's something that you'd probably want to add. But it's something that is entirely behind the scenes. And so it's not something that end users will notice except in terms of performance. Let's see what else do we have that's a to-do. This ordering can maybe be relaxed. Okay, that's somewhere we're going to leave for later. Treeify, if you notice that you want to turn something into a tree, we can ignore that for later. So add count, this is used for whenever you add or remove items from the map, then add count is used both to keep track of the number of elements in the table, but also in order to figure out whether we need to do resizes of the table, to grow it in particular. I don't think there's a shrink that's supported. And the Java version does not just use like an atomic integer for this, it uses this notion of a counter cell, which as far as I could understand it, we'll have to go back and look at earlier streams. A counter cell is essentially like a sharded counter. And the reason you want to do this is if you have just one integer that, like one memory location that holds a number that all the writer threads are going to be adding to, then you create a lot of contention on that one number. And it's not really, it might turn into a scalability bottleneck, because if every write has to go through that one number, then barring some kind of concurrent optimizations made by the CPU, because it notices that they're all ads or something, it could become a point of contention and something that reduces the performance in highly concurrent scenarios. And so if you have a sharded counter instead, you might have, for example, one counter per core, then when you increment it, there's no cost. It's only when you have to read the value of all of them that you have to pay that cost. And the original Java code has a bunch of code for dealing with that situation. And here we've simplified it into just having a single integer counter because it was just easier. And this in some sense comes back to the old adage of like, you don't want to do premature optimization, right? The counter sell business is an optimization. And I don't think it's premature into the Java code. I think it wouldn't have been added unless they found that it was useful in real world scenarios. But for us, when we first port this, we should start with the simpler version, and then we can add the optimizations later. This might make for good PRs down the line. Given that this code is a little different from the Java code, this is one place I suspect there might be bugs, right? Because much of the other code is almost like a direct transcription of the Java code into Rust, whereas this code is not, right? This code is different. What other to-dos do we have? Yes, so this, you'll remember in the concurrent hash map, if there's a resize, then the way the Java version does this, and this is pretty cool, is if a thread tries to do an insert and it notices that the table is being resized, it joins in the resize effort, right? It joins the thread that is currently doing the resize in helping it with resize. And the reason this can work is when you allocate a new table that's sort of twice as large and twice as many buckets, you need to move all the elements from the old table to the new table. And each bin can basically be moved independently. And so you can have multiple threads that are assisting with the resize. And this is size CTL, so size control field, that's atomically updated and managed in order to keep track of how many threads are doing what work. And this is one clause where, for some reason, it adds two, and I don't know why this is. So this is something that I think we'll want to look at why this code does what it is, like figure out why this is an RS plus two. And I think we spent a little bit of time thinking about this last time, but it's something where once we figure it out, we should probably add a comment. This is one thing where the Java code, let me pull that up here. Actually, how about we do, yeah. This is one place where the Java code is not great at comments. So the Java code has a lot of top level comments for how the thing works. But it doesn't really dive into, let's see if I can find this here. Talks a lot about the representation, but it doesn't necessarily talk about all of the different fields. Like, for example, when we get down to size CTL, he talks a little bit about this, but in the place where it does RS plus two, this place, there aren't any comments that explain this code. Even though there's an explanation of what size CTL is, this code is not very well explained. And so I think once we figure that out, it would be good to update the documentation here. And we've tried to do that as we've been doing the port. You'll notice as I scroll through here as well, a bunch of the safety arguments that we made last time, right? We looked for all the blocks where we needed to write unsafe, and we documented why we believe that they're safe. And this is also something that the Java code does not have, in part because there's no notion of unsafe in Java, even though they do, they sort of assume that certain pointers are valid. And here in Rust, of course, we have to use unsafe to basically tell the compiler we promise that this is safe. And alongside any such unsafe block, it's sort of good practice to document why do we believe that this is safe. I think even there's a clippy lint, I think, that requires there to be a safety explanation above every unsafe. But I haven't used it myself. I just learned about it recently. Yeah, so this is essentially if you have many threads helping to do a resize. Currently, we just sort of hard code how many let's say you have four threads, right, they need to know how many bins to skip when they have to pick the next bin that they're going to move. And this is something that really should depend on the number of CPUs, like there's no reason to have more threads helping with resize than you have CPUs, for example. And here's something where we also just like, don't deal with that currently, but it should be easy to add. Okay, I think that's all the to dos. And then we left one fix me to fix me. Oh, I remember this one. Okay, so this is a somewhat complicated safety argument. And here, there's a particular case where we couldn't quite figure out what the argument was for why it was safe. The Java code assumes that it's safe. And so we in some sense, we have it on good authority that it is safe. But if we want to stick to this notion of giving the safety guarantees explicitly, we should finish this paragraph for why exactly this is safe. And the other fix me, which is where we left off last time, is starting to write tests for this map. And so that is actually where we're going to start today is write a test that creates a map and does nothing with it, and write a test that creates a map inserts into it and tries to read from it. And these might seem like simple tests. And it might be that they work like the first time we try them, it seems somewhat unlikely, but we should try it. And we're about to find out. Let's look at, did we actually add a test directory? No, we did not. I thought we did. I guess maybe I was wrong. All right. So the first thing we have to choose is whether we're going to write these as unit tests or write them in the same directory or same, like basically within the the crate itself, whether we're going to have a separate sort of test directory. I like to do the ladder unit tests are best for testing things that are internal only, that you can't access through the normal APIs. So we might end up adding some of these eventually. But I think for the time being what we want to add are our integration tests. So I'm going to do make their tests here. And then we're going to say tests. Let's call it just basic. The reason I call it basic here is because we probably want different testing modules for different different things. So for example, I'm going to assume that we're going to have one test file for all of the Java test cases. In fact, I think there are multiple even. There's going to be a little bright for those of you with we're in dark rooms. Yeah. So here the the Java concurrent hash map comes with a bunch of tests. And all of these we probably want to port into the Rust version. I wonder if there's just like a straightforward, what's map check? Oh, these look like they're, yeah, these look like relatively basic tests. But I want something that first of all, I want to keep all the ported tests in their own files. This will make it easier to like if imagine that the Java authors add more tests to these, it'll be easier to keep the Java ones and the Rust ones up to speed because they'll sort of map one to one in the files. But also I want some tests that are just like completely dumb and are just like, I'm thinking things like new, right? I want to see that new doesn't panic. And here, of course, we're going to do use flurry star, just because it's handy in the test. And let's look at in source lib, what is the name of our type flurry hash map? map is flurry hash map, new, do we even have a new? Do we have a default? Okay, so we don't currently have a way of constructing a new flurry hash map. So that sounds like something we need to fix. Maybe first. So let's do pubfn new. And in fact, this is a good question. We don't even necessarily know how you create a new one of these, because we didn't implement the new method. And if you look at it, like, there are a bunch of fields here that need to have sensible default values. Alright, so let's look at I wonder if it's called, what's the constructor called in Java? I forget how constructor. Let's read objects. Oh, it's just called the name the same as the I think it's just called the same as the type new empty map with default initial table size. Alright, so you'll notice they have a bunch of different constructors here. And they all seem to call to this one. I see so there's a there's sort of a default. You have to have with the default initial table size. Okay, how is that initiated? I guess for each field, there's probably a default value then. Um, let's look at a feel like size CTL. Interesting. It doesn't actually say anywhere what the default value maybe the default value there is just zero. If we have with the Yeah, maybe these really are just default. Interesting. Because it looks like the default constructor here doesn't change any of the fields, which presumably just means that they are their default value. But then where does the 16 come from? Let's look for 16 default capacity. Okay, so where is default capacity used in it table? Well, in a table seems pretty promising. Okay, where does in a table called Oh, you know what? I think it actually just sets it to null. And then insert is the thing that actually initializes the table. So this is why the this is why the constructor is empty is because it doesn't actually have to do anything because insert will already do the creation of the map for us. Alright, so let's look at what new is going to look like. It's going to be sort of self. And then we're going to have I guess table, which then I suppose might just be null. Like maybe the intention here is we set all of these to null. We set count to be atomic use size, new transfer index just defaults to zero, I guess. So this is going to be zero transfer index is going to be zero, size CTL is going to be zero and build hasher. Right, so here we actually need to take here we need to take out of the book for the rust hash map. Well, you notice if we do this new is implemented on. So notice that the map itself is generic over Kv and s where s defaults to random state. And you'll notice that new is only implemented for s as random state. And what is s here? Well, remember that we need to have a way we need to be able to choose how to hash keys into bucket indexes, right? So you need some kind of hashing scheme. And random state over here. Random state is a thing that constructs a hasher that has a random initial state. So this means that all any given key is going to initialize is going to hash to the same value. But if you have two different random states, and you hash to the same value, the same key, you might get different hashes. But within each one, they'll remain consistent over time. The reason for this is to provide sort of resistance to adversarial choices of keys. There's a lot more you can read about this. For now, just think of this as one way to initialize a hasher and a sort of secure secure way to initialize a hasher. No, size CTL has to be eye size, because it can be negative. But specifically, you'll notice that hash map new is only implemented when the s is random state. And then you'll see that for the generic s, then it's a with hasher method that actually takes the s you want to use. And I believe we also set ours to be s equals random states, where we're in essentially the same position as a standard map, where we want to allow the user to provide their own hasher should they so choose. But for the new method, we actually want to say just the same as for the standard one, where this is going to be random state. And then it's not going to be generic or s. And in that case, build hasher is going to be random state new. And in theory now, this should work. Let's just try to compile and see what happens. We get a number of what is this 427? I guess there's some of this, I wonder. Oh, we don't currently use the resize hint for anything. So here, I guess we'll do use the resize hint. This is something the Java code uses to be smarter about when and how it does resizes. And it's something we didn't implement. And so I just want to silence that warning. And load factor is also a to do actually use the load factor. Actually, let's look at how does the load factor used here. It's used in read object right object. And in the default constructor, I wonder why that's even there. Load factor, it's this thing. Alright, so what is load factor used for here? It's used to set the number of buckets. Okay, so this is if the user says what capacity they want us to support, right? So hash maps generally have this constructor that's called with capacity. If the user gives that, then in order to support that number of elements, our table actually needs to be a little bit larger than that. Because we want to resize once we reach a certain load factor. The reason for this is at some load factor, you end up with too many keys that has to the same bucket, even though not every bucket is full. And so that's where this comes in. So it's actually correct that this one is not currently used, because it will only be used to compute what the sort of size up we use should be. Alright, so I guess that means that we'll do here, just for our own Saturday's sake, we'll do a with capacity, n, right? So this is the same thing that the standard library has. And what does that actually do? Well, it, we're not going to allow concurrency level to be set. So what does the thing that just takes the capacity concurrency level of one? Okay, we'll just blindly copy what they do, right? And so there's like an assertion here that the n is greater than zero, which I guess is really a assert ne. If initial capacity is less than concurrency level. Initial capacity here is the number that they pass in concurrency level is going to be one, because we don't let the user set it for now. And the initial capacity can't be zero, so this condition can never be met. So it's really going to be these. So really, what's going to happen here is we're going to create the map using new. And then we're computing, we're computing here 1.0 plus the initial capacity divided by the load factor, where this is really a SF64. And this whole thing as I size, probably use size. And then the cap is going to be if the size, if the size is greater than the maximum capacity, then we limit it to maximum capacity. Otherwise, we do this table size for, and then we say m size CTL is equal to that cap, and then we return m. What does table size four do? Do we already have this table size four? No. If the n is equal to zero, should you return error rather than assert? That's a good question. My thinking here is it's really annoying for with capacity to return a result, just for the one case where n is zero. I would probably instead just document that with capacity, the value provided to with capacity must be at least zero or it will panic. It would be totally, it would be a legitimate choice to make the return type a result. In this particular case, I think it's probably overkill. Okay, so what does table size four do? Returns a power of two table size for the desired given desired capacity. Really, is this really just next power of two? Why is it not called that then? Section three. Do you actually have to buy it? I have this book. Actually, I have this book right here. Hacker's Delight. Let's see. Section three. So Hacker's Delight is an interesting book. It's basically got all these neat, weird tricks you can pull. Three, two, rounding up or down to the next power of two. Let's see what they say. Minus one left shifted by number of leading zeros of n minus one. Okay, that looks like the next power of two. But it's computing it like with, yeah, and then it's capping it at maximum capacity. Okay, so in that case, instead of doing all of that stuff, we are just going to do size next power of two, which I believe is in the same delivery. The smallest power of two greater than or equal to self. Great. And let's just make a note here. This is table size four in Java, just for our own sanity sake. And now we should no longer have the warning about load factor because we actually use it. Let's see what next unused import build hasher. Okay, so there are a couple of things here. First, this has to be an atomic i size new. And I guess it's going to be as i size. Something about this build hasher that's not quite right 45. That's not at all what I intended to do. What I intended to do was is just to have that input block be there for the new methods and then this be for any s where s is build hasher. Let's see how it feels about that. Is that didn't we make a function for this hash thing? What does our get to wait, did I just delete a bunch of code that I did not intend to delete? Where's our get function? I definitely deleted a bunch of code I did not intend to delete. Okay, let's undo that terrible change of mine. Bring get and friends back. Yeah, I don't know how I ended up deleting those. There we go. That's more like it. Yeah, it's got somehow just deleted the entire top chunk of this block. That's not what I intended to do. Next power two could be greater than maximum capacity. You are entirely right. In fact, why does it check it before checking the capacity to I feel like the right thing to do here is actually let size is size next power of two, and then say, I feel like that's really, that's really what it's trying to compute, right? Yeah, I think I think that's actually what it's trying to compute. Let's try that. Cannot infer type for type parameter K. That is true. We haven't said what the type of our thing is here. So let's say this is going to be over, it's going to be from string to string. Or maybe it's just going to be from, sure, from string to string. That seems it's going to complain about lifetimes. It's going to be from you size to you size. Oh, great. It crashed. Oh, and it's complaining about this. That's fine. Okay, we have our very first test and it fails great. So we actually have something to do the stream. It does not just immediately work converting a null shared into owned. Oh, that seems bad. I want these to go away. And the next nightly, maybe I should switch to nightly. They fixed so that panics would start here, as opposed to at the begin panic business when you do unwrap and stuff. All right. So let's look at this in drop in seven, nine, two. Oh, I know why this is. If we never put into the table, we never allocate a table. And because we never allocate a table, there's nothing here for us to adopt, right? It's going to be dropped. So I think actually, let's look at what Oh, there's no destructor in the Java version. In Java, the destructor is just the default, I assume. Is there a destructor? Is it like tilde or something? I forget. I guess not. So for us, what we actually want to do here is if table is null, then we just want to return. Table was never used, was never allocated, I guess. Okay. So our first test passes. It's perhaps entirely unsurprising because all this test does is it creates a new map and then drops it doesn't actually do any operations on it. But it does mean that we have a passing test. That's always a good place to start. Now let's see. Again, this is sort of the set of basic tests, right? We're going to do map.insert. Is it put that it's called here? No, no, we have an insert. Map.insert and we insert 42 and zero. And we're going to unwrap. We're going to no, it's not a nonwrap. This returns whether or not there was an old value. So that's just I guess we can do old and then we want to assert old is none. What would happen if you insert and then remove the element? Then we would do an allocation, right? If you insert something, then that's going to allocate the first sort of set of bins. And then when you remove it, the bins are still going to be there. It doesn't eagerly deallocate. Okay, let's see if just a straight up insert works. It does not crashes converting zero into owned. Okay, that seems like not something we want to do. What is this crash? This crashes also in drop at 803. It also crashes in drop, which is interesting. See, notice that this code is this code is not in the Java code, right? This is code that we ported ourselves or that we wrote ourselves in order to deallocate. And one thing we didn't account for here is that remember how in the Java world or in the concurrent hash map, it lazily allocates things, including bins. And so each bin can actually be null. And that's something we didn't take care of here. So if bin is null, then continue. Bin was never used. One thing that's worth noting here is the fact that we panicked in drop means we did not panic in the insert. The insert actually, well, the insert code ran, I can't guarantee that it did the right thing. Okay, how about that? Oh, oh, interesting. I see. So we actually need to, I guess, load it. It's a little wasteful because we know that we own it. Isn't there a method for doing that? Well, it's into owned, I guess. But into own fails if the thing is null. So I'm a little surprised that let's do cross beam. Let's look at cross beam here. So we have an atomic. What can we do about that? Okay, we really just need to do a load. That's fine. Okay, so we're just going to check whether that bin is null. All right. So what this means is we successfully created a map, allocated the map, right? Because that's the first thing insert will do is allocate the map, and then do the insert and then drop the map. And none of that panicked, right? Let's just let's just run this a few times just to see that it doesn't like an obviously broken one. Okay, so I mean, there's only one thread here. So it'd be very surprising if this was racy, but it's worth testing. Okay, this seems fine. And now get empty. Now what we're going to do is we're going to do a get of let's say 42. And remember that currently our get API is such that you have to provide a guard. This is something that we want to change, but we haven't changed yet. Cross beam, uh, epic, um, I forget is it just here pin? Yeah. So we need to do that guard is epic pin map dot get the guard. And then we want to assert that he is none. Right. So notice that this it's a little awkward to have to do this for every time you want to do a get there are reasons to enable this API, which is you, if you, if you know that you're going to do multiple gets, you might only want to use one guard because, um, pinning the guards does involve some atomic operations. Um, and in particular, whenever you release the guard, you might have to collect some garbage. And so there's like an argument for still exposing this API for things like batch operations. But that's something we think about later. But in particular here, we just want to check that if we do a get on an empty map, um, it actually, it doesn't crash basically. Let's see what that does. Okay, that works. Great. So get can correctly handle an empty map. That seems promising. All right. Now for the big daddy of basic tests, which is insert and get what do we think? What do we think here? How's this going to work? We're just going to do map dot insert. And then we're going to assert and then we're going to do dot unwrap. And then we're going to assert equal E and zero. Now this won't quite be right. Um, because this is going to return us a shared, right? This is again, because we don't have this guard API, this gives us sort of a reference to the value. But because it's a concurrent map, it's not just like a reference to V where V is the value type. Instead, it's a, it's sort of a wrapper around that reference to keep track of things like garbage. And so we can't quite do this. This will probably have to be a, like an unsafe D ref. This is another reason why we want want that API to be better. The reason this is unsafe is because we need to, the unsafe here implies, and we could write this here, safety, the map guarantees that it will not free something there is a shared to, right? Yet another reason why this API is just not great and will want some additional wrappers. Do you really need to make the map mutable? No, you're right. I do not because it's concurrent. Yep, exactly. So this is an important observation. One of the reasons you can't use a normal hash map in a concurrent context is because in a concurrent context, you're going to, you have multiple threads that all have sort of pointers to the map through like an arc or whatever, which means that they have shared references, right? They don't have a mute reference. They just have a regular reference. And the hash map API for things like insert takes a mutable reference. And this is why you just can't use it in that context. Whereas in our map, we want it to be accessible from multiple threads at once, which means that insert just takes a ref self. It does not take a mutable reference to self. And therefore we don't need a mutable, we don't need mutable access to the map. You cannot write to the shared. Well, actually, it's a good question. What does shared let me do? Yeah, shared. shared is not well, you can do a DRF mute, but it's unsafe, because it's a shared reference, right? But this is another reason why we want this wrapper is because shared is a really low level primitive, like a concurrency primitive that we don't want our users to have to think about. All right, see how that works. U size equals integer. Right. Insert and get works. Okay, what does this mean? This means that we created the map, we allocated the map, we did an insert into the map. And then we did a get of the key that we inserted, and we indeed got the value that we wrote. And then the map gets dropped. So this is amazing, right? This already means that the the map, like doesn't immediately break when you try to use it for something. Mm. It's still a very basic test, right? All this means that you can insert and get. But that's still pretty exciting. Right. So the obvious thing to do next is to to do this. That is we want to create an entry. We want to we don't really need the read here, but we want to update that entry with a different value. Then we want to check that we indeed got that value. And then we want to remove the value. I think we added a remove, right? We did not add remove. Let's go delete. Okay. So we don't have a remove. Great. That's fine. Create, read, create, update, read. So this is update. Really. And let's see that this actually works. One thing we'll want to do eventually too is to check that our garbage actually gets collected. This is not something we're really testing at the moment. Right. Because we don't know that the zero actually gets deallocated. In the case of a zero, there's no deallocation that happens, but we'll want to test that actually test drop. But let's see whether this works. That also works. Okay. This seems pretty promising. And now we want something that drop value because what are we going to do here? Well, what we're going to do here is we're going have a derive, I guess, hash, eek, partial eek, struct foo. And what do I want from this struct? I want this to notify on drop. And then I want an impulse drop for notify on drop. There's going to be have to be some code here and there's going to have to be some code here. And then here's the intention. If I make the value here notify on drop, I make a notify and drop here, make a notify and drop here. And then I don't even need to care about the get. What do I want to check here? I guess here I want to check that first notify on drop was dropped. And then I want to drop the map. And then I want to check that the second notify and drop is dropped. Right? This doesn't make sense. Because this is not guaranteed, right? We've implemented our own drop drop implementation that does this like garbage collection using epics and stuff. And so it's not immediately obvious that this is going to work correctly. The question becomes, how do we actually track whether or not something has been dropped? I think the way we're going to have to do this is we're going to use standard sync. Let's do arc and mutex or arc and atomic pool is probably sufficient. So notify on drop is going to contain an arc atomic pool. And then the drop is going to be self drop store true. And I guess we'll also use atomic ordering. And that is also where atomic pool lives. So we're going to have like dropped one is going to be arc new, atomic pool, new false and drop two. And this is going to be a drop is dropped one is going to be dropped as drop two. And here we want to assert that dropped one dot load is true. And here we want to assert the drop two has been dropped. There's the one thing missing though, which is that this won't actually implement hash e can partial leak. And I think the way we're going to do this is just like V, which is going to be a U size, I guess. And then we're going to implement this is going to be a pain. The problem is that hash e can partial leak are going to include the dropped here. And I don't think atomic pool implements hash e can partial leak. Because it requires atomic loads. So I think we might actually have to write manual implementations here. Like isn't that bad? This might be something we will actually need outside of this particular test. So I'm going to hoist this up here. And then we're going to need I guess hash. It's a little awkward. This is sort of a helper type that like, I'm sure there's a crate that actually does this for you. But here what we need is just so straightforward. Right. So this is going to be it's only going to hash. It's only going to hash the V field. And similarly, I guess we want partial eek. And partial eek. Notice that if we did end up doing this tree optimization, we would also have to require a word as in being able to order these. But luckily we do not. So all we need is all the while we need this eek and partial leak, at least for now. And then we don't need to drive. I guess we might need clone on that. I forget. And then we probably here want something like V is one is zero. And I guess what we could do here too, is we want to assert that it has not been dropped up here. It's a sort of an uninteresting assertion, but we might as well do it. When it was replaced by the second, when the map was dropped, when you did hash for a value, you are totally right. I do not need any of this for value. You are entirely correct. I don't need this at all. The value does not need this at all. So that was a good observation. So let's just get rid of that. Still going to keep the V that seems like it might be handy. But of course it's the key type that needs to implement hash and eek and partial eek. So we can just not do that. Let's see what this does. Oh, right. This needs to be cloned. And this needs to be cloned because we want to hold on to the arc so that we can check it separately. It's true that key types should also be drop checked. But for what we're doing right now, it doesn't actually matter. All right, finally, something that doesn't work. A field is never used. That's fine. Actually, let's make this fine. Finally, let's remove the V for now. That's fine. It doesn't work. Drop one load, 95. I guess let's move now. It failed where? At 92. Because this assertion does not hold. So this might be remember that our garbage collection is a little bit lazy, right? Our garbage collection only collects garbage when it's safe to do so. And if you recall back in, let's see where we do this into. What is this from? This is from put. Yeah, that's fine. It's this defer destroy business. And we don't actually have control over when that happens, right? So let's look up the documentation for defer destroy on guard defer destroy. Okay, so defer destroy stores a destructor for an object so that it can be deallocated and dropped at some point after all currently pinned threads get unpinned. This method first stores the destructor into a thread local cache. At the same time, some destructors from both local and global caches may get executed in order to incrementally clean up the caches that they fill. There is no guarantee when exactly the destructor will be executed. The only guarantee is that it won't be executed until all currently pinned threads get unpinned. In theory, the destructor might never run, but the epic based garbage collection will make an effort to execute it reasonably soon. So this makes this particular assertion hard to deal with because we have very little control over and exactly when this drop will happen. So this assertion is actually not one that we can easily write because we have no control over when the destructor when the instructor is going to run. However, it does make sense that after the map has been dropped at the very least at that point, the values certainly should be deleted. So let's check that that is actually the case. That is still not the case. It fails at line 95. Okay. So this suggests that the value that got replaced is just never dropped. There are many reasons why that might be a case. One thing we could try here is to do an epic pin and then immediately drop it. So basically create a guard and then drop the guard just to increment the epic and try to make sure that garbage gets collected. We might not have any control over that though. 97 is the same line. Flush. Okay, so there's flush and there's collector. It's useful when you need to ensure that all guards used within data structure. Interesting. It is true that in some sense we would like everything associated with the map to be related to one collector. But one reason that the cross beam epic stuff works this way is because if you have many data structures that all use this epic business, you can actually share their collection, their collection cues and such so that you amortize the cost. I wonder what happens if we do if we do like guard and then we do guard dot flush. Even though there's nothing actually associated with that flush. Interesting. It might be there's just no way for us to really force this, which would be a little awkward, but interesting. So it sounds like we could just have our own collector associated with the map. So at least this way, when the map is dropped, all the things are dropped, but I kind of don't want to do that. Because maybe we don't even want to. If you force the epic forward by two, then you can assert something gets dropped. I don't actually know whether that's true. I mean, we could try. Let's just sort of try to see if advancing it twice will make a difference. Nope. See what that does. Or maybe I guess it's not entirely clear when the garbage collection actually happens. Interesting. Flush might also be necessary. Well, so one one issue here is that remember that flush does not get called on the guards that were created inside of the map, right? Insert itself generates a guard for us, that it only keeps for the duration of of the insert. And that doesn't call flush. Well, but apparently this works. It's like two sufficient. It's three sufficient. Three is sufficient. And that seems to, yeah, but not always, right? So, so what this suggests to me is that we don't really have a guarantee for when garbage is going to be collected. But if we make this like 10, for example, yeah, even then sometimes it does not get dropped. Interesting. One thing we could do here is input, right? There's a we create a guard, and we could do here like guard dot flush, and see whether that makes a difference. Nope. Interesting. Unreachable statement. Interesting. Yeah, we actually need to deal with this up here. Which is this is a return. And there's a return here, which we can ignore. And there's a return here. Let's see what that does. Yeah, so. Oh, interesting. Still not always. Interesting. What's what's tricky about this is to this cargo test run multiple threads. Well, the insert happens in the same thread. I mean, it could one thing that's tricky here is it's hard to tell whether this is a bug in our implementation of dropping, or whether this is just the fact that like calling destructors is doesn't give you a guarantee about when that drop actually happens. It's a little hard to say. I wonder whether if we do cargo T test threads equals one. Yeah. So cargo test does run each test in a separate thread. And so it could be the like a different thread. How like the garbage happens to be shifted to some different thread. I think I think what we're going to do for now actually is I don't want to delete this test. But what I want to do is just ignore it for now. I'm going to pull this into here after all I ignored because we do not have control over when dropping happens. Exactly. The one thing we do have control over though is when the map is dropped, we do know that all of the keys and values are dropped. Is that true? Yes, that is true. All of the keys and values that are currently in the map because they are immediately dropped as opposed to a delay drop. So actually what we can do is keep this outside and still do our, I guess, I guess I should not have removed those implementations. For notify on drop and where is hash? Here, let me let me demonstrate. So if we bring this back to what it was, have this be v is one, have this be v is two, because otherwise one zero, otherwise they wouldn't compile. But then instead of doing this whole like assert that it works during, I guess, current kv dropped is really what I want to test here. And I want to check that if I do this, neither of them are dropped, and then I drop the map, then now dropping the map should immediately drop not deferred all keys and values. So this test actually we don't want to ignore because this should be the case that here because we own the map, the implementation of map should know that there no one else has any handles to our keys or values. And so any current keys and values really should be dropped. And that's what we're testing here. And I probably need to do like use standard hash, hasher and hash 89, because clone must exist for the key. Oh, that's interesting. Sure. I guess really what we want here is we really want something like a ref count. Because the key can be cloned. Because it might have to if we if we're doing a resize right and the key might be present in both the old map and the new map in which case they were clone of each other. And then we want to make sure the both of the instances of the key get dropped. Good question. Good question indeed. Okay, how about we make this a ref count? I almost wonder whether this could just be arc, because arc already does this tracking for us. Here's what I'm thinking. What if we don't do this? What if we say arc new of zero? Same here. The key and value types are both going to be arc use size. This is going to be dropped one and dropped two. And then we're going to assert. So if you look at arc, there's a way to get the reference count. I'm pretty sure it's racy, but there's a way. What is the main con for not having each map manage its own collector? It's because you don't get the sharing benefits if other things are also using cross beam. There's sort of an advantage to using this shared notion of cues where you get to amortize more of the cost. It's really hard to write tests that assert destructors have run in code using cross beam epic. The tricky part is that it tries to be really good about deferring things because if you don't defer well, you don't amortize that cost, then it becomes really expensive to do these concurrent operations. As far as I can tell, each thread will cue 256 deferred destructors before pushing them to the global queue. And then once the epic is rolled forward by two, the global destructors will run. That sounds about right. It means that I guess in theory, the biggest problem is if the insert happened on a different thread, but that shouldn't be the case here. There's something not quite right. What I was going to look up is, I think arc has a way for you to check how many outstanding references there are to a thing, which is strong count. Yeah. So what we want to do is that we assert not equal arc strong count of dropped one. We want to assert that that is not equal to one. So keep in mind that we, this needs to be a clone, right? Like we have a copy right here and the map has a copy. And so in fact, we can be even stronger about this and say, we want to assert that there are two copies exactly, right? The one is the one we have and one is the one that's in the map. And there shouldn't be any resizes going on here. And so that two should be right. When we drop the map, the number of references should be one for both of them. Let's see if that's the case. And I guess this notify on drop now is going to be, we might as well just at least make sure that this compiles, right? So this is going to be dropped one clone. This assertion is going to be there are two of dropped one, and there's a one of dropped two. Once we insert drop two to replace dropped one, then now our expectation is that they are both one. At least now we can also hear assert that there should be two of it. Okay, so that passes. That is good to know. Let's just get rid of these things that we didn't need. So this test is still ignored because we can't actually control the exactly when destructors run. So I guess let's comment that here. Ignored because we cannot control when destructors run. But this one runs just fine, which means that if we ever modified the drop code of map and we accidentally didn't drop keys or values or something, this test should catch that. All right, so we now have all our basic test work, which is was surprisingly little work, right? Like constructing the test was a little bit of work, but the implementation seems to work just fine. Of course, we haven't really stress tested it all that much yet, as we don't know what the things like resizes work. But at the very least, the basic operations have get an insert work fine. So let's commit what we have right now. Very basic tests. And some drop fixes. Will you be looking at the loom today? I don't know, maybe. Loom is really cool for this sort of stuff, but I think there are more basic things we need to test first. In particular, how about we write a test that it's a little annoying that we don't have removed actually arguably we should implement that. But let's have a thing that tries to insert from two different threads. Actually, yeah, let's do that. Concurrent insert. Here we're going to do arc new map one is going to be map a clone. And we're going to start as t1 is going to be standard thread spawn. And it is going to do zero to 64. It is going to do map one insert. I thread zero. And t2 is going to do the same. But it is going to use clone two of map is going to insert once. And then we're going to join t1 join unwrap, we're going to wait for both threads to finish. And then we're going to create a guard. And then for I in zero to 64 again, we're going to look up that I. And we're going to say v is equal to that assert that v is either equal to zero, or v is equal to one. This particular assertion is probably unnecessary, but we might as well keep it. Apparently, our thing does not implement send. Why is that? Well, we look at the implementation for flurry hash map. We need Oh, this is probably a property of atomic atomic. So under the implementations for atomic, you'll see that interesting implement send for atomic where t is send and sync bin entry. Ah, yeah, so the challenge here is that the compiler does not know that the raw pointers to tables that we stick inside here, let me pull up node. Remember that for bin entries, they can either be a node, or they can be a moved, which is sort of a redirection saying this table is being resized, go look over in that table. This const, it does not know that it's safe to share that reference across threads. And so we will actually need to do either here, an unsafe impulse send for bin entry, or more realistically, what we want to say is that table, we want to say that table is send, even though bin entry is not necessarily send. Um, it's unclear which one we want to do. In some sense, neither of them are true. Basically, we don't want to add an implementation that lies, right? It is not true that any table is sent, because you could stick in a bin entry there with with a pointer that actually, could you do that? A pointer that was not valid across threads? Basically, what I'm trying to figure out is the right place we're going to have to add like an unsafe impulse send for something. And the question is, what type do we add that unsafe impulse for? We could add that unsafe impulse for bin entry, we could add it for table or we could add it for the sort of top level flurry hash map type. The reason we have to have this simple is by default pointer types, raw pointer types, like star constant, star mute, are not send or sync. I wonder why that is given if the underlying type is send and sync, then why would the raw pointer is not P? That's a very good question. It's just default, but it's not necessarily true. Let's look at the Nomecon. So the Nomecon is great at this. Let's do, is there a dark? Oh, Cole, maybe IU, I like IU. Let's do that. Let's look at where is send and sync? Raw pointers are neither send nor sync because they have no safety guards. Raw pointers are strictly speaking marked as thread unsafe as more of a lint. Doing anything useful with the raw pointer requires dereferencing, which is already unsafe. In that sense, one could argue this should be fine for them to mark this thread safe. However, it's important that they aren't thread safe to prevent types that contain them from being automatically marked as thread safe. These types have non-trivial untracked ownership, and it's unlikely that their author was necessarily thinking hard about thread safety. So in our case, we are thinking hard about thread safety. And we know, let's see, what is it that we have to guarantee? Yeah, in this case, it certainly should be the case that these raw pointers are just fine because we are managing the concurrent case for table. Like the whole implementation here is concurrency safe. So what we're going to do is, obviously, we have to have a kv and save implement kv send for bin entry kv node kv implements send and table kv implement send where ks send vs send. Now, this additional restriction on node being sent is probably not necessary, but it does mean that we don't accidentally implement send that we don't accidentally like change node and make node not be send. And then that error is not propagated up because of this unsafe impulse. So we're going to do that for this. And we're going to do the same for sync. Whereas long as these types are sync, bin entries also sync. That seems to hang. Oh, it might just be slow. It's my CPU seems to be is this huh? Well, that certainly doesn't seem great. So something ain't right. Good. Finally, something for us to debug. Let's see. So let's just add some prints here. This is t one at I. And this is t two at I. And this is get I see what this gives us. Ah, okay, here's my guess. My guess is that this starts a resize and that the resize is what is blocking. Right? Because you'll notice that the first couple of inserts were just fine. And then we're gets to 11. It stops working. And 11 is like roughly 75% of 16. Right. And 16 is the default table size. So my guess is we're here entering we're here entering into the space of a resize is happening. So let's do a gdb dash p. Oh, what on earth did I just do? What? I did something very weird. I have no idea what I just did. But I guess we got to fix it. Um, how? Sorry about that. No idea what caused that to happen. Alright, well, so we're going to pgrap, I guess, basic dash pseudo. Okay, so let's look at what threads there are. It's in sys call. It's waiting on a mutex somewhere. That's interesting. It's waiting on mutex online 298. And what about that three? It is also waiting in mutex, but it is waiting in music mutex in transfer. Okay, so that's interesting. Okay, so let's look at our lib. One was in 662. So that's where thread two is stopped. And thread one was stopped in 298. Interesting, interesting. And they've deadlocked here somehow. Fascinating. Wasn't there some deadlock detection feature in parking lot? That might be the case. I'm not sure. The question is why are they both waiting to take the lock or rather who is currently holding the lock if they're both trying to take the lock? This implies that one of them is trying to take the lock while holding the lock, right? Which if there was a recursive call here that could happen. But I don't think there's any recursion in this one at least. Well, there's add count. Oh, what happened was, let's look at this. Aha, look at this. Okay, so an insert happened. It did a put. It called add count and add count called transfer. But it does this while still holding this lock, right? The lock is still held when it counts, when it calls add count. And that is not what we want. Because transfer tries to take that same lock. So I wonder that the sort of the easy fix for this, right, is to here drop the guard. I guess let's What was it called? Headlock. Let's call it headlock instead. Keep them if you remember, there's a lock that sort of belongs to every bin that you have to take if you're going to modify the sort of linked list under that bin. And here we want to make sure that we don't hold that lock once we call add count, because add count might actually end up resizing the table. I'm interested, though, in what what the Java code does here. Add count input value. What does it do? Ah, the synchronized block ends where exactly right before the point where checks the tree fight threshold. So this is where we actually forgot to release the lock, where the synchronized block ends. And so we actually need to make sure that we do that now. And that is right before we checked this. So it should be here. Let's see if there are any other synchronized blocks. There's replace node. Interesting. Oh, replace node, we haven't implemented. This is this does removal, for example. So that's something we probably want to implement, but we haven't yet synchronizing clear, compute if absent is also not something we implemented. And compute of absent, and compute of present, and compute and merge, which we don't have and transfer. Okay, so transfer has a synchronized section, which is down here somewhere probably where it takes its lock, which is down here, right? And that synchronized block ends where it ends at the end of that else. So here, it actually just perfectly happens to match the synchronized block matches up perfectly with the parent with the the else that we're wrapped in, that was not the case in the other place. So here, it actually is fine for us to do that. But what did I just do? But let's make it explicit here too, just for future sanity. Right? All right, let's see what that does. Okay, it ran, but it crashed. This is probably a crash in. Oh, crushing get that's fascinating. So all the inserts worked fine. And that probably then includes a resize, right? Because remember, we had transfer, which probably means that one was transferred and our get code is wrong, right? Our get code does not when the map got resized, it's either looking on the old table and not finding it. Or in fact, that has probably has to be it, or it's looking at the new table and the the thing hasn't been placed in the new table yet. Interesting, right, because a right did happen to every key. So it should be the case that every key is present when we do the get, but the transfer probably made that not be the case. So we need to figure out why that is. Let's go to get. What was this test called concurrent insert, concurrent insert. Okay, so that test fails. Does it always fail in the same key? No. This almost definitely has something to do with the transfer. All right, so let's look at our get method. And let's look at where it decides to return none. Let's go with good old fashioned print debugging. We could probably do we can use our R here should be pretty cool. But this is probably just fine. All right, let's see where it decides to exit for that. Of course, then it passes. Forget seven. It exits at C. C. The bin is null. Why would the bin be null? Interesting, interesting. I oh why would the bin be null? Well, that's certainly concerning. Interesting. What is the hash computed to here? Table bin IH. So this means it's looking up in what is presumably the new table. And the key and the bin that the key hashes to is null that is essentially it is empty. Which I think means what I want to look at is for transfer. There's sort of a there's a transfer bin, I think, or maybe that's just the thing down here. Yeah, this is the thing that moves the bins. Bin is null. Table has been right. So this leaves a moved entry in the place. So what this suggests is it's not looking at the old table, because in the old table, every bin is going to have a sort of moved entry in it. Right, this is the thing that it's moving this bin from the old table to the new table. And the old table and in the old table, if the bin is null, it inserts a one of these moved pointers, the points to the new table to consult. And this means that if the get sees a null, it can't be in the old table, or at least it shouldn't be in the old table, because that should have all moved entries. So I guess really what we want to do here is let's first do transfer into and then do like a P of next table. And then in our get, let's also do a get from just so we can make sure that we're really looking at the right tables. Okay, so we can see that they're actually been multiple resizes, there's one here, there's one here, and there's one here. And they they occur at like increasing distances, right, just which is what you expect, because it's the next power of two. So the read happens from all the reads happens from happened from 3580, which is indeed the latest table, right, there's a table here, that's the wrong address, this is a table, this is a table. 3580 is the last one we transferred into. All right, so we are seeing we're certainly reading from the right table. The question then becomes why is the bin empty? Well, I suggest to me that maybe some of the bins are like left behind somehow. So let's see, this is finished. So this let's say I just want to check that all the bins are actually moved. All right, so here we move zero 123456789 10 11 12 13 14 15, right, because there are 16 bins. Here we move zero 123456789 10 11 12 13 14 15 16 17 18 19 20, 21 22 23 24 25 26, I mean 728 29 30 31. So that looks right. And then this final move zero 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30. I'm going to stop counting out loud because you can all count with me. All right, so all the bins are at least hit the case where they know that it's supposed to be moved. Interesting. Redirecting. Let's do this. Moving. Um, and then in the get method, what I want to do here is I actually want to print out the bin. Let's do the hash as well. Just so we have a little bit more information to work with here. Um, let's see. The get is trying to read from bin 99. Well, that certainly seems wrong. Actually, does it also wire so many of these? This doesn't seem right either because this is moving the redirection notices. Is that even what it's supposed to do? Yes, that is what it's supposed to do. Um, why is it trying to read from bin 99? All right, because at this, the next table size is 128. So that seems right. So really what we need to do here is in the code that does the movement, it's going to write this is where we have the runs of things are going to the same target bucket. So this is going to be so this is going to be low into low is going to go into I and high is going to go into I plus N. Thank you. So what goes into bin 99? If anything here. Moving table bin 35, low goes to 35 and high goes to 99. Okay, so it is certainly moving something there. The question becomes, what does it move there? Um, low bin and high bin. I don't even know whether this will work. Probably not, but we can try it. It's probably not going to compile. Oh, okay. That's handy. Um, now it's looking at bin. Oh, it exited D now. That's different. Let's stick with the one where it crashes at sea first. 103 is what we're now looking for. What goes into 103? Nothing goes into 103. 39 was empty, which would have gone into 103. All right, this one did 102. This one did 104. Interesting. Interesting indeed. Um, I wonder, we probably need more of a chain here. We need to know which bin it originally moved into probably. Um, so for fn put, let's go with direct cast to bin bin I or indirect insert to bin. Do we have a bin I here too? Bin I, great. So what, you know, that's D again. We're going to get six. All right. Where did six get inserted? Thread two inserted six. When does this print get printed and get printed before the insert? So thread two did, uh, indirect insert into one of these bins, um, probably into bin four because modulo two, six modulo two is zero and three modulo two is one. So there's an indirect insert into bin four. Okay. So key six is in bin four. And then moving table bin four, um, bin four stays in bin four and nothing gets moved to 20. Okay. So it's six is still in bin four. This is an indirect insert to bin four. So bin four gets added to two. And then where do we move bin for a second time? Moving bin four, the low moves to four and the high moves to 36. Okay. So, uh, we now need to look at four and 36. Those are the places where six should now be four and 36, uh, 36 goes to a hundred. So now four and 100 where four go to four goes to 68. So we need to look at 68 and 100 are the ones that have this key. Why does it go for 84 then? There's something ain't right here. Somewhere, um, this decides to use the wrong bin. Maybe, um, one thing that would be helpful is to print what the bins are. But doing so currently might be a little bit of a pain. I think we need it though. We're going to need, um, let's go up to the top here and say we want the key to implement, uh, display, standard format display. And we want these to implement display just so we are able to print out bins. Um, because what I want to see is for transfer, that's fine. Insert, I guess now we can actually print out the key to, we can do this up here somewhere and, uh, yep, indirect insert that key. Um, and then in transfer, we should now be able to actually print out the bin that gets moved. Moving table bin that, uh, and the bit that loops through all the nodes, moving key. So the key is going to be node.key, uh, although we might have to match on this. Um, and high is going to be whether the bit is set, which is going to be this, whether that is one. Let's try that. All right. So what are we looking for this time? Uh, key 10 and bin 33, key 10 and bin 33. All right. So where does 10 originally end up? 10 ends up in bin one. Something seems wrong to me. I feel like it should be in bin zero. All right. Uh, so bin, 10 is in bin one and then bin one moving table bin one, uh, moves the key 10 to high. No, it moves it to low. Okay. So 10 is going to be in low, which is going to stay one. All right. What happens to one the next time it gets moved, uh, is moving table bin one, uh, 10 stays in low. So it's still in bin one. And down here, moving table bin one, 10 stays in low. So it's still in bin one. So all the way to the end, it's in bin one. So why does it choose bin 33 here is the real question because 33 is not in the high end of 128. It's not in the, uh, it is the high of 64. So this suggests that this is, um, if you count, remember how, um, let me try to formulate this in a useful way. Um, when we do a transfer, we are moving from end bins to two end bins, right? And for every old bin we have, some of the things are going to move to the same bin and the new size. And some of them are going to move to a different one, right? Because the, for a given key, the way you choose a bin is the hash of the key modulo, the number of bins. And for the ones where the modulo of two N is in the next, how do I draw this is really the most, the biggest question. Um, I think I talked about this a little bit last time, but, um, let's say that we're doing 10 mod. Let's just use 10. Um, 10 mods 16, uh, is 10, right? Um, because 10 is less than 16. Uh, 10 mod, actually this is a terrible example. Uh, 10 mod four, uh, is two, right? Cause 10, um, you take away four, you take away four again and you're left with two. And this is basically how modular works. Uh, 10 mod eight is also two. Uh, but 10 mod 16, oops, 10 mod eight is equal to two. Uh, 10 mod 16 is 10, right? So notice that if we resize from four to eight, then things that had a hash of 10 would not move. They would stay in bin two. However, things that did move, um, in this case, for example, the, the modulo, 10 modulo, the number of buckets did change. And in particular, it changed into the lower half of the 16, right? You can think of 16 as eight and then eight. And, uh, 10 used to be in this half because it corresponds one to one to the old eight bins. Um, but in the new path, it's actually going to end up in the lower and the high bin, so to speak, right? The first eight, the last eight, it's now in the last eight rather than the first eight like it was before. Um, and, uh, its index there is going to be the old bin index plus the size of this, which is eight, right? These are eight each. Uh, so it's going to move from bin two to bin two plus eight, which is 10, right? And so, and when we look at this, um, bin, we know that the key 10, notice that it's hash is like some obscure, like this large number, right? The hash of 10 is this, um, and that number, uh, happened to be bin one, right? Initially it was been one as we, as we looked up through the logs. Um, and it was bin one modulo 16, which is where we started. Notice that the number 33, right? Bin 33 is bin one plus 32. So the moment we resize from 32 to 64, um, 10 should have moved to bin one in the high, high set of bins. Instead it just stayed in bin one. So the question becomes why is that? Um, and my guess here, and that is the real question, isn't it? Why when we move to have 64 bins, which is right, like here we have 64 bins already. So up here, when we're moving table bin one here, why does 10 not move to high? That's the real question. Why was 10 not moved? Um, the hash on n, uh, let's see the mask is, the mask is n. Let's print out both of these. Something here is definitely off. Um, this is a D here again. See, uh, bin 26, 26. So where did, oh, this is 10 again, but notice it's hash is different. That is because of random state. Um, get 10. Okay. So where did 10 end up in initially? I should just turn up randomness. That would, it's not really helpful. Um, key 10 ended up in bin 10. Oh, that's handy. Um, okay. And down here we saw that the final get looked up in bin 26 and 26 is, uh, uh, the is 16 plus 10. So it should have moved to the high bin when there were 32 bins and it did not, uh, 26. So when there were 32 bins, which is up here, right? Cause you see it in search into things like 26, when we moved to having 32 bin, when we moved bin 10, moving key 10 to high, it decides not to move 10 to high, which would be bin 26. Instead, it chooses to keep it in bin 10. And in fact, it almost looks like all of these decisions are false. Right. Looking through it, it looks like it never, ever moves anything to the high, the high bin, which suggests that something is wrong. Um, in particular, okay, so let's, um, so this is the hash and the mask is 16. Interesting. Something here ain't right. I mean, we can check this pretty easily, um, and binary is going to be bright. All right. So one, two, four, eight, 16. Right. One, two, four, eight, 16. So the 16 bit is set. Right. And the mask is 16. So why is this not equal to one? Cause that's the actual check. That's why. Yeah. So it does actually move it to high. Uh, it's just my printing was wrong. Um, that we're doing an end with the mask, right? Uh, we're doing an end with the mask. Uh, and that just is going to leave the 16 bit set, but 16 is not equal to one. It's equal to 16. And so we need to compare to n. That's where that got screwed up. All right. So that means those were just lies. Um, so in theory, that's, that's frustrating. Uh, I guess this has to be as 64. That was kind of silly. All right. Let's try that again. Uh, get of 16 and bin 57 to 57. What's the last move to 57? Look like the earliest one. Wait, did I read that backwards? No, 16 and bin 57. Um, all right. So let's look at the insert for 16. The insert for 16 is is been 25. All right. Where do we move been 25? 16 moves to high. Ah, yeah, look at this. This is a, okay. So my guess is it's the, it's our logic for, for moving runs, which you might remember from part two, we had some difficulty there. Um, 16 is supposed to be moved to the high bin, right? The, it does match the mask. But it is not the high bin ends up empty. So question becomes, okay. It first hits. Okay. So zero and 16 are both in 25. Um, and so we're going to have to, we're going to have to, so that means it's at the tail. So last run is been run bit is, okay. So here it's looking for, um, uh, it's looking at the bin and looking for the tail, the chair, um, whether they're going to hire low, right? Um, and in our case, they do not share whether they're going to hire low, right? Cause zero is going to low and 16 is going to high. Um, so that means in theory, the run that finds should be just 16 and then zero should just, not be in the, in the high bin run. So last run should be pointing to the two 16. These, that's the hope. If the run bit is zero, that's correct. Um, then the last run should go in the low bin. Otherwise the last one should go in the high bin. P not equal to last one. And then it walks all the things that should go in the everything that's before the last run and then sticks in the right list. At least that's the idea. All right. And for some reason here, um, even though 16 should go to high and zero should go to low, high is just null. So we're going to need, I think, a little bit more debugging here in particular. I think what one thing we'll want to know is where the run starts, um, run goes to high runs, uh, starts at that and goes to high. And it starts at, let's do a node here is going to be a last run starts at that and goes to high is going to be whether the run bit is zero is not equal to zero. Right. Yeah. Okay. So that'll tell us where the run starts. And then I think the other thing we'll want is, um, is this information down here on, um, what it chooses to link where, uh, so down here, we're going to say, linking pre run node, pre run this to high, and that's going to be no dot key. And, um, no dot hash and and as you 64, not equal to zero. All right, let's see what that gives us. Uh, hash and n is not a helpful value need hash mod n or hash and n minus one, um, hash and n minus one gives you modulo. Uh, that is true. But hash and then let's you check whether a given bit is set, which is actually what we want to do here. At least from memory, that's what the, uh, that's what the code here does. Yeah, it checks. It doesn't check the, the mask that you get for the modular checks, whether that bit is set, because you can use that to decide whether, when you increase the modulo, whether that particular thing moves to the high bin or the low bin. So we don't actually want to take the full modular. We just want to see whether it moves. Um, that's a D. That's a D. That's a C. Okay. So seven went should be in 27. All right. Seven, seven, seven, seven, seven goes to bin 11. All right. Where do we move bin 11? Moving six to high and seven to high run starts at six and goes to high, true. Okay. So that works correctly. Great. So that went to, that went to 27 because here it's at least claiming that both key six and key seven are moved into bin 27, which is where we want it to be, right? That's where the, um, that's where the read is trying to, uh, trying to read. So the question is, why is the read not finding it in 27? My guess is that some later time bin 27 gets moved again. All right. So bin 27 indirect insert seven to bin 27. Interesting. Okay. Um, moving bin 27. Moving key six to high, true. Moving key seven to high, false. Aha. Look at this. Here six is supposed to move to the high bin, but seven is supposed to stay in the low bin, right? It's supposed to stay in 27. And yet our code says the run starts at six and goes to high, which is not right. Right. The run should start at seven and go to low, but low ends up empty and seven ends up moving along with six to 59, which is not what we want. Um, right. So why does it do that? Why does it think the whole thing is a run? Okay. It starts out saying, it starts out saying the run bit is, uh, what are we at here? Like 16, 32, 64. Okay. So the run bit is 64 or the rum, let's just go with the rum bit as one, but the rum bit is 64 and the last run is the start. Okay. So initially it's going to assume that the run is the entire bin and it is moving to high, which is what it ends up doing too. Right. So this suggests that something's off. Um, then it looks at six and it sees that six matches that six has next set. Okay. Uh, let's just walk through it. Uh, let's see if you also spot what's wrong. Um, so initially it thinks the bin is the run is the entire bin, uh, and it's going to high. Then it looks at the six, uh, it sees that six has a next. Um, then it looks at the, the bit for six, and it sees that the bit for six is 64 and therefore it's going to high. So it's going to skip this is just going to move to the seven. Then it looks as a seven. Uh, for the seven, it looks at whether next is non null and next is null for seven because seven is the last element and then it breaks. So it never gets around to checking that seven doesn't match the run. So this probably needs to be after the run bit check. So what does it do here? Yep. Uh, this check, which we transposed into the middle of this function actually needs to go at the end. That should be it. See what that does. Hey, great. All right. Now let's get rid of these prints. Uh, nine. Technically, um, one thing that would be kind of cool, oh, we should instrument this whole thing with tracing. That'd be so cool. Man, I really want to do that now, but it's like unclear that it's worth the time. Uh, it would help a lot with debugging, right? If, um, if we had tracing support for this whole like, uh, this, the whole concurrency business here. All right. I guess. And in the test as well. All right. So here we have a bunch of concurrent reads and writes and they seem to work just fine. Uh, let's see that the other testles work. Presumably they do. Um, or what else kind of shenanigans can we get up to? Um, well, here's a possibility. Um, how about we do concurrent reads or read while writing? Uh, so here we're going to have a writer and the writer is just going to write into the map. Um, and then we're going to have a reader. Uh, how do you implement that? Which create? Oh, for tracing, um, there's a really cool crate called tracing, uh, tracing, um, that gives you a really nice high performance way to add trace points and, uh, group them together as well to related events and annotate them additional information and stuff. It's sort of like log, but I think more, it's like more sophisticated and lower overhead. And it's useful in that, um, the, you can choose what to do in response to every event. It does not have to be to log it. Um, and so this is just like annotating this library with, uh, tracing events would be really cool because you could do things like analytics over them, right? Log, the log crate, for example, is only built for logging your output. You can adjust it to do different things, but this is like built for, you write a subscriber that chooses what to do with the events. I highly recommend you look into this, but I'm not actually going to spend time, adding the tracing right now. Okay. So read while writing is going to spin up a thread that's going to do a bunch of writes and then it's going to spin up a thread that does a bunch of reads. Uh, that's a good question. What is the reader going to do? I think, so there are two options here. One is we just let them race. We just have them all do like read some writes and not really check what the results of them are, but just to check that like the code doesn't panic or anything. Um, the other option is to make the reads actually read reasonable things. Like only read keys that have been written. How do we want to do this actually? One option here is to have the writer just write keys basically in order or maybe randomly and then to have the reader read randomly. Actually, you know what? Um, the right solution here is to not do these. Uh, let's remove that. Keep the concurrent insert. And now let's move over to actually porting some of the Java tests because I'm sure that they do a bunch of this stuff already. Um, no tracing is not related to Tokyo. It's by some of the same developers, but it's not, it's, it has nothing to do with Tokyo. Uh, and it's really cool for this kind of like tracing, especially in low latency, hard performance situations. Uh, why do you need that arc there? I need the arc because, um, it's the flurry hash map is access from multiple threads. Uh, and so otherwise, um, the thread spawn requires that you have a static, you have a static closure. Doesn't it seem like this language is a little bit heavy on the syntax side and introduce a lot of friction for the developer? Nope. I don't think so. There's not actually that much syntax here. Um, that's like, like beyond what you haven't seen or C plus bus rather. Um, I don't think it's that heavy. Um, it is true that like the borrow checker introduces friction, but I think it introduces friction, um, in a way that actually is makes for better programs. Like it, it tells you that your code has bugs and it's true that it's friction, but it's the good kind of friction. Um, okay. How about we start porting some of these Java tests? So this is going to be bright again. Um, and let's start with, Oh, I don't know. I don't know what these different tests are. Let's look at something like this one times and checks basic map operations. Sounds great. I guess this relies on a bunch of things that we haven't implemented. Like contains contains key iterators. Okay. Maybe we really do need to implement removal and iteration. Yeah. I think we do. It looks like a lot of these are a lot of these are testing iterators or removal. Well, I guess let's do that. All right. Fine, fine, fine, fine, fine. So let's do Actually, let's do, let's actually add these properly because these have some somewhat interesting observations around why they were broken. Um, so here we have drop drop lock guard when Java synchronize ends. Otherwise, put add count transfer produces a deadlock. What was the other one we had here? It also consider last bin entry for runs. This is bin entry mood is thread safe. I want to keep the display bound because it might be useful for debugging later. First concurrent test exclamation mark. And we're going to keep that. All right. How about we do, because the tests rely on this, let's implement iterator. I think that's going to be worthwhile iterator base iterator extends traverser. Okay. What is traverser stack is created on the first encounter. Oh, this is another thing where we probably want to add some of this documentation ourselves. Um, but one thing that's interesting about traversal is that you want to make sure that, um, if there's a resize, you don't visit keys that you have already visited. Right. If you're like bin seven, and you're like halfway through bin seven or something, and then resize happen, you have to make sure that if a thing that you already visited got placed in a high bin, then when you iterate, you yield it again when you get to that high bin. Right. Um, yeah, I wonder what that actually does. It's going to be interesting. I'm excited. Contains value. Actually, how about we do contains key first, because that seems like such an obvious, oh, yeah, that's relatively uninteresting. Fine. I'll do it just to add a method, you know, contains key. Does this contain this key bull guard is epic pin. Uh, and then I guess we do self get key guard is something good to help us. We added contains key, but traverser where that's where we were. Traverser, traverser, traverser. All right. All right. I'm going to pee and then we're going to do traverser. Actually, how about we do like a five, 10 minute break? I'll be back in like a few minutes, just make tea and stuff, and then I can do some Q&A, and then we do iterators. Let's do that. Excellent. I'm back. All right. Let's do, if you have any like questions, doesn't have to be about this. Let's do like a quick Q&A while I eat a crisp bread and then we continue with iterators. Like, does not have to be related to rust for that matter. Just if you have questions now's the time. I guess in the meantime, we can also read the documentation for the traverser implementation. Have you heard about the drama around Actix web and its maintainer? Yes, I have heard of the drama. I mean, I have opinions on it like many other things in this world. I don't know that they're particularly useful. I think in general, unsafety is something to take very seriously. But it is also, I think people shouldn't be as scared of unsafe as they often are. But at the same time, it is important to take it seriously. And in this case, I think it's like a combination of a difference in priorities and a bad mix of different communication styles that cause some problems. Is it always possible to avoid unsafe blocks in this language? No, if you need to implement, like if you have to call out to a C function, for example, it's inherently unsafe. Other than rust, which languages do you think have a potential for the future? Many? Like, I think Godus, I think C is going to remain for a long time. I think there's some cool like up and coming languages, like Zig and Nim. But those are sort of more esoteric ones. Elm is pretty cool. I think JavaScript is going to be with us for a long time. So yeah, many languages. Java too, for that matter. Why do you choose the Java concurrent hash map to implement in Rust? Because it's a very well known and stable thing that is used by pretty serious consumers. And it's sort of seen a lot of vetting. And so it seemed like an interesting thing to port. And especially because it's also open source and the license is such that I can do this work and it's commented. I made it interesting. What do you think of Microsoft new language Verona? Oh, this is the Rust, but with arenas sort of. I think the notion of arenas is really cool. It's something that you often want to do in performance sensitive applications anyway and having a language that has that as sort of a first rate primitive seems pretty attractive. But I haven't looked enough into the language to really say. How is Flurry different from the C hash map crate? The C hash map crate from memory has a lock per bucket. We talked about this a little in part one. Whereas this library is, it does not, it does have a lock per bucket, but only for anything following the first insert. Which in theory should mean that it's a lot, a lot faster and a lot less contention. It also means that reads, one thing that's neat here is reads do not have to take a lock whereas they do and see hash map, I believe. I don't think he even sees this chat. I mean, I'm looking at the chat right now. What particular questions do you have? Could you give a quick overview of your development setup? I actually have a separate video on my desktop setup that's linked in my YouTube channel. So you can look at that and all my .files and stuff for online. What resources would you recommend to learn how to use unsafe in a safe way? Oh, that's tough. I don't think there are that many good resources for unsafe. There's a GitHub repository called unsafe code guidelines, I think. Basically, they're trying to work out what exactly are the rules around unsafe. The Nomecon is also fantastic at this. And hopefully over time, we're going to see more resources for dealing with this. This is both going to be stuff like what we're building here, which does have some unsafe and where we actually talk through what that unsafety means. There are tools being developed like Miri that let you at least be more confident in your unsafe code. And hopefully we'll have some more talks about this. I might give one in New York in like a month or so about unsafe. And then hopefully there'll be like more books and stuff on more advanced uses of Rust. And that might be a good way to learn. If you follow Jai at all, I have not. Is there a lib arena in Rust repo? Yeah, so lib arena and the various arena crates are for arena allocation. But the idea with Verona from Microsoft is that the language level support for things like borrow checking at the arena level. Like you can say all of these things are related to this collection and they have the lifetime of the collection in a way that's hard to say in Rust currently. What do you mean, Basil? The Rust Nomecon touches on a lot of subjects of unsafe Rust. The Nomecon is great for if you already sort of know some of the unsafe, it's great at telling you like here is how a lot of these stuff work. But it's not great for like learning about unsafe. It's gotten better. But if your startings are from scratch, it's not really good at that. We almost need like tutorials in some sense. Have you seen the dash map create implementation? Yeah, it's pretty cool. I actually ran some multi core benchmarks that I posted on Reddit in one of the sub threads. Dash map is really neat. It doesn't try to innovate that much. The author said too that the basic idea was let's just build something where the structure is really simple and that ends up performing well. And so far that seems to be true. I'm really interested to see it compared to something like what we're porting here, which is a very well thought through concurrent implementation and see how they scale. So for example, here like gets take no locks, and writes often do not take locks if they don't contend on a bin. So this should scale better, but it's hard to say. Can you test the crate with Miri? So one restriction Miri has, I believe it still has this, is that it only works for single threaded code. That's not to say that it's not useful, but it means that there's a there's only a limit to how the kind of interactions you can test with it. Oh, I'm glad you liked the streams. Do you know why something like box pin async futewait has a static lifetime? But if this food calls await, the static lifetime goes away. If the food, I don't know what you mean by if the food calls await. Do you recommend another book for beginners apart from the book? As in the rust book, I don't that there's a book called Rust in Action that deals with like, sort of systems level programming and rust. I haven't read it myself, but I talked a little bit to the author and that seems cool. I don't know that many rust books currently, but hopefully this is something that will change. All right, let's dig back into the maybe one more question while I finish the last bit and then we'll dive into the implementation again. Also a cool book. That's not valid syntax, so I don't quite know what that first bit is. But what box pin does is, I think you're at least missing a move keyword, but box pin will allocate something on the heap rather than the stack, which means that it will, rather than only living for as long as the current function is living, it will live on the heap, which means that it gets a static lifetime. My coworker helped you out with a project. Oh, WP2Ghost. Yeah, that's a while ago. That's a bit of a pain. It's true. All right. Let's dive into how to write this concurrent iterator. Encapsulates traversal for methods such as contains value also serves as a base class for other iterators. Method advance visits once each, this valid English, method advance visits once each still valid node that was reachable upon iterator construction. It might miss some that were added to a bin after the bin was visited, which is okay with regards to consistency guarantees. Maintaining this property in the face of possible ongoing resizes requires a fair amount of bookkeeping state that is difficult to optimize away amidst volatile accesses. Even so traversal maintains reasonable throughput. Yeah, so this is the worry that I sort of had was that iterating over this in an efficient way is going to be a little tricky because you need to you need to keep track of things you've iterated past just so you don't hit them again. Normally, iteration proceeds bin by bin traversing lists. However, if the table has been resized then all future steps must traverse both the bin at the current index as well as that index plus base is and so on for further resizing. To paranoically cope with potential sharing by users of iterators across threads, iteration terminates if a bound check fails for a table read. Oh, that's neat. Okay, so if I'm reading this right, the trick we're going to pull is imagine that you are, how would this work? Interesting. I think what this means is that you're going to, if a resize happens to a bin, you look at both the high and the low version of that bin at the same time, rather than just continue in bin order, like normally we're just going to read the bins in order. But if one bin has moved, we look at its low and its high bin before we move on to the next bin in the original iteration. Because that way, that way we only need to keep track of the keys that we've seen so we don't duplicate them as long as we're still in this bin as opposed to for the entire iteration would be my guess. Oh yeah, rustlings is great as a side note. But let's see how this actually works out. I think at this point we're probably going to want to split this into smaller files. Let's do a, how big is table? Table is pretty small. I think I probably want to keep table. But what I do want in its own file is mod iter. We probably actually want a make their source iter. I want iter. I guess traverser.rs. The other question here is whether we want the iterator to have to have its own guard. Quite possibly. It does mean that while you have an iterator, you're actually going to hold up all garbage collection. So this is something we're going to have to think about. I'm not quite sure yet. But we are going to need here a great, at the very least, actually probably maybe it's just table. I think it's, it might actually just be table. But we'll find out soon. It's going to have to be kv and s kvs. I guess this is, they call it tab. We're going to call it table because we like full words. Next, whatever next is going to be, I guess this is going to be, these are probably going to be shared. Would be my guess. We're not going to, we're not going to quite get away from these being shared. This is my guess. Hello, unclear. A table stack, which we're going to have to figure out what does later. An index, a base index, a base limit, and a base size. I guess we might as well keep these comments too. This is current table update if resized. I think, I think that might actually be just a straight up reference. But I'm not sure yet. This is the next entry to use. This is index of bin to use next. I think by use what they mean here is access. Current index of initial table index bound for initial table. I see what this is about. Initial table size. When you create the iterator, you're going to read out what the, you're going to read out what the current table is. But then if you encounter one of these move nodes, then you're going to look at the, you're going to follow the pointers inside of them. But then you have to continue iterating on the original table you made. And so that's why it has to keep track of where we were in the original table. So that's why it has this like index is like where in the bin we're currently in. And then this is where we in the original traversal or something along those lines. Okay. And there's a impulse later. I guess we're going to have to do like MKVS. So it's going to be a new. It's going to take a little bit unclear what it's going to take, actually. We'll find that out later. But it is going to do self table is going to be, actually, I guess, why does it take all these parameters? That seems fine. It takes a table, which is going to be, I guess, of shared table KBS. Because I think we can just load this once, although it's going to be tied to its own guard. Remember that when you read out of a shared, what you get back is a guard is a is a reference to the underlying value whose lifetime is tied to the lifetime of the guard, which I think means that the caller has to hold on to the guard. It's going to be awkward. Think of it this way. If we stick the guard into the iterator, actually shared, you already get, we're going to have to think about this a little carefully how these lifetimes work out. But for now, let's just do this. I guess next. Oh, next is a node. Okay, so next is actually not a shared table. It is a shared node or node entry. We call these node. Okay, so we're going to need node and table. Yeah, so this is while we're walking a bin, next is going to be a pointer to the next bin to look at. So I guess next is going to be a shared null base size. It's going to be, I wonder what his size is. Because isn't size just dictated by the size of the table? I don't know why it's a separate, like, if you look at, I guess, where do people make traversers? Yeah, it's just t dot length. And so why, why is that passed in separately? It's just always t dot length is f here. Yeah, t dot length. So that seems entirely unnecessary. Yeah, I don't buy that for a second. I don't think that argument is necessary. I think the base size is just the number of bins, which we can find with table table table table dot bins dot Len. And I think base index is just going to be zero. And index is just going to be zero. Right. In all of these ones, index is zero. This might be like if you want to iterate over only a subset of the things. But it looks like nothing uses that. So I think we're just going to ignore it. And then base limit is also going to be table bins Len. Now, these might change over the course of iteration, I don't know. But for initialization, it seems like that you should always initialize it to traverse the entire table that you're given. And I guess, advance is really what we're going to call is just going to be our implementation of iterator, right? For KBS. This is just a standard rust iterator trait, right, which takes a mute self and gives you an option self item or item here, I guess, is going to be that's also a good question. Yeah, I think I think we're gonna have to be given guard that the I think the caller needs to maintain our guard, which is pretty awkward, actually. Because it sort of means that what we need is a, we need to take in a guard here too, which we're going to use to read any additional items that we read out. So this is going to be a tick g guard. And then sort of implicitly the lifetime of the map is going to be longer than the lifetime of the guard. How exactly we expose this to users is not clear, though, unless they provide the guard, which again is an interface that we probably want to avoid. There might be ways for us to deal with that. So the item here, this is sort of the same thing as what happens for get, right, which is that at least for the time being, we're just going to expose these shared directly. And so the item type is actually going to be a shared to that's also a good question. This does have to have a guard. What is the type here? I think the type here is node, because the node lets you get at both the key and the value. So the in some sense, right, like this is going to be a very low level iterator, it's just going to iterate over all of the nodes. And then something on top of this, and that's probably what advance does here too. Yeah, it returns a node. And then you can have wrapper iterators around that that like filter it down to just the key or just the value or something like that. iterating over the values in particular is going to be tricky because that's going to include a mutex guard. We're going to have to have to think about how to do that. Not entirely sure yet. Okay, so what does next do? Well, it sets an e. Okay, so e is the entry that we just yielded. Okay, so we're going to have something e, which is going to be a shared g node kb. And it's initially going to be null. And then if self dot next, if self, so really, it's not next, is it? Right, because this is taking e dot next. So I think next is really previous. Yeah, see, it, if it returns e, it set next to e. So it's not next, it is the previous element you yielded. So I think this is a lie, and that this should be called the last node yielded or iterated over. That makes more sense to me. And if the previous node we iterated over is not null, then what we're going to do is we're going to say e dot next, right, it has a next field dot load. And I think we know that the, I'm pretty sure that we know that if we hit a node, all the subsequent things are nodes. That's what this loop here is, right? If the first entry in a bin is a node, all the remaining ones are nodes. And since we know that this particular one is a node, we know that all its subsequent ones are nodes. And so this is going to let's look at what we do for load here. Ordering second to the guard. And then there's a loop. And what does this loop do? Well, okay, so if the previous node we iterated has a next, then we just return that that seems straightforward enough. So this is going to be a loop. And if not e dot is null, then we're just going to do self dot prev is e, and then we return e. And then the question becomes, can you clone a shared which I assume that you can, they're probably not copy. Sorry, I should have warned you that this is going to be bright. But if we go back here to shared, yeah, shared implements clone. And it should be a cheap clone because shared is really just a pointer. What's the difference between ordering second and ordering acrel? That is a big debate. And there's a lot. It's not a big debate. It's just fairly complicated. We go through this a little bit in some of the earlier streams on on this port. It has to do with which memory reads and writes you can see, and also whether the compiler and the CPU are allowed to do your instructions out order. I recommend that you look up some of the resources I talked about in the previous part, and also just look up like LLVM memory ordering, for example, has a pretty good write up on it. How old is he? I am 30. Okay, so if it's not null, then we're sort of done. Otherwise, I don't know what this must use locals and checks means. This seems like a Java thing. If self dot base index is greater than or equal to self base limit, or why does this need to use locals? Wait, really? shared is copy? Oh, shared is copy. Nice. That that makes me feel a lot better. Or table is null. I guess actually this doesn't handle the case where interesting. This is actually going to crash if the table hasn't been allocated. So we might want a special case that it's a little awkward is if table dot bin, if table is null, then zero, else table bins Len. And then this should be Len. What did I mess up? Oh, maybe something down here. Yeah. Or self dot table is null. Or why are all these locals necessary? I don't understand because these aren't concurrently accessed anyway. They're not volatile. So I don't use locals and checks. Sounds like it's to make sure you only read them once, but that doesn't really seem like it makes sense either. T is equal to length. Oh, it's because they might get overwritten as we walk through. That's why. Yeah, that's definitely why self dot table. dot bin dot Len less than or equal to self dot index. Or if I is less than zero, how is that even possible? Where is I reassigned? I is equal to index. Can index ever be less than zero? Where does index get overwritten? Index gets plus plus base index. Where does base index come from? base index never changes. So I'm pretty sure index can never be negative. Which means that I can never be negative. So I'm pretty sure that check is unnecessary. So in this case, prev is null. And we return none. I guess up here we need to return some of that. All right, what else we have? All right, now we get to the gory parts. So that I think for tab at is the one that we renamed to bin to bin. And so this is really saying, let bin is a self table bin. And I here is self dot index. Right, so this is where they do some like funky stuff, where they say like, let I is self index, that T is self table. And then this is a read on T. And this is a read of I. And the reason they do that is because presumably, they're claiming that these values might change. And you want to do the read on the local variable. I'm not so convinced that's true. But I guess we're about to find out. All right. So if not bin is null, as remember how they're playing this trick where if the hash is less than zero, then it's a special type of node. And they want to handle in particular here forwarding nodes. Whereas for us, that's a little different. What we want to do here, if you remember is I guess we need to pass in the guard here, don't we? The bin gives us back a shared bin entry. Right. So what we actually want to do here is we want to match on that. Actually, we probably want to do when a match on that bin. And if it's a bin entry moved, right, table, then we want to do whatever they're saying right here. What about with overflow? It could be overflow. You're right. The check for I less than zero. But rust should panic in that case, I'm pretty sure. I guess if it's moved, then self dot table is equal to that's what it does. I see, I see, I see. Then self dot table is going to be, I guess this is where where's the place where we follow moved. Oh, right. That's going to be a node in node is where we follow moved. Right. So this is like where it got moved to. And in this case, we want to say that we're now iterating over next over that next table instead of the table we were iterating over. And in this case, self prev is now null because we're no longer in we're no longer on a particular entry. And then they're doing this push state business. And I think what they're doing here is actually every time they move into a new table, they sort of they keep track of which table they were in and what I they were at. And then they then they recurse into that table. And then when eventually that table yield something interesting, then they pop when you're done iterating over that recursion, then you iterate back up. I see, I see, I see, I see. Yeah. Yeah. Okay. This is this is kind of special. So what we're going to do here is this iterates back up. So here we're going to do is self dot push state. And here you'll notice it actually uses t and I in here and is the length of t. I don't think they need to do this. I think they could just push tab and stuff before they modify them. But seems fine enough. Alright, so in this case, we want to recurse down into the target table, and make sure we can get back up to where we're at. All right. And then otherwise, right, so that hits a continue. And there's technically like a tree bin case, which we're going to ignore for now. If we get to a normal table, right, so this is e hashes less than zero. So this is if there's a special node, it is not a special node. So bin entry node, right, which is this case, then what? And they're saying if the stack, which is this, this table stack business that we haven't looked at yet, what is spare for if self dot stack dot, I don't know what this does yet, is some then recover state, we haven't written this sort of state recovery business yet, right? So we're gonna have to deal with that at some point. Otherwise, I guess else if index is I plus base size, okay, so it does modify index here. This like, I being able to be less than zero is weird to me, but fine. As I size, I guess. But it's no, it's instantiated at the top of this loop. Okay, so this would only be the case if self dot index itself is an ice ice. Fine. So this will be an ice ice, then that will be an ice ice. Then, or self dot index is less than zero. This will keep that this else if is else if I plus self base size, greater or equal to n, then, then self dot index is equal to I plus self base size. Gee, this is executed regardless. So this is really an else self dot index equals that. And if self dot index is greater or equal to n, then self dot index, then self base index plus equals to one and self dot index is self base index. Okay, so let's see if we can actually reason about what this does. If the stack if we if we're walking. And, okay, so this is the case where we have we have hit the end of a bin, right? If we were in a bin, then this would, this would happen. So we've at the end of a bin. This is, we're done. There is no next bin. Otherwise, there is the next bin. And we look at the head of the next bin. If the next bin has moved, then we need to recurse into the table that it was moved to. Otherwise, if we've stored some state, then why do we recover the state? This one's unclear to me as of yet. Let's ignore that for now. Then. Oh, this is why this is not bin. This is E. Oh, I see what's going on. Then this is really E is equal to n, which is not really true, but it's kind of true. It's also a little awkward. It's not clear that I'm allowed to do that. Notice that this, this is like little sneaky clause here of E equals tab at. So if we get to here, then we're going to then the next node we're going to consider is going to be the start of the next bin, right? And then if that happens to be a forwarding node, then we set E to null because we're going to start iterating over the other table instead. But if it was not a forwarding node, then we, why do we recover the state? Still don't know that. But then we, let's say we take neither of these cases, then we will just start iterating over that bin. Okay, so then it becomes what are these two cases? Well, this case has to do with this push state. So I guess we need to look at what push state is saves traversal state upon encountering a forwarding node. Okay, so it just preserves a bunch of state and recover state just brings us back the year why that matters. Like, isn't it just going to, when it hits a forwarding node, it's going to set the table to the next node and E to null. And it's going to store the current the table we were at. And it's going to loop through E is going to be null. So it's going to go down here. It's going to look at the first bin. It's not special. So it's going to go down here. And stack is going to be not equal to null. So it's going to immediately recover that state. So this right to tab is just going to be overwritten. But we're going to read from tab first up here before it gets recovered. Ooh, this is some sneaky code. I see what's going on. Oh, that is all sorts of sneaky. Okay, let me see if I can explain what's going on here. When we hit a forwarding node, here's what we're going to do. We're going to set the current tape, we're going to store the table we're currently in, like we're going to store basically our current state. And then we're going to recurse into that table. And then we're going to continue. And when we continue, we're going to execute this code. As if we are in that table, right, because we set tab and T gets set to tab here. And then we read the bin from that table. And then we're immediately going to recover the table we were in. So we're going to like recurse down just for the purposes of looking at that bin, and then we're going to pop back up. It's a little unclear to me why this is necessary, like why you can't just read out E from here. Like why is the solution not to do here? E is equal to tab at this. And I, I'm not sure. What does push state do doesn't actually clear any state. Okay, so it's going to immediately recover the state again. And then it's going to continue. And now E is going to be not knows and then it's going to walk the bin. Just hit this case repeatedly. And then eventually, it'll get to the end of the bin that we sort of popped into. And it's going to. And then it's going to try to read from the top level table again. And how does it know not to continue where it was? What does recover state do? Oh, that's awful. Why would they write it with this? Next plus equal. Len is s dot length. The question is how does it end up also, how does it hit this clause? Oh, oh man, this is okay, it's going to the the first time it hits a forwarding node. Uh, it's going to go into this clause. It's going to push the state change table, iterate over and descend into that table. The second time it hits the forwarding node, it's also going to do the same. It's going to push the state. It still ends up in this. It still ends up in this clause. Hmm. Oh, this code is written in such a convoluted way like the control flow here is awful. Okay. Okay. Nope. Um, okay, let's say the word bin zero and bin zero is a forwarding node. We're going to execute this code. This is just going to sort of straight execute because it is equal to null. So it is not going here. We're not out of range. So we're going to keep going. Uh, it reads bin zero and it's hash is less than zero because it's a forwarding node. Now it changes table to the forwarded table. Um, and then it remembers that we read bin zero in the original table. Uh, and then it continues. So it goes back to here. Uh, then E it, uh, E is the E is null. So it does not go in here. It now reads the, uh, it now reads the I've been. So the same bin we were at in the original table, the I've been of the target table. And that is a normal. That is a normal node. It's not a forwarding node. So it does not enter the cells. It does not enter this if instead of goes down here, stack is not equal to null. So it's going to recover the state. And when it recovers the state, it's going to do this business, uh, which resets the table, which resets the table and the index, even though we haven't changed the index at any point here, none of this changes the index. Only this clause does. Um, so this recovers. What is this? If S is null, if we've recursed all the way up to the top, okay, if we have recovered all the way up to the top level table, then, uh, then we add base size to the index and adding base size to the index brings us to the, uh, high, to the high half of the target table, except we're still in the, we're still in the tab refers to the low table. Um, so this is the termination clause saying, uh, if that is greater than N, then we have completely exhausted this bin. Oh, I think I see how this is starting to hang together. Okay. So what we're actually going to do is we're going to, when we hit a forwarding node, we're going to follow it all the way. There might be multiple forwarding nodes, right? We're going to follow the forwarding nodes, uh, to the same bin and the deepest table. And then once we've gone through that bin, then we're going to pop up. We're going to pop up all the way. Ah, we're going to pop up one level. Yeah, it's this recover state is not really a recover state. It doesn't recover this state. It recovers us to, um, it recovers us to the high bin of the table above the one we were in. Okay. I think, um, I think I see how this hangs together now. I think what we're going to do is actually just port this code relatively directly and then leave a comment as to roughly what it is doing. But this is definitely going to take a little bit of cleverness on the iteration. All right, let's give it a shot. So we're going to need recover state and we're going to need push state. Uh, so up here, we're going to do, um, what was it called? Push state itself. And it's going to take T and I and N, KVS. Okay, so T, I and N. And push state seems pretty straightforward. There's like reuse business is probably to avoid allocations would be my guess. Um, so there's going to be this notion of a table stack, which we're also going to need. I guess up here, actually, no, it's going to go at the bottom. So a table stack and a table stack is also going to be GKVS because we don't really have a choice. Um, and a table stack has a length, uh, an index table and a next table stack. And I think for these, these are actually going to be boxed. It's like one of the tricky things here. It's just, uh, now we're going to leave that for now. Um, all right. And then this table stack, I guess we're going to have up here somewhere stack is going to be a table stack of GKVS. And it's going to be something like an option box because initially it's probably none. Uh, and then push and there's also this notion of a spare. What is the spare for? Oh, the spare is so that we hopefully do not have to allocate. It's sort of one that we can reset and reuse if we wish. Theory, this could use like a, not an arena allocator, but it's sort of like a pool allocator. And this is sort of a poor man's, a poor man's pool allocator. Um, okay. So pushing the stack is going to be at stack is going to be self dot spare dot take. Uh, and if not equal to no, that's, um, um, s stack, then self dot spare is s dot next. We're next because next is an option here as well. Yeah, this is really these, these next, the next pointers in spare are just there to, um, basically this is a linked list, right? This table stack is really just a linked list. And what we're going to do is the tail of the linked list or if we need a new table stack, we're going to try to steal whatever is at the head of the linked list of things that we're not using. And when we stop using something, we're just going to stick it onto that linked list. It seems fine. I don't really know why it has to be a linked list. Like my intuition here would be to make it like a Vec DQ or something. Um, like rather than allocate each one, but shrug, there might be a reason. I guess we're about to find out. Um, otherwise, otherwise we have to allocate one. I guess we're going to do stack is stack unwrap or else if there wasn't one, then we have to allocate one. Which is going to be just a table stack of I guess only default values, it seems like. So it's going to be like length zero index zero. My guess is this is actually going to be although the fields are going to be, uh, uh, instantiated just below table. And next it's going to be none. Yeah, exactly. So then we're going to say, um, there's like, there's a different way for us to do this, which is arguably nicer. But instead I'm going to say, yeah, actually let's just do, let's do this instead. Target is going to be a table stack, which I'm going to set up on the stack notice. And it's going to have table. It's going to be T. It's going to have length, which is going to be N. It's going to have index, which is going to be I, uh, and it's going to have next, which is going to be stack, uh, is going to be self dot stack dot take. Uh, and then we're going to really going to do here is if let's, um, s is stack, then we're going to do self dot stack is going to be, um, if let's, um, we could technically do a map here instead. But what I'm going to do is, uh, let's use the same. This is not really stack is the stack is what we're the stack we currently have like self dot stack. So I want to use a name that's not that. Um, so if that already is one, then what we're going to do is some, um, uh, we're going to do standard or actually, we don't even need to do that. We do, um, star s is target. I'm going to do some s. Otherwise, we do box new of target. That way we don't have to repeat the, the instantiator and the fields. Right. So this is, I'll get on the stacks is basically free. Um, and then we're going to set self dot stack to be either if they're, if we did get a spare, um, then we're just going to overwrite that spare with whatever it's in target. Um, and then return it. Otherwise, we're going to use, um, otherwise we're just going to allocate a new table stack. So that seems about right. Um, all right. And then recover state mute self also takes an n, which we still have to sort of figure out why that is. Um, but recover state is going to do let mute n. Okay. I guess let's do a loop. S is going to be self dot stack dot take. I guess this is then really a while that some s is self stack take. Um, and then, uh, self dot index plus equals. Uh, but this doesn't actually take it. Does it? Yeah. This doesn't actually take it. This is while let some is self stack. That's really what we're doing. So while there is a current stack frame, uh, then we increment index by, uh, s dot length. And then if self dot index is greater than or equal to n, I guess actually if it's less than n, then we break. See, this is where comments would come in really handy if the original author of this code had actually commented it. Um, okay. So n is going to be mutable apparently. Uh, then n is going to be s dot length and index. This is self dot index. I don't like that in Java, you don't have to say whether something is referring to a global field or not. It's not great. Um, self dot table is s dot table. Right. So this is the popping part. And then s, oops, table. Oh, um, and uh, s dot table is shared null. This is just to make sure I assume that we don't have dangling. We don't have pointers to things we're not allowed to have pointers to anymore. Um, and then next and let next is s dot next. What is this though? Oh, I see. This is, we popped s at this point. We're popping s. And that s is self dot stack dot take. We are popping the stack. And so at that point, once we have decided that we're popping this, um, the stack, then we want to save that stack frame reuse. So here save stack frame for reuse. And that includes we're going to reset the table. We are going to, we're going to add it to, oh, interesting. Oh, I see what's going on. Um, we need to, we need to keep track of what the next stack frame is, right? We're popping an s dot next is the next stack frame we're looking at. And so I wonder why is this not just, the next variable is just not needed here. If you just move this line to here, then you're fine. Right. Self dot stack is s dot next dot take. So unclear why that next is necessary at all. But we do need to reset it for next use. And that's going to be s dot table equals null. Don't know why they said s dot table equals null. Maybe just to not have an envelope. Oh, it's probably for garbage collection. You want to make sure that the in Java, right, it's going to analyze the entire stack to figure out what things to free. And if you had these things sticking around in spare that held up garbage collection, that would be bad. So, but for us, that doesn't really matter because we don't have garbage collection in the same way. We can at least like make sure we don't leave an old pointer around, even though it probably won't matter. Right. And next is going to be self dot spare dot take. And then self dot spare is going to be s. I guess some s don't Oh, see, okay, so this code is already easier to read at least to me. Look at recover state, right, it doesn't actually pop the stack. The first time you try to pop the stack, it increments index. But if that index is still within range of the current table, it's not going to pop. Instead, it's just going to, it's just going to go to the high bucket. Don't pop if we are still in bounds. Instead, in fact, this changes self dot index, right, that sets index. So really, what we could do is this and that is much easier to read. Although it's going to use that here. So what we can write this as if self dot index plus s dot length is less than n. It doesn't mean that we repeat this operation, right, this addition is going to happen twice, but it makes it much easier to read code. So what we're going to say is if if we haven't checked the high, the high side of this bucket, then do so. Then do not pop the stack frame. And instead move on to that bin, right. So notice that this does not pop the stack frame. Only once we have gone through the, the sort of other instances of the bin that we started forwarding down, only then do we actually pop the stack frame. And then at the end here, there is a, that's just a break as equivalent to a break. Yeah. And then down here, if s, what is s here, s is just self dot stack. This is the first thing it's assigned to. So if self dot stack dot is none and index again, like these like inline assignments and conditionals are make for terribly hard to read code. If self stack is none, then we're going to do self index plus equals safe base size. And then if self dot index is greater than or equal to n, then self dot index, then self dot base index plus equals one. And self index is self dot base index. All right, let's see if we follow this. If we've gone all the way down to the top frame. So this is like the original map that we started iterating over. Then we're going to try to move equals self base size. Is base size ever change? I feel like base size does not change. Very weird. But it doesn't mean that we have to parse all these inline. It does not. But base size, how can index plus base size ever be? Oh, I see. That's sneaky. If we go all the way up to the top, this might actually be easier to draw. Let's see how this works. All right, back to one of my famously bad drawings. So when you initially create an iterator, what you get is a pointer to a table. And that table, oh, wow, I need this to be less stupid. This is the pointer that the iterator holds. And it has a bunch of bins. And if a particular bin has been forwarded to a new table, that new table is twice as long. Right. And they're sort of, if we imagine that this is a current bin, right, then really you can imagine that there's sort of an artificial line here that's like the same n as the number of bins in the original. And the low half, like we talked about before, the low half of the bin is here, and the high half of the bin is here. Right. So we want to iterate through all of them. And this is at the old I, and this is at the old I plus n, where this is n. But now imagine that this is also a forwarding node. Right. This is a forwarding node to a table that is sort of mirrored yet again. Right. So now there's one here. And there's one here. And then both of those have their own mirrors. They both have a low and a high. Right. And so there's a low, there's a low of there's a, what would this be? This would be a high of the low bin. And this would be the high of the high bin. And notice the difference between these is n all the way along, where n is like the original n. And so if we have recursed all the way down to this table, so the current, the sort of current table in the current end, if you will, is like four n, really, then one way for us to iterate through all these bins is to take the original I we had for the bin we're at, and then iterate through I, I plus n, I plus two and I plus three and etc. until we have exceeded the length of the current table, which is four n. And so that is what this this code is doing. It's saying we're going to iterate up to the top. Once we no longer find forwarding nodes, we're going to go back up to the top. And we're going to add base size, which is like the the size of the first table, so the the end here to index. And if that index is now greater than n, where n is the size of the deepest table, only then have we iterated through all the highs and lows of all the forward all the highs and lows of all the highs and lows of all the intermediate bins. And then we move on to the next base index. And now the index is the next base index, right? So that would be equivalent to us moving on to why can't I do this? Because apparently I'm bad at this. So at some point, we're going to move on to this bin, right? And we're going to move on to that bin only once we have iterated through this n is actually base index base, no base. What's it called? Base size, right? And an n is really this n is equal to four base sizes, right? And so once we have iterated, once we're going to keep increment, we're going to keep adding base size to the current index we're at, until that index is greater than n, which is some multiple of the some power of two multiple of the base size. And at that point, we want to increment the base index. So base index, sorry, my writing is terrible, points to here, even once we start recursing into these. And only once we we've sort of tried to move into this area and been like, oh, that's out of bounds even in the deepest one, then we increment base index. So now we're going to move to the next bin, which is over here. And then the whole sort of juggle starts over again. This is also why it's useful to you might see from this drawing why it's useful to keep these tree stacks. Because if you remember, when we do a resize, we do it bottom up. And so once you have hit one of these forwarding nodes, you're going to keep hitting the same forwarding nodes going down and the depth will be as lead, at least as deep as they were further up. You will need at least as many tree stack nodes. All right, hopefully that made a little bit more sense than my rambling earlier. That roughly makes sense. I'm not sure if the chat is working currently, sometimes it gets sad. Okay, but I think now we have a decent sense of like, this is, I guess, move to the next part of the top level bin in the deepest table, largest table. We've gone past the last part of this top level bin. So move to the next top level. Oh, chat seems to work. That's good. Okay, you both work. Great. Just wanted to double check. All right, so now at least we know roughly what this means, recover state is a bad name for this function, right? It's like next bin is really what it's about. But let's ignore that for now. So that's going to recover state. Yep. Yep. I don't know why this is also necessary, but I'm just going to believe them. I plus base size. Yep. All right. Great. Okay, I think that's all of the iterator code, right? Yeah, I think that's all the iterator code. So now the only thing where this is probably going to go wrong is if I remember correctly from node, it's not possible to have a shared node, you can only have shared of bin entries, which means that we're going to have to do some unnecessary matching, which is a little sad. Specifically, this is going to be a bin entry, last bin entry iterated over, even though we know it will always be a node, which also means that in our iterator implementation, we won't actually do things like, I guess, didn't we have a as node? Yeah, we did as node, because we know that this must be a node. So we're going to do an expect. We only ever iterate over nodes. I guess that's why it's not traverser. That's why this isn't being traverser. Traverser. So I'm doing something wrong here. Where's my syntax error? Traverser line 35. Great. Finally, formatting. Oh, cool. Hi, Zorin. Thanks for editing Rustation Station episodes. It's really awesome. All right. So I think in theory, that means that this iterator should now work. So here, notice, we don't actually get to use the end here. This is going to be equals bin. But that's fine. We don't really worry about that. Okay, yeah, I think that's good. I think that's good. And in theory here, we could have a bunch of test cases here for this iterator in particular, right? I think where this is going to get complicated is the guards. I don't have a good solution for that. I think what we'll want here is actually a node iter. Yeah, this really is a, I guess it's a bin entry iter. But it really is a node. Why does it do that? We can call it a bin entry iter. That seems fine. Because that is what it's going to yield, even though it's not necessarily what you, even though we know that they are all nodes. And so this is going to be a bin entry, sadly. And so here, one thing we could do is like, in fact, why don't we do it anyway? Mod tests, we can have like an integration test here, which uses super, super, I guess just create is fine. And here, we could actually set this up if we wanted to. That might actually be the right thing to do is rather than have this do all the inserts, we can just construct the table the way we want. But that would be the rather than this using the flurry, the like top level hash map interface, this could just use the like construct a table itself and then do the iteration, just so we have more control over like, where are there forwarding nodes and that sort of stuff. See, maybe we just want to do that. This doesn't care about things like hashes. So we can just, it's totally fine for us to construct what is essentially an illegal table here. So let's do that. Let's just do that. That seems great. It are empty, right? So I'm going to just create a, I forget exactly what's in a table again. So table, I guess fine, it'll be an old new. Use super because I want the types from this module. And this is just going to be, I don't know, like 16 seems fine. Because it's a default anyway, right? I guess default capacity. No, I want it to be 16. And then we're going to say iter is going to be a bin entry iter. I guess we need a guard. So let's do self. Let's do here use cross beam epic. And we're going to, we're going to construct a new one of these. Didn't we add a constructor to this? We didn't. Great. So we're going to do a bin entry iter. And we're going to do table, I guess shared from table. And we're going to do a guard as a reference to the guard. And then we're going to do assert equals iter dot count should be an iterator, right? Zero. And we're just going to see what happens. Well, a bunch of things are failing. Bin entry. Okay, that is easy to do. Unexpected type argument table doesn't take s. That's lovely. That is lovely. We don't need the s here at all. Because all it cares about are the tables. Oh, that makes me so happy. This, this great. This is great news. Love simpler type signatures. All right, what else do we have? That's a bunch of stuff here. I'm guessing a bunch of it is going to be related to shared not implementing D ref one, one, two. What? Fine. I guess I'll do cargo check first. Yeah, a bunch of these are going to be the same thing. Okay, so traverser is not going to need atomic or owned. Well, we are going to need owned down here. So we'll do this and owned. But it's going to more going to be things like here, right? Where it's unsafe for us to D ref bin, for example. And we need to we need to argue why this is safe. This is like the normal that the normal argument we have to make. And the argument here is that she why does this even need? It's a good question why that even matters. Like, how could this possibly be unsafe shared was read under the guard? I guess. Flurry guarantees that a table read under the guard is never dropped until after that guard is never dropped or moved until after that guard is dropped. Right? This is the same sort of safety guarantee that we've been using everywhere, essentially. And my guess is that most of the other ones here are probably going to be the same. What is this index? Right? Table stack index is an eye size. Even though there's just no way it's ever negative. I just do not believe it. But I guess it was we said overflow. I don't want to care about overflow. I really don't. Yeah, I don't believe it. I don't buy it for a second. But maybe I'm just being stupid, but I don't buy it. I don't want to inherit their weird oddities for how this should be. Oh, yeah, that it didn't compile because I mixed use size and eye size. But instead of doing the casts, I'm just going to use use size everywhere because I think I believe that that is the right thing to do. Um, is to not inherit the like overflow behavior from Java. Um, 86 no field length on option table stack. Um, okay, this we can unwrap because of the while let some above while at some. That should help with a bunch of these. One, one, seven. This is another unsafe D ref safety is that flurry guarantees that a bin entry read under a guard or it's really just any read under a guard from inside table will remain valid until after the guard has been dropped. I should take care of that. One, three, one is probably going to be the same argument here as my guests. Um, see, this is this is unsafe safety. Flurry only deallocates after guards guard drop. And we really could use this for like all of them. All of those in this file at least are really just relying on that same, that same safety guarantee on 33. Take value of method bin. No bins. That's the one 145. This is going to be the same thing where we're going to do that. A match unsafe bin D ref. And this is the same safety guarantee 148. Here, here we have to make the same safety guarantee that for the other place where we follow moved. So if you recall in lib this one. Yeah. So the argument there. In fact, the argument here is the same as the one for why following moved here is safe as for following moved in bin entry find, which is that we still hold a reference to the top level table. Um, and, uh, since that hasn't been freed, nothing, nothing later than it has been freed either because we haven't dropped the guard that we used to read the top level table and 49 actually, this is another thing that's kind of stupid. We need to keep the table and that needs to be passed as a shared down here. I think because it can't, I guess the table stack in theory could store references instead of shared probably, um, but it wouldn't be able to restore them. So, um, in fact, one thing we could probably do here, let's, let's finish it with shared first, but I think we could probably make this a regular reference rather than a shared. And, uh, same with this, in which case we could even make it a node. In fact, how about we just do that? So this is going to be a reference to a table under that guard, right? Oh, I guess we can, that's fine. We can take a shared here, um, and then do like this. Oh, I see. No, this can just be an option table. Is the real way to do this where we're going to say that, um, table Len is going to be this or a table is that, and then we're going to return some table and table bins Len. And that's going to be table. That's going to be Len. And the reason for this is because that we already established the, the fact that we're allowed to deref these shards when we access them. And because the guard, because we know that the guard is out living, uh, this iterator, uh, which this mean this can be now, uh, we already know that all of the references that we deref are going to outlive this guard. As we store this guard in the iterator itself, um, these can be stored as references because they're going to remain valid for the entire time. The reason we want this to be a shared is because it's going to be read out by the caller. And that means that it might be null, right? Like for example, if the table hasn't been allocated yet, and we want to keep that interface. So we're just trying to be nice to the, to the caller. But everywhere else here, these can just be node, node. In fact, it means that this can even yield these because the guard again, remember is tied to the iterator. So as long as the iterator lives, as long as the iterator lives, the guard lives. And this returned thing is tied to the lifetime of the guard. And so it's fine for us to return them, which even means that even if you drop the iterator, this is gonna, this reference is going to remain valid because it remains valid until, until the guard is dropped. It is independent of the lifetime of the iterator, which is what we actually want. Which also means that this can be a reference. And it also means that this is really a none. It also means this unsafe can go away. And here we're going to need, this is going to be next. And if next.isNull, then e is none. I guess we can just do let e, we can just do e is, if next is null, then e is none. Otherwise, and this is going to be just another D ref, right? Where if it's not null, then we're going to have to an unsafe next.d ref. And the safety here is again that flurry only, the flurry does not drop or move until after guard drop. The move is important here because if it could move, then that would also invalidate the pointer stored under the shared, right? And then this is going to be a sum. And the next here we also know is going to be a node. Because only nodes follow nodes. If, and now this becomes, if let sum e equals e, self.table this no longer has to be unsafe because we already do the D ref in the constructor. This can now just be a straight up self.table. This does have to have an unsafe because here we're getting another shared bin dot D ref. But then we can match on the bin just fine. That's fine. I guess this is really, these can probably all just be represents star. No, they probably can't be actually this probably does have to say dot D ref. Um, and then prev is going to be none now. It's no more of this shared business where we can avoid it. This is going to be if let sum prev is self prev. And I guess this doesn't even need to say this because it's already, it starts it as none. So we can just say if not next is null, then this then is this. Let's see. What else do we got here? That does have to be unsafe. That does have to be unsafe. E can now be set to node no longer need the E to just refer to bin entries because the reason we couldn't before is you can't construct a shared node or you can only construct shared bin entries. But now that the references it's fine. Um, okay. That makes me so much happier. Let's see what this looks like. Oh, I mean, it's going to, it's going to complain, right? Now this, we still need to import node into here. Should make it a little better. Line 45 prev is now also going to be a none 89. It's going to be s dot table. It's going to be some of s dot table. So the, I guess really for this iterator business to work, like this has to be option table because it could be that the table stack is generated over. Okay. So the thinking here, first of all was table can be none, right? If it hasn't been allocated yet, but I don't think you should construct a table stack on something that isn't allocated yet because you wouldn't see any forwarding nodes. So technically this should actually, you should never be able to construct a table stack where the table is none. And so really this should be some is guaranteed to be and here where we call push state where is our push state up here. So this is an is none. This is an as ref unwrap and down here, we actually know that this is is none in if above, right? We know that this is some because of this is none up here. And so therefore we can operate on it. All right. Let's see what that gives us 45, expected node. Prev. This is also an option because there might not be, there might not be a previous node that we've iterated over. Are we at now 94? This is just getting rid of the reference. You know, it probably doesn't matter there either. 121. Oh, this is just bad parenthesizing of me 94. Yep. This now is just not a thing because remember here, this is where we set when we're storing one of these stack is like table stack that we use to keep track of how deep we are. We use this at the table to none, which is what the Java thing did to get things garbage collected, I assume, whereas for us, that's now a reference. And you can't set references to none. They need to always be valid. But I don't think there's any reason why this wouldn't be. So I think we're just going to do that. And then 130 is going to be that. All right. How are we doing 151? Oh, I see. This is a raw pointer. It is not a, it's a raw pointer. It's not a shared and 151 expected option. Right. I expected table found star. Now I see because this is going to do a match by reference is why. And 56. Can I borrow s dot zero is mutable as s is not declared as mutable. That's fine. Nice. All right. Now let's see if this test actually does anything. So here we're going to do a node iter. With a shared from that table. Let's test the lib. We need atomic and we need shared. Wait. How do you go from a, from an own to a shared? Isn't there into shared? That's fine. Although I kind of wanted to. This consumes self, which is a little sad. But, but I guess what we can do here is table is table dot into shared with the guard. And now this table can be passed in here. We assert on it. The iter is dropped. And so now we can do table dot into owned. Actually, I don't even need that scope. And the safety here is nothing holds on to references into the table anymore. It's something stupid. Great. And cannot infer key type. Fine. Key type is going to be use size. Ooh, segmentation fault and valid memory reference. All right. So something here is broken. I guess actually there's there's an even simpler test here, which is we don't even do this. We don't do this. We just do shared no. Let's just make sure that that actually works. Right. Like this is just super straightforward stuff. And I guess we need a type here as well. That's fine. All right. So at least that works. So this is like iter new and this is iter empty. Right. Notice that they're different. This one has been allocated, but all the bins are empty. Whereas this one is like a complete blank. And my guess is that the seg fault occurs up here where we need to check if bin is null. Then what? What does it do? It only executes this whole business. It's just going to loop. I think this code gets executed even if the current bin is null. So we actually need, we need these lines to come down here. And then this is going to be if not bin is null, then we're going to do this. And then we're going to do that. Nice. Okay. So we now have an iterator that iterates over nothing. So that seems helpful. Now let's see if we can get it to iterate over something that's relatively straightforward. Iter simple or something, right? So here, what are we going to do? Well, I guess here there's going to be a bin entry node. All right. Remember here, we're not actually constructing a valid table necessarily. All we're doing is constructing something that can be iterated over. Node here is going to be hash is going to be zero. Key is going to be zero U size. Now this should no longer be necessary. Value is going to be atomic new zero U size. Next is going to be atomic null. And lock is going to be, I guess, mutex new. So we're going to here have to use, I guess, parking lot mutex. And now let's first of all, just check that it gives a count of one. It might very well not expected bin entry found owned like so. This is actually not what we want. We want atomic null bins. And then we're going to say bins zero is that the other one would have created 16 instances of this, which is not what we want expected. All right. This has to be a semicolon. We want 16 of that mutex new of nothing. All right. That crashes drop table with non empty bin. Oh, we're hitting IC during testing. If you remember in our in our lib code, let's say drop table. Well, in our drop of table, we're going to drop all the bins, but we assume that all the nodes have been dropped by the, the like flurry map business. And so we will have to work our way around that. And the easiest way is probably just to provide a helper for emptying bins. So this is just going to be like empty bins of a new table. I have water. Thanks. Good call. Okay. So it's a can be and all it's really going to do is I suppose. Actually, I'm going to make this even easier for us. I'm going to say that in test mode, in test mode, we're going to be really nice and drop the tables. Ah, that's not right either. It's a little annoying, right? Because we don't really have to drop things in tests. But it would mean that if you run a lot of tests, you're just going to like consume a bunch of memory might not matter too much here, but it just feels dirty to not do it. Um, how about we do something like fine, we're going to go back to what I had here. And then this is going to move through all the bins. And then it is going to, and then here, I guess we're going to actually walk the nodes. And this is the same thing the flurry map does, right? Remember the flurry map will walk the table and for each one, drop all the nodes within each bin. It's a little awkward to replicate this functionality here. Like that seems a little excessive. The other option actually is for us to just construct a flurry hash map. Um, but it doesn't seem great either. Here's the actual solution to this, which is, um, Imple, uh, here, we're actually just going to have a private drop bins on table drop bins. That's just going to do the same thing. This boom. And then we're going to do table drop bins. Save great, great stuff. And the reason we want to do that is we can do in traverser. Um, we can do let mute t, t drop bins. And then we can do the same here. That does great. Okay. So now, um, okay. So it means they can iterate over at least one element. And I guess really what we want is, uh, something along the lines of we want to search. I guess we want something like here. E is iter dot next, uh, unwrap. And then at the end, we want assertic, uh, iter dot next is none. That's really what we want to do to make sure that the iterator doesn't yield items that aren't there. And then we want to assert something like the key should be zero. And the value, actually key is zero should be fine. We can make it 42 if we want to, but yep. And then we could make this, we could make this have a bunch of elements, but I think that is relatively uninteresting. The one thing we can do is at least move this to like the middle of the map. It shouldn't make a difference. What we will want though is something like, uh, iter forward, right? This is where it gets a little bit more hairy, right? Where we're going to have one table and then we're going to have a next table. This is going to be deep bins, deep table. So here what we're doing is we're constructing, um, we're constructing the deep table that's going to be pointed to, and we're going to construct the shallow table, which is the one that has just forwarding entries. And here, I guess really what we want is, uh, for bin and bins, remember how when a table is moved, when there's a resize, all of the bins are replaced with these forwarders. But we kind of want to emulate the case where only some of them have. So what we're going to do is all of the ones from, let's say, eight and onwards, um, we want bin to be equal to, let's say, bin entry moved, and that's going to be at, so this is going to be, construct the forwarded to table, uh, and then construct the forwarded from table. And now we want to make sure that it still yields just that one element, right? So you'll notice that bin, uh, bin eight is going to get forwarded, right? And so in theory, it should arrive at this entry in the deeper table. I guess we're about to find out. Uh, this needs to be a star. And 261, this should say deep table. And I guess actually here, this is going to be deep table drop ins. Can't compare U size with I 32. Why does it think the key here is I 32? Interesting. Well, and this does need to be mute table. That's fine. Great. Okay. So it does follow forwarding, right? We're constructing the iterator over the, the top level map, and then it follows the forwarding record and returns the one at the bottom. Okay. So now we have like a low level iterator test. And now finally, we can go back to the original test we were trying to implement. Uh, this is going to be bright again, um, which is this, these map check tests. So I guess skit add dot, uh, commit now, man, I have messed up. How did I, it's going to include the formatting business. Yeah. Add source, iter. So here, what we want, I guess, is, uh, we added node, iter. This ad contains key and the formatting. I think we can get rid of at this point. All right. So now it's time for us to try to port the first of these concurrent tests. And that's going to be map check. Um, and I think what I want to do here is I want to make their tests, uh, like Java or I guess upstream JDK JDK JDK is good. JDK, uh, map check dot RS. And so here, cause I want these to map onto easily map onto the, the Java tests, if possible, right? Uh, so I'm going to use the same imports as I used over there. Um, and now we're going to have to figure out what to do to this file. And to save all of your eyes, I'm going to actually, no, that's not how I'm going to do it. That's not how I'm going to do it. I'm going to raw. We're going to do map check dot Java. There we go. That way we don't have to switch to this all the time. And I'll go over to somewhere darker. Um, okay, main, uh, new map. Okay. That's fine. Run test force, mem. T one, I'm guessing is test one. Uh, how are these even run? How weird. Oh, the setup of this file is awful. Um, okay. I guess the idea is that T one is a particular test that they're going to run with things that are absent and things that are not. Okay. That's fine. I can do that. Why does it run multiple iterators? But all right. T one, just, just, just go with it, man. This is going to be a static string name and use size. It's going to be a flurry hash map. Who knows what the types here are. Object. G. Thanks. What calls test? Run test, s test, test, s. Key. Okay. Run test. Map, map class. Map class. New map. Okay. What does a new map do? Map class. New instance. Okay. What is the key type? New object. I see. They're actually not even using a specific type. They're just using the object type. That's chickening out. It is true actually that like we could make all of these generic over the key and value. And because we can, we will. Because why not? Awful stuff. Awful stuff is what it is. Uh, t, t1. Great. Um, okay. So key. This should say keys. It's not key. They're lying. Is a vec of k and expect is a use size. Sure. If you say so. Uh, let sum is zero. Let itters is four. They do timers, which we're not going to do. Although my guess is they probably uses for benchmarking as well. I'm just going to ignore the benchmarking aspect for now. For j in zero to itters. Uh, for i in zero to n. Why are they doing plus plus j on one? And then I want if map dot get, uh, map dot get keys. I is some, this is where we probably want to overwrite, um, the square bracket operator on the map, just so we don't have to call the get method. So we could totally clean up the API for this a lot, right? Um, is some then some plus equal one. I can think of such better names for these variables, uh, time adult finish. And then at the end cert equal, uh, sum and expect time itters. All right. We have ported t one. I'm going to, um, set up the structure around it to before we start porting the other teas. Okay. So they then have a fn, uh, test. Great. I don't think I want that. I think I want a test t one present, which is really get present. Let's use sane names for these. Uh, and that this does size s key size. Okay. So there's some notion of like a set of keys. Do they run tests multiple times? Yes, they repeat the test multiple times, but with the same set of keys. What an odd way to run these tests, but okay. Um, this is going to be a let, uh, size s key. Okay. So the call to the call to test is they do let size. Okay. They, ah, they shuffle the keys. That's how they differ with each repetition. Um, maybe this would be good for a macro. Actually, um, is keys dot Len, uh, don't care about the start time. Okay. So there's a, it's like a let keys is equal to a vector of some form and that vector is of size. Where does size come from? Sure. Why not? Num tests. No, I refuse. Uh, size is 50,000. Uh, where does absent size come from? Okay. So they have const size is going to be 50,000. They have const absent size, which is going to be 117. Um, and then they really have a, this is really a macro rules. Uh, something like stress, maybe, which really just takes the name of a testing function. Well, let's write it for one person and see what comes out the other side. Uh, let keys is going to be some back. Then there's going to be a key start shuffle, which means we're going to need R and G stuff. Let's do that in a bit absent. I is new object. Why do they even have an array of absent? This variable is not used. Oh, it's a awful, awful stuff. Okay. So they have a vector of absence, which are just objects run test size is keys.len. And then what else does tests do? The keys are just basically random numbers. Because, um, so in, if I remember correctly in Java, objects are hashed by their pointer, which is sort of kind of random. And so what this is going to do is it's going to give you different objects each time. Um, so really this is going to be something like, um, a zero to size map nothing, uh, like grand sort of collect. And there's an argument for like, you could shuffle it if you want to, but now that it's random anyway, it's like not clear. It matters. Um, and then this compute size again, because of course, why not get present is then passed in. Why is this a separate parameter? Isn't it always just the length of the third field? It's definitely always just the length of the, the number of keys. Like they say size equals key dot length and absent size. If we recall, absent size is just the size of absent. So in all these cases, the first argument is just the length of the third argument. Right. Am I missing something? Certainly for T one. So this is not going to take an N. It's also not going to have a name. It is going to take a keys and it is going to take an expect. And then this is just going to be for key in keys. And then this is just going to be key. Great. That is much better. That makes me much happier. Now get present is going to do that. And the, as the, is the S actually just empty? Nothing populates the S right? It's just new map, which just creates a new instance. Okay. So it is indeed initially empty. And that is presumably why, why does it expect? No, no, no, no, no, no, no, no. Here it actually expects all of them to be present. Where does it fill? It doesn't. Instead, this put test fills the map. So that's why. So stupid. It's so stupid. So really there's just one test and it's like everything. And the everything test is going to construct a bunch of keys. Then it's going to run a put test, which is T three, because of course it's T three. Uh, and it's going to give so, okay, okay, okay, okay. So there's a map, which is going to be just a flurry hash map new. And T three is going to be given a reference to the map, reference to the keys. And I think that's probably it actually. So T three, bear with me here, T three, why these names? Why, why, it's awful. It's going to be generic over that. This map is actually going to be this and this is going to be this. It's going to take also a map, of course, of K and V. Uh, and it's going to take a keys and it's going to take an expect. Very exciting. And what is it going to do with it? Well, some is null, zero, uh, N. If I recall is size, which is the length of key. It's awful for I in zero, zero to keys, Len. And then if, uh, map dot put keys, I absent, which not entirely clear what this is going to be yet. Absent mask is none. Okay. So expect here is the number of non overwrites. And then assert equals some and expect. Because it's a little unclear what this absent business is. I think, I think this is really just going to be like, uh, none absent size. What an insane way to get absent mask is absent size minus one. Absent size minus one. And this is going to be like, uh, sure, an option of U size of absent size. I'm just going to put absent. That seems fine. Although there's the absent things are all supposed to be distinct. So that's not going to work, but let's leave it for now because it's stupid anyway. Um, okay. And so everything is going to run T three with that and keys and expect, uh, expect all of them to succeed. It's going to run it again and it's going to expect none of them to succeed. So this is put absent. This is put present. Then I'm just going to ignore contains. Yeah, let's ignore that for now. Let's, let's do the ones we have. Uh, why is T six different than T one? Why? Why are any of these things the way they are? What an insane way to, I can't find, okay. So T one is going to be given a map and keys. And we're going to expect to get zero and then get absent is going to be still S, but the keys are going to be all the absent keys. I see. So the absent keys, there's going to be some scent of absent keys and there's going to be some set of present keys and we need them to be separate from one another. And so here this is where Java has a bit of an advantage in that they are all distinct anyway. And what we are going to do is this, keys dot shuffle. And here we're going to need an R and G at some point. And then we're going to say key, absent keys is keys zero to absent size and keys is going to be keys absent size and onwards. And I don't know why this is using absent as the values. Any ideas? We were in the test with the same keys, but in shuffled order. Yes, we can do that too pretty easily now. But why is it using the absent things as values? Is that just a way to choose random? I think maybe so. It can't quite be random because we have to be able to check for it later. That's what's going on. Okay, so we're going to go with values. And this is going to be values, I mod values. Yeah, the trick here is we need to fill the map with a bunch of values, but it refers to later check that the right values are there for the right keys. We need to be able to refer back to what the value should be for any given key. Which is why it's using a list here rather than just randomly choosing the values. Okay, let's make all the values zero right now because currently we don't have any tests that actually check that the value is something reasonable. So I think because otherwise we're going to have to like can't actually be zero. How about currently this will only work with you besides 43 and that's fine. Yep, that would do it. And I think the only thing we're really missing now is like we need our RAND is what's the current version of RAND? Current version of RAND is zero seven. And here we want to do RNG is I guess I probably need this, don't I? So we've got an RNG and now we want to shuffle using that RNG. And now if we try that fetch RAND that's fine. And using new object we guarantee that the value is unique as well. Yeah, so this is why for this test now actually the values are all unique. But they're just increasing integers, but they're still unique. Method is never used traverser. That's fine. I guess this can be pubcrate because this will be used traverser. No ditter just to get rid of that warning. The ditter unused import. That's fine. I don't really care about it anymore. But it doesn't actually run. Oh, right for this to work. I need mod rs and I need mod was it called map check? What am I missing? Test JDK. No test target name JDK. Oh, I probably need test JDK mod. Probably has to be test JDK rs for it to pick it up. This seems like a bug actually. This seems like something that should be fixed. Yeah, this is definitely broken file not found from module map check. Yeah, this is very broken. This is I forget what the test JDK map check. This should not be necessary. But it suggests to me that the paths are just completely off. That's too bad. All right. So map check at keys is not an iterator. That is totally right because it's already now a slice put is private. That seems unfortunate. Right. That's because it's called insert. Also this needs a guard. This should say insert. What else do we have? Absent size. This was absent mask. And in fact, these two I'm not using yet. So we can get rid of them. 38. Okay. We need some brackets there. That seems fine. Pretty sure this is valid. Method not found in flurry hash map 15. Interesting size. Method not found. Oh, yeah. This is only available assuming that you follow all the bounds that flurry hash map requires of you. Which is admittedly a little annoying. That's fine. We're going to have the same for T3 except we don't need the value. We need you stand 18. Don't care about the iteration. That's fine. Don't care about arc. Do you care about 34? Nope. 33 cannot move out of keys. I see. I think we're actually going to require that it's copy. Either that or we could have all the values be references. But I think we're just going to say copy instead for now. Hey, all right. I've been amazed actually this dream how everything has worked. Like not quite. There have been some things we've had to debug. But like this is like one of their test cases and it just works. I'm sure there are things that are broken. But like I think this also implies given the tests we've written so far that the implementation is like actually a pretty faithful port. Because otherwise a bunch of things would be broken. Right. But here it actually looks like it does like it does resizing inserts and get mapped with each other. It does these like forwarding iteration works fine. That's pretty neat. That's pretty neat. Not going to lie. It's already we're already like almost six hours in. So I think I'm going to end it there. Let me just commit the stuff that we have. Add tests, start porting, Java tests, visibility issues. There's obviously there's a bunch more of test suites that need to be ported. But I think this is something where it would be great if if like you as the viewers like take a stab at porting some of these tests now that we have the rough framework in place. And just like see if they work. If something is broken, then we should fix it. But at least now we have get insert and iterators. And so that should be enough for you to develop at least most of those tests. And in theory for people to start doing benchmarks. Once we get some more tests, we can set up like CI and stuff on it as well. The biggest sort of feature I think that is missing apart from the ones we talked about very early in the stream of sort of optimizations is support for remove. The remove method I think is not that complicated compared to what we've ported so far. So if you want to take a look at it, I think you should. As mentioned, though, it'll be a while until I do the next stream. I'll still monitor this repository and like try to handle things like pull requests and stuff. But I won't be doing a stream on it for a while. My guess is that the next stream will be like in a couple of months. I don't know exactly. But know that I still want to do more streams. I just need to settle my PhD first. So thanks everyone for watching. Keep like watch this repository and see what happens. And please like consider this project, our project, not my project. If you want to like submit PRs to add CI, improve the read me, improve the docs, like or add tests like please, please do. And I will do whatever I can to actually keep up with whatever you contribute. I hope it's been useful. I don't know whether it will do more streams on concurrent HashMap itself because I think the port is now it's like unclear that a fourth part would teach you that much more about about new topics, as opposed to just more about concurrent HashMap. So take a look at the live coding voting page that that's a good question. Does that even still work? It does. So here you can vote on what streams you want to see next. So feel free to go in there and vote. I'll post the the link in chat. The repo I think was linked above. It's just on github johnhoo slash flurry. I don't know if that's a name that will stick, but I'll post it into chat as well. And yeah, so vote there for the upcoming stream idea that you want to see. And whenever I end up streaming next, that will probably be what I'm doing. I'm also giving a couple of talks in February that I hope will be published online. And so I'll tweet out links to those when they are. Same as with this episode. As always, the recording will be up just. Yeah, just feel free to follow me on Twitter and then you'll see any of these announcements. Great. Thank you for watching. It's been fun. And I will see you the next time I stream. So long for you.