 Hello folks, welcome back to part two of our video series, if you will on lock free to weight free simulation in Rust. The idea here is to take this paper that someone sent to me a while back, and I think it's just really cool on a practical weight free simulation for lock free data structures. That's basically a way to turn a lock free data structure into a weight free data structure. I'm not gonna rehash all of the, all the definitions and what that all means and how we get there. All of that is covered in part one, which I will make sure that if you look at this video after the fact, there'll be a little pop up like up here somewhere that you can click. And if you're watching live, hopefully in chat there'll be a video to the first part video and you should go watch that. Like I don't think it's likely that you will get much from this video if you haven't watched part one. So go watch part one instead and you can come watch the video on demand version of this afterwards. I'll also just take a quick aside before we dive into the actual paper and the implementation where we last left off to mention that I wrote a book. I wrote a book called Rust for Rustations. And it sort of is what the title tries to say. Basically it is a book that tries to cover the Rust language and the ways to use it in the sort of idiomatic programming techniques and the mechanisms and how stuff works sort of under the hood, how things fit together, best practices, sort of everything that I've built up of sort of knowledge and experience with the language up through the years in book form. And it's specifically written for those who already know Rust. So this is not a replacement for the Rust book and not at all. This is specifically a book for those who already know Rust and want to sort of deepen that understanding, make sure that you actually like understand what's really going on like under the hood or understand good like programming patterns, how code should fit together, what idiomatic Rust looks like or if you just want more exposure to more parts of Rust and more cool features of Rust. It's available for like early access now. Ooh, wrong book. For early access now, so you can buy the early access and then you also get all the chapters as the released and the final book at the end. And you can also order the print book once it's actually released. I'll leave the link in YouTube in the chat. All right, sweet. Now let's go over to the paper. All right, so last time we went through mostly the construction of the sort of abstraction that we want to provide. So the abstraction here is like, remember there's a sort of a translation from something that's lock free to something that's wait free, but only if the lock free data structure is expressed in this normalized form. And so what we did last time was mostly try to capture what that normalized form is through the use of a trait. So specifically we wrote this trait, where is it? This trait normalized lock free and has some associated types. And then it has these two methods generator and wrap up where generator is stuff that happens before the critical section, the sort of commit point for the lock free algorithm. And it generates a list of compare and swap operations that are the actual commit points. And then a wrap up method that executes after the commit point has happened and then tries to sort of complete the operation and give some final output of the computation. And then what we did was we wrote this, no, not the help queue, but this wait free simulator struct, which is generic over any type that implements the trait, which is basically the lock free data structure. And then it does all of this sort of logic around when you call the generator, when you call wrap up, how do multiple threads help each other out to make progress and sort of deal with all the concurrency aspects of it. And where we left off was basically the cast execute method. We only have like a very, very basic implementation of. And then there's also the help queue, which is this like wait-free queue of pending tasks or pending actions that threads might help each other perform. And this wait-free queue it can't itself use the wait-free simulation we're using because that would be a recursive dependency. So instead we're gonna have to implement this wait-free queue from scratch. And luckily the paper includes a sort of tailor-made wait-free queue implementation in its appendix towards the bottom that we're gonna be implementing here. I think just to sort of get back into things, before we implement the wait-free queue, let's go ahead and finish up cast execute because that's sort of where we were last time. Before we do, sort of go to sort of one step up just to see what the higher level sort of call graph looks like here. So we have the, this is all in the wait-free simulator struct. We have a run method that takes an operation that some thread wants to execute and returns an output for that operation. And if you remember, this was something that made us a little bit sad, right? That technically there's like one output type for each input type. And we can't easily express that the way it's currently set up. There are ways for us to get around this, but I think for now what we're gonna do is just say that both of these are enums. And then like if you send a particular input enum variant, you should expect the output enum variant is the corresponding one. We can try to find ways to enforce that with the type system later on, but for now let's leave it the way it was. And you'll remember that in run, the sort of algorithm we're following, right? Is that first we're gonna try to help another thread if another thread needs help. Then we're gonna try this like fast path of the lock-free algorithm, which is basically, we're gonna assume that there's not that much contention. We're gonna just run the lock-free algorithm start to finish. And then we're gonna look for sort of signs of contention. And that's what this contention measure struct is for. Where we run the generator, and if we detect that there's a bunch of contention by virtue of, for example, compare and swap operations failing, then we switch to using the slow path. Then we do the cast execute step on the compare and swap operations generator generated. If we detect contention there, we switch to the slow path. And then we do the wrap up. And if it completes, we're done. Otherwise, we either do the slow path if there was contention, you'll see a pattern here. Otherwise we'll sort of go around again. So the idea is that first help, in case there is someone who needs help, then do the fast path a couple of times. And then if we detect contention or if the fast path doesn't exceed in a certain number of tries, which is equivalent to saying there's probably contention, then we switch to the slow path. And the slow path, if you remember, is we sort of enqueue this encoding of what operation we want to execute. So that's this operation record, which we stick in an operation record box because we need to be able to do RCU on it. Like we need to atomically replace this operation as we execute. We enqueue the help operation and then we keep looking at whether our operation has completed. And if it has, we return, otherwise we keep helping. So the idea is that the slow path is stick my operation on the queue and then just help everything that's in the queue until my thing completes. And that's how we sort of guarantee that over time, as long as some threads execute, all threads make progress. And then if we look at this sort of helping method, which is the core of what makes this algorithm weight-free rather than lock-free, remember the basketball analogy from the previous stream, help first just looks at the front of the queue, the front of the help queue, and if there's something there, it tries to help it. And the actual helping, if you recall, is look at what the current state of the operational record is. If it's completed, then we just remove it from the help queue. If it's in a precast state, if it hasn't executed any of the compare and swaps yet, then we execute the compare and swaps, and then we try to do basically RCU, right? So we read the current operation record, we modify our local copy of that operational record, and then we try to atomically swap back in the updated version. And even if we fail, that just indicates that some other thread helped instead, and so progress is still made, is sort of the general idea here. And so this is really just a, every operational record is really a state machine of trying to move the operational record through multiple different stages of execution where each stage is, the computation made it a little bit farther than last time. And it does this by read, copy, update, and then atomically write back using a compare and swap. And so each of these help operations is really just calling into the underlying algorithm repeatedly, right? So remember the generator can be called concurrently for multiple threads, CAS execute can be called concurrently for multiple threads, and wrap up can also be called concurrently for multiple threads. And the idea is that overall, as even if lots of threads are trying to help a single operation, you'll still make progress over time. Yeah, so this is that compare exchange where we take our RCU copy and actually write it back or try to write it back into the operation record box. Great, so all of that sort of depends on this machinery of the generator, which is dictated by the algorithm, the wrap up, which is dictated by the algorithm, and the CAS execute operation, which is dictated by the implementation provided by the paper, right? So CAS execute is really, it takes a list of command and swap operations, and then it's gonna execute each one in turn until it finds one that fails, and if one fails, it's gonna return the index of the one that failed, because subsequently, we need to make sure that we execute from there on. It's the basic idea. So if we go back to the paper, what does this say? Figure five is what we want. Let's see if we can find figure five. It's probably further up. This is my guess actually. I should have written the page number, that was silly of me. Figure five, the execute CAS's method. So if you remember the CAS's, so if we backtrack a little bit, right? The normalized lock free trait has an associated type called CAS's, or CAS's is really, this really should be CAS's, but Russ doesn't like that, as I can mention. We might try to find a better name here, like CAS list maybe, but the idea is that the implementer of the trait can choose the representation of the list of CAS operations to execute, and the reason we wanted to do that is because most lock free algorithms will just have a single CAS, and so we don't want to enforce that it's like a vector or something, but some operations might actually want to have multiple CAS's that need to be executed, and so we want to support that case, again, without making that be a cost for any algorithm that doesn't need more than one. In addition, the CAS's is also passed to the, where is it further up here? I really need to reorganize this file, to the wrap up method, which also receives the CAS's, so the CAS's are sort of a way for the generator to both convey which CAS's have to be executed, but also as a way to communicate to the wrap up method any additional meta information. So in some sense, this shouldn't really be called CAS's, it should sort of be called like generator output, and it just so happens that we require that the generator output type is something that we can index into in order to get out the CAS descriptors. Currently, we did this with index, it's like a question of whether that's actually a good idea, but we can figure that out later on. I think what we might want here is actually just to call this, actually, no, I think what we want here is call this generator or, what's a good word for this? Like this is the, it's sort of the commit meta, right? Like this is the information that the generator produces that informs how to commit and what information to convey to wrap up. So maybe it's like commit sequence, commit doesn't have to be a sequence. Commit state, maybe really just as commit state. That seems like at least a less confusing name than CAS' slash cases. So we get this commit state. Descriptor is also pretty good. Staged is not bad. Actually, commit descriptor isn't bad. Commit descriptor is good, like that. So for the CAS, I mean, they call it CAS list. What we've really done here is that the commit descriptor is a type that has to implement CAS descriptors, which is part of what makes it a list. But the commit descriptor itself, we don't actually require to be a list. But this trade bound requires that it's indexable and can give its length, which gets us pretty far. So in that case, it says iterate through the length of the list, right? Get out the ith element. Do we ever need, I guess we need I and we need each individual element. So maybe all we need here is really iterator. Like maybe the length isn't important, right? I'm thinking here, we iterate over it and we do need to get, like we only use the I for this I here. I wonder what this I here is even used for, right? Like why is that I returned? Where does it go? The failed index goes into failed CAS index. And where is failed CAS index even used? In fact, we don't even, where is our run method? So that should be in help, which does this. That's the outcome, which we stick in post CAS. And what do we use the outcome for? The outcome goes into wrap up. Does it go anywhere else? Seems like it only goes to wrap up. Wrap up gets the output of execute CASs. So what does wrap up even do? Post CASs, all right. So let's look at post CASs, which is down here. I wanna see what it does with the record, with the failed index. It seems like it doesn't do anything with that failed CAS index, which is kind of interesting. Like maybe it's not even important which one it is. I guess maybe, okay. So let's imagine that we have a lock free algorithm that does two CASs to commit. Maybe the wrap up method needs to know which one failed in order to know like what kind of recovery it has to do. So it needs to know like the index. That might be the case. But it sort of feels like it shouldn't necessarily have to be an index. But I guess let's stick with the index for now. I think then what we actually want here is that we don't care that it's indexable. I think all we care about is that it implements, I think all we care about is that it implements like into iterator. It's a little awkward, right? Because we want a reference to the commit descriptor to implement iterator. And I think you can technically do that. Like I think you can add a where bound here. But I think what we'll actually want to do here is this. And then I think we can say here where commit descriptor or self commit descriptor implement where not even that, but where for any reference that implements into iterator where the item is, that's real ugly, but self cast. All right, like that's sort of what we want to say. But I kind of don't want to express it that way. I think actually what we're going to do is not even include it there. And instead say that we're going to stick that on the impulse block down here where, oh yeah, the four syntax. So this is a higher kind of lifetime bounds. So basically I'm saying for any lifetime tick A I want a reference with tick A of a commit descriptor to implement into iterator item of that same lifetime tick A reference to a cast. Higher rank trade bounds, yeah. And yeah, I think that's good. Because I don't think we need the length, right? Like I think all we need here is for I and cast in descriptors.intuitor dot enumerate, right? And now this will be a cast that we can execute. That looks nicer, right? This also gives us a couple of things. Like it might eliminate a bounce check in the generated code. It also means that if people use an option, for example, it worked pretty nicely. I forget if tuple's, no probably not. It'd be nice if a one tuple implemented into iterator of its inner type. I feel it's like a cool implementation to add. Yeah, I like this. That looks nice. Because now we're no longer requiring the implementer to implement the index trade, which is a little weird anyway. Like it feels like all it should require is iterator anyway. So I think this is fine. All right, so back to this. So we get the cast and then we look at the state of the cast. Right, so here's one thing that we're currently missing which is currently we just say that the cast type has to be a cast descriptor and the only thing cast descriptor has to do is execute. But in reality, what should actually be in a cast is it also needs to have state. It needs to be able to do like clear bit. So it looks like there's actually some more stuff that goes in the cast than just execute. So I wonder if cast is actually a type that we want to control. Like I think cap descriptor maybe should be a struct instead. Maybe backed by a type that implements, I guess, cast. Right, so here we can say that a cast descriptor should contain, I guess, a state which we don't know what type is gonna have yet. Whatever this like bits stuff is, cast state field, right state field, modified bit set. Okay, so there's gonna be some like bit set in here too which we don't know what is yet. And it looks like there's like an enum of cast state which is gonna be either success or failure or pending, right? So there's success, there's failure, there's pending. And that looks like all of them. Okay, so state is gonna be a cast state. And it looks like we have to be able to do a compare and swap on the state field. So what we're gonna do here is actually a, I guess, wrapper U8. And then we're gonna have this be a oh, it's a little awkward. I was thinking this should be like an atomic U8. Tomic U8. But that's a little awkward because we wanna sort of say that this is really a cast state but I think it's gonna have to be an atomic U8 because otherwise we can't do like a compare and swap on it, for example. But this wrapper U8 means that we should be able to just trivially cast between a U8 and a cast state. And the bit set, this is like clear bit and modified bit set. Oh, modified bit set, I see. So we need to figure out what this like modified bit is. There are no atomic enums, no. I guess let's maybe actually read this text. The execute cast method receives as its input a list of cast descriptors to be executed. Each cast descriptor is also associated with a state field which describes the execution state of this cast. Succeeded, failed, or still pending. Great, so we got those three. Controlled execution of these critical castes requires care to ensure that each cast is executed exactly once. The success of the cast gets published even if one of the threads stop responding and an ABA problem is not created by letting several threads execute the sensitive cast instead of the single thread that was supposed to execute it in the original lock free algorithm. The ABA problem is introduced because a thread may be inactive for a while and then successfully execute a cast that had been executed before. If after its execution, the target address was restored back to its old value. I see, okay, so what they're trying to solve for here is, so we talked about the ABA problem on the previous stream but the idea is that the compare and swap is really an operation that's trying to like do a compare and swap of some other value in the system. Like it's trying to swap, let's say something that currently has the value A to something that has the value B. And imagine that we have all these threads trying to help and one of those threads just like goes to sleep for a long time and eventually that operation succeeds like A does get changed to B, the operation succeeds and the whole rest of the program keeps executing. And then at some point for some, because of some other operation that B gets turned back into an A again. And now that helping thread that was just asleep comes back and it sort of resumes trying to redo that old cast that was trying to change A to B and normally that would have failed because the value had changed from beyond A but since it's just looking at the value it's gonna now see, oh, the value is A so I'm gonna change it to B again even though that's the wrong thing to do. And so what they're trying to get at is, let's see, ideally we would have liked to ink, we would have liked to execute three instructions atomically, read the state, attempt the cast of stages pending and update the cast state. Unfortunately, since these three instructions work on two different locations, the cast is targeted rest in the descriptor state field, we cannot run this atomically without using a heavy mutual exclusion machinery that foils weight freedom and is also costly. To solve this atomicity problem, we introduced both a versioning mechanism to the fields being cast and an additional bit named modification bit to each cast field. In a practical implementation, the modified bit is on the same memory word as the version number. The modified bit will signify that a successful cast has been executed by helping thread, but possibly not yet reported. So when a cast is executed in the slow path, a successful execution will put the new value together with a modified bit set. As a result, further attempts to modify this field must fail since the expected value of any cast never has this bit set. I see, so the idea is that we're gonna have a modified bit that is like in the value that gets cast that we're gonna turn, that's always gonna be off when you do the cast. When a field has the modified bit set, it can only be modified, and this prevents the ABA because if the value is, let's see, so the cast is gonna try to modify A to B, but it's only gonna be allowed to do that if the modified bit is unset. There's something weird here. I think this is gonna come back to, I think what we're missing here is the versioning, which they talked about a little bit earlier. I think we're gonna end up having to implement a way to do versioning and the modified bit here. The modified bit will signify that a successful cast has been executed by a helping thread, but possibly not yet reported. So when a cast is executed in slow path, a successful execution will put the new value together with a modified bit set. As a result, further attempts to modify this field must fail since the expected value of any cast never has this bit set. When a field has the modified bit set, it can only be modified by a special cast primitive designed to clear the modified bit. This cast, which we refer to as clear bit cast, is the only cast that is executed without incrementing the version number. Yeah, so there's actually a combination here of a version number and a modified bit. Yeah. It only clears the modified bit and nothing more. Interesting. Okay, so what we really need here is we need to implement the versioning. That's a little awkward. So to recap the versioning, let me go back up here to find where they talk about that. Finally, so this is for the requirements for how to do the generation. We require that the cast is that the generator method outputs before fields that employ versioning. That is a counter is associated with the field to avoid the ABA problem. The version number in the expected value field of a cast that the generator method outputs cannot be greater than the version number currently stored in the target address. The version number in the expected value field of a cast generator method outputs cannot be greater than the value already stored in the target address. Right, so every cast has to end up incrementing the version number. This requirement guarantees that if the target address is modified after the generator method is complete, then the cast will fail. Okay, so we're gonna have to do a little bit of drawing here I think to explain what's going on. So let's see here. The basic idea is, so I think we drew a little bit about the ABA problem last time, but essentially, this cable is in the way, go away cable. Essentially what we have is, let's say the sum place in memory over here. This is foo, foo is sum address in memory, currently has the value A, right? And now imagine we have some thread that wants to do a cast. They wanna cast foo from A to B, right? And let's say that we have multiple threads that are all trying to do this. So we have T1 is trying to do this and T2 is also trying to do this. Cast of foo from A to B. And now imagine that T, apparently my writing is really bad. Let's say that this goes to sleep, but this one succeeds. So now this is no longer A, this is now B. And then some T3 comes along. This is what I described earlier, right? It tries to change foo from B to A. It ends up succeeding. So this now gets changed to A. This T2 then now resumes and now executes that operation and it succeeds so this becomes B again. Even though these two were sort of supposed to be a single operation and only be applied once. But we ended up executing like one to one, when really we should have just done one too, right? So because this operation here succeeded even though it was an earlier update that has already happened. And so the versioning that we're gonna be introducing here is basically to say, or what the paper is asking us to do is to instead say that any value that you want to be castable needs to not just be like, you need to have versioning embedded in the value that gets cast. So if this is A, it actually needs to be like A zero. And then the way the casus will look, see if I can actually let's do this over here. So with versioning, it's gonna look like foo is gonna be a box over here and it's gonna have the value A zero. And then let's walk through what happens with the same thing. All right, so we have T one and T two both trying to help each other. This is a casus of foo from A zero to B one. And this is the same, right? It's the exact same operation. This goes to sleep, this completes. So this is now like foo is now gonna be B one. And notice that this casus incremented the version number, right, the second thing here is the version number. And now T three comes along and it wants to do a casus of foo from B one to A two, right? And you'll notice the difference here, right? Is that now the new value here is no longer the same as the old value here because the version is being incremented again. So even if this now succeeds and then T two runs, the value that's in here is A two. And so the, if we go back here, T two, the casus of foo from A zero, zero, zero, that writing was bad to B one will now fail because A zero, right, remember, compare and swap is if this first argument is equal to the current value, then only then update. Well, this value, which is the current value is not equal to, or this, or if you prefer math this, they're not equal and therefore the compare and swap will fail. So versioning sort of solves that ABA problem. Of course, the challenge with ABA here or the challenge with versioning is that now the value that you have to compare and swap is no longer like just a value. Let's take the example here of this type that we originally wanted to cast being a pointer, right? So we're using atomic pointer. Well, atomic pointer only holds a thing that's of pointer size, right? So where would the version number go? Where does this go, right? Cause the A, the value A fills the pointer value. So where does the version number go? We can't just make it a tuple because then we couldn't do a compare and swap and it on in the first place. So there are a couple of ways to deal with this. One is by doing bit fiddling. So one observation you can make is that a pointer is something like zero X on 64 bit that is. And then it's usually like, I forget exactly what the, I think generally the high bits are ones. And then it's like two for B, A, C, zero, one, zero, zero or whatever. Some address, right? But if you know that the high bits are always ones, then you can sort of like leech off some of the high bits. Or if you know that the low bits are always zero, because for example, the value needs to be aligned, then you can use, you can like stick some secret bits in here, right? So we can take the pointer that we were supposed to use and just hear like stick in like a version number of let's say one, zero or something, right? Like, or I guess zero, two. If we wanted to store A two here where this is A, right? What I wrote out here was the value of the A pointer, then A comma two, so A versioned as two would be the same thing for all the lead digits, but then would be zero, two at the end instead. And then we just need to remember to like zero out the low bits before we actually try to de-referencing the value ever. This works. In fact, there are a decent number of libraries and tools and programs that do this kind of magic. There are a couple of downsides to it. So the first one is that you are relying on particular features of the address space. So in particular, if you use the high bits, then you're assuming that the high bits can all be set to zero or one to recover the original pointer, right? Like imagine that you're running on like an ARM platform or something where the high bits are actually like A, C, B, A, two, four or whatever, right? Then if you tried to like store your version number up here and then just like flip them all back to ones, to all Fs in order to recover the original pointer, that wouldn't be the original pointer anymore because the original pointer started with like this value instead. And so that wouldn't work. So if you use the high bits, it often ends up being like architecture dependent. If you use the low bits, you can always say that like we require your target addresses to be like 64 byte aligned or something to say basically dictate how many, how many bits at the end are gonna be zeros. But the problem is that you don't end up with too many bits to spare, right? Like let's say we require that someone use, let's say that we require that their values are 256 byte aligned. So anything that they point to has to start at an address that's aligned to 256 bytes, which basically means that the last eight bits of the address have to be zero. Well, all we then have to store the version number is is eight bits, right? Which is just 256, which means that you can't have more than 256 versions, right? Or you wrap around, so at some point you do like if we imagine continuing the sequence, right? Eventually you get to like x comma 255 and at this point you've set all the bits you can. So your only option if someone modifies to y is gonna be y comma zero. But now you're exposing yourself to the ABA problem again because imagine, I mean, you're making it less likely but imagine that like T2, like let's imagine someone changed x to a, right? At zero, because it wrapped around, so it's really 256 but it has to be zero because we only have so many bits. Then now T2 might end up succeeding in this operation even though lots of operations have happened because we wrapped around. So you really kind of need the version number space to be large enough that you're not gonna wrap. Otherwise you reintroduce the ABA problem. If you use the high bits, usually you have more bits to play with and so you're less likely to wrap around but even then like you need a decent number of bits before you can actually reliably assume that you won't wrap around. Like with 64 bits, you're just not gonna wrap around. Like it's like the total lifetime of the universe or something, like you're just not gonna wrap around but if you only have 10 bits, like that's not realistically, you're not gonna get there. Okay, so then what do we do if we don't wanna do this bit twittling? So there are a couple of options. For example, here's another option. If you have access to atomic U128 and you're on a 64 bit platform, then voila, your problem is solved. You just have your atomic pointer be an atomic U128 where the left 64, it doesn't really matter whether it's left or right, this is the pointer and the right 64, this is the version. Ta-da, okay, great. Now, let's go and see whether we can actually get atomic U128. There's no atomic U128 here. I like what's in here if we look down. Yeah, it only goes to 64. Now, I'm fairly sure that there is intrinsics. Intrinsics. If we go to intrinsics, atomic exchange we, pretty sure that this, unlike x8664 on newer CPUs, I think you can do, why does this not, I guess it's because it's in intrinsics. Let's go ahead and look at, maybe C++ specs will tell us. Yeah, a C++ reference, XJ. That's not very helpful. Atomic exchange, it doesn't actually say which types. Well, it doesn't really matter. We're not going to start using intrinsics anyway, but like clearly in the standard Rust library, we don't have access to an atomic 128 bit value. I think some CPUs support it. If you have that, that's great. You can just go with that. But for all of us simpletons who only have 64 bit processors that can do 64 bit operations atomically, what do we do? Well, there's a little trick we can play, which is the following. So what we're gonna do is we're still gonna have foo and you're not gonna like this, but it's sort of a forced reality. Now, instead of saying A, which is what we really wanna say, we're gonna do is we're gonna make foo be a pointer. And it's gonna be a pointer to a new object. And that on the heap, we're gonna heap allocate an object. And we're gonna say that it's gonna have a value, which is gonna be A. It's gonna have a version, which is gonna be zero. Okay? And then what we're gonna do is the compare and swap is gonna swap this pointer for a pointer to a newly allocated object. This is gonna be a value of B and a version of one. That's a B right there. So the swap is gonna be two, this pointer. Ew, the heap, yeah, I know, right? And then it's gonna be another heap object. I'm just gonna use V for both of these because at this point, do you know what they mean? A and two. And now the reason this works is because let's do a compare and swap to use this pointer instead, right? And now T2, so let's say this is 0x1, this is 0x2, this is 0x3. I mean, that's not actually what their pointers are gonna be, but let's say it is. And remember our good old friend T2, who's doing the old CAS. Well, it's CAS is gonna be foo from 0x1 to 0x2. But when we're in this state, right? Which is where it ends up after, this was like T1 executing, this was T3 executing. If T2 comes along down here, it's trying to compare and swap 0x1, but that's not the current value, right? These are not the same. And therefore the compare and swap fails, which is what we wanted. It doesn't happen because of a version number in the compare and swap thing, but it happens because of a version number in the heap allocation. So you might wonder, well, like, okay, John, why can't the ABA problem happen now? And you're right, it totally can. So the trick here is to make sure to basically make use of the way that we do garbage collection. So the idea here is that in order for T2 to be able to name this value, is must still have a reference to this value. Therefore, this value cannot have been deallocated. It must still be allocated. Otherwise T2 would never reference to it. Okay, so when this is allocated, it cannot have the same address as this because this one is still allocated. It hasn't been freed yet. Therefore, the two addresses must be different. Therefore, the compare and swap must fail. Therefore, you're not subject to ABA. Now, later on, when T2 no longer holds a reference to the original, like A0, right, then A0 is deallocated and that address can be used again. But that's okay because there's no longer anyone who has that address and is trying to do a compare and swap because if they did, we wouldn't deallocate A0. So this works, the big downside, of course, is we have to do an extra allocation for any compare and swap value. And we have to do this like, I mean, we're gonna have to deal with memory reclamation at some point and that also now has to sort of feed down into the compare and swap operations. But that's sort of what we get to. What's the point of having a version? So that's another good question. Do we even need the versioning at this point or is the versioning sort of embedded in the fact that we're doing allocations? You know, it's a good question. We might actually not need the version. Why would we not need the version? I think we still need the version to do, because we can do a sort of, we can do a proactive check for whether the cast is going to fail if we store the version. Be right, we probably don't need the version, but it's also not very costly for us to keep the version at this point. Does that mean you need ref counting? No, not necessarily. We could reference count these, but that wasn't my plan. Actually, maybe these are a good candidate for reference counting. Is that true? No, this is gonna be using the same memory reclamation scheme that we ended up using for the rest of the data structure, which is probably gonna be hazard pointers, which I haven't talked too much about. We'll talk about those in a separate stream where we implement hazard pointers. But the reason you don't wanna use reference counting here, and I talked a little bit about this in the previous video, is that with reference counting, you still have a race condition between when you read the value, when you read the pointer to the reference counted value out of the atomic, and when you try to increment the reference count for the copy you just got. There's a race condition there, and hazard pointers deal with that race condition. Can you use an arena allocator? Yeah, I mean, you can use whatever allocator you want here. And it should work just fine. I don't know if you really need an arena allocator here, but it might be reasonable to associate an arena allocator with the whole data structure, especially because all of these allocations are fixed size. Virtually all modern x8664 processors support 16 byte compare and exchange. Yeah. That's what I thought too. Let's add it to the REST standard library. This is all general to the simulation, but specifically the thing running under it. So you could also have a data structure which uses the pointer manipulative version or whatever. Yeah, I'm imagining that we could, we could totally make this generic so that if someone wanted to do the bit fiddling they could at the same time, I kind of don't want to do that because people are going to get it wrong. They're not going to keep into account the fact that you will still potentially encounter the ABA problem. I would rather just give a correct solution and then have that be a little bit less efficient for now. And then what we could do is always do the, like swap out the implementation behind the scenes when we get something like 16 byte or 128 bit compare and swap. And one additional benefit actually of doing the scheme this way is, let's say that we're willing to bite the bullet and just do it this way. Well now we have a great place to stuff other stuff. Like for example, this modified bit. Previously we had to figure out like how do we stuff that into the pointer and stuff? Well now we can just have a bool right here in the struct, this is just struct we're just keep allocating it. So now we can add all this like additional information that we want just straight in that cast descriptor which is pretty nice. Sweet. Okay, so now that we have an idea of what the scheme is gonna look like, let's go back to the code. So if we think about this now for a second, the commit descriptor is going to be a list of casses and interesting, interesting, interesting. So rather than have this state be an atomic U8, maybe we do still want that. But I'm thinking we basically end up doing like RCU here again of like we're gonna have a, we're still gonna have a cast descriptor. It's like a weird problem here where we sort of want, we want the user's data structure to wrap a data structure of R-type. Right, so we're gonna require that the thing that the user does a cast against, the value that they do a cast against has to embed the version and stuff, right? So if we go back to this for a bit, we require that the casses that the generator method outputs be four fields that employ versioning, right? So this, the cast descriptor, trying to figure out what the right way to model this is. So each descriptor they produce is gonna be, I guess this sort of has to be a trait. It's gonna be a little bit awkward. I think you'll see why in a second. But basically, that's gonna be a type that they provide. I sort of don't want it to be something they provide. Because okay, let me try to articulate my thoughts here. What I want is for the user to not have to think about this. I want them to be able to just think in terms of their values, right? Like the values that their data structure is going to use rather than have to think about this like versioning and additional bits and stuff, right? So I sort of want their commit descriptors to be of a type that I control that does this like heap wrapping and stuff and that they can just express it in terms of what value they end up replacing out. So I think what I'm gonna do is say that they have to provide something that implements cast descriptor. And our cast descriptor is really just going to be an atomic pointer to also a type that we control, which is gonna be a cast descriptor, I guess this is gonna be a cast descriptor box to follow the same nomenclature, the same terminology that we used for the operational records earlier, right? So this is gonna hold an atomic pointer to a cast descriptor of T. And a cast descriptor of T, sure, let's have it still be cast, is gonna contain the state. It's gonna contain the version. And I think we want the version to specifically be a U64. We don't want it to be U32 even on a 32 bit platform because wrapping around would still be really bad. And then we're probably gonna end up with this like modified bit, right? And my guess is that these may have to be atomic as well and I'll get into why that is in a second. And then what we'll say is we're also going to store, I'm trying to think ahead here, right? So what do they have to be able to do with the cast? I think their cast, it's almost like the cast just has to tell us what the new value is. Like they're not actually gonna do the cast, we're gonna do the cast by swapping out this like box instead. Common in the paper, it should all fit in a word. I don't think it said that, did it? Is, I'm just trying to find the, to the physical cast named modification but to each cast field. It's on the same memory, oh, same memory word as the version number. I don't think that's important. That's more of an optimization that they did. The reason I want the box type is because I don't want to expose the fact that we're using an atomic pointer. That's not important to the consumer. I'm trying to think ahead here of like the, the implementing algorithm might also want additional, like you could totally imagine that they also have some additional bookkeeping. They wanna like stick in this descriptor. So I almost sort of wanna say that like there's a, there's like a meta, which is T meta so that they have a way to stick additional information in here, right? But then there also has to be what is the actual cast going to be? Like I think what's actually gonna happen is we're gonna even, okay, so imagine that the user actually just wants to do a comparison swap of like a Boolean. It's like there's something fuzzing in my head here of like something is not quite right. Let me look at the actual implementation they give. So they have to be able to get the state separately, clear the bits separately. That's all fine. And I sort of wanna see what the execute CAS is. Like the execute CAS is the thing that has to take into account the versioning. So it's almost like the execute CAS that they do has to be over this type itself. It would actually help to see how a given data searcher uses this. Actually, let's go and look at the, let's go and look at what the real implementation does. So, yeah, so as I mentioned, I managed to reach out to the original author, infrastructure to get the original source code. And let's see what we get here. Is there a diff between, this is one thing you'll see with like research code bases is that there's a lot of just copies of files for various iterations of the same thing. Okay, so they have for a CAS descriptor, they just have it be a trait. That's interesting. That just does execute CAS, modify. Okay, so now I wanna see an implementation of this trait. I cast desk. That's fine. Oh, that's probably in like one of the implementations. So let's go ahead and look at, I guess, skip list. Er, for me to have, doesn't really matter. So FS cast desk, holder. Oh, this is a linked list and the compare and swap successor compare and set. I see what's going on. I see what's going on. Okay, let me try to find the right way to articulate this. So holder here is a reference to like the current node in a linked list or something. You see FR node here is like the actual type for any given node. And you see that basically all of these methods end up forwarding into the suck, so the next pointer, if you will, of some whatever this node type is. And you see it has all of these implementations. And you see compare and set here on that takes like old version, old ref, new ref, old mark, new mark, old flag, new flag. So a bunch of like arguments to make, to differentiate which version essentially we're talking about, which makes me think that we're actually requiring that the data structure is gonna use our provided type, the one that does the heap in direction, anywhere where it normally would use just a pointer type so that it gets to embed this versioning. So my guess is that if we go look at FR node, FR node, yeah, so you see the successor, the next thing has a versioned tripled mark reference. Ah, and my guess is that's the heap in direction thing. So version triple mark reference is a atomic reference to a reference quintet. And the reference quintet has the actual reference, a mark bit, a flag bit, a help bit and a version. Yeah, so you see this is the thing that gets stored on the heap. They just allocate a new one and what does the compare and swap look like for this? The compare and set is get the current value, check that all are the same and then compare and set the, oh, current, I see. And then, so it's basically doing RCU. It's doing RCU on the type itself. Okay, so the, we're not actually, when you want to do a compare and swap of one of these, you're not actually doing a compare and swap, you're, I mean, you kind of are, but you're really doing a RCU of, like you allocate a new of these heap objects and then you do a compare and swap between them, which is exactly what we explained in the diagram. But this helps. Okay, so the question is, how do we translate this into Rust? Sorry, actually, let me stop there and see, like we've been talking a lot around this topic. Let's do some Q and A about this to see that we all have like a shared understanding of roughly what's going on. And if we don't, which is pretty reasonable, try to talk through what that is. Let's see. Surely more efficient to leak the versioning to the consumer. So it shouldn't be, because in general, like, I mean, it could be if they like, happen to know that they will never go through more than say 256 iterations before they hit ABA, but in general, they just won't know and we'll get it wrong. Seems better to just like provide a safe implementation. Yeah, this source code, I'm working with the author to find a good way to like publish it beyond just me showing it on stream. Part of it is like, it has a bunch of other code that's not quite related. So it needs to be tidied up a little before it can be posted anywhere. My plan is basically to post this alongside the code when I put that in like a GitHub repo. All right, but let's sort of look at whether the architecture we're sort of planning here sort of makes sense. The heap allocation makes sense. It makes sense why we need to have this extra indirection and the sort of latest observation that really what we're gonna do is have, we're gonna implement basically a compare and swap for the user. Like this isn't gonna be a T-cast, it's just gonna be the value that they want to compare and swap. And then we're gonna do RCU on this atomic pointer by allocating new instances of this. Why can't wrap up prevent future repeats of the CAS? Can you say more about what you mean? Wrap up is called concurrently for multiple different threads and will be called regardless of whether a given CAS succeeded or failed. It's just always called at the end. Part of the challenge here, right, is that if the CAS fails, we may have to do a bunch of cleanup, which is what wrap up is for. And if you have multiple compare and swaps, like I think, I don't know whether any of the data searchers in the paper require this, but the reason they constructed this way is because there are lock-free data searchers that have multiple commit points. And so you have to deal with the fact that like the first CAS succeeded, but the second CAS failed. And then when people help, you need to like deal with that situation. And part of that might be in wrap up, you go, we're now in a weird state, we need to redo generation as well in order to make progress. Part of the reason I'm stalking him for two NAs is because I don't believe that no one has questions. So like, if you have a question, you should ask it because I'm pretty sure other people will have similar questions. Even if it's just, I don't know what's going on, like explain it again and I will try. The basic thought here is that I don't think, in fact, I think we may not even need this. The basic idea here is that we're not gonna do comparison swaps directly on a user's type. Instead, we're gonna require that the user is using our atomic type basically. So really this should be called like atomic. And we're gonna require that the user is using our implementation of atomic. I guess we could make this a trait, right? We could say versioned atomic, right? And we're gonna require, and this is sort of the way the Java code was going, right? We're gonna require that they implement like clear bit and what were the other ones? Like modified bit and version. And execute. And then we could implement versioned atomic for atomic T, right? We could do that and then we just require that the user uses some type that implements versioned atomic and then they could choose to implement their own scheme. But I'm sort of like, is that better? I'm not entirely sure. It sort of feels like, when would you ever use a different implementation? I feel like you're likely to use one that's just wrong. I mean, maybe it would be nice for testing. It's a good question. Maybe we should go this way. If just because it's sort of easier to present this way. Can you post the rest so far to the playground? Yes, I can. Share. What if we take ownership of the user data and store it as heap allocated? That's basically what we're doing here, right? Is that we're telling the user, like if you want some atomic reference type, you have to use our atomic reference type so that we can add versioning and this modified bit and all the other bits, like stuff that we need. And at that point, because we're controlling the wrapping structure, you can use any T. Like one benefit here is that you don't need to use one of the standard atomic types. Like it doesn't have to be like one of the things that the CPU can do atomic operations on because we're basically going to do RCU for you. So it can be any T here. One thing that's a little awkward is we're currently requiring that they use the same T which is kind of silly. That might be one reason why we want versioned atomic here is because I think maybe we want maybe it'd be nice if the commit descriptors were actually like DIN versioned atomic so that a data searcher could admit, could admit like, let's say that you want to do, you have like an atomic, I don't know, root pointer and a B tree, but you also have an atomic, like you want to, your commit operations are, do a CAS on the root pointer and do a CAS on a value, right? Those aren't the same type, but we still want you to be able to express that those are your two commit points. And so I do think actually we do need this trait because we need this to be a DIN atomic versioned atomic so that you can have different values for it. I think that's actually going to be necessary. And at that point, this does need to be a trait. And then if we, I guess at this point, we need to look at infrastructure eye cast desk and then we basically just need these operations. Oops, which is going to be FN execute cast. To bool, this is going to be execute. Let's take a ref self. Modified bit set, why does clear bit returnable? So this is going to take a self. This has to return, I guess, a CAS state and set state also has to be able to run atomically, I think, right? And then we can provide sort of a standard implementation of the form of our, I guess CAS descriptors may be a bad name here. This is really a CAS by RCU. It's maybe a better name and it doesn't have to be public. And then we can implement versioned atomic for our atomic type where execute is going to be, I guess here we can actually look at what this one did. So actually, but there's like more information around this, which is what makes me hesitant that maybe we actually want to, maybe our wrapper doesn't even make sense in the first place, let's see. Yeah, excuse me here, execute CAS is executing it specifically for a particular, this holds context as well. So maybe what we need to do is we're going to expect that the user uses our atomic type and that they sort of, oh, I see what's going on. The CAS descriptor they give us is something that wraps our atomic with additional information that that atomic might need in order to execute. So our atomic is going to implement, I guess, it needs to implement all of these operations. Is that even true? Yeah, it has to implement, really what we're implementing is this like triple marked reference. And it has methods like compare and swap. So we're going to say compare and I guess set is what they're using here. And it takes expected, which I'm guessing is a reference to a T and a new, which is going to be a T. It's like something weird here. I don't know what this mark is, I'm guessing that's something we're going to get to later. Where does it actually check the version is what I want to know. So it checks all this like associated metadata, checks with the references equal to the current one. But where does it actually, all right, all it has to do is increment the version. I see, so really without any additional information, this is just if, so this is self.zero.load. And this is where we're going to get into this business of the unsafe load again. And then if this dot, I guess this should maybe be value or referent or something. If this load value equals expected, then like self.zero.store, then we're going to replace the whole thing. This is the RCU part. My RCU with some updated stuff down here. Am I being silly ordering sex CST, then return true, otherwise return false. And I guess I'm being kind of silly because we just replaced all these methods with ones with different names. Oh, I see, no, that's what I did. Right, so I actually think Atomic shouldn't implement this trait. Atomic should provide methods that the user then uses to implement versioned Atomic for their CAS descriptors. So I'm talking a little bit around this because I'm like grappling with it in my head too. But I think what we want here is the user is going to implement versioned Atomic for or maybe versioned CAS is a better name for this. They're going to implement versioned CAS for every CAS descriptor they have. Right, so remember a CAS descriptor isn't just an Atomic. It's like a descriptor of what CAS should you do, which includes the current value, the expected value, the expected value, the new value, and any like associated metadata that it might need to actually execute the CAS. And ultimately we require that they implement versioned CAS. That is they have to provide these methods and the way they're going to do that is their CAS descriptor will probably internally contain one of our provided Atomic types and just call into its methods, which is what we provide down here. But Atomic itself doesn't implement like CAS descriptor. Right, it just implements like a versioned CAS. So like really maybe this is like versioned CAS descriptor. But really it's, well, versioned, it's like a versioned CAS, it's like a prefilled versioned CAS or a, no, I think versioned CAS is right. This is a ready to execute compare and swap that is itself versioned. And what it's probably going to use internally is one of our Atomic types which provides you with the methods that you will need in order to implement versioned CAS. I think that's the way this sort of thing is gonna go around. Okay, so in our case, what do we want this to provide? I guess we can continue to take inspiration here from the version triple mark reference from Java because presumably this is all that you really need. So you can obviously make a new one. That sounds like a pubfn new, certainly seems like something that we need. And a new, what does new take in this case? It takes an initial ref, which is a T. We don't know what mark flag and help is. My guess is like this, and this is presumably why it's called a versioned triple mark reference, is because it has three of these additional meta information. So really it sounds like what we want here is our Atomic should hold like T and M maybe, where M is like meta that is not a part of the value, but rather is something that's considered part of the current state and something that should be considered for whether the compare and swap should happen. So I think that's actually what we're going for here is this is a meta, which is M. And you can think of like these are conditions for doing the swap, which means that state probably shouldn't be in there actually is my guess. I don't think this talks about state at all and it probably doesn't talk about the modified bit at all either. It just has version and meta and then this is the value that will actually be cast. I think that's what we want. And then the user's cast descriptor is going to implement the modified bit and the state, but that's not something that we have to provide in our Atomic type. We probably here then are going to do something like acquire that M implements partial Eek and probably also Eek. And then what we'll do is, right? So you see it has these like is marked flag and is help, which are really just accessors for these like three booleans, but really what we're saying is you can say, you can include whatever meta information you want. So maybe we have a method that's like, the equivalent for us is meta, which you takes itself and gives you an M back and that's just self.meta. And for a compare and set, it's like, if this dot value is equals expected and this dot meta equals, so this is going to also take a meta, not meet. Actually, yeah, so you see the compare and set takes in both the old meta and the new meta. So it takes in the expected, the expected sort of value and the expected meta and it takes in a new and new meta and it checks that the current value matches the expected value and the current meta matches the expected meta. And if that is the case, then it stores the updated value, right? So it checks that the bits are all the same. It checks that the current value is still the same. I'm not quite sure why they're separated because it can just be a tuple. Like why does it need to be separated? Not entirely clear yet. It might be that we just make this T and that's good enough. And oh, I see. And then it's like, if the new values are equal to the old values then we don't need to do the compare and set. So this is like if this dot, if I guess expected equals new and expected meta equals new meta or to invert, if that's not the case, then we do actually have to do the store. And that's gonna be not a store, but a compare and set. So we're gonna have to actually read out the current value so that we know what to, like there's like a race condition here, right? And the way that I wrote this so far, if we have this be a store, right? We read out the current state of the CAS, right? The current state of the thing in the heap, the meta descriptor, and then we check it and then we just store the updated one. But while we're checking, someone else could go in and swap it out behind us. So we need to make this be a compare and swap instead. And we need to make sure that we only do the store if this is still the same, if the thing in the heap is still the same one. So we're gonna do that by doing this. So this is gonna be, I guess, this ref or pointer rather. Then we're gonna do compare and swap this pointer with the updated one. And this is like the RCU bit. And the version is gonna be, I guess, this.version plus one. Meta is gonna be new meta and value is gonna be new. Whew. Why can't user directly just use our atomic and not necessarily have to implement version CAS? So the reason we want the user to implement version CAS, why they can't use atomic directly is because there's additional information in a CAS descriptor, right? Like a CAS descriptor also includes the address of the thing to do the CAS on, what the expected value should be, what the new value should be. It's a descriptor of a CAS that you haven't run yet. So an atomic is really just a CASable thing, right? Like in some sense, like atomic implements CASable, but what we want is something that implements version CAS, like a thing that describes the CAS that we haven't done yet. So the user's type is a CAS descriptor that's gonna ultimately invoke a CAS on an atomic. So this is like a, it's like a version, a prepared version CAS or something, right? But it's not actually the result of a CAS. It's not like atomic here isn't a CAS, it's a CASable type. And that's why we need that intermediate layer, which is the actual descriptor. Does that make sense? Like why we need the distinction? And this is gonna be clear with some documentation, right? But I think once we actually write an implementation, you'll also see why this distinction makes sense, which is basically all the stuff that needs to go in a descriptor has to go somewhere and it can't go in atomic itself. Instead of relying on the user to invoke methods of atomic, would it be better to make them implement a function that returns an atomic? No, because if this return, if this trait had a function that returned an atomic, then we wouldn't know what arguments to invoke that atomic with is the problem, right? Again, an atomic is just something that is CASable. And then the descriptor contains basically all the arguments. And because the argument is gonna necessarily depend on whatever the data structure wanted to do, we don't have them, like we as in the executing library don't have them. They're a part of the descriptor, which is part of the user's type. So I think ultimately they have to call into the atomic. And this also means that if they have some other way for them to represent the same things, then that's great. All good for them. Okay, so new takes an initial T and I guess an initial meta. I still feel like meta and T can probably just be combined. That looks like all it takes. And they presumably start the version out at zero. Yeah, it starts out at zero. So that makes a lot of sense. So this is gonna just say self and it's gonna have, I guess, a version, which is gonna be zero, a meta, which is gonna be meta and a value. This is gonna be initial. What do you mean no such field? Right. Self atomic pointer, new box into raw box. New. CAS by RCU of this. Foolish me thinking that that would work. Yeah, and this is probably needs box new. Great. So creating a new atomic is pretty straightforward. You give the initial value, which is sort of the thing that you will be doing compare and swaps on. So this might be a pointer type. It might be a number, like it doesn't really matter. And the meta is the other information you need to compare. Okay, and what do we need to implement for this? We need to implement clone, which doesn't clone version. Okay, we'll figure out whether clone is necessary later. There's a get reference, which I think we don't wanna implement for now. Oh, I see. So the tricky part here, right, is that I don't wanna use the word reference because it implies that the only type you can use this with are references, which isn't true. So it's like value, right, which takes self and returns t. And the challenge here is that this is really sort of what this used to be, right? Like this is an unsafe dereference of this dot value. And so this is definitely another place where we're gonna need that whatever that guarding for memory reclamation ends up being, that has to be a part of this too. Because otherwise, if you had an atomic, like what happens if the target of the atomic has been deallocated? Um, maybe the atomic actually just owns this value. That might be the answer here. I know that specifically the problem here is, imagine someone calls value at the same time as some other thread calls compare and set, right? Then the compare and set might remove the thing that the value is referencing. And so if that gets dropped, then this reference would be invalid. So this is why we need this like concurrent memory reclamation scheme, which we don't currently have. And why all of these unsafes are currently just like, all of the unsafes here are currently safe because we don't deallocate ever. But the moment we start deallocating these unsafes will no longer be fine. So like, yeah, we could totally say like safety, this is safe because we never deallocate, right? Which like is not a good safety guarantee. Like that's not where we want to go. We want to do better than that, but we don't currently have the mechanism to do so. And like the same here. And I guess if we wanted to, and we probably should do this, we should have like unsafe FN get which returns you a reference, actually it can be safe. And it returns you a reference to the RCAS by RCU TM. But at that point it's like unclear whether it really buys you that much. But it does mean that this can now be self-dodged. Right, so the comparison set is really just gonna compare the old value and then do a compare and swap of the pointer, right? So this is implementing the scheme we drew earlier of the heap allocation being sort of the differentiating factor here. What else does it provide? So these are all just to get the value and the meta. There's a get that gives you the V. Gives you the V and the mark in an array. Get with all bits. See, this is just weird. I feel like realistically what we would do here is like, maybe we require that M is copy. That might be a good way to do this. But like, I don't know why this meta isn't just a tuple, is help inversion. I see, so there's some like, I see, so really what's going on here is like, if the user has an atomic, one thing they might want to do is like, look at all of the values in the heap box at the same point in time. And so what we're gonna do here, ah, here's what we're gonna do. We're gonna go the more of a rusty way and say with current. Or maybe just with, it takes an F, takes a self and a F. We can be helpful and say it also takes an R. It runs R. Where F is an F in once from a T and an M, I really feel like this should just be T. Like fairly tempted to just make this T, but that's fine. And what this is gonna do is it's going to do, this pointer is self.get. And then we're gonna do this bit. And then it's going to invoke F with this.value and this.meta, right? The important bit is that the reason why it has all of these, like get with all bits, get is help. Like the reason it has all of these is because it needs to make sure that if it looks at, like the V it gets back is at the same instant as the mark it gets back. This is really the Java way of saying, return two values, right? Really this is like V, Boolean, but you can't write that in Java, presumably, right? So this is a way to get both of them at the same time. And the way we're gonna do that is to say, you just give us a closure and we give you the references at the same time. Like this could alternatively just return a reference to a T and a reference to the M. It's sort of equivalent. Do we have where the closure seems kind of nice? Like maybe this should just be a, like the alternative to width, right? Is, let me do get two for now, right? And it just returns the T, the M and the version at some single point in time. Maybe that's just nicer. The reason the width is nice is because it makes it a little bit easier for us in the future to do like guarding of the value, right? Like once we want to do memory deallocation, once you hand out references, you need to deal with keeping track of those references so that you don't deallocate anything while the references are alive. If you do with a closure, it's a lot easier because the moment the closure returns, you know that they're no longer holding onto the reference. At least if you require that like R is static or something, right? So I think we're gonna stick with the closure way. And that way I think these accessors aren't even gonna be relevant anymore. Get with all bits is help inversion. So is help inversion, for example here, right? Is get and then check whether version is equal to the value and then whether the help bit is set. And you could implement that using our rust width, right? By doing something like, if I'm through, I just wanna to demonstrate the rust version of this self.width. And I guess actually this should also be given the version. It's important. You would write it as, I don't care what the value is. I care that the, let's say that I've defined that by meta is these three booleans. Right, so this, what were they called? Like mark flag help. So let's say that's a struct that holds three bools, right? So MFH for short and version. This was like is, what's it called? Is help inversion. And it takes a V or a version, I guess, which is U size. And this is V. And then this is just like V equals version and MFH dot help, right? That is the way you would write the equivalent of this method using just the width. And it has the same guarantees about both of these reads accessing the same instant in time by virtue of this just doing one read of the underlying atomic pointer. Great. So because we have width, I don't think we need all of these like helper methods. The helper methods are lame. And we don't want them. Do we want to have meta behind a trait? How will users know that they're required to have version help and mark? We're not going to require that is the thing. Like whether the user needs like help and mark and flag and stuff, I think is going to depend on the data structure. It's not going to depend on, like we're not going to depend on them in our simulator, in our executor. This is why there's like version triple mark reference is the thing that existed inside of a particular data structure implementation, not in the implementation of the simulator itself. Because for example, like this might be needed for the linked list, but it might not be needed for the B tree. The B tree might need, who knows, like other, in fact, we could look at this, right? So, Fomachev here, I think is a linked list and it includes this version triple mark reference that has mark, flag and help. If we go into BST, I guess, what does it do for its BST CAS desk? Okay, it also uses a version triple mark reference. What about the skip list? Where do we have? Who knows? Skip combined two. Implements I list and C list. That doesn't seem right. Ah, skip CAS desk. It uses a version double mark reference. Where is version double? Ah, so in source, there's a version double mark reference and that has only mark and help. So the skip list doesn't need the flag, but it needs mark and help, right? So I think this is gonna come down to whatever additional meta information any given data structure might need. If we look through this, my guess is that this looks exactly like the other one, except the helper methods are a little different. Yeah, like it has get with both bits instead of get with all bits. It's helping version is still there. It has weak compare and set and compare and set. That's interesting. Yeah, set, but ultimately the methods are sort of the same. Why is there a different compare and set? There are multiple compare and sets. So that one takes all the arguments. That one also takes a version. Interesting. All right, we'll have to figure out which ones of these are actually needed. But I think like, oh yeah, so this one has the same. So it has compare and set. It has just a straight up set. And it has a compare and set. That takes a version as well. Get state. So this can be implemented with our with. Get with both bits can also be implemented with our with our with. Okay, so I think what we want to provide, I guess is probably also, I don't think we need version because that can also be expressed with just with. So there's sort of a compare and set. And then it looks like they also want a compare and set with version and just a straight up set method. So I guess we can implement set. So set doesn't take expected. It just, I guess sets the value. If new reference, I see, so the set really just does 15, just does if this dot value not equals new and this dot meta not equals new meta, then store. Or I guess it still has to be a compare and set actually. Oh, it is not. It just does a store. That's interesting. So set really is a just set. But all of them do increment the version, which I guess is the important part. And then this one that increments the version as well, I guess should be, well, it's a little silly. I can some sense it should be like, I guess version can be an option bool, right? This is where function overloading in Java is a little nice. Where we want to say here, this just also includes the current version as this thing to compare. So if let some v equals version, and if v is not equal to this dot version, then we return false, right? So this is like an additional guard if version is specified. Why isn't version part of meta? Version isn't part of meta because we need to be able to directly manipulate it and touch it. The stuff that's inside of meta is stuff that we don't care about. Like as in the atomic, the implementation of atomic doesn't need to access any of this stuff inside of meta. It just cares about meta as a whole. But looking at this like, I still don't think there's a reason to have meta be separate, you know? I think that it can just be a T and then we just ask the user to have a more structured value in there. I'm just gonna get rid of the meta, man. And then we can, it's certainly then very clear what cast by RCU does, right? So this no longer has M, but we are gonna require the T implements partial, eek and eek because otherwise we can't check whether the expected value is the same. This is gonna return just this. This is just gonna be given a reference, which I feel like is just gonna be nicer. This now just takes a, like a value, which seems nicer. This just stores the new value. This takes expected and value and just does that. If expected, not equal value. No need to check the meta. This is just nice. That seems much nicer. Oh, did I also remove T? That was silly. I think that's gonna be, I think that's gonna be good enough. And then I think the implementation version casts. Like now, in fact, we can even provide our own like, what was it called? Like triple, versioned tripled mark reference. So a versioned triple mark reference. Just to like see that we can do it using what we have so far, right? It's gonna have a reference, which is gonna be a T. It's gonna have a mark, which is gonna be a bool. It's gonna have a flag, which is gonna be a bool. Gonna have a help, which is gonna be a bool. And then it's gonna have, the reference actually is gonna be an atomic of T. Oh, actually, it's not even gonna be that. It's gonna be a triple marked. This is actually just gonna be an atomic of, it's just gonna hold an atomic triple mark T, which is gonna hold these things, right? And then implement, if we now wanted to implement this, just to sort of have the analog to the Java code. The question is, can we write all of the methods that are in the Java code? And I'm pretty sure we can. So, oops. So, this is just pubfn new, which takes a initial, which is a T mark. I guess it's obvious that they're all initial because this is a constructor. So, it takes a flag, which is a bool. And it takes a help, which is a bool. And it returns a self. And that's really just gonna be a self, atomic new value. And then this is gonna be a triple marked. So, this is where I mean that like, there's no reason to differentiate between the meta and the type because we can just use a new type, right? I guess this can just be value to make this a little nicer. So, value mark flag help. So, that's easy enough. Let's ignore clone for now because it seems complicated or seems annoying rather. So, what can we do about these? Pubfn, this is supposed to return a reference to the V. So, get reference self. That's just gonna be self.zero.with v. And we don't care about the version. And we're just gonna return a v.value is marked. So, you see the pattern here, right? Like, all of these can be expressed through the use of with. I guess flag is gonna return flag is help. It's gonna return help. So, I think this is a pretty good helper type, no pun intended. And all of these we could also trivially implement with help. Great. I guess we can look at these just to make sure we can. But yeah, it's the same thing. This just looks at the mark bit and the flag bit. This could also just be a closure. So, this like version triple mark reference is trivial to implement in terms of our new atomic type which means that our atomic type seems pretty general. And then we might actually want this implementation, right? Because it seems like that's a thing that many of these data structures use. So, maybe it's worthwhile to include this. I feel like ultimately like all you really need is this. So, I'm gonna go ahead and just remove this. All right. So, we have this atomic type of ours. And we have the versioned CAS. And now we don't need this trade anymore. Instead, what we're gonna require is that this implements versioned CAS. So, the commit descriptor we get needs to be able to produce a sequence of versioned CAS operations descriptors, like things that are versioned CASs that have yet to be executed. Maybe pending versioned CAS is a good name. Hi, Cat. Hello. All right. Let the cat say hi. Cat, do you wanna say hi? Oh, the light turned on. She had to leave. She's out of here. All right, so back to this. So, now I think we're in a pretty good shape to just like implement this very straightforwardly. So, we loop over them. If the CAS.getState, which I guess can just be state. If that, if let, oh my, this is just if it's success, if it's failure and otherwise. So, this is a match, popularly known as a match, match state. So, CAS.state, ah, success. So, if we get a CAS.state success, then what do we wanna do? Then we would do CAS.clearBit and continue. That's fine, that's implied. CAS.stateFailure, I'm gonna guess it's just returned on. I, yep, and CAS.statePending means that it still needs to be executed. So, if it's pending, that's the remainder here, then we're gonna CAS.execute, and it looks like we don't do an error, which is kind of interesting. We just do CAS.execute, and then if CAS.hasModifiedBit, then CAS.CasStateField, that's interesting, CAS.stateField, I don't remember us adding that as a method, is that a method we missed? There's just setState, which then sounds like it should be something else, right? So, if we go back to infrastructure here, I CAS.desk, see this one just has getState and setState, so it seems like something changed between the paper and the code. That's interesting. Is there even a CAS.stateField? There isn't. All right, so let's look at the equivalent of execute CASes. So line 53. All right, so clearBit, this is the same. Yeah, so see this one, it's actually different than the paper. The paper says, if modifiedBit set, then do CAS, and the code, it's if modifiedBit set, then just setState. In fact, this is a CAS, and if it turned to success, then clear. This is a set to success and clear. That's interesting. I wonder which one is right. That's problematic. These seem different. All right, we're gonna have to think, and while we're thinking, we're gonna say hi to the cat, right? All right. Hi. Do you wanna say hi? Do you wanna say hi? No, they're up there. They're up there. Hi. Where are you going? Are you leaving? Meow. Do you wanna smell the microphone? Does it smell weird? Do you wanna go say hi all the way up to the camera? All right, I'll let you go, I'll let you go. You're the one who came here to meow at me. You came to meow at me, and now you just wanna leave? All right. All right, back to the code. Where were we? All right, let's see. You can see my screen again, right? I think I switched it back. The difference between these is that in one, we compare and swap and then see whether we succeeded. In the other one, we just blindly overwrite it with success and then clear the bit. You know, that's very interesting. Um, I'm gonna go with what the code does is probably right because the code was presumably run, although it is a little worrying. But all right, let's go with what the code does, which is set state, cast state success, paper and code diverge here. Cast state should definitely not take reference. And then cast dot clear bit, and then if cast dot get, cast dot state not equal to cast state failure, oops, not equal to success, then cast set state. This like also feels weird and return. See here, it doesn't even return the index. Remember how we talked about in the paper? Like it like returns the index of the thing, but, but the code version doesn't actually return the index. This is real weird. I mean, it's easy enough for us to just return I here, but like, why doesn't it, it's like something real fishy going on here. Clone, copy, partial eek, eek debug. Yeah, it's real weird that this doesn't return anything anymore. Maybe it's because now that's handled by the caller or something or maybe post castes like walks the list. Nope. Well, I guess the paper had more things than the code does. So maybe, maybe what happened was the paper gave a more general description and then in the cost of implementation, they were like, this seems like it wasn't needed. That's a little disturbing. There is no get. No, no, this is just a source dump tar ball. The set state internally compare and swap. That's a good question. Let's take a look at, I guess, the foam each of FR cast desk 45. Nope. Set state is just, it's not even a, it just changes the value in the descriptor, which in and of itself is bizarre. Because, because how can it even do that? Like, there's concurrent access to these descriptors. So, how can it mutate this in place anyway? Yeah, like it's something, there's something weird. Like this set state doesn't seem like it'll be possible to implement. Cause in Rust, like this is gonna take an immutable reference to self, right? Like if we look back at where this gets called from, which is in base combiner, right? Like CAS here is just a, is just a shared list of descriptors. So you can't call set state on it. I guess maybe this ends up working in Java because volatile, but it's not volatile. Yeah, this seems super sketchy. I mean, maybe it really just is like, it's just an atomic like U8 that it's fine to just store to. Like maybe what we'll end up doing is like, when we implement the descriptor, this will just be a like an atomic U8 internally. But like this certainly seems kind of weird. Yeah, I'm not sure where that that's going. Okay, that's fine. So what does the paper say? It says if the modified bit is set, then set it to success. And if it's now success, then clear the bit. And what we did was always set it to success, regardless of whether it's been changed since we set it to pending, and then always clear the bit regardless of whether we succeeded at setting it to success. So I can some sense this simplified, this change, I don't want to say simplification, but this change is like accurate because like we set it to success and then we clear the bit because it is success. And the old one, it would conditionally be success. So this conditional was required. I don't see why it's okay to do this unconditionally, but I'm more inclined to trust the code that was actually run. This might be something I can check with the author about too. And then the reason, of course, we have to check here is in case some, like for example, if the modified bit is not set, then some other thread might still have put us in the success state or in a non-success state. This is definitely a weird, there's something weird here. Why does the capital I, that's also a big problem. Like this should really not be capitalized because this is the lower case I, which is the variable name here, I assume. Okay, that's fine. That is at least now. That now at least matches what the code does. So now I guess let's go down here. What is this complaining about? Result commit descriptor and contention. Oh right, we changed generator to return like an error on contention. Oh, so that people could use question mark. That's right. So, that makes it a little bit more awkward to do here though. We can do this really nicely with a try block actually here if we did like try, but try blocks aren't stable. But basically we sort of want like a question mark here to break rather than return. I mean, we can always do that by like, if I guess match is fine. So okay, cases is going to be cases and errors going to be a contention which is going to just break. And then this is going to be the same which is going to be match this. Oh, CAS, except the CAS execute does not do that. And I guess we don't check the contention here. In fact, it looks like there isn't even a check for the contention counter in there which is a little odd. Like it feels like there should be. It feels like CAS execute should also like, CAS execute to do, should this also return on contention? I feel like it probably should in which case we would want to sort of do the same kind of structure here. Why is this an option? Right, we changed wrap up. So the wrap up could succeed by resolving or it could succeed in just cleaning up or it could error in the case of contention in which case we break. Right. And index is no longer used. That's great. Atomic U8 might be used later, but not used right now. And now it's complaining about this. That's fine. And this version is going to be a U64. That's right, we specifically chose for it to be a U64. This is because Rust is annoying about references. Version, that should be a U64, of course. If you want to compare the version, this should be a compare exchange. Should arguably be a compare exchange week, actually. But that's fine for now. This same thing, the this pointer, that's fine. This can be a mute. That's what we get from load anyway. And then here, I guess what we want to return is, if these are already equal, then true. Otherwise, we want to return if this dot is okay. And here too, there's like a, if this fails, immediately deallocate the box because we never shared it, right? So we don't actually need the fancy garbage collection in this particular case. The capital return is the comment that wrapped to the next line. Yeah, but then there's not a code line there. Also no, because it has its own line number, whereas this wrapped comment does not. This is like something weird. Also, why is this type set literal and this was not? This is like something weird about the type setting. Okay, I just want to get rid of the errors, right? Cast execute does not currently deal with the contention, which seems maybe problematic. I feel like maybe execute should take care of the, if we go back up to look at where they talked about the contention, I feel like there was a, hi cat, hi cat, hi cat, hi cat, hi cat, hi cat, hi cat. You want to come say hi? All right, come on. Do you want to meow again? Why are you sitting so weird? Sit normal. There you go. That's better. You happy now? Is that better? Wait, let me make you full screen again. What, you want to leave again? Hey, what now? You come in meowing and then you just leave. That's very rude, very rude. I'm just looking for the place where they talk about the contention. Normalize representation, contention failure counter. Yep, I expect to find this address. Because I'm pretty sure we're just going to want to pass that to execute CASAs and have atomic, the atomic compare and set method takes a mutable reference to contention as well. Is it helping to synchronize the critical points? Normal state, normal representation. I feel like it must have been further down somewhere. All right, they talked about this in terms of like monitored run, but there's no monitored run here. So maybe it specifically shouldn't be monitored here then. It's weird, I feel like, I feel like there's almost certainly you should track contention and execute CAS too. Because you think about it like if the commit operation CAS fails, that is definitely a sign of contention and you may want to back off to the slow path, right? But I wonder whether they talk about this a little later. Counting a contention failure counter for all the methods in the linked list can be implemented by counting the number of failed CASAs. Generator, the wrap up, okay, where's the part where they talk about original algorithm for fast path, memory management, comparison, appendix for the white freak queue, contention and failure counter. Yeah, I'm just gonna assume that that's the case. One thing that's nice about the contention counter is that it's not a correction, it's not a correctness measure or a correctness risk. Because if you over approximate contention, what that means is you're just gonna take the slow path more often than you otherwise would need to as your performance goes down, but correctness is still right. So this is okay, this is contention and then we're gonna do this and then this is gonna be okay of error. And then this is gonna be okay of okay. Okay, okay. So execute is gonna take a contention measure and it's gonna return a result of bool or contention is what I'm thinking, right? And then what we'll do is we'll say that this one also takes a contention which is gonna be a mute contention measure. So the expectation is that you're gonna pass this into the atomic compare and set. And if it fails, then contention.detected. This should now return a result bool contention which like is a little weird. I'm not gonna lie. Like an okay false is also contention but you don't have to measure it yourself, I guess. And this is okay, true. And this is, I guess what we'll do is we'll match on this because we have to, we sort of have to match on it anyway because we want to deal with a case where we can immediately deallocate. So if it's okay, what does that even mean? We don't care about the value in the case of an okay. We can just return okay, true. But if it returned an error, I guess we'll probably care about the error type but if it returned an error then contention was detected and we want to return okay false. And this returns the current which we don't actually care about the current value but we do care about the value we tried to stick in there which is gonna be new. So this is gonna try to compare exchange in the new or I guess new pointer maybe. So in this case, we want to do box from raw new pointer. Safety, the box was never shared. So this way we guarantee that the allocation that allocation ends up going away but we still detect contention. And so now I guess this will still be okay false. What does help do though if it detects contention? I guess if we detect contention when executing a CAS then we just continue, right? This is just gonna be a match on this. If outcome is just gonna be outcome and error contention is gonna be continued. Same here, let's format this nicely. There's probably another one of these is my guess right up here. Right, so now this one can be the same thing where if there was contention then we break. So the observation here is we're basically trying to make it so that the user doesn't have to manually deal with the contention counter as long as they use our atomic type. They may want to increment the contention counter themselves which is sort of why we provide it to them in certain other cases. And this is described in like appendix B I think of the paper. Like there are certain cases where you might wanna say that there's contention even if it wasn't because of particular CAS failed. But it really does feel like CAS execute also needs to sort of participate in observing contention. This does need to be pub, that's true. Does it compile? Is it possible? It does compile, great. So I think now we have, if we look down at figure five, we've now encoded all of execute CAS and now let's look at the post CASs which I think we did last time but let's just make sure that that is the case. So post CASs, so this CAS execute and those post CASs. Where does post CASs get executed from? Is it only from help? I think it's only from help. Real question is on the fast path that we also need to call post CASs. I don't think we do. I think it's only used by help. I mean, this is easy enough to find out if we go to base combiner I guess which is the main one. So where is like the main operation? There is no main base. Oh, I guess it's all done in extension because of course it is FR combined three. Let's go with the highest number. Extends base combiner. So let's look at like some random operation like I guess delete. Interesting, so here there certainly seems to be like a bunch of stuff that happens in delete beyond ask for help. So this is what we've tried to encode in our run method, right? Is that if help then ask for help except here that's like encoded in the consumer and then we call generator Oh, I see what's going on. In our implementation of run, our fast path calls like generator then CAS execute and then wrap up. But for them, they actually call like the original algorithm, right? And then only if they detect contention there do they end up asking for help. That's interesting. This is probably so that the fast path is even faster, right? Like rather than go through the generator and stuff which is what you end up doing if you do this like ask for help business, right? They just have the fast path directly encoded in the method and then sort of fall back to the slow path as appropriate. So maybe what we wanna do here is like have a where's our trade? I have too many things in this file now. Like maybe what we want is like a fast path which takes a self and an op and a contention and it returns a result of either self output or contention output. And then I feel like we probably want to encode this first bit. But what we'll do is, what does help first? So help first calls help op which is the one that ends up calling generator and CAS execute and wrap up and then run, I think we're just gonna say rather than it doing the same thing it's gonna do match self algorithm fast path of the operation and of course a mute to the contention counter. And if that just gives you the result then we return the result directly. And if it returns contention then we then we do nothing. I think that's what we want because then we retry the fast path and otherwise we fall back to the slow path. So that's something that seems to be like it's encoded directly in here. It's like this is the first bit for helping. I see this is the encoding of like there's like a setting where you always go through the helping loop rather than do the fast path. This is basically always use the slow path. So first help then the normal fast path operation and if it returns null I guess that means contention that means ask for help. And this is like the retry loop where they retry the fast path and on contention go to the slow path and otherwise fall back to ask for help. So is that generally the case for all of them? So delete is this is the always slow path. This is the contention. This is generally the fast path. And again contention then go to the slow path. Who knows what this does? Yeah, there's like, I guess there's a question here of how much do you leave up to the implementer of the data structure to like choose when to go slower fast. All right, like, okay for, do they have a contains? That's not helpful. Interesting, here it even gets to override the post-casses method. I was like, there's definitely a leaky abstraction here which is a little worrying. It even, it overrides pre-casses. Oh no, this is just the override this is the implementation of the method for the trade. Got it. Although shouldn't that just be generator? Like I feel like something's off here but maybe I'm just like, why aren't these called generator and wrap-up? Is what I wanna know. Like this definitely differs from the way that the paper is structured, right? Which is that help manages the state machine and then there's not a pre-cast and a post-cast. There is a generator and a wrap-up and pre-cast and post-cast are like dictated implementations that call wrap-up and generator. So here I sort of trust the paper more or at least the paper has a cleaner abstraction for this rather than having each one implement like pre-casts itself. I mean, I guess one way to look at this is like, if we compare like this to, I don't know, I don't know, source, I don't know. Combined list to dot Java. If that was probably not very helpful. Let me just open up this and go to pre-cast. Pre-cast, pre-cast. And then open up, I guess, source combined list two and search for it pre-cast. Yeah, like they're the same. And you see really what they do is they call the appropriate generator method for the operation and then just stick that in the, stick that compare and set into the box. And what about the post-cast? So the post-cast here and the post-cast here. You see the post-cast here just like loops through and doesn't do any of the other things that are post-cast. Like the post-cast from the paper has this like monitored run should restart. I guess this does have a restart, but this post-cast has like other bits in it. So there's a, it's a little disturbing that the implementation varies from the paper's model here. It just makes it hard to figure out what the right abstraction should be. I think what we should do is stick to the paper and then when we implement the data structures later, we sort of then figure out whether we can make it fit in the model of the paper sets. It does raise the question of like, should the fast path be, I do think it makes sense for the fast path to be its own implementation so that the implementer can choose to have its own like good fast path rather than have to go through the generator and stuff. That I think makes sense. I think we're gonna keep, we're gonna keep our run method here and just see whether that this structure and forcing the structure make sense. I'm guessing it probably does. One option, right, is that we don't have a retry loop. We just say that the retry loop is something that should be in the fast path specifically for the, again, the fast path implementation in the implementer. But we can look at that later. So I think postcast then is one thing we need to adjust because it needs to match this postcasts. Cause currently we just call wrap up, which like sort of is what this does, right? Should restart an operation result. Oh, we've actually already encoded this. I remember us doing this. There's just a question of the monitored run and monitored run we do with a contention measure and then matching on contention. So I think we're good there. I think that means that we actually now have an encoding of the simulator that these in theory is right. And I think the big thing that's missing now is the actual wait-free queue. There's gonna be like the help queue, the underlying thing and then trying to implement a data structure on top of it. Let's pause there. We've been talking for a long time. We've gone through a lot of different bits. I think this is a great place to like pause, collect our thoughts before we dive into this help queue, which is really just gonna be its own separate implementation. So let's do some questions about how far we've gotten so far. Then we'll take a little bit of a break to like go to the bathroom or make tea or whatever. And then we'll keep going with this. Whew. Ship it. Yeah, it compiles, I know, right? You can use ternary operator in pre-casses. Where? In pre-casses. I'm not sure what you're referring to. I mean, I know what the ternary operator is, but I don't see how it applies here. All right, does the general structure what we built so far make sense and why it was a little weird that the paper and the implementation differed and also sort of what we have and what we don't have so far. Actually, how about I go away quickly and come back shortly and then you discuss amongst yourselves whether you have questions. I'm back. Can you, I assume you can hear me and see my screen. I'm always paranoid that I like come back and then forget to either unmute myself or forget to switch to my screen. So everyone's like, I can't see anything. No. Yeah, this is very much like its research here with dragons, like the only person who's successfully run the code is probably the person who wrote the paper, which is like unfortunate, but it is nice that we have the code for reference at least. What I'll probably do is chat to the original author and sort of ask him a little bit. One thing that's sort of unfortunate, right, is that this paper was originally published in 2014, 2015, and then there was a revision of it in 2017 that's like the longer version that we're working on now, but the author hasn't worked on this for like five years. And as far as I've understood, it's like worked on completely different things since. And so it's unclear that anyone really knows how this works at this point or what the code, how the code works at least. So like it might be, we're sort of trying to reverse engineer understanding not fully grasping the paper as a whole, but it seems like a pretty big deviation. What deviation? We haven't really deviated from anything currently. Or you mean the deviation between the Java code from the author and the paper. It's not really a deviation as much as it is a like leaky abstraction. Like the paper presents a clean abstraction between postcasts and precasts, which is implemented by the simulator and the generator and the wrap-up methods that are implemented by the data structure. And that clean separation doesn't seem to quite be there in the code. What could be the case actually is that maybe they realized later that they can do that clean separation, but they didn't update all of the data structures to use that clean separation or that that's not in the version of the code I have. It's not entirely clear. Now I guess we dealt with this to do. Result, result, unit U size contention is a real weird return type. I think Clippy is gonna yell at me, but like it's sort of is accurate. I mean we could have a type alias for like maybe contention, but I think it really is a either the CAS execute failed to do its job in that there was contention, so it returned early or the CAS execute ran and what we get is the result of the CASes. But the other way to represent this right is to unify the error types and say like CAS execute returns either unit or an error type that is an enum of CAS failed and contention. Maybe that's nicer, maybe we should just do that. It's only really used internally anyway. Right, so we can do like enum CAS execute failure. CAS execute failure is either a, and we're also gonna implement from contention for that. So the question mark will work and we'll do, I can either be a CAS failed of a U size or contention, contention. All right, and then we say this is gonna return a result of CAS execute failure and this is gonna be a error of CAS execute failure, CAS failed and it's like the question of is this nicer? I think maybe it is, I suppose. And this question mark still works because of the from implementation. Then anywhere that calls this now needs to deal with, right, so now we get into a slightly weird case where this now becomes okay of outcome and this becomes, if this is CAS execute failure, CAS failed I, then it's error of I and if it's CAS execute contention, then we continue. Maybe that's nicer. Not entirely sure. All right, still haven't done a git commit. Maybe I should do a git commit. Nah, do it later. Okay, I think the big next question now is implementing, unless there are other questions about what we've done so far. I think the next question now is how we implement the weight-free queue. So as a reminder, this is where anytime we go to the slow path, we need to sort of put up a description of the work that we wanna do and stick it on that queue and every time we are about to do an operation, we wanna see whether the queue of people looking for help is non-empty and if so, try to help them make progress. And because we want the whole thing to be weight-free, any operation on the help queue has to be weight-free and that means that we need to implement a weight-free queue. Now luckily, the paper authors already wrote one which is an adaptation of another weight-free queue that someone else implemented. This is adapted specifically for this use case and the code is down here. I love reading code, Java code in literal wrapped PDFs. Like literally typed PDFs, web line wrapping and page wrapping, that's great. Okay. So a help queue, it's a help queue do, atomic reference. All right, so there's a node type, it's pretty common. I think, let's probably make this its own module. I really need to organize this code. The biggest reason I haven't split this into multiple files and such is because as you've seen over the course of the stream, we've moved so much stuff around because we're still sort of coming to grips with what the interface should be, what the abstraction points should be and the layers should be. So it's nice to have it all in one file where you can just like re-jig it around. But the help queue is sort of unique in that we know what its interface is gonna be and we know that it's just gonna be a self-contained thing. So it makes a lot of sense for it to just be in its own module. Yeah, so that's my plan. So we're gonna have our first module. Help queue. I don't think we're actually gonna need phantom data, but you know, we need operation record box and normalized lock free. We may actually not need these types, but we'll have them there for now. Right, and this needs to use help queue, help queue and this need to be pub create. Because we're gonna use it outside of here. Same goes for AQP, can try front. Great. And the phantom data use up here can go away. All right, so now we need to figure out how to implement the help queue. Let's see what we got. So we have a node type. No value, that makes sense. There's a next, which is an atomic reference. This is an atomic pointer, but it also includes the, it's gonna eventually have to include the logic around garbage collection. It has, TID is probably the thread ID. So it's like the enqueuing threads ID and the dequeuing threads ID. They're a little weird. I guess operation description, phase, pending and queue. So this is probably like, remember how the way you make something weight-free is you make people help each other or threads help each other. Which means that this weight-free queue which we're using for the queue of things to help needs to have its own way to describe things to be helped internally. Which is probably what this off desk is gonna be. Atomic reference, head and tail. Atomic reference, array. What is an atomic reference array? Oh, it's an array of atomics. But okay, good. You need to give a length. But what's the length? Oh no, that's awkward. The array, it's an array of, that's like bounded by the total number of threads. Which is a value we don't really know. And in general, you don't really know. Like what, where does this test.num thread sounds an awful lot like the wrong value. I wanna look at what the code does here. Base combiner, no. That doesn't look right. This doesn't look right at all. This is, okay, this is not giving me the results that I need. I'm guessing this is like, it seems like this is like a J unit value or something. But this seems really problematic. Cause like, okay, what if someone spawns another thread that wants to access this thing? What happens? Then I guess this TID will just be out of bounds and then the whole thing will just crash. Like what happens if I call get with a value that's too large? I mean, I assume it just throws or something. Like there was an exception. Like this seems sketchy. The test of Java file and source. Well, that's awful. Well, that's interesting. Not quite sure what we do here. I mean, it seems like this implementation needs to know the number of threads. Yeah, cause you see what it does is when you enqueue, you basically set the state of this thread in that array, which means that, and the idea of course is that because every thread has its own index, it can like atomically operate on that index all at once and other threads can help on its index by just walking the list of things. But if you don't know the number of threads, then you have no way of doing this. Maybe we could say that every time you want to clone a new handle to it, we'll like take a mutex and like the problem is this can't be a vector, right? Because if the vector needs to grow, who grows the vector when there are lots of threads with concurrent access? That's why it's an array here. How does it even get the thread ID? What does it use as the thread ID here? That's what I want to know. Ask for help just takes the thread ID, okay? Ask for help. Delete is just past the thread ID. Well, I see. So basically you need to say in advance, so it doesn't have to be thread IDs. If we think about this for a second, what this is saying is that every handle to the data structure needs to have a distinct identifier. And there's a proxy in that by thread ID here, but it doesn't need to be. And you need to declare in advance how many handles you want there to be. So one option, well, it's like a question of whether it should be send or not, but one option here, right, is to use const generics and say, oops, there. So our help queue is gonna be generic over that. And that means our generator, oh, sorry, our simulator rather, also needs to be generic over that. And then what we can do is we can say, like, how's this gonna work? Like, all right, let's say there's not a triclone is there. But if we say, why do we need this trade bound all the way down? Oh, it's because this needs to name an associated type. That's really stupid. Okay, so here's what we can do, here's what I'm thinking. We make it so that if you have a weight-free simulator, like you can always create your first one. What am I doing there? PubFN new, right? There's gonna be some new that gives you a new one. And then we can say PubFN like fork, which gives you a result, self and too many handles, right? And what fork will do is, what will fork do? So let's say that the help queue, it's gonna have to be a sort of a shared struct that holds the actual help queue and holds the algorithm and holds identifiers. And this is gonna be an atomic use size. Maybe you already see what I'm getting at here. And then we're gonna say that this is gonna have shared, which is gonna be an arc to shared of LF and N. So I want atomic use size and I want standard sync arc. This needs to be generic over all of this as well. That's fine. Two, okay, that's fine. This is const and use size and this is self-shared help. Self-shared help. So all of these are just gonna have to access through shared. So the idea here, right, is that it's really unfortunate that this is the case that we need to have this like handle. But basically what we're saying is that rather than allow people just have like a wait-free simulator that they themselves stick in an arc, they actually need to explicitly have a handle to one. Do we have to do the arcing for them? It's really unfortunate. It means that you can't stick it in a static, for example, but we may just have to live with that. This might actually mean that we can use epic-based garbage collection too because we can store the guards in the handles, right? So we could store a like a cross-beam epic guard in here. But what I'm imagining, right, is I is self.shared. This should really be handles. Handles.fetch add one. And then if i is greater than or equal to n, then return error of too many handles. Otherwise, okay, of self of shared is self.shared and i is i. And this then needs to do self.shared.handles.fetch sub because it needs to make sure that in the future is someone gives one up. We don't just keep incrementing it and then we'll do like a drop, implement drop for this mute self, which is going to just return the, I can't return the i either because it needs to return its i, not just any i. Handles are a pain. All right, my thinking here was that this way we're like ensuring that there are only ever n handles. The problem is that when I drop, like imagine that I create eight handles and I have the maximum number of eight. I create eight handles. Then I drop the fifth one. Then now if someone tries to create a new handle, it needs to get an ID of five, right? Or four. It needs to get the ID of the one that was last or of one that has been dropped. But this scheme won't do that. Okay, so the other alternative here and this is even stupider. We're just going to do this, screw it. So you can't give back candles. Like the moment you reserve a handle, it's yours forever. And then we'll just leave users to don't fork if you don't have to. This is dumb, right? I'm not saying this is a good API but in the interest of making progress, I think we can do this because behind the scenes you could always improve fork so that it knows how to manage these like multiple IDs. Ooh, here's an option. This is even stupider, but maybe we're going to do it. Free handles is going to be a vector of use size. In fact, it's going to not even be that. It's going to be a, it's really going to be a, some sense is going to be a bool of N, but I don't want to make it a bool of N. So my thinking here is I want this to be a mutex because I'm fine with fork not being weight free. If you have to take a lock in order to get one, I'm okay with that. That way we can at least fix this silliness. So what we'll do is say free handles is going to be, I'm just going to make it a vector of use size for now and it's dumb for so many reasons but we're going to lock, unwrap, pop. And then we're going to say if let sum i is that, then you can get your handle. Otherwise there are too many handles. And now we can implement drop for this, which is going to be self.share.freehandles.lock.unwrap.push self.i. And then new is going to be self of, first of all, it's going to assert that N is not zero. Arc is going to be an arc new of a shared. And we're going to start out with i being zero. This is going to have, I guess this is going to take an algorithm, which is going to be the algorithm. It's going to take a help queue new and free handles. And this is going to be real dumb. It's going to be one.n.collect, mutex new. And we don't have a new for help queue yet. And this is fine, right? Like it's just a vector of use sizes. It's not actually a vector of like thread descriptors. It's just an integer. It's really just a handle ID, if you will. And so now when you fork, you take a lock so it's not weight free to get a new handle. But once you haven't a handle, all the operations are weight free. And when you drop, you just return whichever i you originally got when you created the handle. It is really unfortunate that this N has to be hard coded. Maybe one day we can get rid of that. But now the help queue. So now in the help queue, this is going to be generic over N, pubfn new, which is going to return a self. It's going to be a to do because of course, now will this compile? Great. And I can get rid of the atomic use size. Nice. I don't understand why we popped the way I can fork. We popped the way I can fork because we need to get an identifier. So the free handles is a list of handle IDs that aren't currently used. And so we just take one of them and pop is one of them. Like any of them is just as good as any other. That's why. So now I guess we can actually implement this if we go back to look at the actual code because that seems nicer. All right. So node is just a node in a linked list, right? Next to an atomic reference, these identifiers, that's fine. Operational descriptor, that's fine. That's basically the help queue. I see the observation here, what this is doing, right? This is saying, in this other larger data structure, we don't wanna encode, we use the linked list as a way to encode all of the possible help things or things that might need help. Here, we're not using a linked list, we're using an array. In some sense, like once you have this requirement, why not just use an array in the other one as well? Like something is weird, but it's fine. So the constructor here for the wait-free queue is you create a sentinel, which is gonna be the indicator for the empty list. You create a new thing that has like operational descriptor per thread. So this is where the observation comes in that you don't actually need like a linked list of help operations because you know the maximum number of help operations, which is the number of threads or number of handles rather. And I guess it's maybe the reason we do that here is because this is gonna allocate n of this size. If there are n handles, n times size of op desk. That's probably fine here, because op desk is kind of small. But for the other data structures, the help structure might itself be pretty large. So maybe that's why they want to use the list here. Interesting, okay. I guess we just start encoding this. Shouldn't one to n collect be a zero to n collect? No, that's very intentional. It's a good catch, but no. It's because the self we return has already taken the identifier zero. Turn has already claimed zero, therefore one. So let's see, we should just start encoding this and then see, actually, no, let's read a little bit more. Head and tail pointer, that's pretty standard. Initializes the state array to be, I don't need help, I suppose. What is the actual operational descriptor? Phase, pending and queue and a node. I see, it's like in some sense, this could be a thread local, except that you need some way of iterating all of the threads, thread locals. I'm thinking whether we can make this const, but I don't think we can because of the node. Oh, maybe, maybe actually, because that node is an option. This is really an option node, because it's a lot to be null, which really suggests that opdesk should be an enum or maybe should just be, maybe the whole opdesk should be an option. Like this should be an array of option opdesk. That's probably really what's going on. Interesting queue gets the phase and sets this opdesk to be new opdesk. All right, are these heap allocated because they need to probably, because there's an atomic reference. So that means it also needs to be handled by the memory reclamation scheme we use. I don't know if it's gonna need to do, probably doesn't need to compare and swap because any given index will only ever be updated by that handle and be read by others. So there's no, there's no chance of an ABA problem happening. I think, because I don't think any other thread will ever, hi cat. I don't think any other thread will ever modify it. Oh, there's help. What does finish ink do? Get state.gettid.face false true next. False true next. Also, there is a compare and swap on help finish. I'm just trying to figure out whether there's an ABA problem here. And it looks like there might be, which maybe means this needs to be versioned as well. Well, there's no versioning here. This just declares a new one. Although, because it's using the pointer, you don't have, like this is doing RCU again, which means you don't have the ABA problem for the same reason why we don't have the ABA problem when we're using the heap allocation for the thing to do like versioning and meta and stuff. That's a little awkward if each of these is a heap allocation as well, but it seems like it might have to be because this updates let's see, so when you in queue, what do you put there? Phase true true and a new node. And when it gets updated, it's phase false true and next. Why isn't this curdesk.phase? This feels like it could be curdesk, which just means it's updating that in place. Maybe it's only this one that gets updated. Next is last.next.get. This just feels super racy, but I guess I'm assuming this is a reasonable implementation. All right, let's go with the, let's try to not optimize too much and go with the way that this is implemented in the first place and just do the heap allocations and then like do RCU and heap allocations and then if necessary, change it back. So I guess peak head is just you look at the head that's easy enough and Q is set the state for your thread, then help, I'm guessing it's like help everyone in the same phase as your operation is in and then finish the encoding of the help action you inserted. So really here, all the work is happening in help. Unlike the other one where you had like a fast path, here is just only slow path. Like doing an operation means sticking it on the help queue and then helping until your thing completes. Conditionally remove head. I wonder why this has to help. All right, let's try this out. So we're gonna need a class. Oh, John. We're gonna need a node which is gonna hold a value V. So your question, what's even gonna be in these? I think it's gonna be these things. Which means node is gonna be generic over LF and we're gonna have to require this. Which means we're gonna require this all so that we can name this type. So that is the value. There's a next which is an atomic. Oops, I guess we should also grab these. Which is an atomic pointer to a node. To a node LF, sorry about that. Actually, let's be even better and say to a self. There is the ink. I'm gonna go ink I because we've used I instead of tid. Well, maybe that's gonna be confusing because that might be the index. So let's have all of these be ID instead of just I. And this is gonna be free IDs. Free IDs. And this is gonna be ID. And this is gonna be IDs. And this is gonna be ID. And this is gonna be IDs. And this is gonna be ID. So I guess actually this also means that probably all of the in queue operations are gonna take that identifier. Which sort of suggests that actually the fork method should be on help queue rather than on wait-free simulator. But let's leave it up there for now and just say that these, this is all gonna take an ID which is gonna be a U size. This probably doesn't need to. In fact, is it only in queue that needs it? Peak head doesn't and conditionally remove head doesn't. So only this one takes an ID. And then over here, this is then gonna take self.id. Okay, so the ink ID is gonna be a U size. What else we got up here? We have the deck ID, which is an atomic U size. Not entirely clear why that is yet, but I guess we will soon find out. And there's a constructor for node that's easy enough which takes a value, which is one of these. I guess we could actually make a fully generic help queue and then say that help queue is a variant that takes this. Maybe that'll make this a little bit nicer so we don't have to carry all these around. So we're gonna say that a help queue is really just a wait- is really just a wait-free help queue of this. And then we can do, that way we can just be fully generic down here, which is gonna be a little bit nicer. So this is just gonna be a T and we don't necessarily know what that's gonna look like yet, but at least it means we don't have to carry these long bounds around everywhere and just say this holds a T, takes a T, et cetera. And then what we'll do is we'll have these methods all be on wait-free help queue T, right? And say that they all just take a T, return a reference to a T or something, like there's something odd here that we need to figure out. This may end up operating on like raw pointers or something, I'm not quite sure. And then this is really just gonna be self dot the wait-free queue dot enqueue ID and help. So this is just a, essentially this is just a forwarding, in fact, this can just be a type alias and that way these can all just go away. And we don't actually know what new node is. Oh, I guess new node is pretty easy. It is just gonna be a self of where the value is the value. Next is atomic pointer, is there a null, new? Standard pointer, null pointer. The enqueuing ID, I see, so this needs to take an enqueue ID which is gonna be a u-size and stick that in there. And this is gonna be an atomic integer and this is gonna be an atomic integer. So it needs to be able to be minus one and that's just because we don't have atomic options. But that's fine, we can use, we can live with this being a minus one. Technically that means that half of the enqueue ID space can't be used, which is a little awkward but such as life. And then what's this op-desk? This is gonna be a struct. I still feel like this is probably gonna end up being an enum, but there's gonna be, let's see, phase which is an I-64, pending which is a bool and queue which is a bool, node which is a node which means this has to be generic over T and there's a constructor for that too. Op-desk, T, FNU, and it takes all of the fields so it doesn't have a constructor in other words, that's fine. And I guess now we start to look at what the actual help queue contains. So it has, the actual help queue has a atomic pointer to a node of T and has a head, which is this. It has a tail, which is the same. And then it has this like state array which is gonna be a atomic pointer, an array of atomic pointer to op-desk of T of length N. And this is where this const NU size is gonna come in, right? And then there's also enqueued and dequeued which feel like they're probably just there for debugging maybe. Yeah, they're not actually necessary. So we could have, they're basically for metrics collection like for stats, like this is probably handy in debugging but like in fact, we can do this here too, right? We could say enqueued or enked atomic U size but I'm not gonna do that now. We're gonna keep it simple for now. Is that const N an example of refinement type? No, this is a const generics type. There's no refinement here. All right, so we have a head and a tail and then this is gonna have a new. A new is going to take no arguments, apparently, and return a self. What's it gonna do? It's gonna create a sentinel. It's gonna be node new, it's gonna be null. Which, so that means that's an option T. So node new of none and minus one. So I guess this has, this does have to be an eye size. That's fine. And then we're gonna say, oops, then we're gonna say that the head is gonna be atomic pointer new of, I guess, box new, box into raw. We've done the stands before. This is gonna be pointed to the sentinel and the tail is also gonna point to the sentinel. State is gonna be, well, so state is gonna be a little bit weird because I guess we want a const empty state. Let me write this in the long form and then see why we have to change it first. So this is gonna be a op desk of length n, where the phase is minus one. Pending is false. Secondia, this one is true, apparently, and node is null, which means node is none, which means node here is an option node and a fairly common. So this won't actually compile. This eventually won't compile because you can't create a race this way. Just by repetition, you can only do that if this is const, if this is specifically a const value. So this is like an empty desk. Is this, and then you can do it here. But this is gonna complain because you're not allowed to have one that's generic, can't use generic parameters from outer function in the inner one. So this is gonna be a little bit of a pain. It's gonna be a little bit of a pain. Basically, the problem here is we need to construct the array, but to construct array, you have to have all of the values at the same time, but they're all the same, which you can only express if their type is copy. Like you can put a type in here, if you can put a value directly in there. If the values type is copy, but op desk isn't copy because it contains, it may contain a node, even though it doesn't currently contain a node. So we try to make it const to indicate that it doesn't contain anything that can't be copied, but we can't use the T in that name. Maybe we can type a race the node here because it's just gonna be a raw pointer anyway. I'm gonna have to think about this a little. That's gonna be a little annoying to deal with. You can always work around this by creating a vector of the appropriate type, like it's real dumb. Like, I mean, the restriction is there for a good reason, but the way we have to work around it is real dumb, which is basically we say state is like zero to n. Map, ignore the I, give me an op desk for each one. Collect.tryInto.expect gave n elements. And then we say this has to be of type atomic pointer op desk t length n. And it's gonna complain probably here that this has to be atomic pointer new, box new, box into raw, this. So now that's state. But this is a restriction that's gonna be ultimately relaxed that you are allowed to name the t or just use a value where the constructor happens to be constant, but it's just not something that you can currently do. Like to do once const can depend on t, make this constant instead of via vec, instead of going via vec. Could also make it maybe on in it. That's the other way to do it. But if I can avoid writing on safe code, I will avoid writing on safe code. And like n here is likely to be small anyway. And this is one time construction. So I'm not too worried about it. And eventually there'll be a fast safe way to do it. So I'm okay with it. And then it just constructs, I guess, self from these. So head, tail, state. So that's great. Let's look at, so in queue is the most complicated operation. So let's just skip over that for now and look at peak head. So peak is just real straightforward, right? Which is node is self.head. Dot load. I'm just gonna make all of these be sequentially consistent even though that's not necessarily necessary. So we know that this is always okay to de-reference because we never de-allocate such a lame reason why. So we load, why does this get the next? This doesn't peak. Oh, I see the head is a head dot next. Head dot compare and set. Doesn't this skip the thing that's actually at the head? It gets what's at the head, like what the head pointer is pointing at. Then it looks at the next element and gets that and then returns the value. So this should return the second element, not the first. Right? I was wondering like maybe the head is always like a sentinel value or something, but if you look at conditionally remove head, it does head dot get to get the current head, gets the next and then replaces the thing that's in head. Like it replaces what head points to. It doesn't replace head's next pointer, but it does compare expected value to the, there's like something weird going on here. Now peak head is supposed to give you a reference to the first element of the list, but this seems like it gives you the second element of the list. Unless the head is like always a sentinel value, right? Like head is always just a node that doesn't have all the value in and of itself and only the next one holds the value, but that doesn't seem to be true because in conditionally remove head, we truly remove the thing that's at the head, not it's next. Conditionally remove head gets the current head, finds the next pointer. It shouldn't be possible for the next pointer to be null unless the head pointer is the sentinel, right? So this is checking for the sentinel, that's fine. Or if the next value equals the expected, if the next value does not equal the expected value then return false, okay, that's fine. And then we replace, then we replace the current head with its next pointer. So what ends up getting removed is the thing that's at the head. So the thing at the head can't be the sentinel pointer because then conditionally remove head would only ever remove the sentinel pointer, which makes no sense. Also, why does this set, that's the thing that was removed, I see. Yeah, head dot, this head dot get gets the first node, right? Like head is a atomic reference to a node that initially starts out being the sentinel. But when you enqueue, see, here's what's weird. Enqueue pushes to the end, like it uses the tail pointer, right? Which makes me think that the head pointer really is always the, the head node is always the sentinel, the feeling I get. And if the head pointer is always the sentinel then peak head makes sense because the first element is the sentinel so you always want to return whatever's immediately after the sentinel. But conditionally remove head doesn't make sense. Specifically this line, why would that not end up removing the sentinel? Because this is checking the value of the next thing. Like shouldn't this be head dot next? Or like there's something real fishy about this. It feels like conditionally remove head will just never succeed. Or rather, no, it will succeed, but it won't do. Okay, what I thought this was gonna do, right, is that this would, if head is indeed the sentinel always then this should compare and set head's next pointer to be the following pointer. But that's not what this does. This changes head itself to point directly to next, which would bypass the sentinel. And from that point forward peak head would be wrong because the first element wouldn't be the sentinel anymore. What does the paper say here? Well, like what's the paper code? No, paper code is the same thing. And what's particularly, what makes me think that this is wrong is that this is comparing the value of the next. So it's clearly they expect that the expected value that's passed in is gonna be contained in next, not in head, right? So it's intending to remove next, but that's not what this does. Unless like, no, the compare and set method is on atomic reference. Like my expectation here would be that this would be head, or I guess curhead.next, compare and set next. I guess curhead.next, next.next, no, next, and next.next.get, right? Like that's how you remove next if that is indeed your goal. And that this would be next.next.set null. That feels like what the code should be doing. The reason I'm hesitating, the reason I'm doubting myself here is because this code has been run, right? Like they did all sorts of experiments with it, that there are results in the paper. So clearly it's not wrong? Or if it's wrong, it's not wrong in a way that matters. So what would happen with this code as is? Conditionally remove head would end up removing the sentinel, which is fine. Although, yeah, I think removing the sentinel is probably fine, although it'll make the next, okay, this breaks if you do a, okay, you can't do a conditionally remove head until the sentinel is no longer the head of the list. So you need to do an enqueue first, otherwise you can't do conditionally remove head anyway. So someone does an enqueue, which means that now from this point forward, okay, so now the tail pointer points to the thing we just enqueued, then you conditionally remove the head, which is gonna remove the sentinel, but not the value. Now the value is the first. Now you call peak and you get null when really there is a value there. I'm pretty sure this is just wrong, but the way that it'll manifest is just that you'll end up leaving a completed operation at the start of the help queue. I think what'll happen is you will, yeah, sorry, yeah, I'm sorry, I'm lost in thought. So let me try to explain what's going on. So this data structure has a head pointer and a tail pointer, right? And initially it constructs a node that calls the sentinel. That's terribly written, but says sentinel. And head and tail both point to this. What happens when you do an enqueue, I mean, we haven't gone through it, but basically the end result after an enqueue at least as far as I assume is you end up with head pointing to still the sentinel and the sentinel has a next pointer. So let's say I enqueue this little box. Then there's gonna be a second box that's gonna have a pointer into this value and a next pointer that points to nothing. There's nothing there. And the tail is now gonna point to this node. Great, so far so good. Now if you look at the code for peak head, which is supposed to give you the first element in the list, right? So peak says head.get, so head.get that let me maybe switch colors. So head.get, which is this node, right? Head.get.next.get, so that is .next.get, so this. So next here is this node, which is indeed the next node. If next is null, then return null. That's fine, next is not null. Then return next.value. So this is next.value, so peak will return that. Great, that is indeed the first element of the list. Everything is fine. Okay, no problem. Similarly, if another enqueue happens, notice that none of this touched anything past here. Oops, ooh, what did I do? What did I do? Ooh, what on earth did I do? Ah, I have done something weird. Oh, that's because I have the eraser on. Nothing in the peak touched past here, so if there were more enqueues, it doesn't matter. Still does the right thing. Okay, so now we look at conditionally remove head. So conditionally remove head or conditionally, oops, let's do a third color. So I guess this is peak. Let's do, I guess, orange type color. So conditionally remove head. So conditionally remove head takes a expected value and will only remove the head if that is its value. Okay, so it does head.get. So head.get is this guy. Then it does, so this is curhead. Then it does next is curhead.next.get. So that again is this node. So that's gonna be next. Great, then it says if next is null, it's not. Or if next.value does not equal the expected value. So it compares, this is V, it compares this V to this value. So it compares this V to this value, which is what we wanted it to, right? Like this is checking whether the heads value is equal to the provided argument. Great, so far so good. If they are equal, then it won't return false. If they're unequal, it will. So it'll go down to this line. Okay, and then it says if head.compare.set, so head is this guy.compare.set, current head to next. So current head to next. So it's saying head used to point to here, now stop pointing here and point to here instead. Right, that's what this line of code says. And then it modifies the current heads next to be null. So it takes this node, right, which is now disconnected and removes this little arrow, which is so that this node can ultimately be, both so that this node can be deallocated and so that this next node might eventually be deallocated to, at least that's my read of it. Okay, but now we have a graph to sort of tidy this up a little bit. We now have a graph that is head points to this node whose next pointer is null and whose value is still what was in ink, right? If you sort of ignore this node, which is the one that was just deleted, this is now the head of the list. But conditionally remove head was supposed to remove the head of the list whose value is the one we gave in, but that's not the one that was actually removed. What was removed was sort of the sentinel node that was here. And so in fact, if we now do a peak again, new color, purple, pink. So if we now do a peak here, right, peak remember is gonna look at head dot next dot value. Okay, head dot get next, which is nothing. So peak is gonna return null, but there is a thing with a value here. So I guess peak does return the right thing after conditionally remove head. So maybe this is why it works? Because even though it's messed up, like even though it leaves the node in place, peak ends up skipping it anyway. I mean, maybe it does work. It's just really strange, right? Because we did conditionally remove head of this value, but that value wasn't removed. It's just gonna be skipped over by peak. I guess what happens now if we do an enqueue? I guess the enqueue will still work because it's gonna operate on this node which used to be the tail. It's just really weird to me that the remove leaves the node with the value we said to remove in place as the head. It removes the node before the thing we told it to remove. It's just really weird, but now that I've seen like this peak does return null. So I guess maybe it's right. If there was a node here, then peak would return that node. And if you do another conditionally remove head, it'll skip past the head, look at this node's value and then remove this one. I guess it's just like always one node behind. This is really weird. I mean, it seems to maybe do the right thing which I guess means it's all wrong. Yeah, like next is made the new Sentinel. That's weird. That's weird, but okay, fine, I guess. So next is node.next.load and then if next is, if next is null, then none else unsafe. Next.value? That's a weird ass algorithm. Not quite sure what it's complaining about here now. All right, all right, well, we'll try it. And then this is curhead is self.head.load and then next is curhead.next.load. This is a standard pattern, I suppose. If next is null or I'm gonna go ahead and make this a little more explicit than return false. Next is then gonna be unsafe. Next. If next.value is not equal to front and return false. I'm just splitting this up because I kind of want, oh, actually I guess I don't use next for anything else. But like the alternative is to have this read like unsafe.next.value not equal to front. It just like looks a little less nice. But I suppose it's technically correct. That's fine. And then I guess self.head.compareExchange curhead with next ordering, ordering. And this one I think, yeah, if it succeeds then we do some stuff. If it errors then we return false. And what we do on success is self.help.finish.ank and then we do curhead.next.store.pointer.null. I think we probably don't need this extra store. Is this needed? The reason I think we might not need the store is because I think that's only there for like Java garbage collection. Oh no, it actually does need to be here. This is the way that we turn it back into a sentinel. Turn the, or is that true? It's not clear actually why this is necessary. Because curhead isn't the sentinel. Next is the sentinel now. Next is the thing that we actually removed and is now the sentinel. Not sure why setting that to null is necessary. It could be for garbage collection in Java in which case we wouldn't need it in Rust, but I'm not sure. Okay, and we're gonna, ooh, it does not like this. Why does it not like this? Right, so we're gonna need this like help.finish.ank method which we don't actually know what does yet, but that's fine. And curhead, yeah, it's a little frowned upon to, well, we'll just do this the old fashioned way. Curhead is unsafe of curhead pointer. So now we have this and we can say, take the pointer please. This should return error, that's right. This should return okay, and this should return error. Right, and we need to require that, I guess really this should be by, one question here is whether this value should be like a comparison by partial eek or whether it should be by pointer. I feel like maybe it should actually be by pointer. Because we wanna check whether it's like, whether it's the same T. It's not just whether they compare or equal. Yeah, so I think we actually want sort of like, why is the value an option T? It's an option T because dot as ref. Oh, it's, we had to stick an option T there because for the sentinel node is not a T. Which sort of means that the, this should really be an enum that is either sentinel or node, but that probably gets annoying right around down here where we want to access fields like next anyway. So I think we're just gonna keep this be option T, that's fine. And then we'll just have to here do like as ref expect not a sentinel, sentinel node. No field value on mute node. That's false. I just made that be not the case. And then same thing here. This should probably then be unsafe dot value dot as ref dot expect not a sentinel node because the only node that ever has a value of null is the original head node. Why is this complaining? Oh, that's why. Okay. And this collect is complaining because, oh, I wonder, can I just collect directly into an array? I don't think so. I think this does need to be a collect into a VEC and then I'm pretty sure VEC implements try into for array. Concert, that's, we're trying to, yeah, great. What do we not need? We don't need phantom data anymore. Great. Right. And now we just need NQ and all the helpers, of course. So NQ is phase is self dot max phase plus one. And then self, self dot state ID. And then this looks like it's just like a store of box into raw box new off desk of something where the phase is that we're pending is true where NQ is true and we're node is new node of NQ. This I guess should be value of value and, oops, of value and tid, which is ID. And this should be some, and this should be some, and ID should be, eye size feels wrong for ID to be eye size here. Feels like this should really just be phase should be an option U64, U32 is probably fine, but U64, let's not make this silly. NQ ID should be an option U size. So this should be an option U size. This is where it might be nice to make node an actual enum because otherwise you get into this business of like every field has to be an option, but it just feels nicer to use an actual none here, to use an actual none here, and to use a sum here. Really what we should do is say sentinel, which returns a self. And that is just gonna be value is none and NQ ID is none. This should take an actual owned one of this and value should be some of value and NQ ID should be some of NQ ID. We don't want people to do this. I mean, this is only internal code anyway, but it just feels wrong to have to pass in all these sums. And then what does it do? Help and help finish NQ. Okay, so it does self.helpfinishNQ and self.help. Self help, nice. And I guess we're gonna need whatever this max face is, which I guess is like gonna be a U64. Maybe it's gonna end up being an option U64 because it might, the face might not be known. And who knows what FN help is gonna be, but I do know that it's gonna take a face, which is gonna be U64. And this is gonna be a to-do. And where did I mess up? Cannot add integer. So this is gonna be an unwrap or, it's gonna be a map of P plus that unwrap or, oh, it's actually even gonna use the fancy map or, oh, right, some face. I can't spell anymore, which is a good indication that I've been programming for too long. And this is gonna take the face that we chose. Right, and then, so it's really all the helper methods, which are gonna move all down here. Could start numbering IDs at one, also true. It just, like, option is like the right thing to do here, I feel like. The other thing we could do, right, is we could, if we wanted, if we're worried about the overhead of the option, we could do like a option non-zero U size. So I don't really want to unwrap options all over the place. At the very least, it should be expect and you should document why the expect is accurate. Like, none of these are unwraps. Like, unwrap or is fine because it's not actually an unwrap. But the places where you have to use expect like here is like really unfortunate. What I really want to do is encode it away so that this just can't happen. It's a little harder with concurrency because like, given one of these, it's like, you can't atomically change the variant of an enum, right? So this is why we're sort of forced into this pattern of it has to be an option that we then always know has some value. It's a little awkward. Maybe it could be a union. Like we could manually implement a tagged union but I don't know if it doesn't seem quite worth it. All right. Let's see if we can power through these helper things. So we have, let's do max phase first. So max phase. I'm guessing there's like some construction here where like every, and it looks like this is the case, right? That every operation is given a phase and when you help, you help everything in your phase, no matter which thread or in our case, which handle instantiated that operation but you had all of the ones in that phase before you move on to the next one. I don't know why that's necessary but wait, this is a manually implemented max loop. We can do better than this, which is self.state. .idder.map. s.load, I guess unsafe s.load, no. s.load order ordering sequentially consistent. In fact, it doesn't even need to, this can just be an unsafe, that's fine. s.load.phase.max. Of course, the problem here is that this returns an option. So what we're gonna do is we're gonna do a filter map. Great. So that way any op desk that hasn't been given a phase yet is just gonna be ignored. We're gonna take the max. If all of them don't have one set yet, then max will be done as well. The iterator will be empty. Great. So that's max phase, that was easy. Is still pending? This seems like a helper method that we might as well just add straight away. Fn is still pending. It takes a self, it takes an id and takes a phase. I'm gonna guess that that's what that is supposed to be and returns a bool. And that's gonna be, why does that not need to do a load? Oh, I see, that's just, that's fine, that's fine, that's fine. It's just because we, I guess the atomic, what was it called, like atomic array, atomic reference array and Java, I guess you can dot get a particular index and it does both the indexing to find the right atomic pointer and the load in sort of one method. Whereas for us, we have to index to get the atomic pointer and then do the load. They're equivalent in terms of like what it actually computes to, but that's fine. But why does it do the, why does it do two loads? That seems, that seems bad. Just like, like it's doing the load twice, which seems problematic. Like the Java code does an atomic load here and then another atomic load here, which even if it's not wrong seems kind of wasteful. Unwrap or zero. Because less than or equal, the Java code uses minus one for which this will always be true and for us, none will be turned into zero for which the condition is true because it's less than or equal. Great. So we now have this still pending. And I guess we can move these down so that they're in the same places in the Java code. All right. So now we have the help functions left. Let's do help finish Inc. So help finish Inc. Last is self.tail.load. Next is, am I going to have to do anything with last? This is last pointer. And again, for all of these, the safety is we don't deallocate. Like this is why I kind of consider doing hazard pointers as a second stream because ultimately there's going to be so much unsafe here that really just as we need memory reclamation and ultimately that shouldn't even need to be unsafe, I don't think. The hope is that the hazard pointer library that we end up writing takes care of the unsafety around whether things are safe to dereference or rather ensuring that they always are safe to dereference. So this is next pointer. So this is like if it's not equal to null, then indent this whole block and we're just going to do if it is null, then return. Because that seems nicer. So now we're going to let ID is, I guess that next is unsafe. Next pointer. Next.inc.id. And I think here we, next is never the sentinel. So we know that this inc.id must be set. It must be some. And then we get, I guess, curdesk is state.get of id, which is really state id.load. So this is pointer. The reason I'm splitting pointer and not pointer here is because sometimes we'll want to refer specifically to the pointer even after we've dereferenced it. This matter of things like compare and swap where you actually need to give the raw pointer argument rather than the reference that we end up producing. The other reason is because that way I can easily mark just the dereference as being unsafe, rather than marking like the indexing as being unsafe as well if I put an unsafe around this whole thing. All right, so if last pointer is equal to, why does it load the tail again here? If the last pointer is still the same and, so this is also weird, like why does it do another load of this here after reading the tail again? It's annoying because this one might actually be important. It might be important that you do this read after you do this read. But it's not entirely clear from the way that it's currently set up. Like this could be curdesk.node, but instead each of these is a separate load, which is a little worrying. I'm like, is it intentional that each of these is a separate load of the state? Maybe, but also maybe not. Here too, we can like invert it to avoid the indentation. Well, what does this... I want to try to understand what this code actually does. So it's like, okay, first of all, let's get proper indentation by not having the Java code in here. So what does this actually do? It looks at the tail pointer. It looks at what comes after the tail pointer. Oh, I see what's going on. Okay, the name is pretty indicative, right? Help, finish, and queue. So I think the idea is that the actual in queue is probably just setting the next pointer of whatever the tail is, but then something has to update the tail pointer as well. And that's what this help finish ink is, is updating the tail pointer. So if the tail pointer... Okay, so the tail points somewhere, if the next of that node is null, then the tail pointer is already correct and there's nothing to help. So this is... Tail pointer is already correct. So nothing to do. Let's go with up to date. So then it reads what is the next pointer. Looks up the current operation of the thread that enqueued the thing that is after where the tail pointer is pointing. So the tail pointer point is something that isn't the tail. So we look at the next of that node and then we look at what is the thread that enqueued that next doing. That's what curdesk is. And if the last pointer has already been updated... I guess this can be like... just to make it very clear... Tail pointer has already been updated. And then this is... if the owner of the next node is now working on a subsequent operation, the enqueue must have finished. I'll fix that in a second. The observation here is that if the tail has already been updated, we're done. This is... We're looking at basically the current operational state of the node that added the thing after the tail. If that node is currently working on something that isn't the next node, then its enqueue must have completed for it to do something else instead, which means that it's called to help finish and enqueue must have completed, which means that the tail must be up to date, I think. I think that's what it's going for here. curdesk.node. I think this curdesk.node is actually just a raw pointer, maybe. Maybe this shouldn't be a node. I think this has to be a starMute node t because this is sort of like... this is the pointer I'm trying to put into the data structure, which is going to be a raw pointer because we've already turned it into a raw pointer. We don't really have the nodes. Nodes are only ever represented as raw pointers. So I think this is going to be like box into raw box new. In fact, I think node new should return a starMute to self, honestly. This is an option. Why is it an option? It's an option because of the sentinel because initially, opdesk won't be modifying a node. And so what happens if that's the case? The next pointer... I see this really should be like curdesk.node.unwrap or standard pointer null. Unwrap or else. If node is none... I don't think we can get here if node is none, but if node is none, then that's equivalent to it being a null pointer that we wanted to compare because that's what it was in the Java code. Great. So now we sort of want to newdesk is box into... Actually, yeah, box into raw, box new, opdesk. Phase is going to be... See, this is another place where like... presumably the phase isn't changing just because we made progress on the enqueue. Nothing changes the phase. There's no increment to the phase, except by the owning thread. It's like... But yeah, I guess you see like now the operation is no longer pending because it completed. And this is now some next pointer. But isn't that... We're already comparing whether node is equal to next. So this is just the same as it was. Like, this is just curdesk.next. Otherwise, we wouldn't get in here in the first place. Node. Because this comparison checks that they are the same. So really, this is just setting pending to false. Which feels like it shouldn't be necessary to allocate a new... This is really just setting pending equals false. It shouldn't need full RCU, but... And then it does state.compare and set. So this is self.state of id.compareExchange curdesk pointer with... I guess this is going to be new desk pointer. Like this sort of feels like it could maybe just be an atomic bool. I feel like this could be an atomic bool that we just updated. But self.tale.compareExchange last pointer to next pointer. Ordering... And it doesn't even check whether these succeeded or failed. Which is kind of interesting. It just like does this. I really feel like that could just be an atomic bool. But let's not mess with the algorithm too much. Okay, so I think that only leaves help. So this is help and this is help, Inc. So I guess let's do help, Inc. This is fn help, Inc. id is use size and phase is use 64. It's on self. While self.isStillPending last pointer and next pointer I guess this is really just the same as what we did here. Right? And if last is equal to C, I see. So it has to tail was concurrently updated. This just seems like a huge waste. Like here you read the updated pointer. So why not use that the second time around the loop? Like we're reading the tail pointer. Then we're reading its next pointer. Then we're reading the tail pointer again to see that it didn't change. Okay, that's reasonable. So you want to make sure that you actually got the next pointer you got actually corresponds to that tail. So that's fine. But now the next time around the loop you could save yourself a load. But let's just leave it the way it is for now. That's fine. If next pointer is null then do something otherwise self.helpFinish, Inc. So this is tail is not up to date. Help make it help update it. And this is a continue. Okay. And then here this is next pointer is null. So this is if self and then it calls pending again. Phase is already over. So in that case we can continue. But I'm guessing that if the phase is over like if the phase is over then it's going to remain over I would think. So I think this can just be a return. I mean it's just going to end up checking twice which is fine. But I guess like to do can this just return. So notice that instead of doing this indentation I'm just like inverting all the conditions and returning early. I find it leads to slightly easier to read code. But each to their own. If last.next compare exchange next pointer node is self.state ID load. This is a very involved protocol. I'm not explaining why it's safe because I don't necessarily know. Like wait-free algorithms are super complicated. So I'm sort of taking this on blind faith that the implementation that's provided is like correct. So this is going to be like curdesk pointer curdesk is unsafely dereference that compare exchange this with curdesk.node and I guess this will be an unwrap or else pointer null mute this Java code needs to be in a comment because otherwise I can't see what I'm typing. I see that this is like if that is okay then help finish in queue and return is what the Java code says. So what is this doing? This is helping in queue a thing. So while the phase is still pending so this means there's still some operations that some threads have to complete in this phase then try to make sure that we know what the tail is. This is basically us checking that we have a tail and a next pointer for that tail that we know are like both up to date. That's what this is doing up to here. So here we know we have a valid tail.next pair consistent if you will pair and that it needs it likely still needs to be updated so let's try to actually execute the in queue. That's where we're at here and we do that by looking up the in queue from the in queuing threads descriptor. The to be in queued node from the in queuing threads descriptor. So we look up the descriptor for the in queue operation for that thread and then we try to update the tail pointers the last node's next pointer to point to the node that is to be inserted and in fact I think this can even just be an expect pending in queues are always it's funny this isn't actually check does this check whether this is pending as to check that we're actually trying to help something that isn't in queue so this is like no I'm actually going to go ahead and leave this as this because this would only be safe to unwrap if we know here which we really should so here where did I jump to here this should be an expect node should always be some for pending pending in queue and so here we should really do like an a search we can have it be a debug assert that's fine search that this is indeed a pending operation and that this is an in queue operation and in fact if this operation is not pending then why are we still executing it is the other thing I want to find out like if this is not pending to do can we continue feels like we can assert this isn't this is still pending like it feels like we probably can assert something about this we should certainly debug assert that this is indeed an in queue operation because you shouldn't be calling help in queue if that thread doesn't have an in queue operation anymore I feel like I feel like we can probably assert here that it is still pending but I'm not confident enough to make that an assertion this needs to be ID and then I guess help is all that's left and what's nice is that we should be able to write a unit test specifically like this is definitely something that I want us to write because this is like really gnarly code and even just for the single threaded variant like it doesn't need to concurrency even I just want to see that we wrote code that was correct of course it's going to leak all sorts of memory but that's fine as long as we check that the logic is actually correct this is 4 I in fact it doesn't even need that this is just for this is for descriptor in self.state pointer is desk.load I mean this is not this is like the desk atomic and then this is the ooh that's not what I meant to do this is the desk pointer this is the desk pointer this is the desk the desk pointer and then if desktop pending and desktop face is less than equal to face and then there's an additional if which could sort of be collapsed this but really what this means is this operation needs help and then if desktop in queue which is currently the only helpable operation is in queue right so this feels like it could probably be simplified a decent amount my guess is what they did was they started with a weight free implementation that supported other operations but then they sort of stripped it down to only the things they needed for the weight free simulation stuff which is basically like you need peak try remove front and in queue and as long as you have those three like probably a bunch of the other operations just went away and so therefore here they probably used to be like if desktop delete or something right then help in some other way but in this way in this case there are only in queue operations to help and so you just that's just the only one that's left why does help in take I I see so this is self dot state dot error dot enumerate taking ID and this and this takes ID it's a little stupid here too because unwrap or because the other thing that having this be an array does is that you actually need to iterate through all of the you need to iterate all through all of the threads every time you want to help not just like this is the vector it's a it's an array and so you can't like skip to the first one that needs help you to iterate through all of them so this is like linear and so if someone tries to go like oh let's just make this 1024 threads or something like support for 1024 handles then this code is going to iterate through 1024 things even if you only have like two handles which seems maybe unfortunate but you know maybe we should document that somewhere operations are linear so be careful doesn't compile because that's not right I think we're actually going to do here is be I see what's going on the way we're going to use this help queue it's actually then it's going to store pointers so like when we return a reference to T like a reference to a pointer I think we're actually going to do we're going to have this be T and just say where T is copy and partial Eek and Eek and then now this is no longer that that's fine I can just now be an expect where's the other place I used as ref because this is copy this can now just be that so that's nice ooh only warnings no something's wrong oh private type oh right because we made we made this be a type alias which means that this type now actually is the one we're really using which means it also needs to be pub crate which is a little dumb but it means that we get to means we don't have to implement all like these forwarding functions here which is a little unnecessary also means I can get rid of this and here we specifically don't care about the result of this compare exchange which is like this little weird but I feel like maybe we can should we assert on these I think the answer is no I think the answer is that if they fail we're just going to continue helping anyway so I think we're just going to not do that unused import that's good that makes me happy and the deck ID was probably there just for debugging which means we can do this and config test and just ignore it entirely and in fact I'm just going to go ahead and remove that because it just complicates the code unnecessarily and now we don't need atomic eye size and that makes me happy and it compiles why does the unsafe block on desk DRF only lasts one line shouldn't it last for the whole function the unsafe block on desk it's done in help what unsafe block I'm not sure I follow all of these unsafes are because we never deallocate is the answer why all of the unsafes are safe oh I see so the question is slightly different the question is why is it okay for this unsafe to end here I think this is a question why is it okay for the unsafe to end here but we keep using until the end of the block shouldn't that mean that the unsafe block also needs to extend to here and the answer is no the answer is because unsafe in Rust is an encapsulating operation if you are annotation rather when you annotate something is unsafe you are promising that what you are doing in that instance is safe so in our case this unsafe is promising taking this raw pointer and constructing a reference to it with the returned lifetime is safe so what we are saying here is first of all this is a valid pointer it's aligned and it's not null and what not and we are also claiming by writing on safe here that the lifetime of that pointer is going or rather that pointer is going to continue to be valid for as long as this desk lives basically think of this as like this unsafe is really is really like turning a star cons t whatever right into a tick a t and so the unsafe promise that we are making is that this pointer really is valid for tick a and that tick a here is until the end of the scope ok we are at the 5 and a bit mark so I think we are probably going to end here at least now we have what in theory is a complete implementation we have the wait for a queue we have all of the simulation stuff what's left now is testing obviously in particular I want to test the help queue cause like that's some super gnarly low level code but we need to like write a data structure that uses this as well and then test the overall simulation and then separately we now have all these issues around memory reclamation like all the stuff that's currently unsafe is only safe because we never deallocate and that's not good enough we need to deallocate memory and in order to do that we need to implement something like hazard pointers or now that we have now that we are sort of forced to have this where did it go where did it go now that we are forced to have this like handle approach where it's not just like you can share a single wait free simulator like fork it for each thread that needs it now that we have that it might be that we can just go with like a cross beam epic or something that already has a guard primitive that we might be able to use for the memory reclamation here I'm not quite decided either way I kind of want to port the hazard pointer stuff anyway cause hazard pointers are cool and neat in their own way but that's like an obvious shortcoming of this implementation now too so I think from here there's sort of a branching point where we could either go the let's just test it and not worry about memory reclamation now or we could go the let's fix memory reclamation and then test it I'll see which one I feel more like my guess is the next dream will be in like three weeks schedule lately and I'll try to sort of stick to that so yeah I think I think now that it all compiles and we have code I don't think we have any like big to-dos now that's fine that's fine I don't think we have any actual like to-do macros so I think we wrote all the code let me first try it all the code at least so I think I'm happy to sort of end it off there before I do though are there any questions about what we did today I know that's a large question but like things where you're still like I'm a little fuzzy on how this piece fit into what we did what we did with like the handles like anything I can clear up so that you don't have lingering questions until like three weeks from now or six points from now or something sort of at the tail end what would point in favor of testing last would be good to see whether it actually works before memory reclamation gets implemented so my concern with testing first is that the code may end up having to be re-architected a bit to do memory reclamation so the testing that we do may not bias that much confidence that said like if we run the tests and they all pass that's great that's obviously a big sort of point of value but my concern is we add memory reclamation a bunch of the code ends up shifting around so even if the tests pass that doesn't give us any indication of whether the new implementation is correct so we may end up having to like redo most of the tests I don't think that'll be true but I think these are really fairly orthogonal like implementing memory reclamation is just like the first stream on memory reclamation will be a new library that just exposes memory reclamation and then there'll sort of be a third stream that brings that into what we've done here so what we might do is implement hazard pointers for a brush of fresh air actually breath of fresh air and then do like a then do a stream of writing tests for the non-memory reclaimed one and then do one that we bring it together does the wait-free queue require a wait-free queue to be considered wait-free how does contention get resolved for this queue so this queue that we just wrote is itself wait-free it's wait-free there's no inner queue in this one as well the simulation we wrote depends on a wait-free queue but this wait-free queue is sort of wait-free in and of itself and if there's contention basically what happens here is that every and queue operation every peak is obviously wait-free there's nothing that waits in here every try remove front never waits it does call help finish but help finish is guaranteed to make progress and in a finite number of steps so it's wait-free the complicated one is enqueue is that it always just immediately stores that it wants help and then it tries to help everyone in its phase so if I enqueue I must help others that are stuck before me in order for my thing to make progress and therefore everything makes progress in these phases alright I'll push this code to GitHub as well and include that in the video description so you can ultimately upload it so that you have a place to actually see it there's still some way to go my guess is another two or three streams I'm imagining that porting hazard pointers is a stream in and of itself to a library maybe two probably one, a little bit hard to say but let's say it's one testing this is like half a stream depends on how many problems we find half a stream moving all of the code to use hazard pointers so whatever the memory reclamation scheme is about a stream so it's like two and a half streams left is my rough guess here if getting a fork is not wait-free why does it not affect the wait-freedness of the queue it's a good question the idea here is that every thread is going to have its own handle so once all the threads have handles all operations are wait-free is not wait-free it's important to keep them distinct getting a handle is something you'll do very early on it's sort of set up almost it's initialization but in the main running of your code you won't be forking the process but you won't be cloning the handles at least you shouldn't be when I say testing is half a stream I meant testing of the wait-free queue testing the whole thing I don't think we should probably not do until the memory reclamation is in place too for the reasons I outlined earlier I think realistically it's like three streams thank you all for coming out I hope this was interesting I'll see you in three weeks and I'll let you know in advance whether we do hazard pointers or more of this so long farewell Auf wiedersehen, goodbye