 Hi everyone, welcome back to yet another, I guess, I guess we might as well just start calling them Impul Rust Streams, because that's what everyone is calling them now anyway, which I guess means I need an intro for them. I need like an Impul Rust intro where I don't know what the crust of Rust was easy to make an intro for, because J just eats a pie. I don't know what Impul Rust is going to be, but regardless, this is the fourth installment and I'm going to claim that this is the last installment of the stream where we port the implementation of hazard pointers from C++, specifically from the Folly library to Rust. I'm not going to go over like what hazard points are and everything in this stream, like we're pretty far along, but the reason I say that this is going to be the final part, and this is related to the tweet I made about this in the first place is that I think we've gotten so deep now that for anyone who's not interested in hazard pointers, they're like, okay, we need to do another stream that I'm actually interested in following, and also there's like diminishing returns from this particular stream now because this is a topic that we've sort of covered so much ground on. I wanted to do one more because there's some stuff around testing and benchmarking that I think it's going to be worthwhile to cover. I'm hoping it won't be that long, like I'm guessing this will be about a three hour stream to cover those bits, because mostly it's going to be porting the tests and benchmarks from Folly and then probably try to adapt them to be a little more Rust-like. So in particular, we're probably going to make some of the tests be represented using Lume so that we get this sort of more exhaustive checking rather than the probably more spin testing that's in the C++ implementation, and probably find, if we find benchmarks, which I'm pretty sure in Folly, we're going to probably port those to Criterion, which is this really nice benchmarking library in Rust that gives us statistical significance for results and that kind of stuff. So I think that's going to be interesting. And then even if at the end of today's stream, we're not quite done with those things, or even if we are, whatever is left, I'm probably just going to do sort of on my own time asynchronously to make this be a library that people can actually like pick up and use. I want to cover just before we dive in, there have been a couple of changes since the last stream that people submitted by PRs that I think are worth talking about. So I'm going to bring those up real quick. So the first one, this hasn't landed yet. If you remember from the previous stream, we have this sharding operation that we use in order to one of the things we ported in part three was this split of having the to avoid contention as much on the list of free hazard pointers. We sharded that list and then you look in like the appropriate sharded list. The sharding function we use there was a little naive and someone's implemented a much better one. It's unclear whether it's worth the trade off. There's another PR over here. So this is PR number 12. It's really exciting. There's an interesting discussion here if you're interested in this topic. I don't think it's going to land quite yet because we need proper benchmarks to see whether it's worth it. And that's where today's stream comes in, but it is a neat change that's worth looking at. The other is a PR of three different fixes and a feature that all landed sort of at the same time. One was, it turned out there was an error in how we updated the next and previous free pointers when we introduced this linked list of hazard pointers that we could walk for the free list. The second is that we wouldn't correctly link the list that we returned from allocate when you sort of batch allocate a bunch of hazard pointers. Those weren't properly relinked before they were returned. And finally there was a head assignment that was wrong in a loop. And all of those have been fixed now and have also gotten tests added. So that's really nice. And as part of that PR, we also got a sort of this hazard pointer array implementation or feature that Folly has, which is this, I want to allocate four hazard pointers at once. We now have an API for that in haphazard. It's pretty straightforward, really. It just means that you steal sort of a chunk of things from the free list at once rather than steal them one and then one and then one and then one, which lowers the overhead, sort of amortizes the cost of getting these hazard pointers in the first place. It's a fairly simple API. It uses sort of const generics in order to give you back an array of the size that you specified, which is really neat. We also landed no STD support. This is a really cool PR too of just figuring out what this should look like if you only have alloc. Mostly this was straightforward. There were only really, I think, two big caveats to no STD support. The first one is that we can't use hash set. Remember how we walk all of the addresses that are currently guarded by a hazard pointer, and we stick them into a hash set. Then we walk all of the sort of things that have been freed and check whether any of them are guarded. We use a set for that. We were using a hash set. We've now switched to a B3 set because hash set depends on hash map and hash map depends on the standard library because it relies on the denial of service resistant hashing that is in the standard library, which depends on randomness, which depends on the operating system. There are ways to work around this. If we wanted to work sort of harder to use hash set here through something like hash brown and then use maybe a hash for the hashing and then just not have collision resistant hashing, it seemed fine to just switch to B3 set here. Anyway, doesn't cost us that much. The other thing that is a little tricky is if you remember from last time, we have this sort of batch deallocation mechanism in haphazard, and we sort of inherit this from folly, where when you free something, it doesn't necessarily get reclaimed right away. Instead, what we do is we stick them on this free list and every time something is freed, we go or sometimes when things are freed, we go and check the free list and see if we can reclaim a bunch of items. And one of the ways in which we determine that it's time to go do that is if a certain amount of time has passed since the last time we did it. The idea being that we want to give guarantees both in space and in time for how long we can hold on to garbage. Like if there's an amount of garbage, we want to reclaim that garbage or if a certain amount of time has passed, we want to reclaim that garbage. Otherwise, that memory would just sort of potentially sit there forever and never get deallocated if more garbage wasn't allocated. But in a no STD environment, of course, you don't have time, you don't have a notion of time. So the question becomes, well, what do you do about that? And the answer is we basically just disable that that feature and say that if you're in the no standard mode, then we only do collection or reclamation based on the number of items that are in the trash and not how long it's been. We might be able to improve this, right? So you could imagine using things like the RTDSC, real time synchronized clock. I think it's RTDSC, which is like the CPU instruction for getting the current time. I think there's a crate called quanta that uses that directly. So we might be able to find some way to reintroduce a notion of time here, but it seemed not worth blocking this PR on. There's a third smaller PC or two, which is there's at least one point in haphazard. I think there are two where we have a sort of spin loop where if we detect that someone else is also collecting garbage at the same time, we will yield and then check again to see if they completed. Of course, yielding isn't an operating system primitive. It's not something that the raw hardware supports. So we had to figure out what to do with that. We ended up using the standard hint spin loop method, which is a way to instruct the CPU that like this is a spin loop. So you might be able to run another hyper thread or something. It's not perfect, but it's sort of the closest approximation that we have. So that was neat. That's landed. And this was a neat one that I actually completely missed when we were reading the folly code, which is that the compare exchange function from the C plus plus standard library, which is what folly uses, has this mechanism where if a compare exchange fails, remember how you pass in what you think the current value is, what you want to replace the value with, if the current value is the same as the previously observed current value, then compare exchange will either tell you, okay, I made the swap or no, I didn't make the swap. Here's the value that was there instead of the one you expected to be there. In C plus plus, it will actually update the value that you passed in as this is what I expect the current value to be. It will update its first argument. Whereas in Rust, it just returns that in the sort of error case. And that means that you can see this in the diff actually, that we miss this case of we need to update our current value before we loop again. In C plus plus, this happens for you. That's why there was no code to do this in the C plus plus code because it wasn't necessary, but that means we missed it in the port. So that's not been fixed. And we also looked through all the other compare exchange examples and made sure that they actually match the expected semantics. So that's where we're at. That's sort of what's happened since last time. I did also go ahead and look at the history of the sort of commit history of folly in the synchronization folder. And the last, the commit from the last stream is this one. And after that, there aren't, I looked through sort of briefly, there are no commits around hazard pointers, except this one, which I don't think is relevant to us. This is for the thread local implementation, which we have not ported. So that commit is irrelevant. And all the later commits are to other bits of synchronization. So they don't actually matter for our implementation. So luckily there's nothing to sort of forward port. There are no known bug fixes or anything as far as I can tell. All right. So one more thing before we dive into the test, which I have open here is I want to talk briefly about this survey because it was really fun. I don't know if you can see the survey results. I guess not. So this is the survey result for me asking whether people wanted a part four. And the results were fairly unhelpful, but made me do this one last stream, which is like 40% of you wanted another one. Only 3% of you actually care about the library, which is fine. I mean, that's still a decent number of people given the number of votes. So that, that incentivizes me to actually get the library out there. Like a third of you want to just do something else instead, which is why this will be the last part. And then somehow a quarter of you, just like answering polls and don't actually care. So I thought that was a fun result to share. All right. So the, I'll send the haphazard repo link in chat just so that you all have it. I'll also send the, sorry about that, the link to the hazard pointer C plus plus standardization proposal. So that you all have that for reference too. Oh, I see what's happening. So some people in chat are observing that I'm on the work side of my office. I'm not actually, it's, I realized last time that my video was flipped. So I realized this because I held up a copy of my book and everyone's was like, it's backwards. And so I have flipped the video, but I am sitting in the same spot. So we're all good. Let's see. Great. Okay. So now that that's out of the way, let's go ahead and pull up the code here. Have a diff. Why do I have a diff? Oh yeah. This is an observation I had separately. I don't think we're going to go into that right now. So we in our tests directory, we have the loom tests that we've written. We have a couple of those. We might expand some of those today. We have the lib test, which arguably should get a better name. What I want to do is copy test lib to test folly. And the reason I want to do this is because great is because I want to that this is the same argument for why we've been when we've been porting, we've been using the same names for things and try to keep the code organized roughly the same way, have the sort of same functions as the C++ code does is because it's much easier to then sort of forward port bug fixes and features that land in folly into this. And in theory, the other way around, if we discovered that there are problems or performance enhancements that we can implement, it should be easier to figure out where in folly those could be contributed back to. And I want to do the same for the test so that it's very clear how our test suite maps onto theirs. Now the hope, of course, is that our test suite becomes more expansive through things like loom. But at the very least, the tests that they have, I want us to have represented in one place where we can easily look at the look at the comparison point between the two. So let me just see that cargo test works. Cargo test works. Great. So we have that. And if I do test folly, great, zero tests. Okay, that's where we want to start. Are there any questions before we begin? Let me make this larger actually, because how wide are their lines? We can zoom more. I think that's the limit. Great. So why line wrapping still matters, folks. Let's you zoom. I think that was covered extensively in the Q&A video from last weekend, two weeks ago. All right, let's see here. So they have, it looks like they have benchmarks in here too. The benchmarks, let's do the, I guess we'll look through and see what they have the most of, but my intuition here is that we should do the tests first, because the benchmarks were more likely to want to iterate on. Oh, yeah, this is a vim thing I do a lot is instead of doing like DG for delete to end a file. I often just do like 1000 DD or 10,000 DD. I don't know why. I think it's because I can never remember whether shift G or GG are bottom of file. And so it's faster to just use the thing that I know will work, but the vim pro would just use DG. It's funny actually. So looking just at the top of the test here, remember how when we wrote our tests, we made this type called count drops. And lo and behold, they have a count class that counts the number of destructor calls. So I'm guessing we might find this helpful. They also have inc constructors, which is a little interesting. I guess that's because they have a move constructor, which we don't. Let's take a look here. Node, I wonder why they have, I see they're probably going to implement some of these on top of a linked list. So if you remember, the folly library has support for working with linked the data types where a given object might have children. And if you if you deallocate the object, you also retire all of its children. And they probably have tests that sort of test that that child recursion, which we don't have an implementation for. So I think that will be probably less relevant for us. Although we might want a data type like this, in order to sort of, this is basically a test of the basic, they need a data structure to test with, right, where they can do retires and stuff. So we may want to do something similar. Construct a list of a given size, this just creates a giant linked list, the allocation of a linked list, yeah, helpers for working the linked list. So a lot of this is just data structures that they're gonna test over basic object test. Oh, I see what's going on here. Okay, so they have a count type, but the count type is really just a it's just a static global that they used to keep track of things. We probably don't want to do this because it means you can't run the tests in parallel. Right, if you have a static global like this, then running tests in parallel won't work because they will all increment and decrement separately. So you can't reliably check the results. But if you look at it, node, for example, calls ink C tours whenever a node is created on this static sort of stats keeper basically. So I'm going to guess that if we look at the test down here, let me highlight this line. If you look at the test on here, yeah, it clears the counters, it allocates and immediately retires the object. It does the same for this node RC, it acquire link safe and unlink. I see and I'm going to guess here that unlink on node RC is like a, where's node RC? So up here somewhere, node RC, I don't see an unlink. It might be that they inherit, oh, they probably inherit this from here. This is going to be a little bit annoying because they rely a lot on the sort of base class they have that provides these additional functionalities. It's not going to be too painful though. I'm just wondering if they have an unlink method directly on the list or where they get that from. Acquire link safe. Okay, so these are really just a bunch of create and then maybe acquire, maybe not, and then immediately reclaim. And then at the end, check that the number of constructors match what you expect. And the number of destructors is the same and what you expect. But notice that because this is using the static, you couldn't run two of this test at the same time. And you also couldn't run any of these other tests that also rely on C underscore at the same time. Because if you did, let's see if there's anything else here that checks C underscore, yeah, this test also calls like clear, for example, on the stats. So that wouldn't work. All right, I'm a little tempted to skip this first one. I guess we can do it. I guess we might as well just add this. Maybe we should just port this whole linked list so that we have it because it sounds like it's something they're going to use in a bunch of their tests. The counters are atomic. It's not the atomics that are the problem. The problem is that imagine if you run two tests in parallel, one calls like increment C tours, that's atomic, one calls clear, which is going to undo that increment. But then this test progresses and expects that number to be one, but it's been cleared. So it's zero. So you can't run them in parallel. Yeah, this, this business over here smells an awful lot like they are. Oh no, why does the search bar go away when you zoom in? Let's see where that comes from. Yeah, so that comes from this has pointer object linked type, which is something we haven't implemented. Yeah, which is like link counted objects to automatic retirement. This is stuff that we haven't implemented in ours, which is what makes me hesitant to implement that ourselves because I would rather port those tests when we actually have the necessary features, which is what makes me want to skip that first test because it seems like it specifically relies on this sort of linked property, but we could bring me do like the first two, I guess, just so we have the basis for the test. So let's go over here and do just to sort of get into the rhythm of it. It's a basic objects test. And we don't need the test suffix because we know it's a test. And do they actually use the generics here? They might, it's a little unclear, but they might be. So the question is, how are we going to do the equivalent of what they're doing for constructors and destructors? I guess we can do this. We can have, all right, let's go ahead and do the same basic structure that they have, which is, oh, I really don't want to do it that way. Let's do struct stats. And it's going to have an atomic use size. And I guess we can call it count because that's what they call it just for and retires. And then I think what we'll actually do here is say that node is going to have a static reference to account. And the way we're going to do this is probably with box leak. We're just going to leak a counter per test. And that way each test has its own, but we still produce static values. And they're so tiny that it doesn't really matter to us leaking these. Like it's not, it doesn't actually constitute a problem. It would be a problem is something like a benchmark where you might run it like millions of times, but realistically for these tests, I think realistically these allocations aren't going to matter. All right, so I guess this means, well, what else do they have in node? Is it just a singly linked or a doubly linked list? node up here is value and next and value is a use size. And next is a atomic pointer to a node, there to a node. And what else do we got? So there's Imple node fnu, which takes the value and returns itself. That's easy enough. And it also takes, I guess, a next, which is a star mute node, and it takes a bool, which is equal to false. Why does it take a Boolean? It takes a Boolean that's equal to false with no name odd. That's fine. We don't really care about that. We can leave that off. And then this is going to do, I guess it also has to take the count static count. And it's going to create a self where the count is there. The val is there. And next is in a atomic pointer, a new of next. And we're also going to do count dot C tours fetch add one. Why on earth does my tab to complete not work anymore? That makes me sad. And I guess we'll implement a drop for node two. Um, just because we want self dot count dot detours fetch, fetch, add. What else do they have on node? I guess they have a value. They have a next. And so I guess we can provide next. Which takes a self and returns, you know, a I'm hesitant. I feel like maybe this should be const. This is this is going to be self dot next dot load ordering a choir atomic pointer new requires a mute. So I guess it needs to be mute. That's fine. So we got new and next. And I guess we we might as well have a val on here too. Right. So value of oh, this is the problem of having a different keyboard for work and for home is that I keep putting my finger in the wrong place. Great. So now we have this basic objects. So we're going to allocate a count. And I guess here we're actually going to do a derive default and debug. So count is going to be count default. And actually, I think what I want here is ample count. And we're going to have something like a static, which returns a tick static count. And it's going to be a box leak of self default. So box box leak will write box new box leak box new. And it can't be static. So it should be global. It's not global. It is. I mean, it's static. It's a test or test local, maybe we could use a thread local here, actually. But given that I expect these tests will end up being multi threaded, we're going to stick with this instead. So now count this is going to be count test local. And then let's see what they actually have in this test down here. They have a clear which we don't need, which is create a new one for each test. They have a mute num is zero. Then they do in a scope, they do num plus equals one. That seems fine. They do let object is node new. And then they do retire, right? So here we got, we got to think back to what our, what our API is, which is, I mean, we can probably use the global domain here, but let's have a domain per test to that seems pretty reasonable. And right. So remember, our API is still a little bit ugly here in that you need to sort of, the API is using an atomic pointer and you need to use this, this hazard pointer object wrapper. See, I think you might, we might not even need this. Given that we're not inserting it into a data structure. So I'm just going to do this. Right. So the first element is the domain you want to make an object within, sort of guarded by, and the other one is the object you actually want to make. And node new here is going to be count. And they're probably using zero. And I guess the value doesn't really matter here. And there's no next. And then they immediately call x.retire. And I think, right, and we need to use hazard. This complains because it's unsafe. Right. So this is the reason why we need the whole atomic pointer bit. So if I do this, that should do it. Really? A deleter, right. And the deleter here is going to be, where's our retire? Is this business? Yeah, the API is not very ergonomic here, which, which does make me sad. But it sort of has to be. So x.retire. I don't think we need this to be an atomic pointer though. I think it just needs to be a raw pointer. Right. So the interface here is we allocate a new node, that's fine. We have to wrap it with this hazard pointer object wrapper. I would like to get rid of this wrapper here, but what it does is just associate the object with a pointer back to the domain so that when we later retire the object, it knows what domain it should use to retire itself and what domain is going to reclaim it. Rather than that having to be passed in every time you retire, which is sort of more error prone, right? You'd rather just have the object know. And then we have to turn it into a raw pointer because, you know, otherwise, if we just kept this, then when we drop it, it's going to be just sort of dropped normally. It was never shared in the first place. So we create a raw pointer and that's what the API operates on to, right? It only operates on raw pointers because it doesn't know whether they was allocated through a box or an arc or whatever. Maybe we should make this API be sort of require box or maybe we should make an API where there's like a simpler version of the API where it assumes that everything is boxed. That way you wouldn't need to pass which the leader to use because we would always assume that it's drop box. And you could probably give the box directly. But the reason we use raw pointers here first and foremost is because we're assuming this is going to be used in a library that uses atomic pointers in various places because it needs to swap pointers for the data structure concurrent operations. And that means that all you will ever have are raw pointers. And therefore, that's what the API has to operate on to. And it also couldn't be a box here because then when the box goes out of scope, it's deallocated, but it might be shared. That's sort of the whole purpose of having hazard pointers in the first place. So it needs to be a raw pointer. All right. So that is, let's just see that this first bit works. So now we should be able to do assert eek. Actually, I guess we can make a sort of some convenience methods on here like C Tours is going to be a use size and that's going to be self dot C Tours. Actually, what did they use for the load? I'm guessing it's an acquired load, but I want to make sure it's just a dot load. What is the C plus plus atomic default load ordering? There should never be a default load ordering. That's a bad idea. Load. It defaults to sequentially consistent. Okay. So that means this is an ordering const and same with D Tours. And then here we should do assert eek num with count C Tours. All right. So we have registered as many objects allocated as we expect. And then they call where is it? Haz pointers cleanup. That's interesting. I forget whether we actually exposed something like that. Do we have a cleanup on domain? Because if not, we probably should. We don't have a has pointers cleanups. Let's go and figure out how they have implemented theirs. Has pointer cleanup. That's fine. That's just what it forwards. What is it forward to? Has pointer domain domain dot cleanup. Do we really not have a cleanup method in domain, for example? That's interesting. I wonder why we didn't implement that. Because that seems extremely useful. I mean, let's see what happens. If I do pubfn cleanup, what do we do in the other ones? Like in the other tests we have? We called it eager reclaim. I feel like eager reclaim is not a real method. I think that we probably shouldn't have made eager reclaim should have called it cleanup and should have implemented it the same way as here. Which is bulk and reclaims fetch add one. Do reclamation zero. We don't have this wait for zero bulk reclaims. That sounds like something we need. Let's stick that in here too. I mean, this is good though. This means that we're going to end up with a more standard cleanup than this thing that we just sort of hacked together, I think. Let's go ahead and find where this is. That is declared free has pointer recs. Here we're going to have fn wait for zero bulk reclaims. Who knows what that's going to do? It's going to be a while loop for... Oh, that's wow. Let's do while true for now. It's just a yield now, which means that here we need a... We need this little bit because if you're using the... If you don't have the standard library enabled, then you can't yield. This is just a spin loop. What about load num bulk reclaims? That's probably not a method that we have, but it is a method they have, which is an acquire load of that and bulk reclaims load ordering acquire and it is greater than zero. Why not not equal to zero? All right, fine. Great. Let's see that the tests still pass if we do that. 309. Oh, the old method we have used to... So the eager reclaim we had used to return the number of things reclaimed. So I guess what we can do here is we can keep having eager reclaim because eager reclaim is just going to do... It's going to return the number of things reclaimed. I remember do reclamations return the number of things that it ended up reclaiming. And eager reclaim is sort of a try... Actually, maybe this should be try cleanup. A try sort of implies an error result is possible, but in some sense this is like a maybe cleanup or do the best that you can, but cleanup is like block while waiting, right? This is a block. And then this is going to be that. And arguably the other tests should be using cleanup maybe, but the advantage of eager reclaim is that you can assert that the number of things you expected to be reclaimed are reclaimed. But here let's stick with what the C++1 does. And if we run the folly test, okay, basic object passes. Great. So if we now go back to this one to the test, they have a node RC. What's the difference between node RC? Sounds like reference counting, but node RC set to leader links. Not entirely clear. It doesn't sound like it's reference counting. It looks like it is... It sort of takes out... Oh yeah, this is the bit that's related to the linked object stuff. So I think we're going to just ignore that part of the test because we don't have the functionality implemented, which then means all these other ones are also sort of out of scope. That's fine. But at least we have the basic structure here. Copy and move tests. Let's try that. All right. Copy and move. Now this is going to be a little different in Rust than it is in C++ because in C++ you have move constructors. Whereas in Rust, you don't. I mean, you don't have move in... Well, only types that implement copy are moveable or are... I should rephrase slightly. Everything is a move in Rust, but you are not told when things are moved. Instead, when you move, the thing you move out of is no longer accessible. Therefore, you don't need a move constructor because passing something to a function is a move. Whereas if you want the behavior from C++ of like moving is a copy sort of, then that's what the copy trait in Rust gives you, which is why the semantics of the cells are going to be a little weird in Rust maybe. So here, this isn't going to use the count stuff at all. So here, there's just a struct obj, which has a... Which is a u-size. Object base. Yeah. So this is the move constructor, or this is the copy constructor. This is the move constructor. For their... The object base here is the same as our object pointer wrapper. So we don't really have the same thing here. Because I guess we can look at what their move constructor does, but I'm going to guess that this doesn't really apply for us at all, actually. So has pointer object... Object base. Right? So this is the same as our... Oops. Nope. Has pointer object wrapper, right? Where we provide... This is the thing that we provide the retire function on. The basically the thing that implements this trait for you. So what else do they provide? They provide a set reclaim pre-retire. But what are the constructors for this? Constructor for this is the same as the constructors for hazard pointer object. And what are the constructors for hazard pointer? Object. It's up here probably. Now that's the cohort stuff. Object list. Object. Interesting. This seems to just, in all cases, make a... Really just duplicate the pointer. That's all it really does. As far as I can tell. It just creates another pointer to the same thing. Which kind of makes this test nonsensical. Like I think what you end up with is... I think the equivalent Rust code, rather, is here... Is a zero sort of... I guess they don't even give a value. Which means it defaults to zero. I think in C++. So this is really just create another raw pointer with the wrapper. And for us, the wrapper is in the pointer to memory. So really, if we're going to go by the same thing, P1 is this. And P2 is just P1. That's all it really does. Which means the P2 is now a pointer to the same bit of memory. Which is fine. And then they call P1.retire and P2.retire. I see what this is testing, actually. That's interesting. So I think what this is testing is if you retire the same pointer multiple times, does it cause a problem? And it's interesting to me that it does that because I wouldn't expect this to be okay. If you retire the same pointer multiple times, should that even be okay in the first place? That sort of implies that we're going to do reference counting. Imagine there's a race here where this retires and then reclamation runs in some other threads and reclaims the memory. And then we call retire again. That suggests that it should reclaim that memory again because it might have been reused. So that's interesting to me. Like I'm surprised that they expect this to be okay. It's also interesting that for them the, let me look more carefully at this thing. So an object is a reclaim function pointer. A domain. This is sort of the children's stuff. And the actual object pointer to the object. I see. So I think for them, if we work to think about it in sort of Rust terms, I think for them, the construction is really this, right? Where the wrapper holds the raw pointer rather than the raw pointer being to the wrapper with the object, which is different because in that case, this test tests something different, which is if you have multiple wrappers that point at the same object, does cloning them do the right thing? Or does moving them do the right thing? Even then, I wouldn't expect retire on the same underlying thing to be okay, though. I do wonder whether wrapping it this way around makes more sense. I guess it's a question of is the domain used to retire a property of the allocated memory or a property of the pointer? I think there's an argument that it's a property of the pointer because the, well, this is one place where inheritance make things nicer. Because if you look at test lib, I guess, if we look at the type of X here, X is an atomic pointer to a wrapper of the type. And if has pointer object wrapped the pointer, then you would end up with a double indirection here. I guess has pointer object, the wrapper could be generic over the pointer type as well, or just require that you use atomic pointer. I mean, I think we could tidy up the API here if we wanted to. I think the way we would tidy this up is to say, we're going to require that you use box. And we're going to require that you use atomic pointer. If we did both of those things, we could encapsulate those decisions in the hazard pointer object wrapper. And then I think that could just be the top level type. So rather than sort of all of this, you would just write this. And it would do the allocation for you. Or maybe it takes an argument that's already a box. And it returns you this type, which internally contains an atomic pointer to a box from raw. A question is, should we do that change now? Or should we leave this alone? This is where I'm like, this could easily be many more streams. All right, so I'm going to leave a sort of xxx here, which is idea. What if this contains atomic pointer, star mut T, where star mute is box from raw. And argument to constructors is box T. We'll make for nicer API. It will certainly make for a nicer API because we could hide and maybe we could rename this to atomic pointer, right? Or maybe just atomic. I think I've seen this in, I forget what library is, this is in crossbeam where you have this like atomic type that is sort of garbage collected behind the scenes. And I think the idea here would be the same. The domain reference values may be a little sad, but I think that's a pretty interesting idea. I think that would make for a nicer API. And then the question becomes, should it be clone? Should it be copy? And then we have the copy and move test. I think without making that change, this particular test doesn't really make sense. I do think that actually, let's look at retire. Do they have a sort of protection in retire for if you double retire the same thing? Retire is down here. Pre retire, set reclaim, push object. Pre retire, check, set reclaim. There's nothing here that sort of prevents a double deallocation, which is a little worrying. Aha. Pre retire, check, only for catching misuse bugs like double retire. So if next not equal this. Interesting. So then shouldn't that test always fail because this is a double retire? I guess, okay, the fact that this test doesn't fail means that what they expect this to do is to clone the underlying object, which I guess is what this D reference ends up doing, right? That you are creating a new object that's a clone of the previous object. In Rust, of course, you can't do this until, unless the underlying type is copy. Like this wouldn't be sufficient for you to say to clone it. So I think this test is just weird to port. So I think we should skip this one for now. That's fine. I think it was still a useful discussion. Basic holders test. So this creates a holder, which is a, let's look at the make global. And I guess they just, oh, each one is made in a scope. Okay, that's fine. So this is really just see that it doesn't immediately fail when you like drop the holder or something. Oh yeah, here we need to say how many, which is going to be two. This is the const generic parts that landed in a PR. And we don't have local. So this is just see that they can be created. Great. Would it be safe to, I'm guessing you mean safe to implement copy on it for us? I'm guessing you mean implement copy on the wrapper if we made this change. If we implement clone and or copy. There's not really an or if you implement in order to implement copy, you must implement clone. So really, just if we implement clone, we must have double retire protection. So it's, it's, it's okay. It's just, you know, we need to watch out for that. There's nothing inherently problematic about allowing it to be copy. It just, it's up to the caller to ensure that they don't retire the same thing more than once. Or if they do, we need to make sure that we don't do the wrong thing as a result. We could define it undefined behavior, but we could also just have a check for it, you know. All right. So the basic holder test was very simple, basic protection tests. Basic protection. We got here. All right. This should be easy. So this is just, this is very similar to the basic objects, except we're also going to actually sort of protect the value here. So we create a domain, we could node. We create a holder. And I guess we don't really need numb here. That seems excessive. So we do this. And we're not going to retire it yet. We're going to do that down here. We make an H. We do H dot reset protection. Not entirely clear why we want reset protection here because it should start reset anyway. Oh, reset protection. Why does their reset protection take an argument? That's interesting. They probably have an override for this. This is another thing that we didn't do, which is I think they have sort of overloading for reset protection where where a reset protection with an argument is sort of a shorthand for reset and then protect this thing. So if you look at protect, it resets protection F of P. That's interesting. So their their protect is to do try protect, which is really just a call to reset protection. I don't know why they have this. So this is, oh, right. Okay. So this try protect, right, right, right. Okay. I remember. So try protect is you don't give it a pointer to guard. You give it a pointer to a pointer to guard. So the idea is load this pointer, protect it and then check that it's still the value that was in there because otherwise it's useless to protect it because like imagine that this is the you you tell it to protect the next pointer of some object you're holding. And you want to know what the new next is like what the actual value of the next is all you have is an atomic pointer. And so you want this thing to protect whatever the next value is. But in order to know what value to protect, it needs to load the atomic pointer. So it loads the atomic pointer, and then it protects that value. But it could be that in between those two, another thread went and like deallocated that next. So it needs to read load the atomic pointer again, to see that it protected the right value. If the two differ, then it needs to sort of retry, right? It needs to reset the protection and do the whole dance again until it succeeds. Which is this business here, right? It's like, while we fail to protect so while the value changed in between the two, just try again. And so in this case, reset protection is a little bit different. It is protect this value that I've already loaded. So notice that that's different. It's not saying load this for me and then protect whatever you loaded. It is just protect this value, just blanket protect this value. It's a little annoying, maybe, that it's a very unsafe API, really, to sort of say, just blanket protect this. It's sort of ripe for abuse, or for misuse, I should say. But we can totally do that. So what do we end up doing here? Not this one, but down here. So our try protect, right? So this is one place where our implementation just looks a little different, but it sort of is the same thing, which is that reset protection is really just a a call to reset has pointer, which is really just this call right here. It's just we call that protect rather than reset to this value, which may or may not be null. I see. And so what they're saying here is, I think instead of us saying that reset protection is just going to take a pointer, we should say, let's say protect raw. And I guess over somewhere over here. So this is going to be protect raw. And this is going to take a star mute to T. It's not going to return anything, which means that we don't need the lifetimes for any of these. And we don't need an F. So what does our reset do, right? So you notice here, reset and protect are really the same thing, right? They both just do a store. And in in the folly implementation, this is just one function called reset protection that takes a pointer argument. And we sort of separated them. And I guess that means we can just call this of source. I guess it does have to be static. That's fine. And this has to be we have to do the same thing as up here. And I'm going to erase that comment for now, because it's not appropriate here. So over here, then, instead of reset protection, this is going to be protect raw of objects. So the idea here being trade bound implement hazard pointer global. That's because it's not global. It's because this is not make global. This is make into this needs to be make in domain. So I'm actually very happy that that compile there was a terrible error message. But what we were trying to do here, right, was and rightly complained about this is we were trying to protect. So we allocated an object in one domain in this domain that we made just for the test. And then we made a hazard pointer in the global domain. And we tried to protect one using the other. And the cause like, nope, that's not okay. Let's look more carefully at that error message that we got. So you see it says the trade bound hazard pointer object wrapper node of unit implements hazard pointer object global is not satisfied. The following implementations were found. So it's really hard to understand this error that that is what it means. But the F here stands for family, right? This is the notion of like domain families that we set up earlier to get type level safety for ensuring that you protect using the right domain. So maybe this is a good indication that instead of this being F, this should be like domain family or something, something more verbose. And here the fact that this is just like unit is also confusing. I wonder where this comes from. Like, can I make while we're here, can I make this test this error a little nicer? So that returns a global. Oh, the unit comes from here. This should arguably be unique domain. That's what that should be. Same with this. Should be unique, unique domain. And now what do I get? Yeah, so now you see that the family here is this like random closure that ends up always being unique. But that closure is not the same as global. Of course, that's not what the error tells us. Yeah, I don't know that there's a good way to make this error nicer, unfortunately. Like even if F had a different name here, I don't know that it helps you that much. Ah, so someone pointed out in the chat, doesn't this only protect you against local domain versus global? And it depends, right? So if you create your domains this way, like if you always use unit, then that's true. Because unit is equal to unit. But if you use this macro we provided, then they will actually always be distinct. So if I create domain one and domain two, they have different Fs. So I can demonstrate this actually if I do, this is in domain one. Oops. Let me do that not here. Let me do that down here. So domain one, domain two, domain one, make in domain, domain two, then this will still fail. And the reason for that is this, this unique domain macro we have internally ends up using an anonymous closure type. And the closure type is tied to the current line or like the current location in the source code. And therefore every, if you, like any two closures are always distinct types. And that's why we ended up doing it this way. So that this ends up not working, which is nice. It doesn't protect from everything, but it gets you pretty close. So now if I do this, this will compile. All right. So back to the test, it resets protection, then it calls object retire. Then it asserts that the number of constructors, then it calls domain cleanup. Then it checks that the number of destructors is zero. Right. Because it's we have retired it, but it's still protected. Then we do h dot reset protection. Then we do domain dot cleanup. And then we check that the number of destructor runs is one. Just to make this stop yelling at me. Great. All right. Is there a to restrict creation of the domain to just that macro? So not really, because the problem is the macro has to be able to call new. The closest we can do is uh where are we? Where is this new? Right. So this is our new method. And obviously what we could do here is we could do doc hidden and we could say prefer unique domain, unique domain. The reason not to do this is because sometimes you have to name your domain type. Like you might have to like you might want to stick your domain in a struct somewhere and then you need to be able to name it and you can't name it if it has a closure type in it. So you may actually want to be able to choose your own. I think what we should do here is construct a new domain with the given family type. Um, the type checker protects you from accidentally using hazard pointer using a hazard pointer from one domain family. The type F with a an object protected by a domain in a different family. However, it does not protect you from uh, actually you from mixing up domains with the same family type. Therefore, prefer creating domains with where possible. Um, since it guarantees a unique F for every domain. Uh, the color of your screen changed periodically. No, I think that's a stream encoding issue and it's really annoying. It's something that I've tried to fix in the past and then sometimes just randomly comes back. Um, all right, virtual tests. So virtual is probably not going to matter for us because we don't really have that. I don't even know why this has to be a virtual. It just looks like it allocates a bunch of things and then resets and retires them. Like I don't know why it has to do with virtual. Like it is true that it, it checks that it can still access the value after retiring while it's still protected, but it doesn't seem very relevant. I don't, I like I don't know. I don't understand the, I guess it's because the destructor is virtual. I don't think this matters to us. Destruction tests. So this is interesting. This seems like it would be better suited for a loom test, but it is there aren't multiple threads. So it doesn't really matter, but it makes me wonder like why 2000? Why does this need to be a large number? I mean, we can write the test. I'm just like destruction. It sounds even better in Rust because it's not a destruction test. It's just destruction. This is the destruction function. So we have a struct thing very well named and it has next. That's interesting. That's really interesting. Okay. So this is a thing is really a linked list. We're dropping it drops the next thing in the list. So, so it's creating a thing is a terrible name for this. It should be named something else, but it's really a head drop next is like one name for it. It's a linked list. So it has a pointer to the next thing. And when you drop it, it's going to retire the next element in the list. It doesn't necessarily get reclaimed straight away, but it gets retired. And then we're going to allocate a long list of these. And then we're going to retire the head of the list. And then we're just going to wait for cleanup to finish because eventually it's going to reclaim all the items. So that's kind of interesting. So this let's do a head drop next. So this is going to have a next. This is going to be a star mute self. It's going to have a domain. I'm not sure why it needs domain here. I mean, it's easy enough, I suppose. Let's do domain. This is also the reason to not use a global domain here because our tests are going to run in parallel using the global domain. And let me just check that I did it for that. That's fine. Using a global domain here would make tests like this really weird where imagine one test demonstrates some broken behavior. So it ends up like never deallocating an object or something. Then this test would block forever because it's waiting for every item to be deallocated, but it can't deallocate every item. So you end up with this cross-contamination between tests. So let's go ahead then here and say unique domain. This is one place where we might actually not be able to use unique domain because domain here needs to be able to name this. And this is going to be a little annoying, but let's say static domain for now. Remember how every item we drop has to be static? So if it holds a reference to the domain, you know, it needs to be a static reference. I don't think we're going to be allowed to do that, for example. This is also unfortunate in terms of the API. So the problem is of course here, what F do we use? Maybe we can get away with this. That might be okay. And it holds a value so U size. That's fine. And we're going to have an input drop for head drop next of F. And it's going to be if next dot is null, then if not, then it's going to retire. And I guess actually this is not going to be quite self just because we don't have inheritance here. This is going to be a has pointer object wrapper to self. And I guess it's probably going to have like a domain, which means this is going to have a domain, which is going to be a little awkward, which means this doesn't need to store a domain. And F is going to be the family here. So this is going to be then next dot retire. Deleters box, F is going to be static. That's fine. And this is going to require static. That's fine. Right. So this is the whole like when you drop retire next. So I guess, head retire next, I guess is maybe a better name here. And then we're going to do down here. So the head, they call it last, I guess we can call it last too. That's fine. It's going to be a head retire next of next is a null pointer and val I guess is zero. And this is going to be box from raw box new. No, it's not. I take that back. I think it might be able to figure this out actually. So we're going to hear say mute. And then we're going to do four. Let's do I in zero to 2000, because that's what they do. We're going to do last is equal to and then we're going to do another one of these. And next here is going to be last. And let's say val is I. And let's say from one, because you know, why not? Even though that's not what they do. Oh, actually, no, that is what they end up doing. They I see, okay, so this can really be fine. We can do it this way instead, which matches what they do as well. And then they do last dot retire. And then they do cleanup. And it's true that the value field is never read here. It's like, arguably, we can just remove it. We don't need this value. The value doesn't matter. So let's just get rid of it. That's fine. And we don't need the value. Oh, yeah. So, okay, so we did get away with not having to name the domain, because we're just say we're generic overf, whatever the domain family is, this type works for that. Now let's see if that works. Nice. So this is neat, right? Here, what we're doing is we're sort of seeing this chain destruction, where all of these should be deallocated. Of course, we could here stick in like a detour is atomic use size new. And then say that here, in fact, why not box leak? So this is going to be detours. I don't think I can do this because it's not allowed to close over the environment. But it can be here. Then self dot detours dot fetch add just this is the thing that they're not doing. But I want to do it. And I want to do it so that we can check down here, that detours load is 2000. Nice. Just because otherwise, like there's nothing really that says that when cleanup returns, there are no objects left, like maybe there's like a retire call missing or something. But this way we check that all of the things are dropped, even though they're retired one at a time. Maybe instead of values, the value that implements drop, I don't think we need to we just reuse the drop for head retire next. All right. Sweet. What's the next test? I'm guessing that we're going to get to some benchmarks further down here pretty soon. So here, I can move tests. So these may not be relevant again. This might be yeah, I mean, because we don't have a move constructor, we know that moving has no semantic meaning anyway. Like moving can't have there's no implementation for move so it can't mess anything up. So we could have this test is just doesn't wouldn't do anything. And like even moving into an existing variable like they it just makes no makes no difference. But I suppose we can write it anyway. I don't know if I think move is a keyword. So it might not let me do this. So I guess let's call it move test. This test is mostly irrelevant in Rust, since there is no move constructor to check the correctness of. So it does a four I in zero to 100. It says let's x is all right. So this is using node again, where this is going to be I and we need our our count thing again here too. So this test is also using the global domain, which I don't think is necessary for us. I think we can easily do this. And then inside of it, it creates a a protector. Let me dig up where the spring all of these down here. So they have HP TR zero is that thing. And they call protect raw on X. This is where they call reset protection with an argument, right? Then they immediately retire X. Then they do HP TR one is equal to HP TR zero. So this is the move that is irrelevant to us really. I don't know what this assert is supposed to check. It's a self move. I guess so it's a shadow, I guess is like the closest equivalent here. It's not really a self move, right? So it's it's that they're taking the address of the thing and then doing a move constructor of the D reference of that pointer they took. So that's not that relevant here. And then they have a let HP TR two, which they don't assign anything to. And then they say HP TR two is equal to HP TR one. So this doesn't need to be mute, but this does. And then they assert X dot value, which we'll get to in a second is equal to I. And then HP TR two reset protection. And then down here, they do domain dot cleanup. It's interesting that they even use node here because they don't actually care about the counts. I suppose here. So one thing that's interesting is for us, protection returns a reference to the value. So really in normal rust here, right, like this bit would be unsafe. And what you get back is an actual reference to the value that you can keep using. But because we're going to move the holder that reference is going to be invalidated, right? So if we actually use protect here instead, we will get back a reference. Instead here, we're going to do a sort of unsafe X dot, I guess star X dot. How do we get inside of node dot val? Right. So the idea here is that X is still protected from the call to protect raw earlier. So this is definitely sort of unsafe. But this is what you end up with in the sort of C code. Great. I wouldn't expect this to not work because here the moves, there are no semantics to move in rust anyway. So like it can't really mess up the safety state. Array test. Great. We can actually write this now because we is array a keyword. Ray is not a keyword. Great. So this is probably going to be very similar to the previous test. So let's start with that here. Just to have some code to go from. So here, we still create the object, but we do make many in domain of domain. And in particular, I think I think the value is three. They call that H. And then they do they just move assignment again, which is real annoying. I kind of don't want to use the same API they do like we have a safer API and rust here, right, which is instead of this sort of reset raw, we have one that can actually give you the value back. Assuming it live. Okay, fine, fine, fine. I guess we can have protect raw. Give back. That's interesting. I definitely want that to be the case that you can index into it. But I suppose so if we do let h pointers h pointer dot hazard pointers, then I can do this. And I guess protect raw could return. No, I think the right thing for is for I mean, I was thinking like it could return like a reference to T the same way protect does. But I don't know if that's meaningful, because the reason protect can do this is because it can load the value, protect it and then see that the same values there with protect raw, we have no guarantees like it just the caller promised that this that there is no race, basically, which means I think it's on them to do the dereference as well. It's not okay for us to transform it into reference for them. So we go back here, they take the move assignment, which we're just going to skip. But then they retire the object. And then X is still protected so they can dereference X. And then we can do each pointer to reset protection. Great. What else we got array detour full TC test. So this is the testing with the red locals. Oh, I see it's because we have this like, we have a thread to cash of pointers, I don't think that matters to us. local test doesn't matter because we haven't implemented as pointer local. Link test doesn't matter because we don't have support for links. Same thing here. auto retire test. Also doesn't work for us because we don't have link support, free function retire test. So this is if you have a custom deleter, right? So remember how the deleter currently that we're using is this like drop box. And this is if you have a custom deleter, then sort of checking that actually gets invoked, which I suppose we can do that's easy enough. test free function retire. That's interesting. So I guess here, what we're really doing is something like foo is new, you know, int of zero. And we are all going to do this in a domain as well because we want test to run in parallel. And I wonder why they call they they end up calling retire in like a bunch of different ways. So here, we want not drop box this time. All right, I guess it can be drop box. That's fine. foo to is going to have a custom retire here. So remember how we have this drop box, which is a an unsafe din reclaim. And what we want here is actually this, I think that the awkward part about making this unsafe is that it can't be a closure because I don't think you can inline declare. I don't think you can inline declare unsafe closures. I don't even know if you can have an unsafe closure. But let's say custom drop here. And the custom drop is really just going to be here, they use delete. In our case, we're going to do the same thing as what drop box does, which is this. And this is now going to be custom drop. It's going to be pointer right. And this trait is only right. This is the whole we had to reassign it to a const to get the compiler to realize that these are okay, which was annoying. But I think maybe we can do this. I expect that a reference. And I think we need to do like, was it this? Oh, man, I forget what we had to do to get the the signature here to work out because we need to Oh, actually, maybe it was just din, maybe it was just this. I guess it I think it is like custom drop is custom draw. Oh, actually, maybe it was just this as din deleter trait bound implements deleter is not satisfied. Was it just this? Oh, man. Because I remember we had to play around with this so much to get it to accept that this function signature can be used as a can be used as a drop box. And this is because function definition types are different from function pointer types in Rust. And this is because we're relying on this implementation. So the leader should be implemented for the function type, which I think we can get by saying that, right, we need to specifically declare the type. I remember this being really annoying. I think it's this is custom drop. So this course is it into a function pointer type. And then we can do that. And then we can do this. And then we can do that. And this can be a const. And this is a it's real stupid. There. I don't understand. What is it complaining about? There's some very long. Oh, it's just a warning in here unnecessary unsafe block. That's fine. And this is going to be food to don't retire. So this is the way that you end up. Yeah, we can make a macro to make that cast. That's not a bad idea, actually, because it is a really annoying. It's really annoying to have to do this cast and have to do you have to do it through a console that it's static. Yeah, see, it has to be an unsafe, but it's actually maybe if I do this, maybe that's the way to go about it. There we go. You are correct. That is a better way to do it. Okay, right. So they do that, they do this. And then they have this retired is and I'm going to box leak and atomic rule here. Uh, and then in here, they create a custom domain just for this test, which is odd. We're not going to do that because that's silly. So I see. So this is going to be a sort of retire or retire retire or or struct. And we're going to impel the leader for retire. And it's still just going to be that like, but it is also going to have a tick static atomic pool. And it's going to do self dot zero dot store true. Uh, release. And I guess we're going to do the same thing up here, except we can do it a little simpler, I think, which is we do this, what do they call it in here? They call it foo three, very, very good names really, if you think about it, um, of retired and this needs box new. It does not need to be mute. Uh, cast requires that this lives for static. Yeah, the deleter bit is really annoying. Like I understand why it needs to be static, right? Because it, it can't guarantee when it's going to reclaim this thing. And it's only at reclamation time that it's going to call the retire, but it means that it's weird for it to be an object. Like I wonder what they end up doing for this, right? Like they, they, like I'm wondering if this even, is this even really a customer tire or is this just an object? I don't think this is a deleter, even though the name sort of implies it. I think this is just an object, right? Maybe I'm lying. I could be lying. What other tires do they have? There's only this one and it takes a pointer to the object and a reclaimer, but it doesn't, this doesn't give a reclaimer. I think this just gives the object. So I don't think this test tests what they think it tests. And you actually, you can see this because it doesn't call delete either. So I think this is just like a drop test. I don't think it's even a custom retire like this one is. It's still kind of tempting to have a test for this, but I don't think that's actually what it's testing. I think is just testing that drop gets called with the default deleter or default, which in this case is sort of the test we already had, right? It's sort of the same thing as we're testing in this destruction test or in basic protection, which tests node, I don't want this this last bit. I'm going to leave a comment here saying third test just checks that detour is called, which is already covered by other tests. It is not using a custom deleter, deleter, retire. So I'm just going to not have that. And I don't even know why this, their version of this test doesn't even call cleanup. So like, what is it for? Cleanup test. This smells like a loom test. This smells a lot like a loom test. Yeah, because here they have like a bunch of different threads and the threads are spinning and whatnot. So let's make this into a loom test. And we're going to do this with, ooh, I'm kind of tempted to put it in here. This is the cleanup test. This is cleanup test from folly. And then we're going to go up here and we're going to say this. I'm also wondering if it's a mistake for all the loom tests to be using a global domain. But you know that it is what it is. So each run is going to have a domain and one of these counters, which sort of means that this count type needs to be brought over here as well. That's fine. So that count is count for test. This is a place where our leak is actually going to become a problem. But I think what we can do here is just share it and then do count.clear. This is where we're going to need a clear method on count. We're going to self C tours dot store zero and detours and retires. So here they make a lot of threads. I think we can probably get away with two. But sure for TID in zero to two. Remember that loom will already take care of interleaving. So we don't actually need a lot of threads here. Even two might be excessive. And then they do threads spawn. And forget whether we have already imported that. That's great. So we're going to do threads spawn. So we could have a normal test that does all this threading to. I'm not opposed to that. But it shouldn't be necessary. Loom should be covering the interleavings that are relevant here. The things that loom don't cover are more things like the very strong consistency of sequential consistency and the very weak and relaxed. The stuff in between loom tends to cover pretty exhaustively. All right. So in here for J in for J. So this is like striping. So all right. Let's do so we have these. What I don't understand why is the indentation there weird. Great. Right. Because it's it skips by num threads. So the other way to express this is for J in zero to while we can even either perforate the loop or we could just loop by it's annoying that there isn't really actually there is a range for this. Right. There's like a range is zero to thread ops dot just so this won't yell at me. I thought it was like skip by range. And I'm pretty sure there is a iterator. I think it overrides the step by. Yeah. So I think we want this dot and this consumes self. Right. So step by fine. Numb threads which is I think what they're using over here. They called it. Yeah. Num threads is to threads. So we're going to step by num threads. And then we could do and then it's from TID. So for J in this which is what this translates into. Right. It's from this to this stepping by that. And each one creates a new node which means we need to bring the node type over to loom to which is a little annoying but not the end of the world. I think we specifically want these to use standard sync atomic because we don't need it to be considered in the ordering of the loom test. The atomic pointer here though probably does need to be a loom atomic pointer. That's fine. And then if we go back to our test here just to sort of grab the node creation bit. So they create a new one and then immediately retire it. And this too seems like more operations that are necessary. It's just going to cause more interleavings for loom to explore. So I'm guessing that this number can be reduced to and then afterwards it's going to store threads done. And so we're going to need a sort of it's going to be I guess a atomic U size but it's going to have to be static because I guess we can just have an arc new and standard sync arc great. This is going to be I guess an arc new of atomic bool new of false num threads and we're going to have one of these for each one and we say done tid.store true ordering release. And this is going to be a standard sync atomic bool. And I guess I need to kick back up this thing. We specifically want folly cleanup. I think once we finish this test I want to see if I can find a benchmark. We get a chance to cover that too. This can be J. That's fine. Associated for tests. All right. It's not for test. It's test local standard sync atomic. Actually this is also always tricky when you're using loom is like figuring out whether you should use the loom version of the atomic so that it gets sort of a part of the loom scheduler or whether you can get away with using the standard atomic and sort of hide it from the loom scheduler. In this case I think we do want it to be a part of the loom scheduler. So this is actually going to be the real atomic bool from loom. And then this is going to be while not main done. So this is threads done. And then there's main done. So this is threads done and we're not going to do then. So there's main done here and threads done here. I don't know why they used threads done and not like a barrier because I think realistically that's what this is right. It's trying to synchronize on all of them moving forward right. Every thread here is waiting for oh they're not waiting for each other. They're only while they're waiting for main to be done. And main isn't going to be done until all the threads are done. Where does main done get set to true later. Okay. So there it's not quite a barrier. There are things that happen in between. All right. Fine. So while not main done dot load. This one's tricky because here we kind of want to tell loom that this doesn't matter or rather that this thread can't make progress until this value changes. And I think there is a way to do that. Yielding. Yeah. So as long as we use yield now even yield now is too weak right. Like yield now is telling loom some other thread needs to make progress. But in this case we actually know which thread has to make progress. And we don't really have a way to communicate that here. One way we could do this is to have like a channel that we block on or something instead. And that way loom will actually know what unblocks this one. Which may be necessary otherwise loom is going to end up with too many permutations here. But that's fine. Ordering acquire thread yield now. Great. So it does that. And then include the main thread in the test. So it does the same thing here for the main thread. But this one is has no step by and starts from zero and uses I because you know of course can't have them do different things. And then here this has a threads done actually doesn't need to be a bull. It's just a it's just an atomic U size for how many threads are done. So this is a fetch add of one. And then I guess here we can do a while threads done. I think really this this should be a barrier here on the threads. But while threads done is less than some threads then yield now. Right. The problem here is loom is going to do something like imagine the main thread and one thread have have entered this loop loom thinks it will make progress if it just bounces between them when in reality that doesn't cause any progress. So those are wasted executions. They're wasted to explore. And we the only way to tell loom about this is to let it know that it's specific threads that need to make progress not just any thread. And this is going to blow up the execution tree size which is going to make the the test run long. All right. So down here we want to assert equal thread ops plus main ops the number of constructors and then domain cleanup. And then we want to do the same for destructors because all of them have been retired as part of the test. And then main done store true ordering release and then join all the threads. Yes. The main done is a barrier. Right. Because it stores this and then it then here it join all the threads. Right. So and all the threads are blocking on main done. So that is just definitely a barrier. Does loom have a barrier. That's the next question sink not yet supported in loom. It could cause an infinite loom. Yeah. Can we do better here is the next question. Okay. You're going to hate me for this but I think it's going to work. It's awful. It's going to be awful. So we're going to take the mutex and main all of these threads are going to try to take the mutex and this is going to only going to drop the mutex when it's done. And when main drops its mutex then all of these threads can resume and all they'll do is finally get the lock and then drop it and and terminate. It's great. Right. Dot map. I'm just rewriting this so that we get the handles. Yeah. It's awful. Right. But it works. I think in fact we could do the same thing for the threads done but I'm not going to do it. I mean a barrier is a semaphore so but barrier is not supported so we use mutex. I was thinking if we could do something with RW lock to get both limit ones but I don't think we can. All right. So we join the threads. Clean up after using array. I don't understand. This test down here has nothing to do with any of the stuff that happened before. Everything has terminated. There's nothing left. So why is this random array test here? Why did they stick this here? I mean I'm okay. I'm just going to put that in here because where's our array test? Didn't we have an array test? Move test. Right. Array. So they just randomly as part of cleanup array part or array part of cleanup that just is a separate test they made. And this first part of it makes no sense. This one just sees that it's possible. This is the same as this one. I don't understand. Why? I mean I'll have it there. That's fine but why make many in domain? Oh wait. Is there a loom notify? Ooh. That's tempting. Yeah. We could use notify the other way around. Yeah. We could totally use notify to fix the other way. It's a little awkward though because we'd still need the counter. So it would really just be a way to sort of let loom know what is waiting. But I don't think it matters because the current scheme I think is not going to confuse loom because it goes through this, then it yields. So some other thread has to run is what loom is aware of. Right. And the only other threads there are are these threads. And if any of them have done their fetch add, then they're blocked on lock so they're not runnable. So loom knows that the only other thread it can run is the one that hasn't done the fetch add yet. So I don't think we actually need the notify here. But it's a good call out. All right. So this is just a random, we're just going to stick an array test in here. That's fine. I don't, I don't mind. So we have a h, which makes two of these. We allocate two nodes P0, P0 and P1. We protect both of them. It's a little sad that it's not directly indexable. Arguably it should be. I think the reason why you can't just do h and then directly an index is because, is because it would be a little useless because the borrow checker wouldn't be able to realize that basically the borrow checker has special knowledge of arrays. It knows that if you index into, if you use one index and use a different index, they're independent. Actually, maybe it doesn't even know that. I was going to say that with this, you could actually use two hazard pointers at the same time. But I don't know if you even can. So maybe, maybe this really could just be that we implement like the index straight straight on array. Index mute more importantly, the concern right is that if you try to use like index zero to protect one thing and then use index one to predict a different thing, then the borrow checker is going to be like, no, both of those require mutable references to the array. And therefore everything is sad. Ah, so this is why it works. So this returns you an array of mutable reference to each element. So you can use them independently. If you index, you're going to borrow the entire array for the one index. And therefore you couldn't use multiple hazard pointers that are in that array at the same time. So we're going to do H zero, protect raw P zero, H one, protect raw P one, P zero, retire. Oh, what did I do? P one, retire. End of scope. Actually, that's a sort of interesting test, right? So this is checking that. I mean, it doesn't actually matter that I made this up here, but I'll do what they do. Um, this is effectively checking that dropping this. So in turn, dropping this is going to remove the protection we set here. Therefore, the retires are going to take effect there. So we're going to get the cleanup. So now we can go back here to the, um, where's my C tours? So now I want to assert equal C tours to and assert equal D tours here. Use named lifetimes, but not named generics. Is there a reasoning for that? Um, I find that people are sufficiently confused by lifetimes, uh, that things like take a are not necessarily obvious, especially once you have more than one lifetime, keeping track of which is which is, it needs a name for, for, um, generic parameters that does happen too, but often it's slightly more obvious what's going on. I think domain family is maybe one exception. Um, but, but I think we have better conventions there. Um, like for lifetimes, the conventions are just like take a tick B tick C tick D. Um, whereas for generics, we have like T for a contained type. Um, I think we're using like D for a deleter. Um, so we could expand them. I'm not opposed to expanding them, but I think it's, um, uh, all right. So let me first check that I didn't break that test. Okay. Great. And let's go back to the loom tests. Um, no method fetch add. What do you mean? There's no fetch add threads done is a function. That's not true. Oh, really? So the standard library recommends that when you clone an arc, you use arc colon, colon, clone rather than dot clone, uh, because it makes it clear that you're cloning the arc and not cloning the inner element. Um, but, uh, it sounds like loom doesn't have that. Great. But it does have this. So like, I feel like it should still work. So why does it not? Uh, main done threads done. Am I missing something stupid I've done here? I mean, does that make it better? No, well then thread. Um, it's not giving me type annotations because it's behind a config. So it's not actually compiled. Ah, an unknown type atomic use size. That's what it's complaining about. That's where the other thing stems from. There we go. That should do it. Uh, count is borrowed here, but count is static. May outlive borrowed value count move borrow value domain. Yeah, this one's also a little awkward, but maybe we can do this. We can't do that either, but we can maybe do box leak box new here. No problem, right? So we're spawning a thread, which means they need to be static. So it can't live inside of loom model because that's not static. Um, um, and then this now becomes not reference domain, which does not implement the copy trait, right? That's because box leak actually returns a, uh, tick static mute. So if you look at the signature of box leak, which it returns a, uh, well, any lifetime, but in this case we're using static, but it's immutable reference immutable references are not copy because they can't be aliased. So we need to cast it to a shared reference so that it works out. Um, cannot access loom execution state from outside of a loom model. Are you accessing a loom synchronization primitive from outside a loom test? Am I? I didn't think I was. I probably am because, yeah, I see what's going on here. So create the domain that we create internally contains like atomic pointers and stuff, which we want to use loom, but that means they have to be made inside of loom, which means that, uh, we're going to have to do this a different way. What does loom model take? Specifically, can I give loom model? It's an FN. The problem, right? Is that it needs to be static for it to be passable to spawn because I, I assume that the spawn require static, which sort of implies that it needs to use the globals. Um, actually, maybe I can do this with a lazy static. Maybe that's the way to go about this. So loom exposes a lazy static. So if we do loom, lazy static, uh, and we're going to have a static, uh, domain, and I think here we're going to have to end up with like a sort of weird fan, like we're going to have to use a named family because we can have a generic here is another example of somewhere where you, you need a named family. Uh, and here we can then do domain new, uh, of this. Uh, and now this can do this and the same thing here. This needs to use D and this needs to use D. No rules expected that this token, uh, is it like static ref? Is that what they need me to write? Um, expected reference found struck to D. Yeah, this needs to be D and D. Ha. Well, I'm panicked with something else. Uh, model exceeded maximum number of branches. Of course it does. So this is probably because our numbers are too high here. Um, so we could set this for example to like if I do seven, 17 and nine, is it going to complain? Well, it doesn't look like it. I can hear my computer fan spinning up. Unfortunately, loom is single threaded, which is, you know, one of the great ways to speed up loom would be to let it run multiple model simulations in parallel. It doesn't currently do that. Can you put the domain in an arc? Well, that may mess things up. Um, we could put the domain in an arc. I don't think it needs to be static. We could have an arc per thread instead. Um, well, the CPU usage is not that interesting because it's single threaded. So you see one course is real busy. Um, what I do want though is I want loom to be more helpful to me and tell me what, how many iterations it's on, which I think is like loom. I don't want to turn on loom log, but I do want it. Oh, did I not run it with no capture? No, I did. Oh, wait, why am I running it with loom log? I don't want it to run with loom log because that slows it down. Um, I do want to just see something though, multi reader protection, multi reader protection. Actually the, what I really want to see is I want to make sure that it actually is running multiple iterations. Okay, it is. It's running lots and lots of iterations. Um, so this one, at least in theory is running through and all the asserts are working fine. Um, and there are probably lots of iterations here. Um, let's try something smaller, right? So let's say, uh, there's going to be two threads. The main thread is only going to do three operations and the big threads are only going to do five. I just want to see if we can make it terminate. Yes, this is the problem with the topics, right? Is there are so many possible interleavings to consider that even with a small number of operations, like if you just think about it, right? Every, um, every retire here for every thread for every operation, including both the main ops and the other two threads, all of those combinations need to be checked. Um, and like in some cases, one thread is going to, um, it's going to end up reclaiming. In other case, another thread is not going to end up reclaiming and like all of these cases need to be considered by looms. So there are just lots and lots and lots of iterations. I'm surprised it's not printing the number of iterations it's done. I thought that was a thing it did by default. Um, but apparently not. Um, the other thing we could try is, uh, loom max preemptions. So preemptions here is, um, um, how many times a thread should be, uh, interrupted while executing. So generally threads get to run until they yield, uh, like until they do like a blocking system call or something, but the operating system is allowed to preempt them to stop them early. And loom, the setting I just set is, um, letting, telling loom that you shouldn't interrupt a thread very often or as often, uh, as you normally would consider. Um, and this reduces the number of possible executions because the number of possible interleavings are fewer because threads yield less often. Um, I can probably not make this very large even still. Um, yeah. So it still takes a while to run, but it lets us explore more of these, um, more of these interleavings. Is it better? Um, unclear. Um, in this case, I think it's probably better to have more operations. Uh, just because we know that there are limits, actually the, remember how there are limits on reclamation that are these like, um, the number of items we use before we start doing a collection, we might want to set a lower limit for loom here to, to force it to do collections with lower numbers, but this is probably the reason why they set it to a thousand and seven in that test, right? Because they wanted to get over the count where you're forced to do a reclamation, which is set to a thousand, um, which makes attempting to say, uh, config loom and config not loom. So with loom, if we set this to like five, right? Um, and same thing probably for the number of shards. Um, let's do two is probably enough. So now we, as long as this number is, so they used a thousand and seven, I don't know why seven, but let's do, I guess 10 or maybe it should be prime. Um, and I don't know why they chose, I think they had 17 for main ops. I don't know why they chose 17, but let's do seven. So that panic, that's actually kind of good for us. Uh, model exceeded maximum number of branches. So that's still too many because what we're also doing by reducing the threshold, right, is that loom is going to be forced to deal with some thread decides to do reclamation, which again, sort of increases the size of the exploration space. If it didn't have to think about reclamation at all, there's a bunch of code paths that aren't explored. And so loom doesn't have to think about their interleavings, but now we're forcing it to think about more of them. Okay. So that ran, that's promising. Um, and main ops here, I think main ops is unlikely to make that much of a difference. Like the main, main thread probably doesn't really need to participate in this. It's probably going to be the same exceeded maximum number of branches. That's fine. I'm trying to see if I can do like, oops. Uh, yeah, let's do three and seven and see if I can do that with here. Uh, what about yielding stuff? Isn't that the way to reduce the number of branches? So yielding doesn't reduce the number of branches. It just says some other thread has to run. Um, which it does indicate to loom that don't continue running me run someone else, but it doesn't necessarily guarantee forward progress. All right. So I'll leave this running in the background to like try to find a bug. Uh, the fact that it hasn't crashed yet is a good sign, but it's still exploring the state space, right? Um, all right. So let's see then if we can find a, uh, core test, destruction test, cohort, safe children test, fork test, life photo test, swimmer test, wide cast. Uh, why is this just labeled tests? Oh, I see. This is the thing that actually initializes the test and potentially runs multiple copies with different underlying types or underlying schedulers like here. Um, yeah. So see, it runs the destruction test using, uh, different, different underlying types and different domains. Um, now show me a benchmark, please. Reclamation without calling. This is an interesting test. Reclamation without calling cleanup. So this is just seeing that if you do nothing, eventually your stuff will be reclaimed. Um, which we don't support because we don't have asynchronous reclamation at the moment. We have time based reclamation, but not asynchronous reclamation. There's no collecting background thread. Uh, benchmark drivers. Aha. Barrier dot wait. That's funny. I wonder why they do, uh, I guess that makes sense. They want all the threads to be ready to run. Then they allow the main thread to start the timer and then they, uh, release all the threads and do the work. All right. Let's see what this. Yeah. So we're not going to really deal with the, um, this is, it's interesting that they need to do this. Um, so I wanted to use criterion for this because criterion is great. Um, I want to see whether there's a way in criterion for me to say, uh, timing loops batched or custom custom might let me do it. Um, just specifically because, well, it's still for looping and we don't, this isn't necessarily a loop. Um, but this will let me do the, yeah. Iter accustomed would let me do the sort of correct setup here. Um, which is specifically if you're doing a multi-threaded benchmark, right, you need to make sure that you don't bias the, the sampling based on some thread took a while to start, and therefore the others were sort of slow. And that, that's why they have this barrier for, for synchronized start. Um, but I want to see what the actual test does. Rep fn. So rep fn here is just the, the function that is being benchmark. So this is just benchmarking harness, both, um, this thing up here and this thing here are harnesses holder bench. So this is a benchmark of, wow, this is the benchmark of making 10 million hazard pointers across some number of threads. All right. I mean, that's an easy enough one to make. Um, this is still running. That's, that's good. All right. So criterion has the getting started is real nice. So we're just going to do this. And here what I want is, what are we going to call this one? I guess we can call it folly. Um, and then we're going to do make their benches. And then we're going to have benches folly.rs. We're just going to bring in all of this, uh, test folly. I'm going to want all of these. There's no Fibonacci in here. Um, instead we're, I guess we're going to benchmark, uh, make holders. And the benchmark here is really just, actually I'm going to leave this up for a second to sort of explain black box because black box is its own little annoyance. Um, so here we're going to do let H is um, let me just grab a example of this like so. Um, and let me remove these. This is a very straightforward one to start. And this is just going to keep running. That's fine. Uh, and I want to run cargo bench. So, um, the criterion setup here is that you have a, uh, you have a bunch of groups of benchmarks. Um, the criterion main here is just, these are my groups. In this case, we only have one group. It's called benches. And each group has some number of benchmarks. In this case, we only have one benchmark called criterion benchmark. And each benchmark has a bunch or can have any number of, uh, benchmarking functions. Uh, in this case, we only have one called make holders. Uh, and that benchmark runs a number of iterations. So we're going to do iterations of doing this operation. In this case, that operation is just making a global, um, hazard pointer. Um, black box here is an interesting function where it's unstable in the standard library and criterion provides its own version of it that works on stable, but that's not quite as powerful. Um, and the idea here is sort of straightforward at sort of a high level, which is you might imagine that we wrote this like this, right? Which is just, just call to make global function. The problem here is that the compiler might decide that this is dead code and eliminate it. So your benchmark takes no time and therefore it looks like everything is super fast, but it's just because it did nothing. Black box is a function where they're still trying to figure out what exactly kind of promises is going to make. But one way to think about black box is that it is a function where the mapping from its one input argument to its return type is unknown to the compiler. Hence its name. It's a black box to the compiler. So the compiler is not allowed to assume anything about what the function does with its input or how that is transformed into its output. In practice, black box is the identity function. It just returns its input. But the compiler isn't allowed to know that. And therefore black box could, for example, have side effects. Therefore, the compiler can't eliminate the production of the input. Now, this isn't necessarily save you, right? So for example, like in this case, I'm using black box on the make global, which forces Rust or forces the compiler to actually call this function to produce the argument because it needs to give that argument to black box. But it doesn't prevent things like here. Let's imagine that the compiler realizes that this is loop invariant. There's nothing that stops the compiler from hoisting this call and then doing this, right? If it realizes that that is okay. It's still providing an argument to black box black box is still getting a valid argument that was produced. But there's nothing here that tells the compiler is not allowed to do this optimization. In this case, it shouldn't matter like the iteration is taking care of us by criterion. But like it's it's somewhat tricky actually to make compilers not optimize away benchmarks. All right. So here we're running the benchmark. I mean, I'm running the benchmark and I'm also running loom. But because they're both single core at the moment, notice that we haven't, this is not running multiple threads, even though that is definitely a thing that we want to do. So you'll notice I ran cargo bench up here and then I ran it again. And it tells me how long each call took. So this is not the total run of the benchmark, but one call of the closure inside the iteration function took this long. And you'll notice here, it told me the change to and that's because I had already run. I'd already run cargo bench and then it took 5.392 and now it took 5.639. So it's faster on average, but looking at the ranges, sorry, it took 5.39. It now takes 5.63. So it's slower and it's also telling me that the range is slower. So it's measuring like the, I think this is probably the P95 and the P.5 is also doing like a statistical T test. Criterion is trying really hard to give you statistically meaningful information. So when it tells you that it's regressed, it's not just the number I saw was higher than the previous number is given the variance that I see in the results is this is a statistically significant regression in performance. The fact that it regressed here is probably because this time is so short, like five nanoseconds is like too little. And you can see it told us they found some outliers and stuff about the samples. Criterion is using black box for other things. It can't inject black box into your code in the closure. So you still need to use it in the closure. Oh, there's an HTML report. Where does it place the HTML report? I wonder, there's a place where it's in like some criteria. Normally, this would be in the target sub directory, but I'm, I'm an annoying person. And so it's not there for me. Right. So we're going to have to do index, see what this looks like. Right. So here it shows me the distribution of timings for calling this particular function, showing me across iterations, all of the samples that it got and the linear regression across there. There seems to be a sort of dual mode here where sometimes it takes this long and sometimes it takes that long, which is interesting. Like there's sort of seems to be a step function, like a bimodal of sorts. And here you see, because I ran it twice, it shows me the change as the previous benchmark. So there's a lot of really good statistics in these reports. Is there a way to tell criterion to run until you have a statistically significant information? That's always what it does. So if you look back at the result here, you see that it says, you only see it when you run it, I think you see warming up for three seconds. And then it says collecting 100 samples in that long. And the number of samples it collects depends on the variance and saw in the warmup period. So there's like, there's definitely like a lot of sleek stuff here. And you see here I ran it again and now it said changes within the noise threshold. Luma's still running. That's great. So this is like an obvious first sort of step here. The question now is, what do we actually want to benchmark? Like this is how long it takes to make a hazard pointer. But really what we want to do is how long does it take to make hazard pointers while they're also being made in other threads, right? That's really the measurement we want to make. There are a couple of ways we could go about doing this. I don't know the criterion has a great way for dealing with multiple the reds sort of thread contention. Let me see. I thought there was a, I can't find it now. You know, I wonder if maybe if I go to, I didn't think so. Async benchmarker is not what I want. I want, sure I've seen a way to do this, but I might just be completely misremembering. So I mean, there's inter custom, which is basically like it gives you a function to call when you know how long something is taken. Venture also takes time to drop. That's fine. We want to, so one of the things they point out here is that the dropping of this value, you want to think about whether is, whether should be measured or not as part of the test. Right. So inter custom takes a closure that is told how many iterations to run and has to return the amount of time spent. See if I can, yeah. So like instant now and then start elapsed. So this is an example where we could do inter custom and then in here, let's start is instant now and then start dot elapsed. So here's what we're going to do. This is sort of the closest I think we can get to what they're doing while being within the bounds of how criterion expects us to do this, which is we do barrier is going to be an arc new of barrier new of actually, I think there's a way to parameterize this test too. So if you look at criterion benchmarking with inputs. So over here, there's a bench with input bench with input, which we can do concurrent holders. And we want a range of values. I sort of want to not give the range. But that's okay. So here, what we're going to do is let group. So I'm just following the sort of setup here. Let mute group is C dot benchmark group concurrent new holder. So the idea here is we carry a group and each benchmark within the group is going to be for a particular thread count. And then I want to do four threads in and let's just do sort of one two four eight. So good place to start group dot. And I guess here I need to figure out what I actually want here. So then this is going to be group dot bench with input. And it's going to be n threads. And then it's going to be be it or custom. And just to sort of bring in the types so that this is a little less painful. And I actually can get some formatting and stuff again. So this is going to be n threads plus one, because it's going to be the current thread that's doing the timing. This is sort of mirroring the I don't know why it's saying file not included in module tree. That's weird. So in here, now we can do the sort of bit that that we wanted to get set up here, which is the original benchmark, which is going to be where's the setup they had here for I in zero to n threads, thread spawn move this. And it's going to bear. So we're going to have to have a clone of the barrier for each thread. And it's going to wait on the barrier twice in a row. And then it's going to do the operation. And I guess we're going to do this so that we can wait on them at the end. And we're probably going to want to pull this out because all of our tests are going to have the same setup for the benchmarks. But I'm just writing it in line now so that we can test it for this one benchmark. And then we can reuse figure out ways to pull out this particular benchmarking harness so that it's easy for us to write other tests that have a different body for basically this line. Right, so that's going to be this part. So that's the operation. And then here is where it gets. This is where we do barrier dot wait, then we get the start time, then we do another barrier dot wait to sort of release all the threads to do their actual work. And then I guess we do four thread and threads, thread dot join dot unwrap. And at that point, it's elapsed. Oh, and we want to do domain dot cleanup. How are we going to do that? I guess that's going to be global. How do we do global cleanup? I forget where are it's domain colon colon global is where we put it cleanup and the time measurement. And that's going to be the the total closure is expected to take one argument. So this is going to be the number of iterations that we want to do. So four and hitters. So right, so that the basic setup here is this benchmark is going to be a group of where the group is called concurrent new holder. Each benchmark in the group is going to be for a particular thread count. And for each thread count, we're going to run it, we're going to be given the number of iterations to run. And so this might be called multiple times, right, because criteria might run at ones with like five iterations and be like, there's too much noise here, run it again with 100 iterations. When we're told the number of iterations to run, we spin up that number of threads. And we make sure that the threads have all started. And then we take the current time. And then we let all the threads go. And then each thread is going to run that many iterations of creating a new and you hazard pointer in the global domain. Then we're going to join all the threads, which is when they're all done. Then we're going to call cleanup. And then we're going to measure how much time elapsed. So that's our that's our setup here. Let's see what this does. Right. So this is with one thread collecting 100 samples. Right. So here it says creating a new holder when there's one thread takes about 5.8 nanoseconds. So remember it was 5.5 or something, right. And it's saying it's taking a little bit longer because now we also have to like spin up the thread and stuff. Really, this is meant measuring the overhead of doing the the barrier weights and the thread joints. So here we're seeing this, right. So as the number of threads go up, the time it takes to allocate a new holder is also going up. And the question now, for example, is 701 divided by eight. So you see it's not linear, right. Having with one thread, it took 5.8 with eight threads that took 700, which means 87 per thread. So clearly there's like contention here somewhere. So if we go back, I wonder if I can now load this and it's going to be concurrent new holder report index. Yeah, so here you see average time given the input. And you see this looks very much like a sort of exponential curve here where we're clearly not scaling well with the number of cores in terms of making new holders. So that seems like potentially a problem. Interestingly, it looks like it's sort of bimodal for eight threads. Loom is still running in the background. Let's see if that makes a difference. Loom is still just chugging along. And in fact, to hear what I can do is just rerun it and see whether it claims that there were any performance improvements. The bimodal could be because of that, for example, I think I only have 12 cores. So eight, this is a decent chance that something got scheduled the same time. So for one thread, there was no measure difference for two, no measure difference for four. So you see the how long it takes to estimate also changes. No change for eight. It runs fewer iterations, right? Because it realizes that each one is longer has regressed. Okay, so it got worse when I shut off loom. That's interesting. I wonder why. Interesting. So here it suggests that there is, there's a decent amount of contention in allocating new holders. This is something we definitely should be looking into and figure out where that contention comes from. I suspect that the following implementation has the same problem. And this is probably contention on that allocated holders linked list, because everyone is going to be contending on the head. So as you have more threads out of allocating holders, you contend, you start failing on the compare and swap. And as you fail on the compare and swap, your wasting work. And so more threads means more wasted work. So it slows down over time. That's what I'm guessing is happening here. I don't think this is in and of itself a problem because allocating new holders should be relatively rare, right? In general, you would expect that people keep holders around and what they actually are worried about is the performance of protecting pointers, right? The idea is that any given thread can just like really have one holder and then reuse it for all of its different axes. And so that's sort of the next thing that we might want to benchmark, which is going to be something like, you know, the next thread, the next test they have here is testing arrays. So we could test that object bench, you know, object bends might be interesting. So this is just allocating and retiring, but nothing is protected. So I mean, we could do that just for concurrent new holder. So this is an example of where we probably want to start generalizing out this infrastructure because now we end up or even just writing a macro for it, honestly, we get you really far, right? So we could do a maybe maybe we should do it. So macro rules, a good example of a macro being helpful. Folly Bench. Folly Bench. And it takes a name, which is an indent. And it takes a iter, which is a block. And it takes an N itters, which is an indent. And it produces name. This is stringify name. All of this stays the same. This has to be a variable so that it can be accessed inside the block. Actually, it doesn't need this. This is just going to be here, iter, because the actual value of the iteration shouldn't matter. The thread ID arguably should be exposed. But it doesn't seem like it's super important for right now. And I think that's also so this should mean now that we can do here, I should be able to do folly bench concurrent new holder and a block that is just this. And then we should be able to do concurrent retire and have that be so what does concurrent retire have to be concurrent retire has to allocate an object. So I guess here we can do just this is back to our tests are folly test the node that we had. And in fact, it doesn't even need to be node, this can just be an int. So if we go back to the thing that makes foos, it can just do this. And this should probably be with global or object. I forget what it's called with global domain like so. And I mean, let's see what happens. And to be clear, these are definitely micro benchmarks, right? Like these are not these don't necessarily tell you, oh, look, are things scales well, or are things works well for all data structures? This is a micro benchmark for how long does it take to make a holder? How long does it take to allocate and retire an object? And including the cleanup phase crucially. So like it's it's not as though these are the only benchmarks you should have, you should have end to end benchmarks where you have a data structure that uses this. And the benchmarks from there are going to matter a lot more. But it is useful to have these to spot the low level regressions as well. Well, our macro seem to work. So here retiring seems to take 76 nanoseconds. So when there's one thread doing retires, it takes 76 nanoseconds to threads 185 for threads 272. So this seems much more linear. It's not quite linear, but it's not awful. So if I go here and go to concurrent retire report index, this is a much better line. It's not perfect, but it's much more linear than the other one was. So this suggests that our retires are actually scalable in some sense. And then like, of course, we haven't looked at things like what if they're protected and checking for protection and that kind of stuff, but retire and reclamation seems to be. All right, I'm running over time and I need to eat. So I think we're going to end it there because now we have some benchmarks and we have a sort of benchmarking setup that we can reuse later as well. And we have a bunch more tests. So I'm going to commit these, send them in. And because this is going to be the last stream, please, if you feel like you want to port more tests, if you feel like you want to port more benchmarks, do so submit PRs. That's the way we make this library better. I think the big thing I'm going to do in my sort of when I get some asynchronous time is going to be also along these lines of sort of adding more of this, but also trying to document this whole thing and maybe make the API a little more ergonomic that like the ways we've discussed on stream. If you want to do that, do it, send a PR, I'll be very happy. But those are sort of high on my list because if someone were to actually use this, I would want an interface that's nicer than what we have right now and certainly better documentation. All right, I think that's where we're going to end it. Any last minute questions before we end? What is the biggest user of hazard pointers? Is it useful to port that to Rust? So hazard pointers are used for basically any time you want to implement a concurrent data structure when you don't have garbage collection, you need to figure out how to basically when it's safe to free those objects. And hazard pointers is one very well known way to do that. Another one is epic based garbage collection, which is also really nice, but has slightly different trade-offs. That's what you get with something like crossbeam epic. Crossbeam epic has some other problems too where it's not, its performance is not great in some kind of concurrent situations. That's not necessarily a property of epic based reclamation as much as it's that particular implementation that can be improved. But this is another way to do this, another common way to do this. A final quick overview for why this is better than spamming arcs. So there are a couple of reasons why you don't want to use arcs. The biggest one is that if you're using arcs, well there are two reasons actually why you can't use arcs here. There are two big reasons. The first one is there's a race condition. Imagine that you're doing an atomic pointer load and it gives you back a pointer to an arc. An atomic load is just that and is atomic load. All it gives you back is a pointer. And let's say that that's an a pointer to an arc. Then now there's a race condition between you incrementing the reference count and the arc getting dropped by whoever is operating the data structure and that memory being reused for something else and you now modifying the memory location of where the reference count used to be. So there's a reference count, there's a race condition on the reference count value field. It's kind of maybe possible to work around that with weak references, but it's pretty painful. The other problem with arcs is let's say you solve that problem. You're now making it so that every read has to write to memory and it has to write to shared memory. Imagine that you have lots of threads all reading the same key from a concurrent hash map. Every read has to clone the arc which means that every read has to increment the reference count and all the reads are accessing the same reference count field. So they're all contending on one value and that's how you end up with performance collapse, right? Because you might have a hundred threads but all hundred threads are blocked on updating one value and they have to do that serially so they all slow down. So those are the two main reasons you don't want to use arc. Sweet! I'm gonna end it there. Glad it was helpful. It was fun to go through this long, long, both in time and space series on Hasn't Porteous. I think it was fun. I don't know what we'll be doing next. I think we might return to the to the wait-free lock-free sort of adapter. That would be really fun and try to make use of this library there. That's sort of why we stopped working on it. Or we'll find something completely different. I guess tune in next time and find out. See y'all everyone!