 Nom nom nom Welcome back everyone to yet another crust of rust episode I'm trying to Find a way to like fit in as many of these as I can before like I move to LA and like all these Changes in my life happen and it'll be a little bit unpredictable when the next episode is going to be It's been trying to cram in as many like good episodes in the midst of my thesis writing as I can I For those of you who aren't aware of what crust of rust is this is a Variant of the live streams that I normally do where I try to tackle Sort of beginner intermediate content is the best way to describe it. This is stuff where you've read the rust book You have some familiar to with the language You've maybe built some things But you're you're looking to understand how some of the maybe slightly more advanced topics work So if you look at some of the past videos That I've done I've done things on lifetime annotations declarative macros iterators smart pointers interior mutability a lot of the topics that like once you start getting deeper into rust you start seeing Some of these things pop up and you might wonder how they work And in order to do this next stream or the one we're about to do I tweeted out to ask people What would you like to see next? And there was a pretty overwhelming plurality for looking at the Standard sync MPSC module and MPSC is basically I mean it's not basically it is a channel Implementation that comes in the standard library a channel if you're not familiar with it It's just a way to send data from one place and receive it somewhere else The MPSC part of it is a multi-producer single consumer So this means you can have many senders, but you only have one receiver So it's a many to one channel And in the stream what we're gonna do is basically implement our own channel and see the ways in which it compares to the Both the standard library channel, but also some of the other channel implementations that are out there Some of the other ways to design these channels and some of the considerations that come up when you do and when you decide how to use them Before we dig into that let me Oh, yeah, and you can like if you're interested in these crust of rust streams Just follow me on Twitter or on subscribe on YouTube or Twitch and you'll be notified whenever I do any upcoming stream Um, so as I mentioned the standard library has a built-in mechanism for these MPSC channels, uh, and Usually for any crate that provides something like this You have a receiver type and you have a sender type in the case of the standard library There's also a sync sender type and we'll talk a little bit about why that is and how it's different Um, and as you can see the examples are fairly straightforward. You create when you create a channel you create a sending Handle and a receiving handle And you can you can move these independently, right? So you can give the receiver or the sender to some different thread and give the The opposite side of the channel to some other thread and now they can communicate over that channel The channel is unidirectional though only the senders can send and only the receiver can receive You can clone the sender, but you cannot clone the receiver. Hence the multi producer single consumer part Um before we dig into how to implement a channel Let's first just like make sure that we're all on board about what a channel is and maybe why it might be useful So if you have questions about that, let's like get those out of the way first before we start to get into the weeds of the code Um Oh, yeah, there's a discord server as well. All the chat will also be in discord if you're interested in watching this If you're watching the video on demand Um, what does crossbeam do differently to the standard lib? I forget. Uh, that's something we'll cover after we've done our implementation What kind of data can I send through a channel? So the channels are the channels and rust are all They take a generic parameter t. So if we look at the sender, for example here The sender has a type parameter t and you can send any t through that channel and the when you do When you create the channel in the first place like if you call the channel method Which is the thing that gives you a sender and receiver It's parameterized by that type So the sender and receiver are both parameterized by the type of the thing you're going to send and receive Um Yeah, they're very much like, uh, oh I can zoom in. Yeah, sure um So they are very much like go channels channels or something that exists in most most languages And very often they function in a similar way you have senders and receivers In rust parlance, it's specifically the mpsc channel Whereas in in some other languages you have like many to many channels, for example I forget what ghost channels do there uh The data has to be send though right, um Not quite actually the so imagine that you have a channel But you never give away the sender or receiver to a different thread Then the t is being sent on the same thread and so it doesn't actually need to be sent Um, and I don't know what the standard library makes this distinction Uh, but hopefully it does Yeah, so you see that the sender Is send if the t is send What this means is you can construct a channel and send stuff over it That's not send as long as you don't move the sender or receiver across the thread boundary But if you do then t must be sent Um Are there any constraints of what kinds of types you can send through the channel though? Does it have to be sent? It does not have to be sent and no there are no other constraints you can send anything any type Through a channel remember that it's not it's not like serialization. It's not tcp. It's nothing like that It is really just sending the the the data that's stored in it if you send a veck, for example It's going to send like the length capacity and the the pointer to the data across the channel Uh, does the data need to be sized or can it be din? ooh I think it it has to be sized so remember from our one of our previous streams We talked about the fact that the size trait is an auto trait, but it it is also an auto bound so Unless you say question mark size like this thing doesn't need to be sized every t that you write has to be sized So here because there's no question mark sized t has to be sized Um While you go can you point out differences between other implementations of channels? We'll look at that later in the stream. We'll specifically look at uh, what are other implementation? strategies and the ones that we pursue um How is the sender thread distinguish from the receiver? Uh, they have different types, right? So there's when you call channel Which constructs a new channel for you you get back two halves if you will a sender half Which has the sender type and a receiver half which has the receiver type um Can the data be non-static? Sure, uh, but you need to own it What's the point of a channel if you're not going between threads? Uh, there are some cases where you might um, you might not go between threads, but you might have multiple sort of You might have things that you want to execute Uh in parallel, but not concurrently or concurrently, but not in parallel So you might have one thread that's like an event loop or something, right? And it might end up sending to itself And so you might still want that sender receiver abstraction um So that would be one example tests or another um Who owns the data in the channel the channel object? Yeah, the the channel type itself owns the t So if you send something on the channel, you don't receive it and you drop the sender and receiver The channel will make sure that the data gets dropped What's the performance impact of the channel? Um, we'll look at that a little bit when we write the implementation Um, how does it do back pressure? We'll look at that as well once we start to get into implementation Uh All right, so Um, it seems like the question now are more about uh, implementation details. So let's start with that Um, as always we will start with an empty project new lib and we're gonna call it um Panama Because panama was a channel that enabled communication between There are also good reasons not to call the panama, but it was the first thing that came into my head um Okay, so we're gonna have a Um We're gonna have a pub struct that's going to be a sender and it's going to hold the t We don't know what's going to go in there yet And we're going to have a receiver t And then we're going to have a function channel Uh, it's the generic over t And it's going to return you a sender t and a receiver t And by convention the the sender type comes first So you return a tuple where the first part of the tuple is the sender and the second part of the Um tuple is the receiver and who knows what this is going to do Uh, yeah, so this is the setup that we have And the question becomes well, what is the actual implementation we want and here we have a lot of possible choices I'm gonna go with one that is just like very straightforward and demonstrate some useful concurrency primitives and rust But that is not necessarily the most performant in part because implementing a very performant channel Requires a lot more Subtlety and trickiness that would be hard to cover in the stream. So in particular what we're going to do Uh, it's going to use other we're going to be using other parts of the sync module in particular We're going to be using mutex and arc and con var Um, so mutex we talked a little bit about in the stream on smart pointers and interior mutability mutex is a lock mutex stands for mutual mutual exclusion So the idea is that you have a I really want the default to be to collapse You have a lock method And the lock method returns a guard And while you have that guard you are guaranteed to be the only thing that can access the t that is protected by the mutex And the way this works in practice is if two threads both try to lock the same mutex One will get to go and the other one will block It will have to wait until the other one releases the guard and then it gets to go So this ensures that there's only ever one thread modifying the t at any given time Arc as we talked about in the stream on interior mutability and smart pointers is a reference counter type it's A arc because it's an atomically reference counter type which means that we can use it across thread boundaries Which obviously we would want for a channel. It's kind It's not useless on a single thread, but we would certainly want it to work across thread boundaries This is also the reason why we're using mutex instead of something like ref cell Um, and then con var con var is interesting if you haven't done a lot of concurrency Work before you might not know what a con var is A conditional variable a con var is a way to Announce to a different thread that you've changed something it cares about So think of this as like if there's a receiver who's waiting because there's no data yet And you have a sender that sends something it needs to wake up the the thread that's sleeping Right the thread that was waiting to receive something and go there's now stuff you can read And that's what a con var lets you do And together these are very useful concurrency primitives that give you a very nice Model for how to write concurrent code in a in a safe in a safe way We might not even need any unsafe code in this implementation of channels Okay, so what we're going to do is we're going to Define and this is a pretty common pattern in rust when you have things that are that are shared Like there are multiple halves or multiple handles the point to the same thing We're just we're going to declare an inner type which holds the data that is shared And for us that's going to be sort of the the things in the channel This is effectively a queue right because if a sender sends something and then the receiver receives something The receiver should be should receive the thing that was sent the longest ago Right Now let's start with this being a veck. It's not actually going to end up being a veck But for now it's a that's a useful starting point And then we're going to say is that the receiver Has an inner which is just an arc mutex t And the sender has the same thing so the sender and receiver actually contain the same stuff at least at the moment Standard sync We want arc and mutex and we'll want con var Um, and now we even have our way to create this channel in the first place, right? So what we do is we create the inner which is going to be like an inner and a queue which is going to just be an empty veck to begin with Uh, and then we're going to have the the shared inner be an arc of a mutex of an inner right And then we're going to return a sender of that inner and a receiver of that inner Uh, all right. This is not a t but an inner t The heat is making me type more poorly Um, great. So that now compiles. This is a rough structure. Um, that we've laid out here makes sense That there are obviously plenty of things missing and there are some things that aren't quite right Um, but it's a general idea of how we're planning to to sort of set up this shared state Make sense Uh, ref cell does runtime borrow checking, right? Yes Mutex in a sense is also a runtime borrow check, but it's it doesn't borrow check. So more just borrow and force Um, if two threads try to access the same thing at the same time it'll block one thread Whereas ref cell will tell you you can't get this mutably at the moment Uh, why does the con var always need a muctex guard? We'll get back to that in a second once we start adding the con var Uh, why not make it a linked list? Um, we'll touch on that for alternative implementations Uh, can you zoom your vim text a little? Absolutely How's that? Uh, is it possible to specialize the struct so that if t is not send the arc mutex would just be an rc Um No, not easily you can't specialize the definition Um Yeah, no, you would have to have Uh, a sort of unsync version, right? So this is something I forget if the standard library has something like this I don't think so, uh, but you could imagine that you had like a standard unsync or unsend um, mpsc Although I think that the actual use case for those is is um, is less clear than for a cross thread channel In general channels are used for things to ascend Um I've seen a lot of people using mutexes from parking lot and channels from cross beam Does con var have a similar better implementation? You won't know of a parking lot also provides Parking and notification, which is what con vars give you. Uh, so yeah, you could totally use this stuff from from parking lot as well um, I know there's been some talk of trying to Take the parking lot implementations of things like mutex and con var and make them the standard library ones And that might happen someday Uh, why would you not put the mutex in inner? Uh, yeah, I guess we could do that That's certainly, uh This is a change that I would have done in about five minutes. Um So make this be this And you'll you'll see why that's actually necessary, uh in a bit And if I here we can even go default So use the default implementation of back Um Why does the receiver type need to have an arc protected by mutex if the channel may only have a single consumer thread So, okay, so the question is why does the receiver need to have a mutex? And the answer is because a receive ascend and receive might happen at the same time And they need to be mutual mutually exclusive to each other as well, right? Um, and so that's why they all need to be synchronized with the mutex Um Is there a difference between an arc mutex and a boolean semaphore? Um, a mutex is a boolean semaphore Uh effectively So no, uh, but there's I don't think there's a reason to use a boolean semaphore over the implementation of mutex Um, in particular what mutex buys you is that it integrates with the um with the Parking mechanisms and and user mode a few texas That are implemented by the operating system. So with a boolean semaphore If someone else a boolean semaphore is basically a boolean flag that you check and atomically update The problem there is if the flag is currently set So someone else is in the critical section someone else has the lock has the mutex Then what do you do? It with a boolean semaphore you have to spin you have to repeatedly check it Whereas with a mutex the operating system can put the thread to sleep and wake it back up when the mutex is available Which is generally more efficient. Although adds a little bit of latency Uh, at that point you can just use a cue, right? Uh I don't know what the question is Um, why is the arc needed? So the arc is needed because otherwise if there was no arc here Then the sender and the receiver would just have two different instances of inner And if they did then how would they communicate right? They need to share an inner because that's where the sender is going to put data and where the receiver is going to take data out of Um, all right. So so let's just like start implementing and see what happens So for a sender We want a obviously send function It's going to take a mute self and it's going to take the t that we're going to send And it's going to return something For now, let's just say that it returns nothing and we'll see why that's a problem later on Uh, and similarly for receive actually, let me move this up here Um, and then for receiver We want a receive method Which does not take a t but returns the t Okay, so what would send do well send is just gonna First it's going to take the lock self dot inner dot lock And you'll notice if we go back to the documentation for mutex You'll see that lock returns a lock result The answer for this is imagine that the last person who took the lock the last thread that took the lock Panicked while holding the lock Right. So so it might be in the process of updating something under the lock, but then the thread panics So that might mean that the data under the lock is now in some like not quite consistent state And the way that the lock communicates this is when the the thread panics It releases the lock, but it also sets a little flag in it to say the last thing that access this panicked And so what lock result does Is you'll see that it's a it's either a guard or a poison error with a guard Basically telling you if you get an error back it's saying the other thread panicked You should know about that and of course you could always choose to ignore the fact that it was poison That you can ignore the fact that the other third panicked, but you it could also be that you don't want to ignore that In our case, we're going to unwrap this for now Inner q So we're going to lock the q And then we're going to do q dot push t Right and the receiver Is going to do sort of the opposite right it's going to It's going to lock the q and then it's going to pop the q now There should immediately be some some obvious problems with this the first is that this is not actually a q Right if the sender pushes and the receiver pops Then if you have two sends and then receive the receiver would get the last thing sent rather than the first thing sent And the problem here, of course if we're using vek we're using it like a stack Now in theory you could remove the first thing from a vek But what you end up doing is you have to shift all the other elements down to fill the hole of the thing you removed In practice the way to do this is to use a ring buffer We might cover those in a later stream, but for now know that there is In collections there's a type called vek deck Or vek dq Which implements the It's basically a Fixed amount of memory or it's it's sort of like a vector But it keeps track of the start and end position separately So if you put you if you push to the end Then it it pushes it to the end if you pop from the front it removes the element and then just moves Like a pointer to where the data starts And so this way the data might end up wrapping around the whole thing But it can be used as a q as opposed to a stack And this allows us to have a to have send do a push back And q do a pop and and receive do a pop front You don't want to swap or move someone suggested that as an alternative If you swap or move what that will do is the last thing sent will become the next thing to be received It changes the order of the elements in the vector Apparently vek deck is the correct pronunciation. Thanks Atlas All right, so the other problem here right is Well, when we receive pop front as the compiler tells us returns an option It doesn't return you a t because it could be that there's nothing in there And then what do we do we can't just I mean one option here, right? Is we can provide a try receive method that returns an option t Right, and so it will try to receive something, but if there's nothing to receive it just returns none That seems totally fine We're I'm going to remove it now because we're going to change some things later But really we want to provide what's known as a blocking version of receive We want to provide a receive that if there isn't something yet it waits for there to be something in the channel And so we need to figure out what to do here And the answer here this is where the con var comes into play so Here we're going to do something We're specifically going to have a We're going to split it into a tx ok Or tx available Uh con var And an r actually we're just going to do available Is that what I want to do? Yeah, we're going to go with available for now It's not quite true. We're just close enough um And the con var needs to be outside the mutex Uh because the idea is that imagine that you're currently holding the mutex and you realize you need to wake other people up um The person you wake up has to take the mutex. That's sort of the assumption But you're currently holding the mutex So if you tell them to wake up while holding the mutex And then they wake up they try to take the lock they can't they go to sleep And then you continue running and then you release the mutex Then now no threat is awake and you end up with what's known as a as a deadlock No thread can make progress even though it is possible to make progress Uh, so this is why the con var has to be outside the mutex The idea is that you sort of let go of the mutex at the same time as you notify the other thread Uh, this is why to get to the question that was raised earlier Why the con var requires you to give in a mutex guard You have to prove that you currently hold the lock and then it will make sure that does this step As one step as an atomic step Um, so here what we're going to do is we're going to match Uh on Q pop front, uh, and if it's some t Then we're just going to return t Uh, but if it's none we're going to block Uh, so we're going to do self inner available Wait And we're going to wait on the q Now, of course The the problem as the compiler also points out is that okay, we wait, but then what? So this actually ends up needing to be a loop Right, so we're going to be doing this in a loop And not only that, but if you look at the signature of weight You'll see that the weight actually gives you a mutex guard back And the idea is that if you get woken up you automatically have the mutex Someone else chose to wake you up and you now are sort of it basically hands the mutex to you And then you do something appropriate with it Um And so instead of having to lock it each iteration to the loop what we can do is this Right and this too can be poisoned if the previous holder was poisoned And so what we're going to do is we're just going to keep looping But this isn't going to be a spin loop, right? So if we end up in the non cause Uh clause then we're what we're going to do is going to wait for a sort of signal on this available con var And the operating system will make sure that the thread only the bed thread goes to sleep And then only wakes up if there's some reason for it to wake up And this also means that now the sender needs to make sure that it notifies any waiting receivers once it sends because otherwise Imagine that some thread enters this loop and it's just like sleeping and then a send happens We need to make sure that this thread wakes up if it doesn't we have a problem Right, uh, so we're going to use the cod var for this as well. So the oops inner So it has a notify one and a notify all call And we're going to drop the queue here So we need to drop the lock so that whoever we notify can can wake up and then we're going to notify one thread And because we are the sender We know that this will be a receiver that we wake up Does that make sense at the moment? Uh Vector double-ended queue is a vectec. Uh, yeah, basically. I mean a a vectec is just a A vector with a head and tail index Uh, isn't that kind of loop the russon de très d'autre? I don't know how to pronounce that in french I should probably not try uh of async. Um Not quite so this is fine like this loop is not a spin loop. Um, where you need async await is more if you're It is generally when you are io bound not cpu bound. Um It's for slightly different reasons. It's basically so that you don't need to have a million threads running Um Actually, I lied for the need for weight to take a guard Uh, come to think of it So you'll notice up here notify one does not require me to drop the mutex. It doesn't require me to hand in the mutex Um, but but you sort of need to do that regardless But weight requires you to give up the guard, right? The idea is that you can't wait while still holding the mutex You need to give up the mutex in order to wait because otherwise Whoever would have woken you up can't get the mutex. And so that's why it requires you to take the guard Um How is it protected from convar spurious wake-ups? Yeah So one thing that can happen with convars is when you call weight here the operating system doesn't guarantee That that you aren't woken up without there being anything for you to do and that's what the loop does here, right? So imagine that you're woken up for some other reason like not because a sender happened But just because of a signal to the process or some other random reason basically the the Operating system doesn't guarantee that you wake up for a reason Then what you'll do is you'll loop around you'll check the queue You'll realize it's still empty and then you'll go to sleep again. So that's fine Um How can someone send we have the mutex lock now when we're receiving so it blocks insertions Yeah, so that's weight gives up the lock Um, just before it goes to sleep and so that allows the sender to proceed Uh, I'm not using ale anymore. I'm using a coc vim coc neo vim Which gives me the like inline type annotations and errors um Wouldn't the lock be dropped after the notify? Yeah, but I specifically drop it before the notify So that when the other thread wakes up it can immediately take the lock um How does the Operating system know which thread to wake up it doesn't uh, so when we do notify one What that means is notify one of the threads that's waiting on this convar specifically um And because we know there's only one sender and many receivers we know that that must be a receiver because this is the sender Uh, wouldn't it be nice to use brackets around let queue and send instead of drop? Uh, I mean that's true. We could do this instead. I don't know that that's any nicer I prefer for it just to be a an explicit, um to be explicit about the the release Uh So But if you're woken up it takes the mutex For you So that's why we reassign to the the guard because we get it back when weight returns There's a wait with timeout right yeah, there is a wait with timeout as well, which does not necessarily give you the guard Uh, actually, I think it does give you the guard as well Uh, wouldn't only one thread empty the entire queue and only allow other threads again once it's empty No, because notice that we return when we manage to pop something from the front which releases the mutex um Is there a notification variant which takes the guard to drop it for you? Uh, not as far as i'm aware. No Uh, if end threads are waiting one of them is randomly chosen to be woken up Yeah, notify one does not guarantee which thread is woken up There's also notify all which notifies all the waiting threads Is it possible for the queue to be locked between the drop and when the receiver locks from another sender There's only remember this is Yeah, so it can it can be that there's another sender um It can be that there's another sender that also manages to push to the queue But that isn't really a problem right it just the receiver will still eventually get to go um So in the current setup right remember senders can basically never block here right the senders When a sender gets the lock it always succeeds in sending So there are never any waiting senders in the in the current design. We'll talk more about that in a second Um All right, so um this setup will actually work pretty well. Um, in fact, we can we can try it out here Available is going to be a con var Uh new And then the other thing we want to do is we Right The other thing we want to do here is we want to make sure that the sender is cloneable, right? So your first instinct might be to derive clone here Now you can clone the sender Unfortunately derive clone at least at the moment actually desugars into Imple t clone clone for sender t self to self With some auto generated stuff by the compiler here Uh, so this is what the if you put derive this is what it turns into And one thing you'll notice about this is that it it it added the The clone bound to t as well Very often this is what you want right because inner My like if the the struct you're deriving clone on contains the t Then t does need to be cloned and you order for you to clone the whole type in our case though arc Implements clone regardless of whether the inner type is cloned. That's sort of what reference counting means You can clone an arc and there's still only one of the thing inside And so for our implementation of clone, we don't actually need t to be clone We want this implementation, uh, and that's the reason why we need to implement, uh clone ourselves manually Luckily, it's pretty pretty simple though Um, it's just inner is self inner clone Um, and the actual way to write this is Here, this is technically legal, but it's usually not what you want to write The reason for this is imagine that inner also implemented clone Rust won't know whether this call is supposed to clone the arc or the thing inside the arc Because arc dereferences to the inner type And this the dot operator sort of recurses in in into the inner derefs And so usually what you want to do here is use arc clone Uh, to say that I specifically want to clone the arc and not the thing inside the arc All right, uh Couldn't lock block. Yes lock blocks. That's the whole point of lock And that's what we want right like send and receive should be blocking methods Uh, if if you run Well, I mean send, uh, if you run send will only block for short amounts of time But if you try to receive and there's nothing in the channel, we want the thread to block Uh, I don't understand why you're talking about waking up multiple receivers while implementing mpsc. Sorry, you're right. Uh, I misspoke I meant, um I meant there's only one receiver and therefore notify one will notify the right thing There are never sleeping senders in in our current setup Uh, will there be a max size on the queue? Uh, that will need senders to wait not currently although i'm about to get to it It's also easier to read that it's a trivial clone that way Uh, by removing the homebound. Yeah Is there a way to disable autodref? Yeah, you just don't use the dot operator. You use the the sort of unified method calling syntax Why does rust bubble e enter or whatever os equivalent up into the weight function? It often doesn't have a choice um, the weight implementation, uh, basically just Ends up being the os implementation often. They want to add as little as possible in between. Uh, except it like poisoning um So yeah, I think weight specifically says it does not give this guarantee that you don't get um, what I know is spurious wakes up wake ups Um, okay, so let's let's just like check that this works, um So we're gonna do mod tests Have a test Um And we're gonna do like a ping pong test We're gonna create uh tx and an rx Uh, I guess here we're gonna do super And that's gonna create a channel. I can't spell Uh, then we're gonna do tx dot send and we're gonna send uh 42 And then we're gonna cert equals Uh rx dot receive 42 And what did I call this thing? Panama, that's right And tx and rx both need to be mute Actually rx Rx needs to be mute Uh, actually rx doesn't technically need to be mute But we're gonna make it be mute. I'll show you that for a second uh and send Also technically doesn't need to be mute, but we might as well make it be mute Uh, which is why they need to be mute up here Now if we run it great the ping pong test succeeds So we now have we have an implementation that works. Uh, there are still some things wrong with it, but we have one that works um The only problem with our clone is that it cannot coerce to trait objects. You have to do it manually for example Our clone as arced in trait. Yeah, that's true Do you think tx and rx are good names for channels? I know std docs use that, but I've always hated it I like tx and rx Um, but but it's true. It sort of comes down to personal preference um all right So, um, the first and most most obvious problem with this is that the receiver Imagine that there are no senders left. So here we're gonna do We're gonna make this be closed So we're just gonna immediately drop the the sender and now the receiver It's not even clear what the receivers should do right like What happens when I now call receive is it's gonna block forever even though there are no senders left I guess they need to give this a type There are no senders left and the receiver tries to receive there can never be any future senders because In order to get a sender you have to clone a sender, but all the senders are gone So if you run this test, you'll see that it just hangs forever Which is obviously not great. So realistically what we want is some way to Indicate to the receiver that there are no more senders left that the channel basically has been closed um, and the easiest way to do this is to Have a we're gonna change the naming here a little and call this shared Uh shared Shared shared shared shared shared Shared I'll show you why I change it to shared is because really, uh, we want some additional data that's guarded by the mutex So we're gonna have the mutex protect an inner t and the inner t is going to hold the q So this is now going to be inner And it's also going to hold like a senders which is going to be a u-size Now, of course, this is going to be shared dot inner dot lock q Right. So the lock now guards both the Both the q but also this additional u-size that we added That should say inner That should say inner Great and now what we'll do is, uh, every time you clone a sender We're going to increase the number of senders in that in that value So when you clone it's actually going to take the lock And then senders is going to be inclement incremented by one. We drop the inner, uh, and then we clone the shared And then similarly we need to now deal with the case where a sender goes away So when a sender goes away, we also need to grab the lock And then one thing we want to keep track of here is like Whether we were the last one If the number of senders is now zero what might happen Is that like the receiver was blocking and then the last sender went away We need to make sure to wake it up. Otherwise it might never wake up And so we also call self shared available Notify one If we were the last uh, and then what the receiver has to do is, um Is it now basically needs to return an option t Right rather than just a t because it could be that the channel Truly is empty forever in which case we want to return none So now what we're going to do is in the none case if, um, the inner dot Inner senders is zero then we want to return none And I guess we don't we can do this a little bit nicer And only if the the sender count is more than zero do we actually want to block and wait Um So this is a this is addition of keeping track of the number of senders make sense I guess I'll show you that the test actually Right this test is now not quite right The inner is going to be an inner default Uh This can derive default actually Let's just not do that because that requires t to be default Um, so we're gonna have an inner b a q which is going to be an empty vectec Uh, and the number of senders initially is one And this is going to be a mutex new over that inner state I guess I need a semicolon Uh, this now is inner And now the test here needs to assume that here we got some Right this close test should in theory now, uh now succeed we can assert now That if we try to receive after the center goes away we get a none Oh Fun it does not work Let's figure out why Closed hangs forever can't the receiver check if shared is unique Um, potentially actually Uh, it could be that we can get away with Yeah, actually you might be right that we could use the the reference count in the arc instead Um It gets a little complicated because of weak references, but because we're only using strong references here It might be that we can get away with uh, so with arc there's a strong count Which you can give self you can give an arc and it tells you How many references there are To that arc how many instances of that arc there are And if there's only one then that must be the one of the receiver therefore there are no senders Uh, you're right So you get optimization Uh, so then we can get rid of the senders field Uh, and we no longer need to deal with the case Actually, this is the complicated case Um, if you drop a sender You don't know whether to notify because if the if the count is If the count is Two You might be the last sender or you might be the second to last sender and the receiver has been dropped So I think we're going to keep it the way it was. It's also easier to read There are plenty of optimizations you can make over the implementation. I'm more trying to build us like a representative way in which it might work Um Could use an atomic use size and shared rather than creating Inner uh, you could although the moment you take a mutex there isn't really that much of a value to it It would mean that you don't have to take the lock in Uh in drop and clone but those should be relatively rare Uh And the critical sections are short enough that the lock should be fast anyway Uh, wouldn't you want to notify all for drop? No, so when the last sender goes away that means that there's only the receiver left So there will be at most one thread waiting which will be the receiver if any Uh, so someone pointed out the receive should probably return result instead of option. I'll get to that Uh, can you overflow the sender count in theory? Probably not in practice Uh, is there any immediate benefit to adding to the mutex rather than atomic use size? I mentioned that a little bit Um, yeah, patrick. I'll get to your question later Um, what's the difference between vectec new and vectec default? Uh, none Uh I think the error was initializing senders to one in the constructor and then calling clone on the Sender we return Uh So in channel Uh, what we're cloning here is we're cloning the shared We're not cloning the sender. So this won't increment the sender count Um If you get false sharing in between the vectec and the sender count Uh, you could but they're under a mutex anyway, so That shouldn't matter Uh Can't you just notify every time a sender is dropped? You could but that would cause a lot of extra wake-ups Like you want to avoid spurious wake-ups because they're costly Like they're waking up a thread that didn't need to be woken up There's no correctness issue to waking up more threads. Uh, but it does is it is a performance issue um all right Great, so now we need to figure out why this didn't work. Why did the close test not work? Um So when we run it it hangs forever I'm presumably It hangs down here Yeah, uh, so it hangs on the receive And so the question becomes why does it hang on the receive? um We take the lock We try to pop from the front if it's some we return it if it's none And there are zero senders and return zero. So here's what we're gonna do We're gonna debug print to this value See what comes out Okay, so the senders is one So then the question becomes uh Why isn't the sender's count decremented here? Uh, probably won't let me do that drop sender count was this Oh, it's not dropped, huh? Or maybe it never gets the lock No, then it should hang sooner Yeah, so for some reason the sender is not being dropped. Why is that? I guess if I Do this does that make a difference? Interesting. Okay. So I was under the impression that uh assigning to underscore would drop immediately, but I guess that's not true I think there was actually an open discussion about this for a while Okay, so the it actually is correct. It's just that this does not drop tx apparently Which I found weird I thought that was the case, but apparently not. Uh, where's an explicit call to drop? We'll do what we want Um, I was a little surprised that that was broken because this implementation is pretty straightforward Okay, does that make so change sense so far? Why not use atomic use size instead of a mutex? Well, we need the mutex for the queue Um and because we have the mutex anyway, there's no The atomic use size doesn't save us anything Because we have to take the mutex. We might as well just also update the count under the mutex anyway It says we have it anyway Do you ever use gdb to debug Rust programs? I do but I find the print debugging is easier for for The especially for small examples like this Um I guess we can get rid of this to Great Okay, um, so it turns out there's still a problem with this implementation And that is it can go the other way around So this is a closed tx But what if there's a closed rx? Like what if we drop the rx here and then we try to do a tx send of 42 Um Actually, this isn't really a problem, uh this Oh, let's remove the type annotation here So if I do this, uh, I guess mute and rx If I do this this test will run just fine But the question is should it run fine if I try to send something on a channel where the receiver has gone away Maybe the right thing to happen is that I should be told that the channel has been closed Rather than the send just sort of blindly succeeding It's not entirely clear what the right answer is here This is sort of a design decision of whether a send should just always succeed, uh, or whether it should fail in some way Um I think in this case, we're just going to keep it the way it is because it it's okay. It's fine Um, but in in a real implementation You might you might imagine that you actually want send to get back a signal Like send returns like a result or something If the channel is closed some implementations do some don't Keep in mind though that if if you wanted this to be able to fail Then you have to make sure that you give back the value that the user tried to send Like if the send fails the you should user should be given back the value they tried to send So that they can try to send it somewhere else or log it or something like that And the basically the way you implement this is you add A sort of closed flag to the inner just a boolean that the sender sets And if the sender drops just like we have a drop for sorry if the receiver drops The closed flag is set in its drop and it doesn't notify all Although there aren't senders blocking in this particular implementation But it sets that flag and when you send if the flag is set you return an error rather than pushing to the queue Um Can we resurrect a drop channel? No, if the sender goes away, uh, you have no way to send anymore in our particular design Because the sender and the receiver have the same implementation In theory we could add a method that lets you construct a sender from the receiver Uh, most implementations are not quite as symmetric as this one And you can't easily create a sender from a receiver Um And uh in our implementation you could get a receiver from a sender But we wouldn't want to provide that because then people could create multiple receivers Which which would not work, right? It would be wrong with our notify one Actually this particular channel would kind of work, uh with a A multi producer multiple consumer But we're it's we're not we make some assumptions that make that annoying in the future Um There's an audio video sync problem with the stream I try twitch instead Sometimes it's better YouTube gets confused and is slow All right, uh, so now let's look at some design decisions here that the might not always make sense The first one here is that like every operation takes the lock and That's fine if you have a a channel that is not very high performance But if you wanted like super high performance like you had You have a lot of sends that compete with each other, for example Then you might you might not want the sends to contend with one another, right? Imagine that you have 10 threads that are trying to send at the same time Realistically, you could perhaps write an implementation that allows them to do that The only thing that really needs to be synchronized is the sender with the receivers The senders with the receiver as opposed to the senders with one another whereas we're actually locking all of them I'll talk a little bit about what that implementation might look like later The other thing that you might have noticed is when we looked at the standard library The standard library channel has a receiver and then it has two different sender types It has a sender and it has a sync sender And both of these if you construct one you get a receiver So the receiver type is the same, but the sender types are different And these are The difference between these is that one is synchronous and the other is asynchronous now This is not the same as async like you might be aware of it's not that kind of asynchronous What they mean when they talk about a synchronous channel is whether it Forces the senders and receivers to synchronize That is a Imagine that you have a sender that's much faster than the receiver In the current design that we have the sender would just pursue Produce content much faster than the receiver could consume it and the queue would just keep growing If you have a synchronous channel what that means is the the sender and receiver sort of go in lock step Basically the channel would have a capacity it would have a limited capacity So at some point if the sender sends so much that the and the receiver isn't consuming it as fast The channel would fill up and the sender would now block And so the the primary difference between a synchronous and asynchronous channel is whether sends can block In our implementation sends can't block right the send Here all the send does is it takes the lock pushes to the veck And then and then drops the lock and notifies the consumer And that works fine, but it does mean that there's no back pressure Right that the if the sender is too fast Nothing in the system is told that the receiver isn't keeping up So the advantage of a synchronous channel is that there's back pressure the sender will eventually start blocking as well Obviously this can this creates some additional challenges right like now you might have blocking senders And the receiver might have to notify the sender and be like hey I know that you were blocking, but I just received something you can now go ahead and send The way this works out in practice in a design that is based on like mutexes and con bars Is basically you need two con bars You need one for notifying the senders and one for notifying the receiver the way we're currently doing But you can guard them by the same mutex. I think is true Now in practice once you implement these designs, there are some other implementations you can go with that They're a little bit better suited And we'll we'll talk about those in a second In particular our channel method right it does not take any kind of capacity. It's just an infinite queue Whereas if you look at the standard library You'll see that There's a channel and there's a sync channel And the sync channel function takes a bound which is basically the channel capacity and it returns a sync sender and a receiver That that functions very much the same way as our Sender and receiver types except that the sends here are synchronous Questions about that Before i'm going to move on to like alternative implementations a little bit down the line Why not have senders use weak? If sender send tries to de-ref the week and returns none then fail and if the week count is zero Then you can know that the receiver receive will fail Um, so if the senders use weak Yeah, you could have the senders use weak. Um, in general, I don't think I would optimize this particular implementation too much Um, because there there are better implementations like we'll see later, but that are more complicated if the senders use weak what you would do is When you send you so weak is a A version of arc that doesn't increment the reference count, but you have a way to try to increment the reference count If it has if the reference count hasn't already gone to zero And so the sender would try to upgrade their their sender And if they succeed then they know that the receiver is still there and they try to send um Now one downside of weeks is that every time you try to send you have to atomically update the reference count and decrymented after So it actually adds a decent amount of overhead Um, is there a way to have a con var without a mutex? Uh, not really no as you see the con var weight requires you to have a mutex guard Uh, wouldn't send technically block if a send caused a vec resize So this is uh, an important point. I've spent a bunch of time on resizing in my research recently Uh, and this innocuous call to push back is not necessarily free Right the the pushback it might be that the vector the vec deck we're using has capacity 16 And you're pushing the 17th element to it in which case it also like allocate a new vec deck of capacity 32 Copy over the 16 elements the allocate the old one and then push the element and that takes some time Now this still isn't blocking like if you resize it's not blocking it's just it the send just takes longer Um, but it is true that it does mean that the send takes longer Uh, and that the in the meantime you can't do sends and you can't do receives Um in practice for for most implementations of these things, uh, you don't use A vec deck and you don't have this problem um Given the implementation you currently have how hard would it be to write an iterator implementation Which consumes values from the channel until all senders are gone and then ends with none? I'm really easy actually So if we do Imple we could even do iterator for receiver Um, and the type item is going to be t Next is going to take a mute self It's going to return an option self item And it's just going to call self receive And now um receiver is an iterator Are you planning to make a video about your resizing insights? Maybe it could be interesting It might be like a thesis talk video Is there a good way to send multiple items at once? Um in theory that would be pretty easy to add you could have like a send many that just uh appends Um, so that shouldn't be too hard and you would only need to take the mutex ones would Um, okay, so so there's actually um, there's one more optimization that that many implementations do that I think it's worthwhile mentioning here and that is Because we know that there's only one receiver and this is the first place We're going to try to encode that assumption into our code to make it more efficient Because we know there's only one receiver We don't really need to take the lock for every receive instead Here's the trick we can do Um I bear with me here. I'm just going to write the code first and then talk about it in a second Here's what we're going to do Dot q if not inner q Is empty Then what we're going to do is swap this Oh All right This optimization is a little cool And it's something you'll see in a lot of other implementations that don't necessarily use mutexes The idea is that because there's only one receiver Any time we take the lock we might as well steal all the items that have been queued up rather than just steal one Right because no one else is going to take them and if we call receive again We might as well just like keep a local buffer of the things that we stole last time Right, so what we're going to do is when someone calls receive We're going to just like first see if we last time we took the lock Whether we still have some leftover items that were there at the time and if so we can just return from there We don't even have to take the lock Only if the the sort of buffer is empty do we need to take the actual lock? Um, and when we do take that lock, then we try to take the front item If the queue is empty, then we do the same thing as before we have to wait But if the queue is not empty And we get an item then we check are there more items in the queue and if there are more items We just steal all of them We swap that vec deck with the one that we have buffered inside of ourselves And we leave the the empty self dot buffer It must be empty because we pop front returned none We leave that in its place So we just swap the two and now in subsequent calls to receive We just end up popping from the buffer until that's empty again And so this means that now instead of taking the lock on every receive if the we only take the lock Once every time there were no send no Additional sends between every time we lock if that makes sense It's a neat little optimization Um, it's not It's not really double buffering But it it has a little bit of that flavor It is true that this ends up keeping Sort of twice the amount of memory because you have two Vectex, they're both going to be growing as you add more items And you're going to be swapping between them So you do end up keeping two vectex in terms of capacity I want the receiver buffer optimization trigger a lot of extra memory allocator activity um only twice the amount Which is also amortized right the resizes happen every power of two pushes or power of two size And so in theory that this triggers twice as many resizes at predictable intervals This is also a good way to have To reduce the amount of contention right this means that now The the lock is taken fewer times and that means the lock will be faster to acquire So now that we have that implementation and we've talked a little bit through it How do you come to that optimization? Looking at implementations of channels got it from somewhere else. Yeah This is a pretty common implementation if you look at some of the the more optimized implementations. They generally pull this trick Uh It's important the swap is used rather than just discarding the local buffer each time Yeah, you could imagine that instead of doing this right we did like self dot buffer equals like For you standard mem take for example Uh, which just allocates a new vectec and leaves that in place of the one that's there Um, but that would be much more inefficient because the the one you leave there is a new allocation You would deallocate the old self dot buffer. This lets us reuse it Do you think it might be faster without the branch for the swap? Could be I mean we could do this There's nothing really stopping you from doing it. Um It this does Yeah um In terms of which one is faster Without the branch is probably faster But not probably not by a significant amount the branch predictor should be pretty good at that because generally either your channels aren't Usually empty or usually not empty If the channel is usually empty then the branch will usually be false And so the branch predictor is going to do well If the channel is usually not empty then the branch predictor will predict that it's not empty. And so it will do well Oh the branch predictor, uh, the cpu has a built-in component that observes all your ifs all your conditional jumps And it tries to remember whether it took the branch or not the branch last time And then this is where sort of speculative execution comes into into play where if it runs that code again The branch predictor is going to say it's probably going to take the branch or it's probably not going to take the branch So start running that code under the assumption that it will or won't Um, and then if it doesn't end up doing that then like go back and unwind what you did and then do that stuff instead Uh, or maybe receive can just return a list of values It could it's usually nicer for receive just to return an option So you can use it as an iterator and this way it'll just be fast regardless If we return to list we would have to allocate the list every time What about a extending buffer with inner queue drain? This would save memory will probably be a lot slower It wouldn't actually save memory. You would still end up with both of them having to have capacity uh all right So now I think it's time that we um Now that we have an implementation that's pretty reasonable. It's time that we try to talk about some alternative implementations Um Usually what you'll see is that there are multiple So there are two kinds of Implementation differences you're going to see Uh, these are either these are usually referred to as flavors. Um, so Well, actually there are more than two. Let's say they're there are multiple different kind of flavors you'll see Um, and usually they take one of two approaches One is that there are different types for different implementations of channels We saw an example of this in the standard library where there are two different sender types for for the different flavors, right? One synchronous flavor and one asynchronous flavor, although not async, but asynchronous, uh, as we talked about for channels The other approach is to instead just have a single sender type and then Under the hood have basically think of it as an enum although that's usually not how it's implemented of under the hood it like Figures out what type of channel it is, uh, and that way you can use the same sender type no matter where you are in practice the implementations tend to vary in what they do, but they all Usually have this notion of flavors and the idea behind flavors is that you Um, you have multiple implementations of your channel multiple backing implementations And you choose which one you use depending on how the channel is used So flavors I can't spell So there are some common flavors that we've seen. Uh, one is synchronous Uh, synchronous channels one is asynchronous channels Another is rendezvous channels Uh, which we haven't talked about yet. Uh, and the last is one-shot channels Uh, these are usually the flavors you see sometimes they're represented as different like explicitly different channel types Uh, but very often they're sort of under the hood. You won't see whether or not they're there Um, it's something that is dynamically chosen. So a synchronous channel. Um, this is a Channel where send blocks Where send can block Usually it has limited capacity This is a channel where Send cannot block Uh, and this is usually, um unbounded So any number of sends you can build up as much as stuff at an impossible in memory a rendezvous channel is a Synchronous channel with capacity equals zero So the idea here is that a rendezvous channel is really It doesn't let you send things It's usually a channel that you use only to synchronize two sides Very often you see rendezvous channels just have the have t be like unit like the the empty tuple and the idea here is that it's used for I don't want to say time synchronization but for like thread synchronization The idea is that if you have you have one thread that you want to kick another thread to make it do something You don't actually want to send anything to it. You just want it to do stuff Um, that's where you get into a rendezvous channel, right? You create a channel that has capacity zero So and what capacity zero means is that you can only send if there's currently a blocking receiver Because you can't store anything in the channel itself So the only way to actually send is to like hand it over to a thread that's currently waiting Uh And that that basically means that both threads must be like one thread must be at the receive point of its execution And one thread gets to its send point and now they've rendezvoused They they're both at a known location and then they can move forward from there This is often achieved with something called a barrier you find whether those in standard sync as well But you can do it with a channel. Uh, the channel version is Is still it ends up being a two-way synchronization Because the receiver also can't proceed until the sender arrives Uh, I'll I'll get to questions one once I've done the last flavor and one shot channels are Channels that you only send on once Um, so usually these can be any capacity Uh, although in practice only one called to send So these are often things like Uh, imagine that you have an application where you have a Channel they use to tell all the threads that they should exit early, right? Like the user pressed control c or pressed x or something and you want them just all to shut down You might have a channel that you only send on once and you don't send anything useful, although you could Uh, and then the all the like that Some thread is running somewhere when you send the signal the thread is gonna like drop what it's doing and shut down And so that channel is only ever used for one send. So it's a one shot channel And these flavors are different enough that you can have different implementations that take advantage of their patterns And we'll look at some of those in a second All right, let's do questions about flavors Uh, I've heard of the term bounded and unbounded for synchronous and asynchronous in case others are wondering. Yeah synchronous channels are often called bounded channels Because that's what they are asynchronous channels are often called unbounded channels if you look at Um The ones in tokyo sync and I think also the ones in cross beam Um, and in fact in many of the other implementations. These are referred to as Unbounded and bounded in particular because of the the potential confusion with async Ronde Devoo rendezvous Right, sorry Um, uh, so basically what you should use if you need a con var, but don't have a mutex with locked data Um, so a rendezvous Is not a mutex because it doesn't guarantee mutual exclusion It is sort of like a con var in that you can wake up another thread That's true, but it doesn't give it doesn't give you any way to um, also guard the data More like unix pipes. I mean all channels are sort of like unix pipes Can rendezvous channels actually send anything useful? Yeah, so the the way this works You can totally send data in a rendezvous channel, right? It still has the t type But the idea is that if the sender can only send if the receiver is present And the receiver can only receive is if the sender is currently present If they both are present then then you can just like hand the data over you can think of it that way It's just that the the sender can't like put data somewhere and keep going because that the capacity is zero But if there's a handover it can hand data over. Yeah Seems rust has a lot of specific input design of channels where gol just has a simple channel implementation Um, well go only has a simple channel implementation as far as i'm aware go has all of these implementations Um, so specifically in go. There's only one type just like in rust There's only one type at least for things like crossbeam. These flavors aren't different channels Like they're not different types in the type system They're different implementations that are chosen between at runtime and from memory go does the same thing it the way it works is basically Initially you assume that the channel is a one shot one shot channel and the moment an additional send happens You like upgrade it to be a different type of channel And so this means that the first send will be more efficient in a sense than the later ones Um, similarly a rendezvous channel, you know, it's a rendezvous channel because the capacity is set to zero So you can just choose that flavor Uh and synchronous and and unsynchronous you choose based on what the capacity is set to um Could a synchronous channel or sync channel where t equals unit be a rendezvous channel Yes, that is that is like a rendezvous channel is any channel whose capacity is zero Specifically it is a synchronous channel for any t, uh, where the capacity is set to zero Kind of like a baton pass. Yep Um Okay, so in the last few minutes what I want to talk about is is different implementations I also want to try to touch on async as an actual async as an async await and futures We'll see whether we get to that So for a synchronous channel, uh What we implemented was a mutex Um, we didn't actually implement a synchronous channel. We implemented an asynchronous channel But you can do a synchronous channel with a mutex plus con var as well um And usually what you do behind the scenes is is very much the same thing. Uh use a vectec, uh, and You just like have the sender block if the vectec happens to be full Um, so the implementation is fairly similar for an a, uh If you want to not use a mutex What do you do? There are a couple of different approaches here um the simplest way, uh, is that you use um You use basically an atomic vectec Or an atomic queue And the way this usually works is you have head and tail pointers Just like the way that a vectec is implemented But you update them atomically, uh, and this means that you can now you don't need to take a mutex in order to send Which happens to help a lot And as long as you update the sort of head and tail in the vector atomically Um There are ways to ensure like there's basically an algorithm for how to implement this uh this data structure In such a way that that no thread ever tries to touch data that another thread is currently touching Uh, and then for wake-ups you use sort of the thread park, uh, and, uh, thread Thread Notify primitives that are in the standard library Or you can use the ones from parking lot if you wish, uh to ensure that things are woken up appropriately Basically, you need some signaling mechanism, right where if the sender If the sender is asleep because it's blocking The receiver needs to wake it up if it receives something because now there's capacity available Or similarly if the receiver is blocking because the channel is empty and a sender comes along and sends something It needs to make sure to wake up the receiver Um, and so you need some kind of notification mechanism often it's park and notify although it doesn't have to be Uh, and this kind of like atomic, uh, vectec is very often the implementation you see It's basically a fixed size array, uh, where you atomically keep track of the head and tails Um, that's also the only implementation I really know about there Um, I think flume which is a one of the implementations I'll point out later I think it actually uses a mutex But it does it in a slightly smarter way than we have we have a sort of dumb implementation of mutexes But there's some smarter tricks you can play and some of them have been mentioned in chat already Of like tricks you can play with the mutex implementation to make it slightly faster Like take advantage of the fact that there's an arc there Um, that I believe flume does Uh, and I think crossbeam uses the atomic vectec approach or they're not actually using a vectec But they're using a sort of head and tail pointer implementation um asynchronous channels, uh similarly the thing we did was mutex convar and a and a vectec Um in practice vectecs have some sad properties like resizing So very often what you want to do and this is one of the few places you actually want to do this Is you use a linked list? So what that means is you never resize right because the when a sender comes along You just append to the linked list or you you don't even have to append you can pop from the linked list You sorry not pop you can push to the front of the linked list Um, and then what the receiver does is it just steals the whole linked list It like sets the head to null or to to none and it steals the whole linked list and then it walks it backwards And that is an implementation that doesn't require resizing. Uh, it doesn't have the memory problems that the vectec does And it plays the same trick that we did below where if you take the mutex as a receiver You can steal all the items rather than just one Usually you want this to be a doubly linked list so you can efficiently get to the end Or you can just keep track of the of the tail Uh and for for non mutex implementations, usually what you see here is an atomic linked list. Uh, this is uh often refer to as like an atomic atomic linked list or a Just an atomic queue So here you it's basically the It's not a ring buffer like a vectec is But it's an it's an actual atomic queue. Uh, usually that the implementation is a linked list, but it doesn't have to be In cross beam what they do is actually kind of interesting. Uh, it is a an atomic I don't know what to call it like block linked list um And the idea here is that rather than Rather than have like every push be an atomic operation to Append a new item to the list what you do is you sort of mix this like atomic Uh head and tail thing with an atomic linked list. So instead of having a Uh, so this is a linked list of T This is a linked list of like a Atomic Vectec T So only occasionally do you need to Do the sort of Only occasionally do you need to actually append to the linked list? Which is uh, It's a problematic operation because if imagine you have two Senders that both want to send at the same time with a linked list One of them is going to succeed in updating the next pointer of the tail of the linked list But the other will fail and we'll have to retry if you have a List of these blocks of teas then only occasionally does a thread actually need to update the next pointer Usually they just need to incur increment the tail which turns out you can do concurrently using fetch add And so this this sort of block atomic linked list turns out to be a lot more efficient in practice And here too you need some signaling mechanism for waking people up For rendezvous channels, uh, it turns out you don't need the linked list at all All you really need is this wake-up primitive, uh, and then a single place in memory to to sort of Store the item for the handoff I haven't looked too carefully at the flavor implementations of this Um, I know that crossbeam has one the standard library has one. I don't think flume has this optimization yet But the trick to play here is basically you can get rid of the whole like linked list part And you can have a much simpler implementation that that just synchronizes the threads Almost all you need is a mutex and a convar Well in practice, I think these things are they're a little bit smarter and a one-shot channel If you know that you have a one-shot channel is the same thing You don't need a linked list of any kind. Uh, you only actually need to store the one tea Um, and because you because you know, there's only one item Uh, what you can do is basically just have an atomic place in memory That is either like none or some and you can just atomically swap the element in there Think of it like a single slot and so the sender just fills the slot and the receiver Consumes the slot uh and marks the thing as sort of empty at the same time or Completed at the same time So these are basically you can write more specialized implementations that are faster for those use cases Okay, let's discuss that briefly. We're getting towards the end here Um How do async away channels different implementation and we'll look at that in a second Uh, youtube stream is gone. Yeah, youtube is not always great at streams Uh Or does no linked list guaranteed to always end up doing an allocation deallocation on each push and pop Yes, um with a linked list you will be Um, you will be allocating and deallocating On every pusher pop, right? The push is going to allocate a node to stick on the end of the linked list And a pop will um have to deallocate that node. This is another advantage of the the sort of block linked list variant. Um of course Often the allocations the memory allocation system is not your bottleneck. Um, usually especially if you're seeing something like jamaloc You have basically thread local allocation. So this turns out to be fine Um, but it is true that that is a that is a downside So you're really measuring like memory overhead versus memory allocator performance Um Wouldn't a smart list just keep spare nodes around? So one option is to basically keep a pool of these nodes and reuse them the problem is now you need to atomically manage the pool Uh, which also needs a bunch of synchronization primitives to do correctly, but in theory you can um, it's not clear that You can write a better implementation of a reusing pool than the memory allocator can allocate and deallocate memory um Maybe but it's unclear. You might want to use like an arena allocator and that might work well um all right So I think the last thing I then want to touch on because we're sort of out of time But uh, I'll do it anyway, which is um, there were some questions about like async await. What do you do with async await? um It's pretty hard to write a channel implementation that works for Both async await like the futures world and for the the blocking thread world because the the primitives are a little bit different right in if you do um If you do ascend and the channel is full then in the async await world what you want to do is you don't want to block you want to yield to the the To the the parent future yield to the executor ultimately yield to the task um, and then at some point in the future you'll be woken up to pull again um And that sounds a little bit like sort of waiting on a convar But in practice it's not quite the same because you actually need to return Rather than sort of you don't get to sit in the current function And The same thing for receive of course and the notification primitives are a little bit different Although they do have the same flavor of like you notify a waker That's going to cause that other thing to be pulled again So they're similar not quite the same where it gets hard is to write an implementation that Internally knows whether it's being used in async context like in future context Or in blocking context Without exposing that to the user Very often what you might end up with is like some additional type parameter that is like waker or or um like signaling Mechanism which gets really ugly Now there are ways to do this If you look at both flume and cross beam, I believe have both blocking and asynchronous Versions if you look in the code you might be able to see how they do that and basically it requires a bit more bookkeeping you have to be a little bit more finicky with the types um And usually what you end up with is a channel that looks much the same But not quite the same and the at runtime it basically ends up diverging into different ways of managing the underlying data store Or the underlying data structure If whether depending on whether you're in the blocking world or not For example often if you're in the blocking world, you can do Some additional optimizations. You can't always do in the async world and vice versa They're just a little different But in practice the the data structure that's used like whether you're using a vectec or atomic linked list or or anything like that That is fairly similar and the flavors that exist are fairly similar You could probably beat the allocator because we always need allocations of the same size Allocators are really good at taking advantage of repeated patterns So it's not clear to me actually Because it's it's really hard to write good performant basically garbage collection and the memory allocator has had a lot of practice with it You could use a bump allocator or something like that I'd use any channels in your thesis Yes, many I would never roll my own channel. It's a bad idea unless you specifically want to work on concurrency stuff which like is fun, but I use the tokyo channels because I needed one though at async and the tokyo one. I know pretty well I don't have a particularly strong feeling about it. Noria is not really channel bound So the decision wasn't that important I couldn't use the standard library one because it didn't support async away And I think cross beam at the time when I started adding them didn't but I might be wrong about that Um Great Um, okay So just to give you some pointers on where to look next if you if you're curious about this If you want to see what how a real implementation work I recommend that you actually have a look at the standard library implementation If you go look at the the mpsc model module The probably don't want to read it through like the docs rs interface But it has really good documentation like what's going on under the hood like what is the implementation Some of the optimizations that they do like internal atomic counters Similarly, if you go to cross beam, this will be bright for the people of you who are reading this at night In cross beam, there's a cross beam channel sub directory that holds the cross beam channel implementation And if you look at it, you'll see that there's a flavors directory that holds all the different flavor implementations You see array is the one that's like for synchronous channels that does this head tail business There's list for the atomic block link list And some of these are for things like rendezvous channels and One-shot channels Select we didn't really get into but there's usually a bunch of additional stuff you need to do to support selection The selection is the ability to for example Receive from one channel or the other whichever sends first, which requires some additional mechanisms in the implementation There's also flume. So flume is a Different implementation of channels that popped up fairly recently It has a very different implementation to what Cross beam does like there's no unsafe code and part of the idea here is that it uses mutexes under the hood But in a slightly more clever way I think the experience I've had is that cross beam is better for very high contention cases Because it doesn't use mutexes whereas flume is often faster for Cases where contention is lower Because then mutexes end up not adding quite as much overhead All right, I think that's all I wanted to cover Let's see are there questions about this before we end for today Like sort of at the tail end of the channel here And he thought some benchmarking channel implementations Benchmarking and channel implementations is hard. I know there's been some work on this I forget where that was I think burnt sushi Did a bunch of benchmarking of channels looking at like go channels The standard library channels flume cross beam channel His um chan crate, which I think is deprecated now And that's worth looking into in general when you benchmark channels you want to try to basically benchmark all of the different flavors because they do represent real use cases You want to benchmark things where you send the things you send or large the things you send are small There are many senders. There are few senders So for example, if you have a single producer single consumer case You might be able to optimize better for that basically write a flavor for it And that might be able to perform much better Than your your sort of general multi producer single consumer version You want your benchmark to test Cases where the channel is usually full for a bounded channel cases where the channel is usually empty Which basically means you adjust the relative rates of production and and consume calls The number of senders is important. Like how do you scale with the number of senders? Um So the you basically want to like do like a grid of all the possible configurations and then try to benchmark each one separately Um Round of your channels are like the default go channels with zero capacity. Yeah, that sounds about right A bump allocator would be really good since you would likely allocate memory atomically quite possibly and also because it You don't need to drop anything Uh in that case because the memory has already been handed off So the drop implementation is a no op You might be able to use something like bump below, which is pretty cool Um How do they support async without tying it to a specific executor like tokyo? Um So the primary reason for the the current Like Lack of harmony in the async await ecosystem is is around the iotrates Like async read and async write and also the the spawn feature being able to run a future in the background For implementing a channel, you don't need either of those All you need is like the primitive that's provided by the standard library Which is the the waker trait and the ability to sort of yield or go to sleep and the ability to wake something up Or notify something and those are the same they're they're from the standard task module in the standard library And so you can use those independently of what the executor is And so that's why a channel sort of trivially is cross executor uh If you have a sleeping center thread and you'd like to wake it if the receiver is dropped so they can free up resources Is there a standard way to do that just have it wake up every few seconds to check No, you do it the same way we did in our implementation here Right, you implement drop for you would implement drop for the receiver where It will do a notify all to wake up all sleeping centers, which could then do Whatever freeing up of resources it needs to do All right. I think we got it I will as always like Put the recording up on youtube It might even have the intro that I like demoed early to the people who shut up to the stream early It should be up. Hopefully in like a couple of hours. I'll tweet about it as I always do And apart from that, thanks for watching Hopefully you learned something Hopefully there'll be some more of these and as always just like follow me and you'll you'll learn about upcoming ones I also announced them on discord now There's like a channel with automatic updates whenever I go live whenever I upload a video Or when there are upcoming streams and that also includes other rust streamers We have a bunch of them on there now including steve klapnick, which is pretty funny So like join the discord. I'll put the link somewhere. Maybe someone put it in chat already And I'll see you there or on twitter or in the next video. Thanks for joining