 Today, the TAs are going to be giving a lecture on concurrency and go. Basically, this lecture is going to be full of design patterns and practical tips to help you with the labs. We're going to be covering briefly the Go memory model, the reading which we went over, and then spend most of the lecture talking about concurrency primitives in Go, concurrency patterns in Go, how you do things that you will need to do in the labs. And then finally, we'll talk through some debugging tips and techniques and show you some interesting tools that you might want to use when debugging the labs. So very briefly on the Go memory model, on the reading. So why did we assign this reading? Well, the goal was to give you some concrete examples of correct ways to write threaded code in Go. So the document, like in the second half of the document, has some examples of correct code and then incorrect code and how it can go wrong. So one thing you might have noticed in the document is early on it says, if you need to read and understand this, you're being too clever. And we think that that's good advice. So focus on how to write correct code. Don't focus way too much on the happens before relation and being able to reason about exactly why incorrect code is incorrect. Like, we don't really care. We just want to be able to write correct code and call it a day. One question that came up in the lecture questions was talking about Go routines in relation to performance. And so we just wanted to say that Go routines, and like in general concurrency, can be used for a couple different reasons. And the reason we use concurrency in the labs is not necessarily for performance. Like, we're not going for parallelism using multiple cores on a single machine in order to be able to do more work on the CPU. Concurrency gets us something else besides performance through parallelism. It can get us better expressivity, like we want to write down some ideas. And it happens to be that writing down code that uses threads is a clean way of expressing those ideas. And so the takeaway from that is, when you use threads in Lab 2 and beyond, don't try to do fancy things you might do if you're going for performance, especially CPU performance. Like, we don't care to do things like using fine-grained locking or other techniques. Use basically write code that's easy to reason about. Use big locks to protect large critical sections. And just like, don't worry about performance in the sense of CPU performance. So with that, that's all we're going to say about the memory model and spend most of this lecture just talking about Go code and Go concurrency patterns. And after we go through these examples, feel free to ask any questions about what's on the screen or anything else you might think about. So I'm going to start off talking about concurrency primitives in Go. So the first thing is closures. This is something that will almost certainly be helpful in the labs. And this is related to Go routines. So here's this example program on the screen. And what it does is the main function declares a bunch of variables and then spawns this Go routine in here with this Go statement. And we notice that this Go routine is not taking in as an argument a function call to some function defined elsewhere, but this anonymous function just defined inline here. So this is a handy pattern. This is something called a closure. And one neat thing about this is that this function that's defined here can refer to variables from the enclosing scope. So for example, this function can mutate this variable A that's defined up here or refer to this weight group that's defined up here. So if we go run this example, it does what you think it does. The weight group dot done here lets the main thread continue past this point and prints out this variable which has been mutated by this concurrently running thread that finished before this weight happened. So this is a useful pattern to be able to use. One, like the reason we're pointing this out is because you might have code that looks like this in your labs. Very similar to the previous example, except this is code that is spawning a bunch of threads in a loop. This is useful for example, when you want to send RPCs in parallel. So like in lab two, if you have a candidate asking for votes, you want to ask for votes from all of the followers in parallel, not one after the other because the RPC is a blocking operation that might take some time. Or similarly, the leader might want to send append entries to all of the followers. You wanna do that in parallel, not in series. And so threads are a clean way to express this idea. And so you might have code that looks kind of like this at a high level. In a for loop, you spawn a bunch of go routines. One thing to be careful about here, this is something that was talked about in a previous lecture, is identifier, capture and go routines and mutation of that identifier in the outer scope. So we see here that we have this i that's being mutated by this for loop. And then we want to use that value inside this go routine. And the way we do that, like the correct way of writing this code is to pass this value i as an argument to this function. And this function, you can rename it to x inside here and use the value inside. And so if we run this program, so here I've kind of stubbed out the send RPC thing was actually just prints out the index. This i might be like the index of the follower trying to send an RPC to. Here it prints out the numbers zero through four in some order. So this is what we want, like send RPCs to all the followers. The reason we're showing you this code is because there's a variation of this code which looks really similar and maybe intuitively you might think it does the right thing, but in fact it doesn't. So in this code, the only thing that's changed is we've gotten rid of this argument here that we're explicitly passing. And instead we're letting this i refer to the i from the outer scope. So you might think that when you run this, it does the same thing, but in fact in this particular run it printed four, five, five, five, five. So this would do the wrong thing. And the reason for this is that this i is being mutated by this outer scope and by the time this go routine ends up actually executing this line, well the for loop has already changed the value of i. So this doesn't do the right thing. So at a high level, if you're spawning go routines in a loop just make sure that you use this pattern here and everything will work right. Any questions about that? So it's just like a small gotcha but we've seen this a whole bunch of times in office hours so we just wanted to point this out. All right, so moving on to other patterns that you might want to use in your code. Oftentimes you'll want code that periodically does something. A very simple way to do that is to have a separate function that in an infinite loop does something. In this case we're just printing out tick and then uses time dot sleep to wait for a certain amount of time. So very simple pattern here. You don't need anything fancier than this to do something periodically. One modification of this that you might want is you want to do something periodically until something happens. For example, you might want to start up a raft peer and then periodically send heartbeats but when we call dot kill on the raft instance you want to actually shut down all these goroutines so you don't have all these random goroutines still running in the background. And so the pattern for that looks something like this. You have a goroutine that will run in an infinite loop and do something and then wait for a little bit and then you can just have a shared variable between whatever control thread is going to decide whether this goroutine should die or not. So in this example we have this variable done that's a global variable and what main does is it waits for a while and sets done to true and in this goroutine that's ticking and doing work periodically we're just checking the value of done and if done is set then we terminate the goroutine. And here since done is a shared variable being mutated and read by multiple threads we need to make sure that we guard the use of this with a lock. So that's where this mute.lock and mute.unlock comes in. For the purpose of the labs you can actually write something a little bit simpler than this so we have this method rf.killed on your raft instance so you might have code that looks a little bit more like this. So while your raft instance is not dead you want to periodically do some work. Any questions about that so far? Yeah, question? Yeah so it's able to observe the done with a written true to the other thread but in general this is not guaranteed. Does using the locking mechanisms or channels make it so that any writes done to any variables in those functions are guaranteed to be observed by the buzzer which you need to send done across a channel to ensure that. Okay so let me try to simplify the question a bit. I think the question is do you need to use locks here? Can you use channels instead and can you get away with not using locks and like what's the difference between nothing versus channels versus locks? Is that basically what you're asking? You don't need to send that you're done down a channel that's able to observe done from another thread. Okay so I think the question is this done does it not need to be sent across a channel? Does just using these locks ensure that this read here observes the write done by a thread? Okay so the answer is yes. Basically at a high level if you want to ensure cross thread communication make sure you use go synchronization primitives whether it's channels or locks and condition variables. And so here because of the use of locks after this thread writes done and does unlock the next lock that happens is guaranteed to observe the writes done before this unlock happened. So you have this write happen then this unlock happen then one of these locks happens and then the next done will be guaranteed to observe that write of true. Question? That's a good question. In this particular code it doesn't matter but it would be cleaner to do it. So the question is why don't we do mu.unlock here before returning and the answer is in here there's no more like the program's done so it doesn't actually end up mattering but you're right that like in general we would want to ensure that we unlock before we return. Yeah thanks for pointing that out. So in main we're requiring the lock what's the order in the instance? So I'm not sure entirely what the question is but maybe something like can both of these acquire the lock at the same time? Is that the question? Yes so the idea is that, all right I guess how can, I think we, all right. And we'll talk a little bit more about locks in just a moment but at a high level the semantics of a lock are the lock is either held by somebody or not held by somebody and if it's not held by somebody then if someone calls lock they have the chance to acquire the lock. And if before they call unlock somebody else calls lock that other thread is going to be blocked until the unlock happens and the lock is free again. So at a high level between the lock and the unlock for any particular lock like any, only a single thread can be executing what's called the critical section between the lock and unlock regions. Any other questions? In this code periodic doesn't necessarily have to, even if you are waiting, right? So you can read the case that some other thread just blocks periodic and you're actually going to be waiting for a second. So wouldn't you want to have somewhere, a subcode where it can read weights for periodic to somehow run? So the question is related to timing. Like when you set done equals true and then you unlock you have no guarantee in terms of real time like when periodic will end up being scheduled and observe this right and actually end up terminating. And so yes, if you wanted main to actually ensure that periodic has exited for some particular reason then you could write some code that communicates back from periodic acknowledging this. But in this particular case, like the only reason we have this sleep here is just to demonstrate that the sleep here is just to demonstrate that take prints for a while and then periodic is indeed canceled because it stops being printed before I get my shell prompt back. And in general for a lot of these background threads like you can just say that you want to kill them and it doesn't matter if they're killed within one second or within two seconds or when exactly go schedules it because this thread is going to just observe this right to done and then exit do no more work. So it doesn't really matter. And also another thing in go is that if you spawn a bunch of go routines one of them is the main go routine this one here and the way go works is that if the main go routine exits the whole program terminates and all go routines are terminated. That's a great question. Okay, so I think the question is something like why do you need locks at all? Like can you just delete all the locks and then like looking at this code it looks like okay main does a right to true at some point and periodic is repeatedly reading it. So at some point it should observe this read, right? Well it turns out that like this is why go has this fancy memory model and you have this whole thing on the happens before relation. The compiler is allowed to take this code and emit a kind of low level machine code that does something a little bit different than what you intuitively thought would happen here. And we can talk about that in detail offline after the lecture and office hours. But at a high level I think one really can follow is if you have accesses to shared variables and you want to be able to observe them across different threads you need to be holding a lock before you read or write those shared variables. In this particular case I think the go compiler would be allowed to optimize this to like lift the read of done outside the four. So read this shared variable once and then if done is false then set like make the inside be an infinite loop because like now the way this thread is written it uses no synchronization primitives. There's no mutex lock or unlock no channel sends or receives. And so it's actually not guaranteed to observe any mutations done by other concurrently running threads. And if you look on Piazza I've actually like written a particular go program that is optimized in the unintuitive way. Like it'll produce code that does an infinite loop even though looking at it like you might think that oh the obvious way to compile this code will produce something that terminates. Yeah so the memory model is pretty fancy and it's really hard to think about why exactly incorrect programs are incorrect but if you follow some general rules like hold locks before you mutate shared variables then you can avoid thinking about some of these nasty issues. Any other questions? All right so let's talk a little bit more about mutexes now. So why do you need mutexes? At a high level whenever you have concurrent access by different threads to some shared data you want to ensure that reads and writes of that data are atomic. So here's one example of a program that declares a counter and then spawns a go routine or actually spawns a thousand go routines that each update the counter value and increment it by one. And you might think that looking at this intuitively when I print out the value of the counter at the end it should print a thousand but it turns out that we missed some of the updates here and in this particular case it only printed 947. So what's going on here is that this update here is not really protected in any way and so these threads running concurrently can read the value of counter and update it in clobber or other threads updates of this value. Like basically we want to ensure that this entire section here happens atomically. And so the way you make blocks of code run atomically are by using locks. And so in this code example we've fixed this bug, we create a lock and then all these go routines that modify this counter value first grab the lock, then update the counter value and then unlock. And we see that we're using this defer keyword here. What this does is basically the same as putting this code down here. So we grab a lock, do some update, then unlock. Defer is just a nice way of remembering to do this. You might forget to write the unlock later. And so what defer does is it, you can think of it as like scheduling this to run at the end of the current function body. And so this is a really common pattern you'll see. For example in your RPC handlers for the lab. So oftentimes RPC handlers will manipulate either read or write data on the RAF structure, right? And those updates should be synchronized with other concurrently happening updates. And so oftentimes the pattern for RPC handlers will be like grab the lock, defer unlock and then go do some work inside. And so we can see if we run this code it produces the expected results. So it prints out a thousand. And we haven't lost any of these updates. And so what, at a high level what a locker or mutex can do is guarantee mutual exclusion for a region of code which we call a critical section. So in here this is the critical section. And it ensures that none of these critical sections execute concurrently with other ones. They're all serialized, happen one after another. Question? So technically this code doesn't have the proper use of mutex on lock before or after any time. Yeah, so this is a good observation. This particular code is actually not guaranteed to produce a thousand depending on how thread scheduling ends up happening because all the main goroutine does is it waits for one second, which is some arbitrary unit of time and then it prints out the value of the counter. I just wanted to keep this example as simple as possible. A different way to write this code that would be guaranteed to print a thousand would be to have the main goroutine wait for all these thousand threads to finish. So you could do this using a wait group for example. But we didn't want to put two synchronization primitives like wait groups and mutexes in the same example. So that's why we wrote this code that is like technically incorrect. But I think still demonstrates the point of locks. Any other questions? Great, so at a very high level you can think of locks as like you grab the lock, you mutate the shared data and then you unlock. So does this pattern always work? Well, turns out that that's like a useful starting point for how to think about locks but it's not really the complete story. So here's some code. This doesn't fit on the screen but I'll explain it to you and we can scroll through it. It basically implements a bank at a high level. So I have Alice and Bob who both start out with some balances and then I keep track of what the total balances like the total amount of money I store in my bank. And then I'm going to spawn two goroutines that will transfer money back and forth between Alice and Bob. So this one goroutine that a thousand times will reduce one from Alice and send it to Bob and concurrently running I have this other goroutine that in a loop will reduce one from Bob and send it to Alice. And notice that I have this mutex here and whenever I manipulate these shared variables between these two different threads I'm always locking the mutex and this update only happens while this lock is held. And so is this code correct or incorrect? There actually isn't really a straightforward answer to that question. It depends on like what are the semantics of my bank? Like what behavior do I expect? So I'm going to introduce another thread here. I'll call this one the audit thread and what this is going to do is every once in a while check it, check the sum of all the accounts in my bank and make sure that the sum is the same as what it started out as. Like if I only allow transfers within my bank the total amount should never change. So now given this other thread, so what this does is it grabs the lock then sums up Alice plus Bob and compares it to the total and if it doesn't match then it says that oh I've observed some violation that my total is no longer what it should be. If I run this code I actually see that a whole bunch of times this concurrently running thread does indeed observe that Alice plus Bob is not equal to the overall sum. So what went wrong here? Like we're following our basic rule of whenever we're accessing data that's shared between threads we grab a lock. It is indeed true that no updates to these shared variables happen while the lock is not held. What we're intending for this to do is to decrement and increment atomically but what we're actually doing is taking the decrement atomic and then increment atomically which doesn't guarantee that if we're in the middle we'll see the total output. Yeah exactly, so let me repeat that for everybody to hear. What we intended here was for this decrement and increment to happen atomically but instead of what we ended up writing was code that decrements atomically and then increments atomically. And so in this particular code actually like we won't lose money in the long term like if we let these threads run and then wait till they finish and then check the total it will indeed be what it started out as but while these are running since this entire block of code is not atomic we can temporarily observe these violations. And so at a higher level the way you should think about locking is not just like locks are to protect access to shared data but locks are meant to protect invariance. You have some shared data that multiple people might access and there's some properties that hold on that shared data like for example here I as the programmer decided that I want this property that Alice plus Bob should equal some constant and that should always be that way. I want that property to hold but then it may be the case that different threads running concurrently are making changes to this data and might temporarily break this invariant here. Like here when I decrement from Alice temporarily the sum Alice plus Bob has changed but then this thread eventually ends up restoring this invariant here. And so locks are meant to protect invariance at a high level you grab a lock then you do some work that might temporarily break the invariant but then you restore the invariant before you release the lock so nobody can observe these in progress updates. And so the correct way to write this code is to actually have less use of lock and unlock. We have lock then we do a bunch of work and then we unlock. And when we run this code we see no more printouts like this that we never have this audit thread observe that the total is not what it should be. All right so that's the right way to think about locking. At kind of a high level you can think about it as make sure you grab locks whenever you access shared data. Like that is a rule but another important rule is locks protect invariance. So grab a lock manipulate things in a way that might break the invariance but restore them afterwards and then release the lock. Another way you can think about it is locks can make regions of code atomic not just like single statements or single updates to shared variables. Any questions about that? Great so the next synchronization primitive we're going to talk about is something called condition variables. And this is, it seems like this has been a source of confusion from lab one where we mentioned condition variables but didn't quite explain them. So we're going to take the time to explain them to you now. And we're going to do that in the context of an example that you should all be familiar with. Counting votes. So remember in lab two A you have this pattern where whenever RAF peer becomes a candidate it wants to send out vote requests all of its followers and eventually the followers come back to the candidate and say yes or no like whether or not the candidate got the vote, right? And one way we could write this code is have the candidate in serial ask peer number one peer number two, peer number three and so on. But that's bad, right? Because we want the candidate to ask all the peers in parallel so it can quickly win the election when possible. And then there's some other complexities there. Like when we ask all the peers in parallel we don't want to wait till we get a response from all of them before making up our mind, right? Because if a candidate gets a majority of votes like it doesn't need to wait till it hears back from everybody else. So this code is kind of complicated in some ways. And so here's a kind of stubbed out version of what that vote counting code might look like with a little bit of infrastructure to make it actually run. And so here I have this main goal routine that sets count which is like the number of yes votes I got to zero and finished to zero. Finished is the number of responses I've gotten in total. And the idea is I want to send out vote requests in parallel and keep track of how many yeses I've got and how many responses I've gotten in general. And then once I know whether I've won the election or whether I know that I've lost the election then I can determine that and move on. And then like the real raft code you'd actually do whatever you need to do to step up to a leader or to step down to a follower after you have the result from this. And so looking at this code here I'm going to in parallel spawn say I have 10 peers in parallel spawn 10 go routines. Here I pass in this closure here and I'm gonna do is request a vote and then if I get the vote I'm going to increment the count by one and then I'm also going to increment this finished by one. So like this is the number of yeses this is total number of responses I've gotten. And then outside here in the main goal routine what I'm doing is keeping track of this condition. I'm waiting for this condition to become true that either I have enough yes votes that I've won the election or I've heard back from enough peers and I know that I've lost. And so I'm just going to in a loop check to see and wait until count is greater than or equal to five or wait until finished is equal to 10. And then after that's the case I can either determine that I've lost or I've won. So does anybody see any problems with this code? Given what we just talked about mutexes I guess. Yeah exactly. Count and finished aren't protected by mutexes. So one thing we certainly need to fix here is that whenever we have shared variables we need to protect access with mutexes. And so that's not too bad to fix. Here I declare mutex that's accessible by everybody and then in the go routines I'm launching in parallel to request votes. I am going to and this pattern here is pretty important. I'm going to first request a vote while I'm not holding the lock and then after whether I'm going to grab the lock and then update these shared variables. And then outside I have the same patterns as before except I make sure to lock and unlock between reading these shared variables. So in an infinite loop I grab the lock and check to see if the results of the election have been determined by this point. And if not I'm going to keep running in this infinite loop otherwise I'll unlock and then do what I need to do outside of here. And so if I run this example it seems to work. And this is actually like a correct implementation. It does the right thing. But there's some problems with it. So can anybody recognize any problems with this implementation? I'll give you a hint. This code is not as nice as it could be. So it's not quite, it's going to wait for exactly the right amount of time. The issue here is that it's busy waiting. What it's doing is in a very tight loop it's grabbing the lock, checking this condition, unlocking, grabbing this lock, checking this condition, unlocking. And it's going to burn up 100% CPU on one core while it's doing this. So this code is correct, but it's like at a high level we don't care about efficiency, like CPU efficiency for the purpose of the labs. But if you're using 100% of one core you might actually slow down the rest of your program enough that it won't make progress. And so that's why this pattern is bad. That we're burning up 100% CPU waiting for some condition to become true. So does anybody have any ideas for how we could fix this? So here's one simple solution. I will change a single line of code. And all I've added here is wait for 50 milliseconds. And so this is a correct transformation of that program and it kind of seems to solve the problem, right? Like before I was burning up 100% CPU now only once every 50 milliseconds I'm going to briefly wake up, check this condition and go back to sleep if it doesn't hold. And so this is like basically a working solution. Any questions? So this kind of sort of works but one thing you should always be aware of whenever you write code is magic constants. Why is this 50 milliseconds? Why not a different number? Like whenever you have an arbitrary number in your code it's a sign that you're doing something that's not quite right or not quite as clean as it could be. And so it turns out that there's a concurrency primitive designed to solve exactly this problem of I have some threads running concurrently that are making updates to some shared data and then I have another thread that's waiting for some property, some condition on that shared data to become true. And until that condition becomes true the thread is just going to wait. There's a tool designed exactly to solve this problem and that's a tool called a condition variable. And the way you use a condition variable is the pattern basically looks like this. So we have our lock from earlier. Condition variables are associated with locks. So we have some shared data, some a lock that protects that shared data and then we have this condition variable that is given a pointer to the lock when it's initialized and we're going to use this condition variable for kind of coordinating when a certain condition, some property on that shared data, when that becomes true. And the way we modify our code is, like we have two places, one where we're making changes to that data which might make the condition become true and then we have another place where we're waiting for that condition to become true. And the general pattern is whenever we do something that changes the data we call con.broadcast and we do this while holding the lock and then on the other side where we're waiting for some condition on that shared data to become true, we call con.weight. And so what this does is, like let's think about what happens in the main thread for a moment. The main thread grabs the lock, it checks this condition, suppose it's false, it calls con.weight. What this will do is it will, atomically you can think of it as it'll release the lock in order to let other people make progress and it'll add its thread, like it'll add itself to a list of people who are waiting on this condition variable. Then concurrently one of these threads might be able to acquire the lock after it's gotten a vote and then it manipulates these variables and then it calls con.broadcast. What that does is it wakes up whoever's waiting on the condition variable. And so once this thread unlocks the mutex, this one, what it'll do when, as it's returning from weight, will reacquire the mutex and then return to the top of this for loop which is checking this condition. So this broadcast wakes up whoever's waiting at this weight. And so this avoids having to have that time.sleep for some arbitrary amount of time. Like this thread that's waiting for some condition to become true only gets woken up when something changes that might make that condition become true, right? Like if you think about these threads, if they're very slow and they don't call con.broadcast for a long time, this one will just be waiting. It won't be periodically waking up and checking some condition that can't have changed because nobody else manipulated the shared data. So any questions about this pattern? Yeah? So that's a great question. I think you're referring to something called the lost wake up problem. And this is a topic in operating systems and we won't talk about it in detail now. So feel free to ask me after a lecture. But at a high level, you can avoid funny race conditions that might happen between weight and broadcast by following the particular pattern I'm showing here. And I'll show you an abstracted version of this pattern in a moment. Basically, the pattern is for the side that might make changes that will change the outcome of the condition test. You always lock, then manipulate the data, then call broadcast and call unlock afterwards. So the broadcast must be called while holding the lock. Similarly, when you're checking the condition, you grab the lock, then you're always checking the condition in a loop. And then inside, so when that condition is false, you call con.weight. This is only called while you're holding the lock and it atomically releases the lock and kind of puts itself in a list of waiting threads. And then as weights returning, so as we return from this weight call and then go back to the top of this for loop, it will reacquire the lock. So this check will only happen while holding the lock. And so outside of this, we still have the lock here and we unlock after we're done doing whatever we need to do here. At a high level, this pattern looks like this. So we have one thread or some number of threads doing something that might affect the condition. So they're going to grab a lock, do the thing, call broadcast, then call unlock. And on the other side, we have some thread that's waiting for some condition to become true. The pattern there, it looks like we grab the lock, then in a while loop, while the condition is false, we wait. And so then we know that when we get past this while loop, now the condition is true and we're holding the lock and we can do whatever we need to do here. And then finally, we call unlock. So we can talk about all the things that might go wrong if you violate one of these rules, like after a lecture, if you're interested. But at a high level, if you follow this pattern, then you won't need to deal with those issues. So any questions about that? So that's a great question. When do you use broadcast versus when do you use signal? So convars have three methods on them. One is wait for the waiting side. And then on the other side, you can use signal or broadcast. And the semantics of those are signal wakes up exactly one waiter, like one thread that may be waiting, whereas broadcast wakes up everybody who's waiting and they'll all reach, like they'll all try to grab the lock and recheck the condition. And only one of them will proceed because only one of them will have the lock until it gets past this point. I think for the purpose of this class, always use broadcast, never use signal. If you follow this pattern and just like don't use signal and always use broadcast, your code will work. I think you can think of signal as something used for efficiency. And we don't really care about that level of CPU efficiency in the labs for this class. Any more questions? Okay, so the final topic we're going to cover in terms of go concurrency primitives is channels. So at a high level, channels are like a queue-like synchronization primitive, but they don't behave quite like queues in the intuitive sense. Like I think some people think of channels as like there's this data structure we can stick things in and then eventually someone will pull those things out. But in fact, channels have no queuing capacity. They have no internal storage. Basically, channels are synchronous. If you have two go routines that are going to send and receive on a channel, if someone tries to send on the channel while nobody's receiving, that thread will block until somebody's ready to receive. And at that point, synchronously, it will exchange that data over to the receiver. And the same is true in the other direction. If someone tries to receive from a channel while nobody's sending, that receiver will block until there's another go routine that's about to send on the channel and that send will happen synchronously. So here's a little demo program that demonstrates this. Here I have, I declare a channel and then I spawn a go routine that waits for a second and then receives from a channel. And then in my main go routine, I keep track of the time, then I send on the channel so I just put some dummy data into the channel and then I'm going to print out how long the send took. And if you think of channels as cues with internal storage capacity, you might think of this thing as completing very fast, but that's not how channels work. This send is going to block until this receive happens and this won't happen until this one second has elapsed. And so from here to here, we're actually blocked in the main go routine for one whole second. All right, so don't think of channels as cues, think of them as the synchronous communication mechanism. Another example that'll make this really obvious is here we have a go routine that creates a channel, then sends on the channel and tries receiving from it. Does anybody know what'll happen when I try running this? I think the file name might give it away. Well, the send blocks until no one can receive but so that never gets to the part where it receives. Yeah, exactly. The send is going to block until somebody's ready to receive but there is no receiver and go actually to Texas condition. If all your threads are sleeping, it detects this as a deadlock condition and it'll actually crash. But you can have more subtle bugs where if you have some other thread like off doing something, if I spawn this go routine that in a for loop does nothing and I try running this program again, now it goes deadlock detector, won't notice that all threads are not doing any useful work. Like there is one thread running, it's just this is never receiving and we can tell by looking at this program that it'll never terminate but here it just looks like it hangs. So if you're not careful with channels, you can get these subtle bugs where you have deadlocks as a result. Yeah? So would it also be true that if we take the receive line and put it first, it's still deadlock? Yeah, exactly. There's no data, nobody's sending on this channel so this is going to block here. It's never going to get to this line. Yeah, yeah. So channels, as you point out, can't really be used just within a single go routine. It doesn't really make sense because in order to send or in order to receive, there has to be another go routine doing the opposite action at the same time. So if there isn't, you're just going to block forever and then that channel, but thread will no longer do any useful work. Yeah? So sends block for receives and receives block for sends? Yes, exactly. Sends wait for receives, receives wait for sends and it happens synchronously once there's both a sender and receiver present. And then for both buffered and unbuffered channels? So what I've talked about so far is unbuffered channels. I was going to avoid talking about buffered channels because there are very few problems that they're actually useful for solving. So buffered channels can take in a capacity and then you can think of it as it's, just switch this to, so here's a buffer channel with a capacity of one. This program does terminate because buffered channels are like, they have some internal storage space and until that space fills up, sends are non-blocking because they can just put that data in the internal storage space. But once the channel does fill up, then it does behave like an unbuffered channel in the sense that further sends will block until there's a receive to make space in the channel. But I think at a high level we should avoid buffered channels because they basically don't solve any problems. And another thing you should be thinking about is whenever you need to make up arbitrary numbers like this one here to make your code work, you're probably doing something wrong. Yeah? Is it really a deadlock in the sense that two blocks are involved or is it just because one operation is blocking and sort of waiting on the other but not really because blocks are involved? So I think this is a question about terminology like what exactly does deadlock mean and does this count as a deadlock? Like yes, this counts as a deadlock. Like no useful progress will be made here. Like these threads are just stuck forever. Any other questions? So what are channels useful for? I think channels are useful for a small set of things. Like for example, I think for producer consumer cues sort of situations. Like here I have a program that makes a channel and it spawns a bunch of go routines that are going to be doing some work. Like say they're computing some result and producing some data and I have a bunch of these go routines running in parallel and I want to collect all that data as it comes in and do something with it. So this do work thing just like waits for a bit and produces a random number and in the main go routine I'm going to continuously receive on this channel and print it out. Like this is a great use of channels. Another good use of channels is to achieve something similar to what weight groups do. So rather than use a weight group, suppose I want to spawn a bunch of threads and wait till they're all done doing something. One way to do that is to create a channel and then I spawn a bunch of threads. I know how many threads I've spawned. So I have five go routines created here. They're going to do something and then send on this channel when they're done. And then in the main go routine I can just receive from that channel the same number of times. And this has the same effect as a weight group. So, question? So what exactly is the question? Can you repeat that? Can you use a Barford channel with a capacity of five if you're waiting for five ACs in a second? So the question is here, could you use a Barford channel with a capacity of five because you're waiting for five ACs? I think in this particular case, yes, that would have the equivalent effect. But I think there's not really a reason to do that. And I think at a high level in your code you should avoid Buffer channels and also maybe even channels unless you think very hard about what you're doing. Yeah? You said weight group but I haven't seen a class for it. Does that put the broadcast or is that an additional mechanism? So what is a weight group? I think we covered this in a previous lecture and I talked about it very briefly today. But I do have an example of weight groups. So a weight group is yet another synchronization primitive provided by Go in the sync package and it kind of does what his name advertises. Like it lets you wait for a certain number of threads to be done. The way it works is you call weight group.add and that basically increments some internal counter and then when you call weight group.weight it waits till done has been called as many times as add was called. So this code is basically the same as the code I just showed you that was using a channel except this is using weight group. They have the exact same effect. You can use either one. Yeah? One end is done before another thread is spiked. So the question here is about race conditions I think like what happens if this add doesn't happen fast enough before this weight happens or something like that? Well so here notice that the pattern here is we call weight group.add outside of this GoRoutine and it's called before spawning this GoRoutine. So this happens first, this happens next and so we'll never have the situation where done happens after this add happens for this particular GoRoutine. Yeah? So we have a lot of mechanisms for understanding do we know the compiler under the hood these are all just semiforces on shared data for how many mechanisms at the machine level. So I think the question is how is this implemented by the compiler and I will not talk about that now but talk to me after class or in office hours. But I think for the purposes of class like you need to know the API for these things not the implementation. All right and so I think that's basically all I have on Go concurrency primitives. So one final thought is on channels. Like channels are good for a specific set of things like I just showed you the producer consumer queue or like implementing something like weight groups but I think when you try to do fancier things with them like if you want to say like kick another GoRoutine that may or may not be waiting for you to be like woken up that's a kind of tricky thing to do with channels. There's also a bunch of other ways to shoot yourself in the foot with them. I'm going to avoid showing you examples of bad code with channels just cause it's not useful to see but I personally avoid using channels for the most part and just use shared memory and mutexes and condition variables instead and I personally find those much easier to reason about. So feel free to use channels for when they make sense but if anything looks especially awkward to do with channels like just use mutexes and condition variables and they're probably a better tool. Yeah. What are some of the differences between this producer consumer pattern and a thread safe bunch of? So the question is what's the difference between this producer consumer pattern here and a thread safe FIFO? I think they're kind of equivalent like you could do this with the thread safe FIFO and like that is basically what a like buffered channel is roughly. So because this is a channel that is not buffered the receiver won't ever have a situation where there's more than one item that can be receiving. Like what if we're interested in like dequeuing a bunch of things and then having the receiver dequeue a bunch of things? Yeah. So if you're interested in dequeuing things and then dequeuing things like if you want this line to finish and have this thread go do something else while that data sits there in a queue rather than this go routine waiting to send it then a buffer channel might make sense. But I think at least in the labs you will not have a pattern like that. All right, so next Fabian's going to talk about more graphically related stuff. Do you need this? All right, can you all hear me? Is this working? Yeah, all right. So yeah, basically I'm going to show you two bugs that we commonly see in people's raft implementations. There's a lot of bugs that are pretty common but yeah, I'm just going to focus on two of them. So in this first example, we sort of have a start of a raft implementation for that's sort of like what you might see for 2A just the beginnings of one. So in our raft state we have primarily the current status of the raft peer either follower candidate or leader and we have these two state variables that we're keeping track of the current term and who we voted for in the current term. So I want us to focus though on these two functions attempt election and call request vote. So in attempt election, we're just going to set our state to candidate, increment our current term, vote for ourselves and then start sending out request votes to all of our raft peers. And so this is similar to some of the patterns that Anish showed where we're going to loop through our peers and then for each one in a go routine, separately call this call request vote function in order to actually send an RPC to that peer. All right, so in call request vote we're going to acquire the lock, prepare arguments for our request vote RPC call based on by setting it to the current term and then actually perform the RPC call over here and finally based on the response we will reply back to this attempt election function and the attempt election function eventually should tally up the votes to see if it got a majority of the votes and can become a leader. So what happens when we run this code? So in theory what we might expect to happen is for, so there's going to be some code that's going to spawn a few raft peers and actually try to attempt elections on them and what should happen are we just start collecting votes from other peers and then we're not actually going to tally them up but hopefully nothing weird goes wrong but actually something is going to go wrong here and we actually activated goes deadlock detector and somehow ran into a deadlock. So let's see what happened. For now let's focus on what's going on with the server zero. So server zero it says it starts attempting an election at term one that's just starting the attempt election function. It'll acquire the lock, set some stuff up for performing the election and then unlock. Then it's going to send out a request vote RPC to server two. It finishes processing that request vote RPC over here so we're just printing right before and after we actually send out the RPC and then it sends out a request vote RPC to server one but after that we never actually see it finish sending the request vote RPC. So it's actually stuck in this function call waiting for the RPC response from server one. All right, now let's look at what server one's doing. So it's pretty much the same thing. It sends a request vote RPC to server two that succeeds. It finishes processing that request vote the response from server two. Then it sends this RPC to zero and now what's actually happening is zero and one are sort of waiting for the RPC responses from each other. They both send out an RPC call but not yet got their response yet and that's actually sort of the cause of our deadlock. So really what's the reason that we're deadlocking is because we're holding this lock through our RPC calls. Over here in the call request vote function we acquire our mutex associated with our raft peer and we only unlock at the end of this function. So throughout this entire function we're holding the lock including when we try to contact our peer to get the vote. And later when we handle this request vote RPC we actually only see it at the beginning of this function. In the handler we're also trying to acquire the lock but we never actually succeed in acquiring the lock. So just to make this a little bit more clear the sort of order of operations that is happening is in call request vote, server zero is first going to acquire the lock and send an RPC call to server one. And then simultaneously and separately server one is going to do the same thing. It's going to enter its call request vote function, acquire the lock and send this RPC call to server zero. Now in server zero's handler and server one's handler they're trying to acquire the lock but they can't because they already are acquiring the lock and trying to send the RPC call to each other and that's actually what's leading to the deadlock situation. So to solve this basically we want you to not hold locks through RPC calls and that's the solution to this problem. In fact we don't need the lock here at all instead of trying to read the current term when we enter this call request vote function we can pass this as an argument here. Save the term when we had acquired the lock earlier in this attempt election and just pass this as a variable to call request vote. So that actually removes the need to acquire the lock at all in call request vote. Alternatively we could lock while we're preparing the arguments and then unlock before actually performing the call. And then if we need to process the reply we could lock again afterwards. So it's just make sure to unlock before making an RPC call and then if you need to you can acquire the lock again. So now if I save this then... So it's still activating the deadlock detector but that's actually just because we're not doing anything at the end. But now it's actually working. We finish sending the request votes on both sides and all of the operations that we wanted to complete are complete. All right, any questions about this example? Yeah. You were basically suggesting you don't use locks when there is a... Yeah, sort of. So you might need to use locks when you are preparing the arguments or processing the response. But yeah, you shouldn't hold a lock through the RPC call while you're waiting for the other peer to respond. And there's actually another reason to that in addition to deadlock. The other problem is that in some tests we're going to sort of have this unreliable network that could delay some of your RPC messages potentially by like 50 milliseconds. And in that case, if you hold the lock through an RPC call then any other operation that you try to do during that 50 milliseconds won't be able to complete until that RPC response is received. So that's another issue that you might run into if you hold the lock. So it's both to make things more efficient and to avoid these potential deadlock situations. All right, so just one more example. This is again using a similar draft implementation. So again, in our raft state, we're going to be keeping track of whether a follower candidate leader and then also these two state variables. In this example, I want you to focus on this attempt election function. So now we've first implemented the change that I just showed you to store the term here and pass it as a variable to our function that collects the request votes. But additionally, we've implemented some functionality to add up the votes. So what we'll do is we'll create a local variable to count the votes. And whenever we get a vote, if the vote was not granted, we'll return immediately from this go routine where we're processing the vote. Otherwise we'll acquire the lock before editing this shared local variable to count up the votes. And then if we did not get a majority of the votes, we'll return immediately. Otherwise we'll make ourselves the leader. So as with the other example, I mean initially if you look at this, if I look at this like it seems reasonable, but let's see if anything can go wrong. So this is the log output from one run and one thing you might notice is that we've actually elected two leaders on the same term. So server zero made itself a leader on term two and server one did as well. It's okay to have a leader elected on different terms, but here where we have one on the same term, that should never happen. So how did this actually come up? So let's start from the top. So at the beginning, server zero actually attempted an election at term one, not term two. And it got its votes from both of the other peers, but for whatever reason, perhaps because those reply messages from those peers were delayed, it didn't actually process those votes until later. And in between attempting the election and finishing the election, server one also decided to attempt an election, perhaps because of server zero was delayed so much, server one might've actually ran into the election timeout and then started its own election. And it started it on term two because it couldn't have been term one because it already voted for server zero on term one over here. Okay, so then server one sends out its own request votes to servers two and zero at term two. And now we see that server two votes for server one, that's fine, but server zero also votes for server one. This is actually also fine because server one is asking server zero for a vote on a higher term. And so what server zero should do is, if you remember from the spec, it should set its current term to that term in the request vote RBC message to term two and also revert itself to a follower instead of a candidate. All right, finally, so the real problem is that on this line where server zero, although it really got enough votes on term one, it made itself a leader on term two. So the reason, so one explanation for why this is happening is because in between where we set up the election our attempt for the election and where we actually process the votes, some other things are happening. In this case, we're actually voted for someone else in between. And so we're no longer on term one where we thought we started the election. We're now on term two. And so we just need to double check that because we don't have the lock while we're performing the RBC calls, which is important for its own reasons, now some things might have changed and we need to double check that what we assume is true when we're setting ourselves to the leader is still true. So one way to solve this, there's a few different ways to solve this. Like you could imagine not voting for others while we're in the middle of attempting an election. But in this case, the simplest way to solve this, at least in this implementation, is to just double check that we're still on the same term and we're still a candidate. We haven't reverted to a follower. So actually one thing I wanna show you is if we do print out our state over here, then we do see that server zero became a follower but it's still setting itself to a leader on this line. So yeah, we can just check for that. If we're not a candidate or the current term doesn't match the term at which we started the election, then let's just quit. And if we do that, then server one becomes a leader and we never see server zero become a leader. So the problem's solved. All right, any question? Yeah. Would it be sufficient to just check the term? Yeah, I think that would. Because we would not, if the term is higher now then actually, no, it might not be sufficient because we might have attempted another election. It depends on your implementation but it's possible that you could have attempted another election on a higher term afterwards. Oh wait, no, that's the same thing, right? Yeah, it would not be sufficient to only check the state but I think you're right. If you only check the term, then it is sufficient. All right, any other questions? All right, so yeah, that's it for this part. She's gonna show you some more examples of actually debugging some of these wrapped implementations. Can you all hear me? Yeah, it should not be. So in my section, I'm gonna walk you through how I would debug if you have a bug in your rough implementation so I prepare a couple of buggy rough codes and I will just try to walk you through it. So first I'm gonna go into my first buggy implementation and if I run the test here, let's see the results, right? So for this one, it doesn't print anything, it just gets stuck and it's gonna be here forever. And let's assume that I have no idea why this happening. The first thing that I want to find out is where it gets stuck. And we do have a good tool for that, which is printf, but in the start code, if you go to utl.go, we have a function called dprintf. This is just a nice wrapper around the log printf with the debug variable to enable or disable the logging messages. So I'm gonna enable that and go back to my rough code. So first of all, when there's something that's bug happening, I'll always go check if the code actually initialize a rough server. So here I'll just print. So here, if I run the test again, then now I know that there are three servers that got initialized. So this part is okay, but there's not where the bug is happening. So I'll just go deeper into the code just to find where it gets stuck. So now if you see the code, we are calling the leader in the election. So I'm gonna go to that function and just to make it faster, I'll try to check if it kicks off some election. That part is still fine, so we try to go further. Now here we are in the election. I'll see if the server actually send the request word to some other servers. Now we kind of have like more idea of where it gets stuck because it's not printing that some server that kicks off the election are not sending the request words. So I would go back further just to see where it gets stuck. Like I always try to print, if we call some function, I would always double check if it actually go into the function. So now I'm gonna say that this server is at the start of the election and that works. So now we have an idea of like the bug should be between here and here. So we are trying to minimize the scope of the code that's causing the bug. Let's say if I print something here and it doesn't get there. So I move it up. Let's say here, still not there. Now it's there. So the bug is probably in this function and I'll just go check. So here the problem is that I'm trying to acquire the log where I actually do have the log. So it's gonna be a dead log. So that's how I would find the first bug. Using the deep printf. And it's nice to use deep printf because you can just turn off the debugging print and have a nice test output without all the debugging if you want it. So that's how I would use the deep printf to try to like hand out a bug in your code. And for this example, that's actually another trick to help you find this kind of dead log. So if you press control and backslash, you can see in the bottom left that I press like control and backslash. This command will send the signal quit to the goal program. And by default, it will handle the quit signal and quit all the go routines and print out the stack crits. So now this, like if you go up here, like this way it gets stuck and there are gonna be a couple of functions printing here just trying to go through all the traces. Yes, so it's actually showing that the function that's causing the problem is the convert to candidate. So that's another way to find out where the dead logs are. I can remove all this and now it works. So that's the first example that I wanna go through. Second thing that you want to do before you submit your labs is to turn the race flag on when you do the test. The way to do that is just to add minus race before minus run. And here, because my implementation doesn't have any races, so it's not gonna tell you anything. But this, just be careful about this because it's not a proof that you don't have any races. It's just that it cannot detect races for you. I'm going to run the same command game with the race flag but now this time that's actually race going on in my implementation. So it's gonna yell at you that there's some data race going on in your code. I'm quitting that and let's see like how useful is the warning are. So I'm gonna go to my second implementation with the rough code. And here, let's look at this race. So it's telling us that there's a read going on at the line 103. I'm going to that line. So there's the read on probably the state here. And that's also a write at line 412, which is the state. So I'm going to this line again. And now we kind of know that this really is not protected by a log. So the race flag is actually warning us and helping us to find our bug on this data race that we have. So the figure is gonna be just two. Lock this and unlock it. And that should solve the problem. So at this point, we kind of know how to basic, like do some basic debugging. Does anyone have any question? Okay. So I'm going to go to the third one which is gonna be more difficult to find the bug. I'm going to test the run the same test. And now I actually have some debugging messages in there already and just see that I also have a debugging message with the test action. There's something you might want to consider doing. If you go into the test script here, you can just see how the test would run. And then there are some actions that the test script is gonna do to make your code fail. And it's usually a good idea to print out where that action is happening in your actual debugging message so you can guess what is happening, like where the bug is happening in which phase of the test. If that makes sense. So now it's like I was doing fine in the first test. I passed the first test, but I'm failing the second test. And here the test action is to find one as a leader one. So I'm passing the test until this. And if you go to... I'm actually passing until the leader to rejoins. So this can give you a nice idea of how the test is working. And just to help you have a better guess at where the bug is in your code. So now let's look at the debugging messages. So it seems like when leader to rejoins, it becomes a follower and we have a new leader. So that looks fine to me. And we probably need more debugging messages instead of just the state changes. So I'm going to add some more. My first case is that when one becomes a leader, it might not be doing what a leader should do correctly. So we got stuck. So in my code, after we convert a server into a leader, I have a go-routine call over a leader. They're just sending heartbeat to the auto servers. So I'm going to print some stuff here. Send heartbeat to the server. So to become a leader, it sends the first heartbeat to each server and one still tries to send heartbeat to the new leader. And then one becomes a follower. So this doesn't look like to be a problem. Now I'm going to check if the auto servers receive a heartbeat correctly. It's taking a while. I'm trying to finish this. So to become a leader, two sends a heartbeat, but no one receives a heartbeat from two. So if I go to the send app entry, I actually hold the lock to the RPC call, which is the problem that Fabian went through in the last section. So that's the problem that I need to fix. So what I should do is to unlock here and then lock again here. And that should work. And then there are a couple of things that you might want to do when you test your rough implementation. So that's actually a script to run the test in parallel and I can show you how we can use it. This script is in a PSI port. Someone made a port about it. And here's how we can use the script. So you run the script, specify the number of the test. Personally, I do like 1,000, but that depends on your preference. This is the number of calls that you want to run the test at the same time, and then here's the test. And if you run the script, then it will show you that we have run four tests so far and all are working fine, and it's going to keep going like that. So that's how I would go about debugging rough implementation. And you all will come to your office hours when you need help.