 Hi, and welcome back to Program Analysis. In this second video of the lecture on concurrency, we will look into how a program analysis can fight data races, and in particular, we will look at a dynamic analysis that looks at the running program and then tells you whether in this program there has been a data race or at least a potential data race. So there are, of course, many different ways how a dynamic analysis could look for data races. What we'll focus on here in this video is just one of these many approaches, which has been one of the earlier ones and is referred to by many people, and this is an approach called Eraser, and if you want to know more about it, you should have a look at this paper here. So the basic idea of Eraser is relatively simple. It basically looks for unprotected accesses to shared memory, where unprotected means that you're accessing a shared memory location without having the right lock that other threads when they access this memory location are having. So the implicit assumption here is that all accesses that happen to a specific shared memory location V should always happen while holding the same lock L. And if this assumption is true, then the program is set to follow a consistent locking discipline, which essentially means that whenever you access this specific memory location, then you have this specific lock L and every thread and every instruction in these threads is actually following this locking discipline. And now what Eraser is doing is to look at the execution of a program. So it's a dynamic analysis in order to find out whether the program violates this consistent locking discipline. And in order to do this, Eraser looks at a couple of concurrency related operations that happen while the program is executing. Specifically, it looks at all acquisitions of locks, all releases of locks, and all accesses of shared memory locations. To find violations of this consistent locking discipline, this dynamic analysis is using the so-called lock set algorithm. We will look at this algorithm at first, but just looking at the simple form of this, of the algorithm. And then after having gone through that, we will look at the refinement of the core algorithm. But if you've understood the simple form, then the refinement is also very easy to understand. So let's get started with that one. So while the program is executing, the dynamic analysis is keeping track of all the locks that are held by a particular threat of execution. So this locks held of T basically tells us for a threat T what locks T currently has. And this set is simply updated by looking at all the lock acquisitions and lock releases that threats are performing. And then we just add and remove locks from the current locks held set. Then the analysis also looks at every shared memory location. And for each such shared memory location V, it's initializing a set C of V, which is representing the locks that people could hold while accessing this memory location V. So this set C of V is called the lock set, and there's one such set for every memory location that is shared among different threats. And then while the program is executing, basically every access to every shared memory location is tracked. And then the lock set of this memory location is refined by taking the current lock set that this memory location already has. And by intersecting this current set with the locks held set of the threat T that is executing. So that at the end, we have updated the lock set to the intersection of these two sets. And in case this lock set ever becomes empty, which basically means that you have accessed V without the right lock, a warning is issued, and the algorithm will report that it looks like there is a data race. To illustrate this simplified version of the lock set algorithm, let's have a look at a simple example, which is a variant of one of the examples that we've seen already in the first video. So in this variant, we again have a variable called balance, which for example represents how much money you have on a bank account. It's initialized to 10. And then we have two concurrently executing threats, where the first threat starts by acquiring a lock L1. And then once the threat has acquired this lock, it's going to read the current value of balance into temp one, which is a local temporary variable. And then it's writing into the shared variable balance, the result of adding two temp one, some value a. And once all of this is done, threat one is releasing the lock that it has acquired, which was lock one. And the concurrently executing threat on the other side also starts by reading the current value of balance, notably without having acquired any lock. And then afterwards, it's acquiring a lock, but a different one. So this is lock two. And then once it has this lock, it's going to write into balance this value temp two that we have read as the previous value of balance minus some value b. And afterwards, it's releasing this lock L2. So now let's have a look at an execution of this program, where for every instruction that is executed, we will write down the locks held of the threat that was just executing that instruction and the lock set for our shared variable balance right after this instruction has executed. So in this execution, let's assume that the first thing that happens is that we initialize balance to 10. So this is the first thing that must happen because that happens before the two threats are starting. And all this happens, of course, without having any lock acquired yet. And because the lock set of every variable is initialized to the set of all locks that exist in this program, C of balance will be initialized to L1 and L2. Next, let's assume that we are executing the first statement in threat one. So the first thing that happens here is that the threat is acquiring lock L1. And then once this instruction is executed, the locks held set of threat one will obviously contain L1. And because we haven't really accessed our shared variable balance yet, the lock set of balance just remains the same as before. It's still L1 and L2. Next, let's assume that we are executing the two statements that are in the critical section of threat one. So at first we will write into temp one the current value of balance. This happens while the threat still has L1. And because this is an access of this shared variable balance, we are intersecting the current lock set of balance with the locks held set of the executing threat, which means we're intersecting L1 and L2 with just L1. And the result is just a set that contains L1. So this is the new lock set of balance. Next, we'll execute the second statement in this critical section. So we will update balance now with the value of temp one plus some other value a. Again, this happens while we still have lock L1. So the locks held set doesn't change. And again, because this is an access to the shared memory location balance, we will intersect the current lock set of balance with the locks held set, which in this case just returns the same set again. And we still have L1. And then threat one will release the lock. So we're executing this release statement. And after that statement, the locks held set is empty because the lock L1 has just been released. And because there was no more access to the shared memory location, the lock set of balance is still L1. So now after all the instructions of threat one have been executed, it's time to switch over to threat two. Of course, we could see different execution where the instructions are interleaved in some way. But in that execution that we look at here, let's just assume that we now switch over to threat two, which means we would start with the first instruction in this thread, which is reading balance. And because the threat has not yet acquired any lock, the locks held set is the empty set. Now, because this is also a read, which means an access of the shared memory location balance, we will again intersect the current lock set of balance with the locks held set, which means we're intersecting this set here, which contains L1 with the empty set down here, which means that we have reached this point where the eraser algorithm will actually report a warning, because now it seems that someone has accessed this balance variable without acquiring the required lock. And this is actually also what has happened in this example. So at this point in time, the log set algorithm will report a warning and say, hey, you have accessed balance without following the required consistent locking discipline, and therefore it seems there is a data race. Next, we will still continue with the execution and acquire lock L2 and threat two, which means that the locks held set then contains this lock. But now intersecting the empty set with any other set will always yield the empty set. So the lock set of balance will never get out of this unfortunate state, but since the algorithm has already reported a data race, it won't do it again. The program, of course, still continues its execution, and we will update balance in threat two. This happens while threat two is holding L2, but the lock set of balance doesn't change. And then finally, at the end of the execution, L2 is released, which means lock held is again empty, and the lock set of balance is also empty. So overall in this execution, the lock set algorithm has found a data race and will report this to the user. So now as you've seen, this lock set algorithm is able to detect data races, but it turns out that the simple lock set algorithm that we've just seen is a little bit too strict, because if you just execute it like this and analyze the program using that algorithm, you will get a lot of false positives for the following reasons. One is that sometimes a threat may actually access a shared variable just to initialize that variable, and all the other threats will not yet access it. And in this case, it's actually not necessary to acquire a lock, and the lock set algorithm, as we've seen it so far, doesn't really account for this kind of case. Another problem is that you may have multiple threats that are reading from the same shared memory location, and if you remember the definition of a data race, it's okay to have multiple concurrent accesses as long as none of them is a write. So if all of them are just reading, you actually do not need a lock and you do not want to see a warning about a data race, but the simple algorithm that we've seen so far would actually report one. And then finally, another reason for data races is that there are more advanced locking mechanisms than just a simple lock that we've seen so far. For example, there are read-write locks, which basically allow multiple readers to acquire a lock, but only one writer at a time, so that multiple threats can read from a shared memory location while some other threat can write into it. So this happens typically if the program is written in a producer-consumer style, where you have one threat that is producing something, that then many others are consuming. So to address some of these false positives, there's a refinement of the lock set algorithm, which basically works the way the simple algorithm works, but in addition to looking at these two data structures of locks held and the lock set, the algorithm also keeps track of the state of every shared memory location. And this state can be represented using this nice state machine here, where we have four different states called version, exclusive, shared, and shared modified. And the algorithm will only issue a warning about a data race when the lock set gets empty while for the memory location, the state is shared modified. So let's go into these states in some more detail. So initially every memory location is in this version state, and then as soon as the first write happens, which happens by one and only one threat, it's entering this so-called exclusive state, exclusive in a sense that only one threat is exclusively accessing this memory location so far. If in this state we have more reads and writes by this first threat that has happened to access this memory location, we stay in this exclusive state, and there's of course no reason to report a data race because so far only one threat has accessed this memory location. It may be that at some point we have a read by a second or maybe even a third threat, in which case we move to the shared state, but as long as this other threat is only reading, and even if it does it again, we stay in this shared threat and there's no reason yet to report a data race. This changes if it happens to be a write to this memory location, or also if while we are in the exclusive state there is a write by the second threat, in which case we finally go to this shared modified state which tells us that now it's time to really have a look for the program taking the right logs when accessing this memory location, and if we are in this state for a given memory location and the log set becomes empty, then the algorithm will issue a warning about a data race. All right, so let me just summarize what this eraser algorithm is doing and also what it can do and what it cannot do. So it is as you've seen a dynamic analysis to detect data races, which is based on this assumption that the program is following a consistent logging discipline. It works pretty well in some cases, but there are two important limitations. One of them that despite this refinement of the algorithm that we've just seen, it may still report false positives because sometimes logs are not acquired consistently with a logging discipline, but still in a correct way because the developer may not follow this discipline that you always need to have the same log in order to access a memory location, but the program may use different logs in different phases of a program for the same memory location and still be correct. The other limitation is that eraser is just a dynamic analysis that looks at one execution and as a result, it may miss data races because there may be other interleavings that it's not seeing in this one execution in which a data race gets exposed, but if this does not happen in the execution analyzed by eraser, it's going to miss this data race. All right. And this is the end of video number two in this lecture on concurrency. I hope you now have a better idea of how to detect data races and how to do this in a dynamic analysis. And in the remaining two parts of the lecture, we will look into other kinds of concurrency bugs and how to detect them. Thank you very much for listening and see you next time.