 Hi, hello and welcome back to program analysis. This is part three of the lecture on analyzing concurrent programs. And what we want to do in this third part is to look at a technique for testing thread safe classes. Again, as for the other parts of this lecture, this part is also based on a paper. And if you're interested in more details about this technique, then please have a look at it there. So let's get started by defining what thread safety actually is. It's a term that is sometimes used in a very informal way, but usually what it means is that it's a way to encapsulate the challenges of concurrent programming into a specific class in a language that has classes. And these classes then are called thread safe classes. So the basic idea is that instead of using concurrency everywhere and bothering about having the right locks and synchronizing memory accesses correctly everywhere in your program, you delegate this task to a few classes that hopefully do that correctly. And then in the rest of the program, you can basically assume that this class is ensuring the correct synchronization of all shared memory accesses so that the clients of this class can basically use instances of this class as if they were alone and no other threads would access these instances. So in a sense, the rest of the program can treat a thread safe class as a black box and just call its methods without really thinking about the other threads that may also use objects of this class at the same time. A popular book on concurrency in Java gives this definition of what a thread safe class is. So it basically says that the methods of this class behave correctly when accessed from multiple threads with no additional synchronization in the calling code. So the important bit here is that there's no need for clients of this class to acquire any locks, but this is already done in the class. What is not quite clear in this definition is what it really means to behave correctly, but fortunately there's a different definition that says that the operations of this thread safe class behave as if they occur in some serial order that is consistent with the order of the method calls made by each of the individual threads. So essentially the correct behavior is implicitly given by what could happen if you would execute the calls that happen concurrently in some serial order. And we will use this definition of thread safety as a way to find bugs in thread safe classes. Let's first illustrate this idea of thread safety with one particular class from the Java standard library that is supposed to be thread safe, namely the string buffer class. And what we do here in this example is to just initialize a string buffer and then there are two threads that are concurrently using the same instance of the string buffer and one of these threads is appending A and B while the other one is appending C. And now the question for you is, well, what would be possible contents of the string buffer B? Assuming that string buffer is indeed a thread safe class. So I invite you to just stop the video here for a few seconds and think about it and then I'll tell you what content of B could actually happen. So let's have a look at the solution. So there are three possible contents that B may have that are legal if this class is thread safe. One of them is ABC, which you basically get if thread one is executing its first two calls to append first and then thread two follows. We could also have CAB, which is what you get if thread two starts executing its call and then thread one is executing the other two calls. Or you could also legally get ACB if the call of thread two is basically interleaved in between the two calls of thread one. But any other state of the string buffer, for example, AC, where maybe the B is overwritten by the C are not legal, or BAC is also not OK, because this would mean we have reordered the two calls in thread one, which is not OK with the definition of thread safety. Now, because programs that use thread safe classes put so much confidence into these thread safe classes, the correctness of these programs heavily relies on the correctness of the thread safe classes. So now what actually happens if these classes are not thread safe? Well, then you have a problem. So you better test for these classes to be actually thread safe. And one way to do this is the tool that we are talking about now here. It's called ContiG, which basically means concurrent test generator. And in a nutshell, what it does is to automatically generate multithreaded unit tests, which basically look like normal unit tests, just that they have multiple threads. And by doing this, it detects thread safety violations or thread safety related bugs by comparing the concurrent behavior of these multithreaded unit tests to linearizations of this concurrent behavior. So basically two alternative tests where you would put all the methods that happen concurrently into a single thread. Before looking into how exactly this works, let's just have a look at one example bug that this approach has found. And this was a bug in a JDK, which is in the string buffer class that we have already seen earlier. So if string buffer is used, as shown here in this example, then something unexpected happens. So in this example, the string buffer is initialized, and then some strings, say A, B, C, is appended to it. And then the same string buffer, object B, is used in two different concurrently executing threads, where the first thread is trying to insert the content of the string buffer into itself at index one. So it basically tries to put A, B, C right after the A that is already there, so that at the end you would have A, A, B, C, B, C. And concurrently, thread two is trying to delete a particular character that is already in the string buffer, namely the character at index one. And now if you write code like this and you execute it concurrently, you may actually get an index out of bounds exception, which is, of course, not what you would expect, because if you would just execute one of these calls after another, this kind of exception could never occur. And this has actually been a bug in the JDK, which has also been confirmed by the Java developers. To detect this kind of bug, Contigy is automatically generating test cases that test the thread safety of a given class. So the input to the approach is this class under test. And then the output is either nothing or a report about a bug that it has found. So to do this, there are three main steps. The first one is to generate a concurrent test. And we've already seen examples of these tests, and we'll see in a second how they are generated. Then the next step is to execute this test. And finally, there's a thread safety oracle, which we'll also see in a second that looks at the execution of this test case and determines whether there was a thread safety violation or not. If there was one, it's going to be reported as a bug. And if not, then the approach goes back to one of the two earlier stages. So either it goes back to execution, which basically means it's just executing the same test again, hoping that in a different execution, it's going to hit different behavior that maybe exposes a bug. Or it takes this arrow back here and goes back to generating another test, hoping that maybe another test is going to expose a thread safety bug. So let's start by looking at how the generation of these concurrent tests works. So the example that you've seen before is actually a generated test that has been generated by this algorithm. And each of these tests consists of three parts. One is what is called the sequential prefix. So it's essentially a sequence of statements that creates and then sets up an instance of the class under test, for example, by just calling the constructor and then calling one or more methods on it. And then we have two parallel execution threads that each execute a so-called concurrent suffix. So the key idea here is that these concurrent suffixes are using the shared instance of the class under test and each are calling methods on this shared instance. To generate such a test, the test generation algorithm takes three steps. The first one is to generate this prefix. So essentially here it instantiates the class under test and calls some methods on it. And once it has done this, it moves on to step number two, where it's creating multiple suffixes for this prefix, which basically adds calls on this shared instance of the class under test. And then once the algorithm has produced a prefix and at least two suffixes, it puts them together into a test, which basically looks like what you've seen before, so where you have the prefix first followed by the two concurrently executing tests. And all of this generation of the method calls and as a result also of the prefixes and suffixes happens through so-called feedback directed random test generation, which we'll see in more detail in a second. Let's look into this algorithm in some more detail and let's start with the first step, which is to create this prefix for our test. So in the prefix, we start by instantiating the class under test. So we wanna call one of its constructors and this happens by basically randomly selecting one of the available constructors. So let's say our class under test is string buffer and let's say we are randomly selecting this constructor that just calls the default constructor without passing any arguments, then we would basically have this call here. Now, whenever the test generation algorithm is adding a call or a constructor call to the test, it's executing this entire test that it has at this point in order to check whether this call or constructor call yields an exception. And only if it does not yield an exception, it's continuing. So this is an idea that is similar to what you've seen earlier in the lecture when we talked about rendoop and its feedback directed random test generation. Now here for the simple example, if you just execute this test that we have so far, we will see that it runs fine without any exception so we can continue extending this prefix. Now to extend this prefix, the algorithm wants to call some methods on this newly created instance of the class under test. So it will randomly select some method. Let's say it randomly selects to call the append method which requires a string argument. And now in order to get an argument, it goes through a couple of different options. So one option is to take one of the already available objects that we have. So if we had a string object in this test, we could just use that. The second option is to call another method which is returning an object or a value of the required type. And the third option is to just pick a random value. So for the sake of the example, let's assume the algorithm is picking a random value, let's say A, B, C, and then it has extended this test with a new call. So it will again execute this extended test to check whether this leads to an exception because if it does, then we should not use this prefix. But here everything is fine. So we are basically done creating a prefix. Of course, we could also add more methods, but for the example, let's just assume that one is enough and this is the prefix that we get. So let's now look into the second step which is creating suffixes for the already created prefix. And what we essentially wanna do here is after the object has been set up in the prefix, we wanna call some more methods on this shared instance of the class on a test. So we start with the prefix that we have already created and then in order to call more methods on it, we again randomly select one of the methods that this object is providing. So let's say we are selecting insert and insert has multiple variants. So let's say we take the one that takes an integer argument as the first argument and then a character sequence which tells us what to insert at a given index. So that means again, the algorithm needs to decide what arguments to use here for the integer and for the character sequence. And again, it has these three options of taking any available object that has a compatible type or calling a method that returns a value of the required type or just picking a random value. So let's say it takes a mix here of option A and C by taking a random value minus five for the int and the existing variable B which is compatible with the character sequence type that we need because string buffer is also a character sequence. So now this is the call we have added now and now the algorithm is trying out again if this leads to an exception or not. So it will execute both the prefix and this so far created suffix. And if it does that, it'll actually get an exception. And the reason simply is that we cannot insert anything at index minus five. So we get an index out of bounds exception here. So now this is just a sequential problem. It doesn't have anything to do with concurrency or threat safety. So the algorithm does not want to have this kind of suffix but instead goes back to the previous step where it's trying to find better arguments for this call that do not lead to an exception. And now let's say it now randomly chooses one and B as the arguments. So we again execute this entire prefix plus partial suffix and now in this case we do not get an exception which means we have created a suffix that is fine. So now we could of course add more method calls to the suffix but for the sake of the example let's assume we are done and then move on to the second suffix that we also want to have and that we also create in a similar way to before. So let's now assume we add this call in the second suffix that calls delete character add with index one. If you execute just the prefix and the second suffix one after the other we will see that there's no exception so everything is fine which means we basically have created another suffix that the algorithm can continue to work with. So now at this point the algorithm has a prefix and two suffixes so it'll put these together into a complete test which basically just works by spawning a new thread for each of the suffixes after the prefix has been executed and this gives us exactly the test that we've seen earlier which if you execute it and are lucky to trigger the right interleaving will expose a threat safety bug. All right, so zooming out a little bit here's the overview of the approach again. You now have seen how to generate concurrent tests. Now these tests are executed we will not look into detail of how this works and practice quantity is just repeatedly executing the test on the standard Java virtual machine but you could use more sophisticated techniques such as the one that we'll see in the fourth lecture of this on the fourth video of this lecture. So now instead of looking more into the execution let's now have a look into the threat safety oracle which is trying to find out whether a given execution of this generated concurrent test is exposing a threat safety bug or not. Okay, so let's have a look at this threat safety oracle and let's see how it figures out whether a given test execution is actually exposing a threat safety violation or not. So there are two key ideas here. One is that the oracle is focusing on very clear signs of misbehavior, namely exceptions and deadlocks. When any of these two happens and the programmer does not expect it, it's obviously bad. And the second idea is related to the definition of threat safety itself and this idea is to compare the concurrent execution of the given test case to linearizations of this test case to basically check if the misbehavior, the exception or the deadlock that we are seeing in a concurrent execution could also happen in a linearization of this test. So what does linearization mean? Linearization essentially means we're putting all calls that happen in the concurrent test into just a single threat while preserving the order of the calls within that threat. So let's say we have a test case that looks like this where we have some prefix up here followed by two concurrently executing suffixes. One executes some statements one and two and the other one executes a statement three. Then we would have three possible linearizations of this test case, namely the one set you see here. So they always have the suffix at the beginning followed by all the calls in the prefix but preserving the order of calls within the individual threats which basically means we're never swapping the order of one and two what we may put three in between or before or after these two calls one and two. So now given this idea of linearizations let's have a look at how the Oracle figures out whether an execution of a concurrent threat exposes a threat safety problem. So it starts by executing the test concurrently and in this one concurrent execution of the test it checks whether there is an exception or a deadlock. If there's no such misbehavior then we're basically done with this one execution. It has not exposed any threat safety problem and there will be no further analysis of this execution. In contrast, if the Oracle sees an exception or deadlock the question is whether this could also happen in one of the linearizations of the test. So in this case it's trying out a linearization of this test case and checks if the same failure also happens. If the same failure also happens it basically means that well, okay it's an exception or deadlock but it could also happen if you just call this these two or more concurrent methods in a single threat which means it's not a threat safety problem it's not even a concurrency related problem so there's no need to report anything to the user. But if the same failure does not happen in this linearization and if it also does not happen in any other linearizations so basically the algorithm has checked all possible linearizations of the concurrent test and hasn't seen the same exception or deadlock then and only then a threat safety violation is reported because then we know for sure that there actually is a threat safety bug in our class under test. So let's illustrate this idea of the Oracle again with our running example. So here's the generated test that tests the string buffer class of the JK and as I've said earlier if you execute this test concurrently and you're lucky enough to hit the right interleaving you will actually get an exception. So now the Oracle is checking whether this exception can also happen in one of the linearizations. In this case there are only two linearizations one where we take the call from threat one first followed by the call from threat two and if you do this you will not get an exception and the other one where we just swap the order of the two calls from the two threats so after the prefix execute threat two's call and then threats one, threat one's call and in this case we also do not get an exception and this means that the exception is actually a threat safety violation and this is gonna be reported by the Oracle. Good, so now that you've seen how this Oracle works let's take a step back and think about what properties this Oracle is actually giving us. So turns out this Oracle is sound but incomplete which here means that all reported threat safety violations are real so there are no false positives but whenever the approach says that there is a threat safety problem then indeed there is one. But on the downside the Oracle cannot guarantee to yeah that the class that is tested is indeed threat safe because it may not see some more subtle misbehavior that for example does not result in an exception or a deadlock. What is nice about this Oracle is that it's independent of the bug type so in contrast to for example the eraser approach that we've seen in the previous part of this lecture it is not just looking at data races but it can also detect other kinds of concurrency bugs including data races but also atomicity violations or deadlocks for example. And as long as any of these bugs manifests through an exception or a deadlock the Oracle will be able to find it. Finally let me just quickly tell you about some results that this test generator has obtained so it is implemented for Java classes and then was applied to popular threat safe classes from the JDK itself but also from various Apache libraries and in total it could find 15 concurrency bugs that were not previously known in these classes including some previously unknown problems in the JDK itself which is a nice finding because that's a piece of software that is used by many, many people. In the version of the tool that I've talked about here it has taken between several seconds and several hours in the worst case 19 hours to find a bug so it was actually pretty compute intensive and one of the reasons is that this random generation of tests doesn't really look at what kind of concurrent behavior has already been seen. And in a follow-up piece of work that a master student in my group has actually worked on we could reduce this worst case time to several minutes so the overall approach has become much more efficient and the key idea here was to look at the coverage of the, that the test cases achieve and to try to cover new behavior more often so that we do not repeatedly test the same kind of behavior. All right and this is the end of video number three in this lecture on analyzing concurrent programs. You hopefully now have a better idea of what threat safety means and also have seen how to automatically test whether a class is indeed threat safe by generating tests at random and then comparing their concurrent behavior to linearizations of the concurrent test. Thank you very much for listening and see you next time.