 Hi and welcome back to the program analysis course. This is the second video of the lecture on random testing and fuzzing. And what we're going to do in this second video is to look at an approach called RANDUPE, which is an implementation of feedback directed random testing for object oriented languages. If you are interested in more details on this work on top of what we're discussing here, then you should have a look at this paper, which has appeared a couple of years ago at Xe and gives a lot more details and a lot of more examples about this tool. And the RANDUPE tool is also available for download, so you can even try it out if you want to. Before looking into the details of that approach, let me just show you a few examples of tests that RANDUPE can actually generate. So RANDUPE is a tool for testing Java classes and the methods that are provided by these classes. And in this example here, let's assume that we want to test the hash set class that is provided by the Java user library. So what the test generator RANDUPE would do is to generate these tests that somehow create an instance of the class under test and then call some methods on it. And then at some point, eventually have some assertions like this one, which is checking that the set that you have created is equal to itself, which by the definition of the contract of the equals methods should always be true. Here you see another test, which again creates a hash set and then again calls add on it, calls another method and then again checks the same assertion here. Now, if you look closely at these two examples, you'll notice that this is the only difference. So this call to is empty. And if you think a little more about what hash sets actually do and what this call means, then you'll see that this is actually a redundant test because the first and the second tests are testing essentially the same. So if you have already tested the behavior using the first test, there isn't really a need to also generate the second test. And this is one of the ideas that this RANDUPE approach is implementing. As another example, let's assume we are testing a class that implements dates and we do this again by creating an instance of that class and then maybe call some methods on it. And then at some point, we will again have some assertion that in this case again checks whether the object is equal to itself. Now, if you look at the second and third tests that we have here as examples, then you might see that some of the inputs that we are providing to this class may actually violate the preconditions that these methods have. So for example, the set month method is likely to expect a value between one and 12 because this is the usual range for month. And if you pass minus one, then what probably happens is that this class is throwing an exception. And once you've actually violated a precondition in a way where the class is throwing an exception, there's actually no need to generate more calls or maybe assertions in the same test, simply because you know already that at this point, the test is always going to throw an exception. So it doesn't really make sense to extend it any further. And in Rendup, this idea is handled as so-called illegal tests, which are tests that for some reason, for example, because an exception is thrown, are not generated, simply because it does not make sense to continue testing once an exception has been thrown. Now that you've seen some examples, let's talk about the more general idea behind this feedback directed test generation that Rendup is implementing. So the idea is that you're creating tests in a random manner while using feedback from the execution of already generated tests. So the idea is to guide the creation of new test inputs. In this case, a test input is a sequence of calls to a particular class and a test by feedback about the execution of previous inputs. So previous tests that you have already generated. And while doing this, the test generation approach is trying to watch the two kinds of tests that we have already seen. Namely tests that are redundant because they are essentially doing the same thing over and over again. And tests that are illegal because they, for example, violate some precondition that the code on a test has. As a heads up Rendup, this tool that we're talking about here is not perfect in achieving these goals. So it may actually generate some inputs that are redundant and it may also create illegal inputs, but it tries to avoid these cases. And it's definitely better than a purely random approach that does not use any feedback. So as you've already figured out from the examples, a test input here actually means a sequence of method calls to a particular class and a test. So the software or the program that we're testing here are classes and in this case, classes in a Java like language. The key idea of feedback directed random test generation is that the test inputs are created incrementally. And that basically means that a new test input is not just always created from scratch, but it actually extends the previous test input so that we are basically building better and better test inputs because we already know that the partial test inputs have some desirable property. So essentially what happens in this approach is that as soon as a new test input is created, the program is executed with this test input in order to see if it has the properties that we want to have. Specifically, this means that the execution result is used to guide a test generation away from redundant or illegal method called sequences. So like the examples that we've seen before, but instead towards sequences that create new object states. So basically bring the class and a test in states where it hasn't been before. So there are many different ways how you could implement this idea of feedback directed random test generation. And the one that we focus on here is this tool called Randoop. So Randoop is one implementation of feedback directed random test generation. Let's first have a look at what the input and the output of Randoop are. And then we will look into the algorithm that actually makes this whole tool work. So the input are three things. One is a class under test or also a set of classes under test. So Randoop can just test one class or multiple classes that, for example, belong to the same library because very often classes depend on each other. And if you have more than one class available, it's actually easier to test them all together. Then you can also set a time limit where you're basically saying how long Randoop should test these given classes under test. So it will basically continuously generate more and more test cases within this time limit and then give you all the test cases that it has generated. And then what you can also provide is a set of contracts which can be two things. So one is they can be so-called method contracts which say that a particular method called should have a particular behavior. For example, you could say that on any object if you call the hash code method in Java, then it should never throw an exception, which is essentially what the documentation of that method is also saying. Or you can also specify so-called object invariance, which are properties or conditions that should always hold on any object or maybe on objects of a specific class. For example, you could say that any object dot equals the same object again, should always be true, which again is something that is informally also specified in the Java doc of the equals method. Now, given these three inputs, the classes under test, the time limit, and a set of contracts, what RANDU will produce as the output test cases that also include assertions. So it's not just calling methods of the class under test to exercise that class, but the test cases also have assertions or oracles in order to check that behavior fulfills the set of contracts that is given as an input to RANDU. So as another now very concrete example of a test case that RANDU is able to generate, let's have a look at this test case here. So this is actually a test case that is mentioned also in the RANDU paper and which actually has exposed a bug in the Java Utils Library a couple of years ago. So the class under test are different classes in the Java Utils package like HashMap and LinkedList and also TreeSet. And what this test case is doing is to create a collection here in the form of a HashMap, then gets just the values of this HashMap, which will be an empty collection, turns this into an array of objects, puts this array into a newly created LinkedList, then transforms that LinkedList into a TreeSet and then eventually turns this TreeSet into a so-called unmodifiable set and then checks this assertion that the resulting collection is actually equal to itself. So while this may not look like the typical way of using these Java Util classes to you, this is actually a legal way of using these APIs and the bug that is exposed by doing this is a bug that could of course also be triggered in other scenarios. So just because this test looks like a generated test, it does not mean that the bug cannot occur in more realistic scenarios. And now, why is this a bug? Well, simply because this assertion fails when this test gets executed. And the reason is, as I said, that there was a bug in the underlying JDK, which implements these Java Util classes. Before this assertion fails, it's worth noting that none of the contracts is violated up to this point and this is actually one of the properties that random is checking while it's generating tests by making sure that everything looks fine according to the contracts that it knows about. But then at some point, if there's actually a violation of a contract, then this is given back as a failing test to the developer who can then try to fix the bug. Now that you have an idea of the all idea of feedback directed random test generation and have seen a couple of examples, let's now get into the algorithm that makes all of this possible. So here it is. There are basically two steps, but the main magic happens in the second step. So what random is doing is to build a set of so-called sequences, which are essentially statements that are chained one after the other and that create some kinds of objects. And in order to initialize this whole creation of these sequences, it also has a couple of so-called seed components which are simple statements like this one where we are creating an integer or that one where we happen to create a value of typolian. And then given these seed components, the main action happens here in step number two, which is a loop that is running until the given time limit expires. So what this loop is doing is two things. At first it's creating a new sequence. So it's creating a new piece of code that creates some objects and by doing this also tests the class and a test. And then it's classifying this created sequence in order to find out whether it should keep it and extend it or maybe report it back to the user. We look into the second step more in a minute. So let's first go through this first step of creating a new sequence in some more detail. So what is done here is that the test generator starts by randomly picking a method from the classes under test and I'm representing methods here using that format which basically says that there's a type T zero which is the type of the class that implements that method where we call this method M, then there's some types of the arguments expected by this method and then there's some type T red for the return value that this method is going to create. Now, for each of the T's, we need, except for the T red, we need some value that we can actually use in order to call this method. So if you do not have a value of T zero, then we just can't call this method and the same goes for the arguments because we need to pass some arguments of the correct type into the method. So what the test generator here does is to look at all the components or sequences that it has already created and then tries to pick one that has the right type and uses this object of the right type then in order to call the method that this method M that we want to test. So basically for every type T I, it's constructing a value VI of that type based on the components that it already has created at some point. And this then results into a new sequence which is called S new down here which first has all the previous sequences that are needed in order to get values of the right type. And then we have this new call which we add at the end where the method M is called given the additional values that we've created to be used as arguments. Now it may be that this newly created sequence is lexically exactly the same as one of the previously already created sequences in which case there's no need to execute it again and the whole algorithm just goes back here to the first step where it's trying to create another sequence that lexically is different from anything we've seen before. If this is not the case, then this sequence is actually classified which means that it will be executed and then based on what happens during the execution it's either kept or discarded as we'll see on the next slide. So here we see what happens while a sequence is classified. So Randoop starts by executing that sequence so basically executing that generated test and checking all the contracts that the user has given to the tool. So if any of these contracts is violated here it means we have already found a bug and this is something we should report to the developer. So in that case, what Randoop does is to try to minimize that sequence by basically getting rid of some calls that are not really needed in order to trigger that violation of the contract. And then after that, the result is a contract violating test case which will be reported to the user. The other case which is more common in practice is that the contract is not violated. And in this case, there's another check which is whether this sequence is redundant and we'll see what redundant really means but the idea is that a redundant sequence is equivalent to some other sequence or some other test that we have already created in the past. And therefore there's no need to add this sequence to our components. So in that case, the sequence is simply discarded but if the sequence is not redundant meaning that it, for example, creates an object that is different from the objects that we've seen before then the sequence is added to the set of components and will be used in order to create more tests in the next iteration of the loop. So what does it really mean that a sequence is redundant? Well, while generating tests or sequences of method calls, Rendup is maintaining a set of all the objects that it has already created. And then it says that a sequence is redundant if all of the objects created during the execution of a sequence are already in this set of already created objects. So basically it checks whether there's a new object that it hasn't seen before and only if there is such an object, the sequence is kept because it adds something new and otherwise it's considered redundant and will be discarded. So one question is how to actually compare objects. So one thing you could do is to actually compare the objects in a precise way by looking at the state and transitive state they are containing which could be done through some kind of heap canonicalization because you somehow need to compare different objects and maybe see that they are equal even though they are arranged slightly differently in memory. But this is actually not what Rendup is doing because doing that would be pretty expensive but instead Rendup uses the equals method to compare these objects which depending on how the class is implemented may perform a very shallow comparison or may perform a deeper comparison depending on what this class has implemented and what the implementers of this class believe to be a good notion of equality for that given class. And then if there is an object that is not equals to any of the previously seen objects then the sequence is new and is not redundant and otherwise it's discarded because it is redundant. So let me illustrate these ideas and this algorithm using a concrete example and for this concrete example let's assume that the classes under test which are given to Rendup are all the classes in the Java utility package. So it's everything in this package. So the very first step after initializing the components will be that Rendup is picking one of the methods from the given classes under test. And for this example, let's assume that it's picking the new hash map method so the constructor of hash map. So the next thing it does is to check whether any other objects are needed in order to call this method and in this case the answer is no. So there are no values needed simply because there is a default constructor which does not take any arguments. So what it will do is to create a new sequence which simply consists of a call to new hash map and the result will be stored into some variable. Let's call it M here in order to use it for the remainder of the test. Now before deciding that this is a good test to keep what Rendup now does is to go into the second step of the main algorithm where it's classifying this newly created sequence in order to check two things. So one is whether this is actually a sequence that exposes a violation of any of the given contracts but in this case, this is not the case because there's no exception thrown and none of the other contracts is violated here. The second property that is checked here is that the sequence is not redundant and because this is the very first sequence we are creating, there is no other hash map that looks like the M object that we have created here. So this is a good sequence so to say and as a result, what Rendup will do is to add this sequence to our set of components. So then the algorithm moves on in the main loop and Rendup will again go out and pick a method. And for the sake of the example, let's just assume that Rendup is again deciding to use this new hash map method. So again, the default constructor of hash map. So it will again create a sequence. Again, there are no arguments needed but I'm not writing this again. But what will happen is that it creates a new sequence which tries to again store the return value into some variable. Let's call it M2 and then calls this default constructor of hash map. And now before keeping or maybe discarding this sequence, the algorithm will try to classify the sequence which means it's first executing it and after every call checks that there's no violation of a contract and in this case, there is no such violation, so this is good. But it turns out that the object that is created by this sequence, this object stored in M2 is actually equivalent to one of the objects that we have created earlier, namely the one that we had created here called M because they are basically the same kind of object. It's a new hash map without any elements in it and therefore at this point, this sequence is discarded and not added to our set of components. So moving on with the example, let's continue with the main loop where our random again picks a method and to make things a bit more interesting now, let's assume that this time it's not yet another constructor call of hash map but this time the randomly chosen method is hash map dot values, which is a method that needs some hash map in order to call this method on, which means that random needs a sequence that constructs a value of type hash map. So what the algorithm will do at this point is to look at all the components that it has already created and luckily it finds this sequence that we have decided to add to our components here in step two, which was this call of the constructor of hash map. So it'll reuse this sequence from step two and then create the following sequence where we are using this sequence from step two, which was this call to new hash map which stores the newly created hash map into M and then we now can call on this method M the values method and because this also creates another object which may be useful later, this is actually stored into a variable which we call C here and which will have type collection. Now, of course, before deciding what to do with this method, random will classify it, which again means that it'll check whether calling this method violates any contract and in this case, there's actually no contract violated and also because this sequence is creating an object namely that connection at collection C which we haven't seen before, this sequence is not redundant and therefore the algorithm decides to keep the sequence and to add it to our set of components which will then be used to create more inputs or more sequences of calls. So let's finish this example here but essentially this would now go on and on and on until a random runs out of time and actually then it has hopefully created many tests including hopefully also some that are triggering interesting behavior for example, violations of the given contracts. Of course, testing is only useful if you have some test oracle. So some kind of assertion or check that tests whether the behavior that you see during the execution and the state that you get at the end is what you expect or maybe something else. So random uses two kinds of oracles in the test that it's generating. One of them is what you have already seen and these are these given contracts where it, for example, inserts these assertions that an object should be equal to itself. This is useful to find test cases that violate some assertion like this one that you definitely want to be true and if you find any such test case it means you have found a bug. The other option is that random also outputs oracles that are basically just testing the behavior that the class under test currently has. So it's an oracle for the normal behavior of the class. For example, if it has created this list called L here and if it knows that calling size on it returns two because it has called size on it and it has seen that it has returned two then it'll insert this assertion that just makes sure that this is what really happens or if it has seen that the isEmpty method on this list returns false then this assertion will be added here. So why are these kinds of assertions useful? Because after all they are only checking what we know anyway. Well, the reason is that you can use these oracles these assertions here as regression tests. So you can generate these tests for one version of the class under test and then check a future version of this class to make sure that it actually still behaves the way it has behaved before because otherwise clients of this class will be surprised because it turns out to not be backward compatible. As a little quiz to check if you have understood these ideas of feedback directed random test generation and specifically the RANDUPE tool that I've been talking about here. I have three tests here and my question for you is which of these tests may actually be created by RANDUPE and I invite you to stop the video at this point in order to think a little bit about this before you listen to what I'm going to say next. So now that you've hopefully thought about it yourself let me give you the solution. So the first test is actually one that could not be generated by RANDUPE in this form and the reason is simply that there is no oracle in this test. So RANDUPE always outputs either an oracle that shows that there's a violation of one of the given contracts or an oracle that tests that the behavior is the same as observed while generating these tests. The second test actually crashes and therefore will also be discarded by RANDUPE because if you try to get the value at index minus five out of a linked list the linked list will give an assertion back to you which means that one of the preconditions of this method has not been fulfilled and in this case RANDUPE is not continuing to extend this test case and is also not giving it to the user because apparently it has created an invalid test. The third example is actually one that could be generated by RANDUPE because here nothing bad like an exception or any other kind of assertion violation is happening but then at the end we do have this oracle that is checking that the first element of this linked list is actually seven. So this is an assertion that in this case is probably not violated but RANDUPE will still emit this as a test that you can run in the future in order to check that the implementation of the linked list class actually still has this desired behavior. Finally, let me quickly mention some of the results obtained using that implementation of feedback directed random test generation. So RANDUPE has been applied to various data structure implementations and popular library classes. Depending on what kind of classes it's testing it achieves between 80 to 100% basic block coverage which is pretty high. So it's testing most of the code that it's given. Of course this all depends a lot on what kind of class it really is applied on but for the class that it has been applied on it was pretty successful. While doing this it has also discovered various bugs which is more important than coverage of course because the purpose of testing after all is to find bugs. For example, it has revealed bugs in the JDK collections in some classes from the .NET framework and in various Apache libraries. If you wanna know more details yeah, you're invited to read the paper that is also mentioned on the lecture slides and yeah, there you can learn more about these results and of course also about how the tool works in general. All right, and this is already the end of video number two in this lecture on random testing and fuzzing where we've looked into one form of black box random testing that is using feedback from the execution of the code and a test in this case the classes in order to guide the test generation to create better tests in the future. Thank you very much for listening and see you next time.