 Hi, hello, and welcome back to program analysis. This is the lecture on symbolic and concocted execution. And we are now in part three of this lecture, where we will look at a technique called concocted testing, which is essentially a combination of symbolic execution, as we've seen it in the previous two parts of this lecture, and concrete execution in order to address some of the challenges that we've seen in the previous video. Most of what I'm talking about here is also explained in this paper on Dart, which was the first paper to introduce this idea of concocted testing. So what is this idea of concocted testing? So as the name suggests, it's basically a mix of concrete and symbolic execution. So concocted just puts together concrete and symbolic. And the idea is to perform these two kinds of executions side by side. So instead of just concretely executing one path without reasoning about why we are taking a path or trying to reason about all paths concurrently, we are now doing something in the middle, which is to just execute a program with one input, but at the same time, gather path constraints that basically tell us why we are taking a particular path in the program. And then once this execution is done, you can negate one of the decisions that have been taken in the previous execution, re-execute the program with a new input that then will trigger another path. To illustrate this idea of concocted execution, let's look at this example here, where we have this function called test me. That's the function we are going to test. It has these two inputs X and Y, and it relies on this helper function called double, which is called down here, which essentially just takes the argument and returns the doubles it and returns the result. And then depending on this return value of this helper function, we have this check here, another check that only depends on the inputs X and Y. And if both of these conditions are true, then we are reaching this point where we would throw some exception. And this is actually the interesting part of this program. So the question is, is there an input where we can reach this throw statement? So before looking into what concocted execution would do with that example, let's first look at how symbolic execution would work here. So symbolic execution, again, let me just draw this execution tree would start by doing all the initial assignments. So X would have the symbolic value X zero, Y would have the symbolic value Y zero. And then we have this, before reaching the first branching point, we have this call to double, where we are assigning the result of two times what is given to double. So two times Y zero to Z. So this will be two times Y zero. All right, and now we are reaching the first conditional where we are checking whether Z, which in this case is two times Y zero is equal to the value of X, so X zero. And then as every conditional, this can be true and false. If it is false, we don't do anything because we just don't take the outer if and the program is done. If it's true, then we are reaching another conditional where we are checking the current value of X, so X zero plus, sorry, and we're checking whether this is greater than the current value of Y, which is Y zero plus 10. And again, this can be true and false. In the true case, we would actually reach this assertion. So basically, if you take this path down here, we would reach this case where the error is thrown. So as you can see, symbolic execution works well again on this simple example, but as we have also seen in the previous video, there are lots of cases where pure symbolic execution just does not work. And therefore, we will now look into this alternative idea of concordic execution, which combines concrete execution and symbolic execution. And to do this, we will execute the program multiple times and we'll start with the first of these execution called execution one. For each of these executions, I will basically draw a little table where we look into what's going to happen during the concrete execution. At the same time, we look what's happening in the symbolic execution, which we'll do side by side with the concrete execution. And while the program is executing, we will also gather some path conditions that basically tell us why we are at a particular point in the program. So because this is a concrete execution, we need to start with some concrete input values and let's just assume that they are selected randomly for the first execution. And let's just assume that we start with these values, so x is 22 and y is seven. At the same time, we do what a symbolic execution would do. So we are assigning the initial symbolic input x zero to x and the same for y. And then we are calling this double function where we are now updating some of the concrete and some of the symbolic state. So basically after this first step here, we would then call double and assign the return value to z, which means that x and y do not change, but z would now get a value and this would be 14, two times y. And on the symbolic execution side, we also update some value, namely that of z, so x and y also here stay the same and that is also updated, but it's not updated with the concrete value 14, but with this value of two times y zero, because we only reason about the symbolic state of the program during the symbolic execution. So after executing that statement, the next thing that will happen in the concrete execution is that we reach this outer if where we are checking whether z is equal to x, but 14 is not equal to 22, which means we will reach the end of this execution and basically reach the end of the block. And at this point, we still have some concrete and some symbolic state, which hasn't changed, but just for completeness, let me still copy it. So at the end of this outer if block, we will have this concrete state and that symbolic state. And what is different now is that we also have a path condition because we only reached the end of this outer if block because we have checked this condition, which was that two times y zero is not equal to x zero. So this condition has been true and this is why we have actually taken this step. So just to clarify where we are now, so this is basically after the outer if. So now that the concocted execution engine has executed the program once concretely and gathered the symbolic state and the path conditions on the side, what it will do next is to try to trigger a different execution path and it'll do this by basically negating one of the conditions that we have in this conjunct of path conditions. In this case here, we have just one path condition. So what it will do is to negate this one condition. So that means it'll give the following to the solver. It'll ask for some input where two times y zero is equal to x zero. And the solver will say, well, okay, that's easy to solve. I'll give you a solution. And for example, the solution could be that x zero is two and y zero is one. And now given this solution, what the concocted execution will do is to execute the program again. So we will now look at execution number two. And just keep in mind that these are real executions of the program. So if there would be some call to say a file system API, then this call would actually be executed. And we wouldn't have to stop there because the symbolic execution can't handle it. So now in this second execution, we'll again look at the concrete execution and the symbolic execution side by side. And while doing this, we will also again gather the path conditions that explain why we are at the program location where we are. So we'll start with the inputs that have been suggested by the solver based on the first execution. So we will now start with x being two and y being one. And we will use those as concrete inputs to now re-execute this program. On the symbolic side, we'll do what we'll always do at the beginning. So we just assign these initial values x zero and y zero, two, x and y. So again, this is the state that we have after entering the function. And then we are reaching this call to double again where we will update our concrete state by leaving x and y the way they are. But now we will say that z now has value two because this is two times the value of y. And on the symbolic side, we do the same as before. We update z to here have the value of two times y zero. We do not yet have any path conditions simply because we have not yet reached any conditional during this execution. So now after this call to double, we will reach the outer if and we will now again check whether z is equal to x. If you look at the concrete values, you see that yes, they are equal because both of them are two, which means that now we're taking this outer if. So this leads us to a new state in our program where the concrete state is exactly the same as before. And also the symbolic state is the same as before. So I'm not copying it. It's just the same as above. But we now have a path condition which tells us that two times y zero has been the same as x zero. And this is why we are now inside the outer if branch. So this down here is the state after the outer if, which we now have taken in contrast to execution number one. Because we have taken the outer if, we are now reaching the inner if where we are now checking whether the condition x is greater than y plus 10 is true. This check doesn't really change our concrete or symbolic state, but it'll tell us that while this condition is actually not true. And this is something we will add to our path condition because that essentially tells us why we are not taking this branch. So the path condition now gets a second conjunct. So the first one was this one from up here. And then we say, and now we have the negation of what is checked in this inner if where we have checked whether x, in this case, x zero is larger than y, but it wasn't. So it's now the condition that we have is that x zero is smaller or equal to y zero plus 10. So that's the state we'll have after the inner if and because we do not take the inner if branch, we are done with this program because we've also left the outer if branch then. So when concocting execution reaches the end of the execution, it checks whether there's some branch that we have not yet executed. And indeed there is because we still haven't taken this branch that will lead us to throw in the arrow. So now what happens is that the concocting execution engine takes the path condition and negates one of the conditions that we have not yet negated, which in this case means it's going to look at this second part here of our conjunct and tries to negate this and then ask the solver whether there is a solution for the negated path condition. So essentially what it'll do here is to solve the following equation or formula actually where we say two times y zero is equal to x zero and now we are negating what we see in the path condition in the table where we now say x zero is greater than y zero plus 10. And the solver will think a bit and then come back with a with an answer which says, hey yes, this is actually satisfiable. There is a way to assign concrete values to x zero and y zero so that this entire equation is true. For example, this is the case if x zero is 30 and y zero is 15. And now what will happen next is that we are entering a third execution which just to save some time I will not draw but essentially what will happen is that it will enter the first if it will also enter the second if and then it will actually throw the error. So the solution that we've gotten here out of execution two will lead to an execution that leads us to this not yet covered branch which then also happens to hit the error. So after you've now seen this concrete example let's think a little bit more abstractly about what happens during concordic execution. So again, there's this notion of an execution tree. So let me just draw some tree for you. And what symbolic execution would do with this tree is to basically explore all of its paths simultaneously by reasoning about what would happen if you take this path or what would happen if you take that path. And now as we've seen during concordic execution it's a little different because here we are not reasoning about all the different path concurrently but actually execute one after the other. And we'll see once I've done withdrawing this tree what the underlying principle of this exploration of the possible paths is. So let me just finish this tree. Let's say it looks like this. Okay. And now as we've seen there are multiple executions of the program that try to each time take a different path through the tree. So let's say execution number one goes like this. It starts here, it takes this branch and then happens to take that branch and then let's say happens to take that branch. Then what concordic execution tries to do next is to negate one of the decisions. So it could either negate this one or that one or that one in principle. For the algorithm that we look at here the idea is to always negate the last decision that was taken assuming that it leads us to a branch that we have to a path that we have not yet taken. So in this case, this would mean that it will negate the decision that we have taken here and inverts this one or negates this one part of the path condition. Then this is given to the resulting logical formula is given to the solver, which then will give us inputs that lead to another execution where we are now taking the same path up to this point. But because we have on purpose negated the condition or the outcome of this condition, we will now take that branch here. So as you can see, this is a different path. So the concordic execution engine has discovered some new behavior. And now of course wants to do this again. So it again looks at some condition that it can negate in order to get a different path and it takes the last condition in this, in the last conjunct in the path condition, which leads to a not yet covered path, which means in this example, it'll negate this one. So green is execution number two. And now negating this condition here will lead us to, will give us some input that then leads to a third execution, which again will lead us to this point. But now because we've negated it, we go here. And now we don't really know what will happen. This really depends on the program and the solver doesn't know anything about it yet, but let's assume in the third execution, we are taking this path. And then this same idea goes on and on and on. So it continues to pick conditions to negate, which will then again lead to a different path. And this goes on and on and on until someone stops the tool because it has taken enough time or maybe until it has actually covered all paths. So now that we've seen this idea that concocted execution is implementing on a concrete example and also have looked at it abstractly on the level of this execution tree, let's have a look at the underlying algorithm. So essentially there's one big loop that is repeated until either all paths are covered or someone stops the tool or maybe until some interesting behavior has been discovered. For example, the violation of some assertion or maybe that some exception is thrown. So what happens in this loop is that the program is executed with some concrete input I. And while this program is executing, the symbolic constraints are gathered at all the branching points that are taken. And these constraints or the path conditions are stored in this capital C. Then one of the constraints is negated in order to force the program to take another branch, an alternative branch B prime which leads to a set of constraints which we here call C prime. Then the algorithm calls a constraint solver to find the solution for this set of constraints C prime which if there is such a solution means that we'll get some new concrete input I prime. It could also be that there is no such solution which basically means that this path that we try to cover is infeasible. And in this case, the algorithm will move on and try to negate some other branch. But let's assume that there is a concrete input I prime that fulfills the constraints C prime. Then we'll take this new input I prime and now execute a program again which hopefully covers now this branch B prime that we intended to now take. It could also happen and we look at this in the next slide that at runtime B prime is actually not taken and we'll see some example for why this could happen and this is called a divergent execution. But in most cases, actually B prime will of course be taken because we've chosen the constraints in order to get an input that will lead us along this alternative branch but then a new path through the program will have been executed. So to understand this problem, this potential problem of a so-called divergent execution, let's have a look at a concrete example. And this concrete example is a simple function which takes one argument. And what this function does is to have an if here and that if depends on a call of this random API which is just returning a random value between zero and one. And what we're doing here is we're checking whether this random value happens to be smaller than 0.5. And if it is, we then have another check where we are checking whether a is larger than one. And if it is, then we, yeah, for example, write to the console that we are here. Okay, and there are no else branches. So if any of these conditions is wrong, we don't do anything. So now let's have a look at what concocted execution could do in this case. So it'll start with one concrete execution. And let's say in this one concrete execution, we start with a being zero, which is some random value that we just take as a starting point. Because a zero, the first branch is taken. So the first check that we have here will return true. So we take this first if, but then if you check whether a is larger than one, we'll see that this is not the case. So this second check returns false. And if we now look at this entire execution, we are getting a path constraint that looks like this. It essentially tells us that a zero, the initial value of a has been smaller or equal to one because that second if returned false. And then what the engine will do is to negate this one path constraint and give it to the solver, which means that we get some solution that fulfills the negation of the above path constraint. And this solution could, for example be, oh, sorry, could for example be that a zero is equal to two. So given the solution, what we would naively expect is that we will now take the first if again and also go into the second if branch. So let's see if this actually happens. So in execution two, the program will now be started with a being two. And the interesting thing is that now the first condition could evaluate to false simply because it depends on the random value. So it's something that is out of the control of this whole symbolic or concoctic reasoning about the behavior of the program. And this means that we have actually now not taken the path that we expected to take. And this is what is called a divergent execution. So now that you've seen how concoctic execution works and also have seen this corner case of a divergent execution, let's take a step back and just reason about what are actually the benefits of doing it this way. So the main benefit is that this concoctic approach still works when purely symbolic reasoning is just impossible or maybe impractical because in these cases, the execution engine can always fall back to concrete values and basically not do any symbolic execution for some part of the program but still continue to reason about the program as a whole and try to generate new inputs for it. So for example, if the program reaches a call of one of these native APIs or a system call or maybe some API that is not analyzed by the symbolic execution engine, then it can just simply execute this concretely and it'll not gather any path constraints while doing this but still continue the execution and move on beyond this call of this function that can be analyzed. The same is true when there are some operations that maybe cannot be handled by the solver. So for example, I mentioned that floating point operations are pretty difficult for many solvers and if any such floating point computation is triggered a naive implementation of symbolic execution will basically stop and say, oh, sorry, I can't handle this but instead with concoctic execution you can continue the execution by just concretely executing the program because any way you're executing with concrete values and then continue with the symbolic execution right after that. And this brings us to the end of this third video in this lecture on concoctic and symbolic execution. So you've now seen this variant of symbolic execution called concoctic execution which combines a concrete execution and a symbolic execution and by doing this helps to address some of the challenges that a purely symbolic execution would face. Thank you very much for listening and see you next time.