 Hi, hello and welcome back to program analysis. This is the second video in the lecture on symbolic and concocted execution and what we'll do in this second Part is to look into some of the challenges of symbolic execution so in the previous video I've introduced this idea of symbolic execution and we've seen that it can work pretty well On simple programs as the one that I've used in that video But we will now look into some challenges that occur when we're actually trying to apply this idea to larger programs And we'll also see how these challenges can be addressed so here's a list of five problems with Classical symbolic execution, which I will briefly explain now and then we will go through them one by one so problem one number one is about loops and Recursion because if you think about this execution tree that we've seen in the previous lecture And if you try to draw this tree for say a program that has a loop you'll quickly find out that actually this is an infant Execution tree and there's no way to really draw it and also that means there's no way to really reason about all the paths in this in this tree The second problem even if you do not have this This infinity because of a loop is that you may still have or will still have path explosion Which means that the number of path in this tree is exponential in the number of conditionals And in a large program you have a lot of conditionals. So this number of paths quickly reaches Yeah, an odd of magnitude that is just too large to completely reason about Yet another problem is about the environment in which a program is executing So no program no practically useful program is Is executing in isolation as the simple examples that I've been using in the first Video of this lecture, but instead real programs typically deal with a lot of native calls system calls library calls basically an environment of other parts of a larger Software system that also needs to be taken into consideration if you really want to reason about behavior for program Then there's some limitation of these solvers. So we've seen that these SMT solvers can solve some complex Equations, but we will see that there are also some limitations. For example, when it comes to values that are expressed That are floating point values because most solvers cannot really handle floating point values that well And finally, there's also a problem with programs that have complex structures on the heap So typically data structures that consists of objects and pointers between these objects because if you want to reason about those You somehow need to have a way to map the properties of these of these heap data structures into Yeah, the constraints and formulas logical formulas that are understood by the where the SMT solvers and how to do this in a Scalable way is actually not so easy. So after this quick walk through these five challenges Let's now have a look at the first two in some more detail And then let's also see how these challenges are typically addressed in practical tools that use symbolic execution Let's start by illustrating the problem that Occurs when your program has for example a loop and let's just do this by looking at a very simple function Let's call it f and it takes one argument Which I'll call a and then what we do here in the function is to assign this argument a to local variable say x and Then we have a while loop Where we are checking whether x is positive and As long as it is we're going to decrement x Okay, so it's a very simple function now if you think about the execution tree of this Function then we'll start with this initial edge here where we are assigning the initial symbolic values of a Yeah to to a so a zero gets assigned to a and then because of the first assignment We also have the same value a zero assigned to x now. We are reaching the first Conditional in this execution where we are checking whether a zero is larger than zero and Of course this check may return true or false if it returns false We are done because then we won't enter this while loop at all and the program Is is done if it returns true Then we'll have this x minus minus statement here on this edge in the graph and then we'll reach again the conditional of the loop and Now because x has been decremented This check that x is larger than zero actually means that we are checking whether a zero minus one is larger than zero and Then again, we can have the two possible outcomes true and false if it's false We are done, but if the outcome is true. We will again execute x minus minus and then Reach the loop condition another time where we are now checking whether a zero minus two Is larger than zero and then as you can hopefully see now this Goes on and on and on like this because down here We can have infinitely many more of these executions because we simply do not know what the initial value of a is so now that you've understood the problem the question is what can we do about it and The solution is typically to not try to reason about the entire execution tree and maybe to not even represent The entire execution tree in memory simply because this would exhaust all memory And you still wouldn't be able to fit an infinitely large tree into it But instead to build the tree step-by-step and to Euristically select which of the branches that we have not yet explored to explore next and now the The heuristic to choose which branch to explore next Can be can many different things so the most simple one is to just select one at random So you look at all the branches that we have not yet expanded further and just pick one of them at random and explore it Another option is to do this selection based on coverage So if you have some reason to believe that if you would Explore this branch more you would probably cover more code. Maybe because you've never Executed that branch then this is maybe a good branch to explore next Yet another option is to prioritize branches based on their distance to some interesting program location So sometimes you have some locations that you know would be interesting to reach for example assertions or maybe calls to particular APIs and in that case you can compute a distance for example in terms of how many branches You are still away from this program location and then prioritize those branches that are likely to bring your closer to your interesting location and Finally there's also the option to interleave symbolic execution with random testing. So maybe if you have Tried some of the above heuristics for a couple of Executions and somehow the program doesn't really make more progress or just takes too long because symbolic execution tends to take relatively long Then you take the existing inputs that you have and maybe give them to a fuzzer Which will then mutate them in a more random way as we've seen in the last lecture And then once you've done this and maybe covered some more code and some more parts of this execution tree You're then going back to symbolic execution and try to explore some more branches in a more systematic way So these are all different ways of addressing this problem of a large or maybe infinitely large execution tree and it None of these heuristics really solves the problem in a sense that they magically help you to Explode the entire tree, but they are all ways to address that problem in practice So going back to this overview of these five problems What we've just seen is basically different ways to address the first two problems the problem of loops and recursion and the path explosion Which basically all boil down to having a very large or maybe infinitely large execution tree Let's now have a look at this problem number three, which is about the environment and specifically we will here look into How a program can interact with a file system while it's still being executed through Yeah, or analyzed through symbolic execution So the key problem here is that a program may actually have some behavior that depends on some part of a more complex system That just cannot be analyzed By symbolic execution for example, this could be some native API So let's say you have a symbolic execution tool that reasons about JavaScript code But sometimes it's calling a native API that is implemented not in JavaScript But in say C++ and then compile to native code Then your tool that works for JavaScript just can't analyze the this native API And even more challenging problem is if your code is interacting with the network because then you're basically Leaving the current computer and your behavior depends on some other computers So you in principle would have to run symbolic execution across different computers Which in principle is feasible, but in practice is pretty hard Or you may have the problem that your program is accessing the file system is maybe reading something from the file system And then the behavior depends on what was read from the file system and this is also the problem that is illustrated down here with this simple example where we have some JavaScript code that uses this FS module to read from the file system So we are reading a particular file Stored somewhere in a file called Yeah food at txt here and then depending on the content of this file We are entering this branch or not and now if you just look at a symbolic execution idea that we've seen so far It's not clear how a symbolic Execution engine should actually reason about what is read from the file system. So one popular way of Handling this challenge is to model the environment in some way that is somewhat representative for the real environment in which a program is Executing but still allows a symbolic execution engine to reason about the code So one such approach is implemented in a tool called CLEE, which is one of the most popular symbolic Execution tools out there which models in this case the file system It does it in two possible ways if you're interacting with the file system and all arguments Of this API calls that you're doing are concrete. So you basically know what their values are Well, then you can just forward on this to the actual operating system So this is the first case mentioned here Where for example if you go back to the previous example if he would read from this File name and this file name is a concrete value because it's given as a string literal here Then it would just look up this file on the file system and actually return whatever this file is returning The second case is where some of the arguments that go into these file system APIs are symbolic so you do not really know for example what file is read and what CLEE does in this case is to model the call by basically Emulating a file system that looks a bit like a real file system and does something that a real file system could do But it's not really the actual file system But it's just an imaginary system of symbolic files Which CLEE makes sure to look like a real file system, but but it's not mapped down to the to the actual Disk that your computer has and now the goal here is to explore all possible legal legal interactions that The program could have with the file system by maybe making this file system act sometimes like this and sometimes like that So as a concrete example, let's say CLEE which is actually not implemented for JavaScript, but let's assume something like this would be implemented for JavaScript So in this case, let's say we again read a file using this FS dot read file sync API Then there would be some implementation of this method in for example JavaScript that doesn't really read from the actual file system but models the effects that Doing that would have for some symbolic file name And if you then read the same file again, it would return you the same content even thought I actually isn't Isn't such a file on the actual file system So now this gives you an idea of one way of addressing this challenge of a program that interacts with the environment But as we've seen this is only for the file system and it's somewhat limited and for example doesn't address the problem of interacting with some other native APIs and in the next video of this lecture, what we'll do is to look at An approach that addresses the last three problems here all at once Using a very simple idea which is to make symbolic execution and concrete execution So that whenever the symbolic execution doesn't really know how to continue We fall back on concrete execution, but then also go back to symbolic execution if we can We'll see how exactly this combination of concrete and symbolic execution works in the next video of this lecture what you have already learned in this video is What challenges actually exist when you try to apply this idea of symbolic execution to real Programs and you've seen some of the approaches that are used to address these challenges in particular this idea of not exploring the entire tree But of heuristically selecting which branches are most interesting to explore next. Thank you very much for listening and see you next time