 Hi, and welcome back to program analysis. We're still in this lecture on slicing, and this is video number three on this topic, where we will have a brief look at a technique called thin slicing, which is also a static approach to compute a slice, but that aims at a smaller slice than the static slicing that we've seen before, so that the slice is easier to understand for a human. So this idea of thin slicing starts from the observation that static slicing techniques, such as the one that we've seen in the previous video, often result in relatively large static slices. In the worst case, a static slice can actually contain the entire program, and in many cases, if you apply slicing to complex real programs, these slices are too large for typical debugging or program understanding tasks. So the key idea of thin slicing to get a smaller slice that is easier to consume for a human is that instead of aiming at an executable program, the thin slicing approach tries to reduce the program in a way that is good for a human, but not necessarily leads to an executable program. And the idea to do this is to heuristically focus on those statements that are most commonly needed for example, debugging, and to then let the user who looks at the slice increase the set of statements that are included on demand. So the user can essentially expand the slice when needed, but starts from a very, very small subset of the program and only adds those statements where it's needed in order to really understand what the program is doing. So here's a definition of how this thin slicing approach computes a slice. So it relies on this idea of directly using a memory location. So we're saying that a statement is directly using a particular memory location. If it uses that memory location for some computation that is not a pointer dereference. So for example, if you have an expression like this where we look up the field F of X and then add this to some other variable Y, then X is only used here for a pointer dereference which means this statement or this expression is not directly using X, but it is directly using Y because this is really used for the computation itself. And now based on this idea of direct uses, thin slicing computes a dependency graph similar to the previous approach that we've seen in the previous lecture, but now it only contains data dependencies. So no control flow dependencies. And it also does not look at all data flow dependencies, but actually only those that correspond to a direct use as defined above. And then given this dependency graph, the rest of the approach is more or less the same as for the original Mark Weiser approach that we've seen. So the thin slice eventually is computed as all the statements that are reachable from the criterion statement in this graph. So basically same idea as before, just that the graph is less dense than what we have seen in the classical static slicing approach. As an example of these ideas, let's have a look at this very small program here where we are creating a new object and store a reference to it in variable X. Then we also store this reference in this other variable Z, create another variable which we store in Y and then create another pointer to our object X called W. Then in line five, we are writing something into the field F of W and then have this conditional where we check whether W and Z are the same. And if they are then down here, we are writing Z.F into our variable V. And now let's assume that this last statement is actually our slicing criterion. And in order to compute the slice, the first thing we will need to do is to compute the dependency graph. So to write down this dependency graph, we start by just writing down the nodes which correspond to all seven statements that we have here. And I've of course thought before how to align those nodes so that the graph looks nice at the end, which happens to be like this. And now the next step is to introduce the edges into that graph. As we've seen before on the definition of the graph used for thin slicing, all dependencies that really matter here are the direct data dependencies. I'll also note down the other data dependencies, those that are actually ignored here and also the control flow dependencies. So just that you see what is not included in this dependency graph. So let's start with the direct data dependencies. So those that really matter for thin slicing. So here we have one that starts at line, sorry, at statement one where we are writing into X and goes to statement two because X is used here. We have another one from this definition of X to statement four because X is also used here. Then we have one for this definition of Z here, which is used down here to compute the conditional at statement six. So this corresponds to that edge. Then we have one from statement four where we are writing into W and to six because W is used also in this conditional. Next we have one for this definition of Y, which is used down here at statement five to determine what to write into the field F of W. And then finally, there's one that is a bit more tricky which goes from five to seven. So at five we are writing into this field F and we are using this field F here as the value to put into variable Y. So this is why we have this data dependency, but now if you look carefully, the base object is actually different or at least it's a different variable, but because we have this check here that W and Z correspond to the same object, we know that this actually is the same field F and therefore we add this edge from five to seven. Now, in addition to this direct data dependencies that I used for thin slicing, there are also some other data dependencies that are not used. I just put those here so that you see what is not used. So there are some that are just a data dependency, but it's a data dependency that is used only for a pointer dereference. And as by the definition of the graph used for thin slicing, those are ignored. So there's one for this pointer that is created here which points to the object that is also pointed to by X and that is then used here as the base object for this field lookup. So we have this data dependency from four to five and this would be included in the full program dependency graph, of course, but is excluded in the graph that is used for thin slicing. And then we have a similar case here where Z also gets a pointer to this object which is then used down here. And again, this is a data flow dependency, but it's one that is only used for dereferencing this object and therefore not included in the graph. And then the full program dependency graph that we have seen in the previous video would also include control flow dependencies. And in this example, we have one of them, but this is also ignored in the graph that is used for thin slicing. And this is from statement six to statement seven because statement six obviously determines whether statement seven is actually executed. So given this dependency graph, we can now compute the slice and just to show the advantage of thin slicing, let's at first have a look at what the traditional slicing approach by Mark Weiser that we have looked at in the previous lecture would actually do here. So if we compute this traditional static slice starting from the criterion at statement seven, we would basically look at node seven and then check which other nodes can reach this node. If you do this and basically go backwards in all these edges, what you see is then that all the statements are included in this traditional slice, which arguably is not very useful, of course for this very small program, it's probably okay, but for a larger program, this very often includes way more statements than you want to have. So now in contrast, what does thin slicing do instead? Well, it will only include those nodes that can reach node seven using the direct data dependency edges. And that basically means that it will only include node three, five and seven. So the thin slice for this program will consist of this node or statement and this statement and that statement. And that gives you some idea of why this variable V gets the value that it gets. Without really including all the other nodes that are included in the traditional slice. Now, if these three statements are not enough for the developer to really understand whatever the developer wants to understand, then thin slicing allows to demand the slice, sorry, to expand the slice on demand. So for example, if the developer wants to do this, let's say for the question, why W and Z are actually aliased? Because if you just look at the yellow statements, this is a question that you would probably ask. Then what the developer would possibly do here is to mark this one statement here as the criterion and then starting from this criterion would include more statements into the slice. So this is statement six. If we look at the graph and then follow all the direct data dependency edges backward, then you would also include this and also this and also this, which would then explain why W and Z are actually aliased. Now in this small example, this then happens to include all statements and you could say, well, it's just as bad as traditional slicing. But in practice, there are examples where for larger programs, this is still a much smaller slice and it's also done only on demand. So only when the developer really wants to expand the slice. So now it may happen that the thin slice that is compute this way is not enough for the developer to actually understand what the developer wants to understand. For example, when debugging a particular problem. And the reason is that these thin slices only include what you could call producer statements, but they do not really contain what you might call explainer statements. So producer statements are basically those that tell you why a particular value has been computed, whereas explainer statements also tell you why things happen. So for example, those statements may tell you why two accesses to the heap read and write the same object because they, even though they may have different variable names, or they may explain you why a particular producer statement can execute first of all. So what other statements has made this statement execute. The assumption of thin slicing is that most of these explainer statements are not useful for most tasks. So you should not include them at the very beginning, but instead the developer can expose these explainer statements on demand by incrementally expanding the thin slice based on what the developer really wants to see. As an example of this expansion, let's consider one question that a developer might ask when seeing just the three yellow statements. And this is the question of why these two variables, W and Z, are actually aliased. Because if you just look at the three yellow statements, this may not be very clear to you. And now what this incremental expansion of this slice allows you to do is to select some statement in the program as another criterion where you want to start expanding the slice from. And this naturally here would be this one where W and Z happened to be shown to be aliases. And now doing this means that we look at our dependency graph again and now go backward from this statement six, following the direct data dependency edges backwards. And this would mean that we also include four and also two into our slice. And because two is included, we would also include one. So by doing this expansion, we would then end up with the full set of seven statements here, which then also explain why W and Z are actually aliases. Now you might argue that, well, we're back where we started from because now the whole program is included in this expanded thin slice. And for this small program, this is of course correct, but in general for more complex programs, the thin slice, even if you expand a few times, will still be smaller than the traditional static slice that you get from the more traditional approach that we've seen in the previous video. So now to evaluate whether this idea of thin slicing can actually help developers find bugs quicker than the traditional static slicing that we've seen earlier, the authors of this thin slicing paper did an experiment where they've essentially simulated the effort a developer spends in order to find bug using the traditional or the thin slicing approach. To do this, what they did was to take a set of known bugs that all crashed the program and where you also know the root cause of this crash and this root cause maybe at a different location than where the program crashes. And finding this location of the root cause is the task that a developer is supposed to do. And now they assume that given a program or the slice of a program, the developer starts from the crash point because this is what you typically know when a program is crashing. And then does a first search through those notes of the program that are in the slice on the program dependency graph. And then what they did in this experiment is to count how many statements that the developer would inspect when he or she uses the full traditional static slice or the thin slice only. And the results that they get is that if you just use the thin slice, then on average, you have to inspect 12 statements before you reach the one that is actually responsible for the crash. And this is 3.3 times less than what you get if you would use the full traditional static slice because that just more statements included in the slice. And as a result, a developer would likely look at some statements that are actually not needed before reaching the one that is really the root cause of those bugs. All right. And this is already the end of this video number three, where we have looked at a variant of the static slicing approach that we've seen earlier, which is not focused on getting an executable program, but on having a small subset of the program that is useful for a developer when trying to understand, for example, the root cause of a bug or trying to understand the behavior of a program more generally. In the remaining fourth video of this lecture, we will look not at another static slicing approach, but at a dynamic slicing approach where the program is executed and where during the execution, some statements are relevant for the value of a variable at a particular location and the question is how to compute these relevant statements. Thank you very much for listening and see you next time.