 Hi, hello, and welcome back to the program analysis course. This is the lecture on program slicing, and we're in the second video where we look at particular kind of slicing approaches, namely static slicing approaches. So what these approaches basically do is to look at the program source code and then try to compute a slice without really executing that code. This general idea of slicing has been introduced by Mark Weiser in 1984, and he also proposed a static technique to compute a program slice. In general, there are of course many different algorithms that you can use to compute a slice, and many of them are static. So here we focus on the one that Mark Weiser has introduced in 1984, which essentially reduces this problem of computing a slice to the problem of graph reachability on a program dependence graph. So what the approach does is to compute a graph that models the source code of the program, and then on this graph asks which nodes are reachable from which other nodes, and by doing this gives an answer to the slicing problem. So let's start by looking at what this program dependency graph actually is. So a program dependency graph is a directed graph, so basically the edges have a direction that represents the data and control flow dependencies between statements of a program. So the nodes in this graph either represent statements or predicate expressions, so basically the boolean predicates that determine whether we're taking this or that branch if there's some kind of branching construct in the program. And then the edges in this graph can represent two things. One is that can represent data flow dependencies where basically a value that is written somewhere is read somewhere else, and here we will have one edge for each such relationship, which is called the definition use pair, and we will define what exactly that is on the next slides. Furthermore, we also have control flow dependencies modeled as edges, which we'll also look at in a few slides, but basically the idea is that we can say that a particular node is only reached because of a particular other node, and this relationship is expressed as a control for dependency through an edge in the program dependency graph. So let's at first look in more detail into these data flow dependencies. And to do this, we first need to define what variable definitions and variable users actually are. So given a program that you can represent as a control flow graph where the nodes are basic blocks, which is something we've explained, I think in the introduction lecture very early on in this course, a variable definition for a particular variable v is a basic block that assigns something to this variable v. So v can not just be a variable in a classical sense, but it can be any location that holds some value. So in particular, it can be a local or a global variable, but it could also be the parameter of a function or a property of a field and an object oriented language. And any basic blocks, basically any consecutive list of statements in a program that is always executed together. So any basic block that writes something into this variable that assigns something to this variable v is called a variable definition. And then the counterpart of these variable definitions are variable uses. So for a variable v, this is a basic block that reads the value of v. And this can be different things that basically reads the value written in a variable. For example, any condition when this variable is used or in some computation where some complex expression makes use of this variable v, or for example, to write something to the console. Any kind of use that reads the variable is called a variable use. So now that we have labeled some nodes in our control flow graph as variable definitions and some others as variable uses, we can define definition clear paths. So these are basically paths through the control flow graph where you have a definition somewhere and then use later on. And you know for sure that this use uses the value that is defined at this variable definition. So bit more formally, such a definition clear path for some variable v is a path that is basically a sequence of nodes in our control flow graph and one to nk, where n1 is a variable definition for our variable v and nk is a variable use of this variable v. And now in between these two nodes, there's no other node ni that is also a variable definition for v. So basically we know for sure that the variable use at nk is actually making use of the value that is defined at n1 simply because there's no other definition in between. Note that this last node nk in the path may also be a variable definition. So it may first use the value that was written and then also define that variable again that doesn't contradict the definition. But what is important here is that this reassignment to v occurs after it's being used. So this is kind of a corner case and we'll see examples where this matters. Also please note that this definition of a depth clear path is different in the sense yes, different from the earlier definition of path that we've seen in the sense that it does not go from the entry of the method to the exit, but instead just goes from a variable definition to a variable use. So finally, after having defined what this definition clear path is, we can define what a definition use pair is. So a definition use pair or very often also called du pair for a particular variable v is a pair of nodes d and u such that there is a definition clear path that starts at d and then may have some nodes in between and then ends in u in the control graph. So basically each of these pairs means that at node d we are defining a value and at node u this value is then used. Let's now illustrate this idea of data flow dependencies using a concrete example. So you see here the same code that you've already seen early on where we have this read of some value that we store in N and where we then compute the sum and the product based on this value N. And the first thing I'm going to do is to label each of the statements that we have in the code here. So each of them just gets a number. And as you can see, I'm labeling individual statements because we're going to compute the data flow dependencies for individual statements. We could also do it on the level of basic blocks where it's essentially the same idea just that now we use a control flow graph that has nodes that represent individual statements and the other option would be to have a control flow graph where the nodes represent basic blocks. So now what we want to do is basically write down all the pairs of statements that define something that is then used somewhere else. So I'm going to have all the potential def nodes here and all the potential use nodes there. And then of course we should have all the numbers here and the same here. And now the question is which node define something that is right somewhere else. So for example this variable N here which is defined at statement number one is going to be used here. And this is why we'll have a def use relationship between statement one and statement five. So I just put across here to indicate that. Now we can do the same for statement two which is defining this variable I which is used here and here and also here and also here. And because of this I'm putting crosses here because statement two defines something that is used at statement five and also at statement six and statement seven and statement eight. Now moving down in our code we can do the same for statement three where we look at this definition here and where it's used and then what you'll see is that what is defined at statement three may be used here at statement six and here at statement nine. So let me just put the corresponding crosses. More or less the same idea also for the variable broad broad which is defined at statement four and maybe used at statement seven and also at statement 10. Next we're looking at line five which is computing some value by combining I and N in this Boolean expression but this is not written in any variable so there's no assignment here no definitions or nothing to do for line five. It's different for line six which is assigning this to this variable sum so we have a definition here which may be used here and which may also be used here when this loop is taking another iteration. So in one iteration this variable sum is written and then in the next iteration the same variable is read and therefore we'll have a diffuse pair between six and six and also between six and nine. For line seven it's more or less the same story just that numbers move by one so whatever is written at line seven may be used here again in the next iteration or down here when the loop has exited so there's a diffuse pair then between seven and seven and also between seven and ten. Next let's move on to line eight where we are defining this variable I which may then be used in the next iteration here for the conditional check and also here to compute the next value for sum and the same here to compute the next value of broad and then also here to compute the next value of I itself. So there are four diffuse pairs one for eight and five, one for eight and six, one for eight and seven and another one for eight and eight. And as this example illustrates diffuse pairs of course can also go backward in a sense in the source code because of loops and also other constructs where you're writing something at one line which is then used at a line that maybe above the line where the variable was written. And then finally we can look at the remaining lines so at statement nine there is just a use of sum but nothing is assigned to so we do not have to make any entries here and the same for line ten so this table basically contains all the data flow dependencies all the diffuse pairs that exist in this little program. So now that you've seen what data flow dependencies are we can next have a look at control flow dependencies in order to define what these control flow dependencies are we will first define a little helper concept and this is this idea of a post-dominator. So given a control flow graph we say that a node N2 post-dominates another node N1 if every path in our control flow graph that starts at N1 and ends at the exit node of the control flow graph also contains N2 so intuitively basically means in order to get out of N1 we have to go through N2 and if this other node N2 is different from the first node N1 then we say that N2 strictly post-dominates node N2. So let's again illustrate this idea using the same example that we've seen before so as before you see the code here and as before I've given every statement some label and now the first thing we will do is to compute or write down all the strict post-dominators. So I'm essentially going to fill a table that again indicates for two nodes whether one is a post-dominator of the other one so here I will have the nodes N1 and N2 so now the definition of post-dominators talks about paths through the control flow graphs so in order to fully see why one node is a post-dominator of another you of course want to look at this graph because we've drawn a lot of control flow graphs already in this lecture I leave this as a little exercise to you to actually draw the graph and instead just show you the post-dominators here so for example for node 1 so the first statement every path that starts at node 1 and goes to the exit of the control flow graph needs to go through node 2 and 3 and 4 and 5 it does not have to go through node 6, 7 and 8 because it's not certain that we will enter this Y loop but it will definitely go through node 9 and 10 so that basically means that that N2 is a strict post-dominator or node 2 is a strict post-dominator of node 1 the same for node 3 and 4 and 5 and then also again for 9 and 10 for nodes 2, 3 and 4 it's very similar just that of course the nodes above it are not its post-dominators so for 2 we have 3 and 4 and 5 because they are definitely executed after 2 before reaching the exit and also 9 and 10 for node 3 it'll look like this and for node 4 we know that definitely 5 is executed afterwards we do not know anything about 6, 7 and 8 but 9 and 10 again will definitely be executed afterwards for node 5 we do not know whether the execution of your what the whether the check whether we go into the Y loop is followed by the Y loop bodies so 5 only has post-dominators 9 and 10 and now let's have a look at the nodes inside this loop so if we have reached node 6 then we know for sure that 7 and 8 will also be executed because we have entered the body of the loop and there's nothing like a break or so in between so for 6 we have 7 as a strict post-dominator and also 8 we also know for sure that the program will get back to checking the loop condition to figure out whether it'll have to go through another iteration of this loop and that means that 5 is also a strict post-dominator of 6 and then of course at the end we'll definitely execute 9 and 10 so they are also strict post-dominators now for statement 7 and 8 it looks very similar so for 7 we also know for sure that we will get back to the loop condition we know that after 7 8 will be executed and we know that eventually 9 and 10 will be executed and for node 8 we also know that we'll go back to this loop condition we know for sure that 9 and 10 are executed eventually and that is it for node 8 and now finally let's have a look at the last two statements 9 and 10 where for node 9 we know that 10 follows afterwards so there's one post-dominator and for node 10 there's actually no post-dominator because that's the last statement in the program so now with this concept of post-dominators in our hands we can formally define what it means that we have a control dependence informally given two nodes in our control flow graph and one and and two a control dependence means that and two is executed depending on a decision made at n1 so for example if you have an if somewhere and then something that may execute if this if evaluates to true then this means that the statement that may get executed and two is control dependent on the conditional n1 a bit more formally we say that a node n2 is control dependent on a node on a different node n1 if two conditions are true the first is that there is a control flow path p that starts at n1 and ends at n2 where n2 post-dominates every node in this path excluding the first node n2 and the second condition is that n2 itself does not post dominate n1 which basically means that it's not sure that n2 gets executed after n1 but instead n1 determines whether n2 is executed or not so let's illustrate this again using our example where now as a second step we are going to write down the control dependencies so there are three of them in this example and I'll first just tell them what they are and then we can have a look at the definition again to see why they actually are because of the definition so one of them is that node six is control dependent on node five and the reason simply is that six is executed depending on a decision that is made in five and now the same is of course also true for seven and eight so seven is also control dependent on five and the same for node eight now let's check at least for one of these three that this also matches our definition let's take the one in the middle so seven is control dependent on five so n1 is five and n2 corresponds to node seven and now let's look at the two conditions here so there is a control flow path from node five to node seven yes where n2 post dominates any node in this path excluding n1 itself which basically means that seven post dominates everything between five and seven except for apart from seven so that practically here means that seven post dominates six and now looking at the table above we see that this is actually the case because of this little cross that we've made here and now let's have a look at the second condition and two does not post dominate n1 meaning that seven does not post dominate n1 and does not post dominate five and this is true because we do not have a cross here simply because it's not for sure that seven gets executed just because five is executed good so now you know exactly what data flow dependencies and control for dependencies is and the reason why we talk about this is because you wanted to represent all of this in a graph the program dependence graph where as i said earlier the nodes correspond to statements and conditionals and the edges correspond to either data flow dependencies or control for dependencies and now given such a graph and a particular slicing criterion for a program that the graph is describing we cannot use this graph to compute the slice so let's say that the slicing criterion consists of a node in this graph n and some set of variables v which happens to be the set of all variables defined or used at this statement n then to compute the slice for this slicing criterion we basically have to look at all the statements on which this graph n depends and another way to express this is to say that we have to look at all statements from which n is reachable and that essentially means that we are phrasing this problem of computing the slice as a graph reachability problem which is a very well known problem with many algorithms to compute a solution so let's illustrate this idea one more time using our example from the beginning so what i'm going to do now is to draw this program dependence graph and then based on it compute the slice for a particular slicing criterion so to draw the graph we basically have to put one node for every statement here and to make the drawing a little easier it's a good idea to think a little bit before you draw it how you would lay out the graph so in this example i will just put all the nodes one two three four five and also nine and ten up here and then six and seven and eight down here and then i wanted then what i want to do next is to add these two kinds of edges into our graph that we have namely the data flow dependencies and the control flow dependencies so let's start with the data flow dependencies and now we basically copy the matrix that i had earlier in this example into this graph by for example saying that n is defined at node one and is then used at node five which means that we'll have an edge that goes from one to five two is defining i which is used at five six seven and eight so we'll have one edge like this and then three more like that for node two we have a use at six and a use at nine sorry for node three so that's one of them and that's the other one and similarly for node four we have a use at node seven and another use at node 10 five doesn't define anything and then for six seven and eight we have always these yeah edges to the same node again because in the next iteration the value that is defined may be used again but we also have more dependencies here so whatever is written at line six will then be used at line nine so we have an edge like this and what is written at line seven is used at line 10 so another edge like that and then we also have this definition of i at line eight which of course is used at eight again so we already have that but it's also going to be used at five six and seven so there's one edge like this one like that and another one like this and now in addition to the data flow dependencies we also have to control flow dependencies which is a second kind of edge in our pro-con dependence graph so let me just put those into the graph as well and as we have seen earlier when we thought about control flow dependencies in this example there are three of them here one from node five to node six another one from five to seven and finally one from five to eight and now finally we can have a look at how to compute a slice based on this control flow graph by just looking at the reachability of the graph so let's for example say that we care about the slice of statement nine and the set of variables that includes this variable sum then this contains basically every node n for which there is a way from n to node nine so basically everything that can reach our node nine and now just looking at the graph we can figure out which nodes those are namely node one two three five and six and eight and nine and this magically happens to correspond to the slice that I've also shown you earlier because those are the statements that are really relevant for computing the value of sum at our statement nine all right so now to double check that you've actually understood this I give you a second small program which you see here on the slide for which the task for you is to compute these data dependencies and control dependencies and then draw the program dependency graph and then based on that graph compute the slice of statement five so the last line and the variable z which will be some subset of this full program and then to double check that you've done the right thing you may actually want to post your solution in elias and give the sum of the number of nodes the number of edges and the number of statements in the slice as a kind of checksum so that other people can compare that with their own solution all right and this is already the end of this second video in this lecture on slicing so you now hopefully know what static slicing is and in particular you know about this algorithm by mark wiser that uses graph reachability as a way to compute the slice of a program thank you very much for listening and see you next time