 Hi, and welcome back to the program analysis course. So you're here in video number three of the lecture on data flow analysis. And what I want to do in this lecture is to provide more examples of data flow analyses that show how this general principle of a data flow analysis can be used to define how a static analysis is going to work. Here's an overview of the data flow analysis that you'll see in this course. So in the first video for this lecture, I had started by introducing data flow analysis by showing a concrete example, namely the available expressions analysis. And what we're going to do in this video is to look at three more examples of data flow analyses called reaching definitions, very busy expressions, and life variables. If this doesn't sound very clear at this point, don't worry, this is exactly what this lecture is going to show. And then in addition to what you see here in the lecture, there will be another kind of data flow analysis, namely one that is about attained analysis, which is what you'll implement in the course project. So everything you see here now in this lecture, hopefully, allows you to better understand what you're supposed to do in the course project, which is also about a data flow analysis. All right, so let's get started by looking at another kind of data flow analysis, and this will be a reaching definitions analysis. The goal of this analysis is what you see here. So for every point in the program, which again means before and after every statement, the analysis is going to compute which assignments may have been made and may not have been overwritten yet. So basically, that tells us what kinds of variables have been written to, and their value may still be what has been written to at a particular location, because they may not have yet been an overriding statement yet. So this can be useful in various kinds of program analysis, and in particular, this is useful to compute a so-called data flow graph, which tells us what kind of data may flow from one statement to another statement, because in this data flow graph, we want to have an edge that tells us, oh, what is written here may be used here, and this is exactly what a reaching definitions analysis is going to compute. Note the careful use of the words may in this statement of the goal of the analysis, because this is not about definite information that some assignment must reach use of a variable, but it's just about assignments that may have been made on some path that leads to a point in the program, and that may not have been overwritten. So it's not a guarantee that whatever definition reaches a particular program point is actually reaching this program point, but it's something that may happen. To make this more concrete, let's again have a look at an example. So in this example, we are writing into two variables, x and y, and then have this y loop that depends on the value of x, and then in the loop body, when we enter this loop body, it's going to write into y again, and then also into x. Now one of the reaching definitions that we have here in this piece of code is that the definition of x in the first line is going to reach the first statement inside the body of the y loop. And the reason is that there is a path where this assignment of x up here in the first line is still valid when we are reaching this statement down here, because there is no other assignment to x in between. Actually, if you look carefully at this piece of code, you'll see that every definition that is made in this piece of code. So all these four rights that have a box around them here, so the right into x, the right into y, and then the other two rights, also to y and x again in the body of the y loop, they all may reach this first statement of the body of the y loop, because there always is a path how to get from the assignment to this first statement in the body of the y loop without any other assignment to the same variable in between. In contrast, not every definition may reach the second statement of the y loop. So in particular, if you look at this definition up here in the second line where we are writing into y, then there's no way that this right can actually reach the second statement in the y loop, simply because the first statement in the y loop is always writing to y just before. So we know for sure that the definition in the second line is not going to make it beyond this assignment here, so it's in particular not a reaching definition for the second statement in the loop body. So now that you have an intuition of what the reaching definitions analysis is doing, let's define the analysis. And as we've seen in the second video of this lecture, you basically need to define six properties in order to define how a data flow analysis can compute, in this case, the reaching definition sets. The first of these six properties is the domain of the analysis, which tells us what things the analysis is propagating when it reasons about the behavior of this program. And in case of our reaching definitions analysis, the domain is the set of all definitions in the code, and the definition basically means an assignment, so every statement that assigns something into a particular variable. Formally, we will represent these definitions as pairs v comma s of variables and statements. So v is a particular variable, s is a particular statement, and having this pair means that at this statement, we have a definition to this particular variable. Then the second property we need to define is the direction of the analysis, and similar to the first example that we've seen earlier, this is a forward analysis, so we're propagating information in the direction of the control flow as it would happen when the code is executed. Property number three is the meet operator, which tells us what happens when two flows of control are merging, and in contrast to the analysis that we've seen earlier, we are here using the union operator, which means that if we have a branch and if we have a branching point and there are two branches where maybe here we have some set of reaching definitions and here we have some other set of reaching definitions, then once we are reaching the merging point, we take the union of these two sets of reaching definitions and then propagate that union further down. And the reason why we do this is because in the analysis, we care about definitions that may reach a program point, so we do not want to guarantee that they reach a program point, but just care about whether there is a possibility that they do it, and that's why we take the union of these two pieces of information. You could also define an analysis like this that takes the intersection and this would be another kind of analysis, but here we care about may information and therefore use the union. The next property that we need to define is the transfer function, which tells us how the analysis state, so the set of reaching definitions, is changing when reasoning about the execution of one particular statement. And similar to what we've seen earlier, we are defining the transfer function as a function that tells us what happens at the exit of a statement. So the RD here stands for reaching definitions. In terms of what we know about the analysis state at the entry of the statement, and in terms of two helper functions, kill and gen. So the overall equation looks very similar to what you've seen for the available expressions analysis, just that now this is about reaching definitions obviously the definitions of kill and gen are different here. So let's have a look at the definitions of these two helper functions gen and kill. So gen is basically creating new reaching definitions and it does so as follows. So if a statement S is an assignment and if it's assigning to a variable V, then gen of S will contain the pair V comma S, basically telling us that this statement is going to create a new reaching definitions that from now on we should propagate forward. And otherwise, so if statement S is not about an assignment, gen of S is simply DMT set because it does not generate anything new that the analysis should propagate. The kill function is defined as follows. So if S is an assignment, and again that's the only case that really matters for this analysis, and if it's assigning to a variable V, then kill of S will contain V comma S prime for all other statements S prime that also define the same variable V. So let's say we have an assignment to some variable called X somewhere and there's some other statements that also define this variable X, then kill of the first statement that defines S will include all the other statements by basically saying that, well these other assignments are not valid once we are reaching the statement that is now writing to X. And then again, if the statement is not an assignment, kill of S is simply DMT set because all other statements do not really matter for a reaching definitions analysis. The two remaining properties that we need to define in order to fully define this data flow analysis are the boundary condition and the initial values. So let's look at these two properties and let's start with the boundary condition. So because this is a forward analysis, the boundary condition is defining what happens at the entry node of our controlled row graph and intuitively what happens here is that all variables are still undefined. Now we do not yet have a notation to say that the variable is undefined because we always represent the definitions by the name of the variable and the name of the statement and to express the fact that the variable is still undefined, we're using a kind of special statement which is just called question mark that tells us that a particular variable is not yet defined. And now using this special statement, we can define the value of the, or the state of the analysis at the entry node as the set of pairs that contain every variable we have in our program combined with the special statement question mark by basically saying that all variables are still undefined. And then finally, we also need to say what will be the initial state at all the intermediate nodes and here we simply define it as the empty set by saying that all nodes before we know anything about them do not have any reaching definitions. So let me illustrate this idea of a reaching definitions analysis using the example that we've already seen before. So you just see the example code here again. And as usual, the first step for any kind of data flow analysis is to write down the control flow graph. If you want to practice your control flow graph creation skills a little bit, you may want to stop the video here and just try to draw it yourself. Otherwise you can also just watch me. So as usual, there's an entry node which is the beginning of this piece of code. And then in this case, we have these two assignments. One where we write five into X and then the other one where we write one into Y. And then we have this loop. So at the beginning of the loop, we're going to execute this conditional that checks whether X is greater than one. And it may be that this condition is wrong, which if you look at this code, of course, it won't be, but statically, we do not know about this. So we consider this edge where we go directly to the exit node. And then if we enter the loop, there will be these two more assignments, one that assigns the result of X times Y to Y. And then right after that, we have another one where we are writing X minus one into X. And then once we are done with that second statement in the loop body, we're going back to the conditional of the loop to decide whether we are going to iterate one more time in this loop. And now let's write down the results of the gen and kill functions because it's kind of the helper, these are the helper functions that we need to compute first in order to perform the actual analysis. So as for the first example that we've seen in the first video, let's do this for each of these statements and let's just again label each of these statements with a number. And then what we're going to do for each of these is to write down gen of S. And kill of S. So let's start with the first statement. This first statement is performing an assignment and that means that it's producing a definition, namely the one where we are defining variable X at statement one. So this is represented as this pair X comma one. The statement is also killing something because whenever you are defining a variable, you're killing all other assignments to this variable, no matter where they are in the program. This includes the assignment itself. So this is, it's killing itself, which technically doesn't really make a big difference here, but and also doesn't hurt and it just keeps the definition of kill a little bit shorter. So it's killing X as defined in statement one. It's also killing the definition of X at statement five. So X comma five. And it's also going to kill this definition of X at this artificial helper statement question mark that we have introduced earlier, which basically means that it's removing the information that X is undefined. Now for statement two, it looks pretty similarly. So if you want, you can just try it yourself before you watch me doing it. So gen of statement two will contain this pair Y comma two because Y is defined as statement two. And then we are killing all definitions of Y at other statements, which in this case means at statement two with statement four and at this artificial question mark statement, meaning that X is not undefined anymore. Statement three does not contain any assignment, which means that both the gen function and the kill function are empty, which basically means nothing changes when we are propagating information across that statement three. Statement four is an assignment. So it is generating something. Let me defect that at statement four, we are writing into variable Y. And because we do this, it's also killing all other assignments to variable Y, which is the same set that we've already seen above. We have basically every other place, including this artificial question mark statement where Y maybe written is included in the kill set. And then likewise for statement five, we are writing X here. So this is going to be in the gen set. And then the kill set will contain all definitions of X, including the one at line five, statement five. So next we're going to write down the data flow equations again, which basically describe how the state at the entry and at the exit of our five statements is going to be computed. So we do this for every statement and for each of them, we are writing down RD for reaching definitions, entry and RD exit. So let's get started with RD entry of statement one. And this is something we had defined in the boundary condition where we had said initially for every variable, we assume that it's written or defined at this artificial question mark statement, which basically means that these variables are not yet defined. So this is what we say here. And then the next entry sets, for example, for statement two are always defined in terms of the incoming statements. So if you look at the control flow graph for statement two, the one that is incoming is statement one. So this equals RD exit of statement one. Then let's look at the entry set for statement three, where we have to look at the incoming statements of statement three. Statement three has two incoming statements, namely statement two and statement five. And now we have emerging points. So we need to look at our meter operator, which in this case is the union. So we're saying that the result of RD entry of statement three is RD exit of statement two and the union with RD entry, sorry, RD exit of statement five. And then for statement four and five, there's just one incoming statement each time. So RD entry of four is equal to RD exit of three and RD entry of five is equal to RD exit of four. So now we have defined the entry sets. Let's now have a look at the exit sets. So RD exit of each of these statements starting with one. This is the result of applying the transfer function, which says that we should look at what flows into this particular statement. So in this case, RD entry of one. And we should then remove everything that is killed by this particular statement. So we need to look at the kill set of statement one, which we have computed here in this table. So it's all these assignments two variables X. So we need to write this down here. And then after having removed everything that is in the kill set, we also need to add something. And this is everything that is generated by this statement. So again, going back to our table, statement one is generating X comma one because it's assigning two variable X, which means we are adding to this set that we have right now here this pair of X comma one. RD exit of statement two looks very similar to what we've seen above, just that it's now all about the definitions of variable Y. So I've just written it down here and you can confirm yourself that this is actually correct. Then we need to look at RD exit of statement three, which is very simple. For the simple reason that statement three does not have any assignment and therefore the kill and gen sets are both empty, which means that RD exit of three is exactly the same as RD entry of three. And then we move on to RD exit of four. And again, this is kind of the same story as for RD exit of one and two because we take the entry set for that statement, remove everything that is in the kill set and then add everything again that is in the gen set. So fast forwarding a little bit, this is what we'll get. So for every RD exit of a statement, we now know how to compute it based on the RD entry, which is actually missing here, based on the RD entry of that statement and the kill set and the gen set that we have computed for that statement. Once we have defined all these equations, then the question is again, what are actually the concrete RD entry and RD exit sets? Because right now these equations just depend on each other. And again, in order to find the concrete solutions, we need to compute it by basically putting these equations into each other until nothing changes anymore. And I'm going to show you a subset of the result of this computation of the solutions and I'm leaving the rest as a little exercise for you to try to do yourself and then feel free to, for example, exchange the solution through the Ilyas forum so that you can see whether it's correct or not. So what we'll get at the end would be something like this where for every statement, we are going to compute the final RD entry and RD exit set. And just to give you parts of this solution, so for example, the RD entry set for S will be exactly what you see on the left where we say that at the entry of this very first statement one, exit Y are both undefined. So this is the entry set. And then at the exit of this statement, X will be defined. So X is now has now a reaching definition from the statement one itself whereas Y still has this artificial reaching definition from the undefined question mark statement. For statement two, it's relatively similarly. So I'm just omitting it here. And again, this is a little exercise for you to try at home. For statement three, let me just give you the entry set. So the entry set here will be X defined at one, Y defined at two, Y defined at five, and also X defined at five. And actually Y should be defined here at four. All right, then the exit set is state in three. I leave as an exercise, same for the entry and the exit of statement four. And for statement five, I'm just going to give you as a little hint, the exit set, which is saying that Y is a reaching definition from its assignment at statement four and X is here because of the assignment at statement five. All right, so now you've seen two examples of data flow analysis, namely available expressions and reaching definitions. And what we're going to do in the following is to look at a third example called the very busy expression analysis. The goal of this analysis is to compute for each program point a set of expressions that must be very busy. And very busy here means the following. It means that on all future program paths, so basically on all paths that may happen after this program point, the expression will be used before any of the variables in the expression are redefined. So what this essentially means is that there is a point in the program where you know that some expression will certainly be used afterwards and that the values used in this expression will not be changed in the meantime. And that's something that a compiler can use for program optimization. And in particular, compilers use this kind of property for an optimization called hoisting. So what hoisting means is that you're basically taking a statement that is, for example, inside a block and then you're moving it outside of the block, you're hoisting it out of the block, which essentially means that the code is going to pre-compute something before it's actually needed. But you can do this if you know that this expression is very busy, meaning that it will certainly be used. So this pre-computation will not be wasted. And sometimes it's more efficient to compute something a little bit earlier because then the following computation either does not have to wait for the value being computed or it maybe doesn't have to repeatedly compute that value because you do it once at the very beginning instead of doing it repeatedly again later. So let's have a look at a concrete example. And in this concrete example, we have an if-then-else statement where depending on some condition, we either write some values into X and Y or write some values into Y and X. And now the interesting bit in this example is that there are some very busy expressions here, namely these expressions B minus A and A minus B. And the point where they are very busy is right here where this arrow is pointing. So right before we are entering either the then branch or the else branch. Because already at this point, no matter which of the two branches we are going to enter, we know that these two expressions B minus A and A minus B will be used and that none of the values used in this expression, so neither A nor B, will be redefined in the meantime. Now let's define this analysis and as before we need to go through these six properties that define a data flow analysis. Property number one again is about the domain of the analysis. So the kind of things that the analysis is reasoning about. And in this case, the domain is similar to the first analysis that we've looked at the set of all non-trivial expressions that occur in our code. Now the direction of this analysis is different from the examples that we've seen before because now this is a backward analysis, meaning that we start at the exit note of our control flow graph and then propagate information backward, so against the usual flow of control. Property number three is about the meet operator, which is basically telling us what happens if two control flows are coming together and what we're going to use here is the intersection operator and the reason is that we care about very busy expressions that must be used, so we wanna be sure that they are guaranteed to be used at some later point and therefore we're computing the intersection. So intuitively if there is a branch somewhere and then in one branch an expression is used and in the other one it's not used, then when we propagate information backward, which we're doing in this analysis, then we only wanna consider this expression to be very busy if it's actually used in both of these branches. The next property we're defining is transfer function where again we're defining what happens at one side of the statement as a function of what happens at the other side of the statement, but now because we have a backward analysis, things are the other way around. So basically here we're defining what happens at the entry of a statement. So this VB stands for very busy and this is the entry as a function of what happens of the analysis state as we note at the exit of the statement and then minus whatever gets killed by the statement plus everything that gets generated by the statements. So it looks very similar to the transfer functions that we've seen in a forward analysis but the big difference is that entry and exit are basically reversed. So these helper functions gen and kill have basically the same intuition as in the previous analysis. So gen is returning things and our case expressions that are added when we propagate through this statement and kill is returning statements or sorry, expressions that are removed when we propagate information through statement S. So more precisely gen of S will contains all the expressions that appear in a statement S and the intuition here is that whenever an expression appears in a statement S that means that this expression is actually used in the statement and therefore when we go backward it may actually be one of these very busy expressions. And kill of S looks at assignments because the intuition here is that we want to kill any expressions that are not valid anymore because some of the variables that are used in the expressions in the expression is assigned to. So if our statement S is assigning to some variable X then kill of S will return or will include all expressions in which this variable X occurs. And otherwise, so if this statement is not an assignment then kill of S is simply the empty set. And finally we again need to look at the boundary conditions and the initial values of intermediate nodes. So the boundary condition here is for the final node because this is a backward analysis. So the analysis starts at the exit node of the control flow graph. So we need to define the VB exit set at this exit node of the final node. And here we define this as the empty set because at the exit node there's no expression that we know to be used after that. So we start with the empty set. And for the intermediate nodes we also don't know anything about expressions that will be used after them at the very beginning. So we also start by basically having the empty set of no very busy expressions at all as the initial state of all intermediate nodes. So let's illustrate this with our code example again. And as usual, let's start by drawing the control flow graph. And as usual, if you want to exercise this part a little bit, feel free to just stop the video here and then try yourself. So as usual, we are starting with an entry node and as usual, somewhere we'll have an exit node which is for the end of the piece of code that we have here. In this example, because we start with an if, the first thing that is going to be executed is this conditional of the if. So this check whether a is greater than b. And then we have these two branches which are represented by two outgoing edges where in one of them, we're going to assign b minus a to x. And then after we've done this, we have another assignment where we are writing a minus b into y. And once we are done with this, we are done with the whole code and go to the exit node. And in the other branch, the else branch, we also have two assignments but they look a little differently. So here we are going to assign b minus a to y. And after that, we'll write a minus b into x. And again, once we're done with this, we are also going to reach the exit node. So given this control for graph, the next step again is to compute the gen and kill sets that we'll use as helpers in order to compute the actual transfer functions. So we're going to do this for each of our statements here. Well, for each of them, we are computing gen of s and kill of s. Let me just label these statements. And then for each of these, we want to compute these two sets now. So to do this, let's start with statement one. So statement one is not using any expression and therefore it's also not generating anything. And because it's also not assigning to any value, its kill set is also empty. Statement two is actually using an expression. So it may generate or it does generate this busy expression b minus a. And because it's not assigning to any variable that is used in any of our non-trivial expressions, the kill set is also empty. So the kill set could contain x and if we would do this in a more complete way here, we would also write down x, but actually x doesn't matter because x is not a variable that is used in any of the non-trivial expressions here. The non-trivial expressions here are a greater than b, b minus a and a minus b. Okay, then statement number three is also using an expression, namely a minus b. So this is something that will be in its gen set. And again, it's not killing anything because it's neither writing to a nor to b. And then for three and four, things look very similarly. So statement, sorry for four and five. So statement four is using b minus a and again is not killing anything. And statement five is using a minus b and again is not killing anything. So now the next step would be to write down the data flow equations again because we've done this already two times now and because you can probably do it yourself from the formulas that I've given you on the previous slides. I'm not going to do this again, but what I'm going to show you here is the solution of these data flow equations. Where basically for every statement, we're going to have this set vb entry of s and the other set vb exit of s. And we want to do this for all our five statements again. In principle, once you've written down the data flow equations, it doesn't really matter in which order you are computing the concrete values for vb entry and vb exit. But because this is a data flow analysis that is computed backwards, I'm going to sort of go backwards through this code. So by doing this, let's start with the exit sets of statements three and four, which are these two statements that are yeah, at the end of the respective branches. So just before the exit block and because we start at the exit block with an empty set of very busy expressions. This basically means that the exit sets of statements three and four will also be empty because we're just propagating the empty set up in our control flow graph. Then looking at the gen and tilt sets of three and four, we can see that they both will add the expression a minus b to the set of very busy expressions. So here we will then have that set and the same for statement three. Then next we can look at statements two and four, where from the entry set of statement five, we do know what the exit set of statement four will be by basically just copying it over. So this will be, oops, sorry, a minus b again. And then we can look at what statement four itself is doing and it's essentially adding another expression to what is coming in here. So we have a minus b and then because it's also using b minus a because that is an expression that is in its gen set, we are adding this here into the set VB entry of statement four. And very similarly, we do the same for statement two, where we look at what is the analysis state at the entry of statement three. This is this one expression a minus b, which we propagate backwards into the exit set of statement two. And then statement two generates something on top of that. So now we will have not just a minus b, but also b minus a. And statement one is kind of interesting because here we have a point where control flow, if you look at it backwards is merging. So actually this is a branch, but if you look at it backwards, the flow is merging here. So we need to look at our meat operator, which says that we should compute the intersection of the two sets that are coming in here. The two sets come from statements two and four. And both of them contain the same set, which is the set of a minus b and b minus a. So the intersection of these two sets still contains these two expressions. And then because statement one does not generate or kill anything, what we'll have here at the entry of statement one is the same set of a minus b and b minus a. Cool. So now you've seen three examples already of data flow analysis. And in the remainder of this video, I wanna show another one, which is called live variables analysis. So what's the idea of live variables analysis? It is that we want to compute for each statement the set of variables that may be live, where live means that a variable is used before being redefined. So essentially what we wanna know here is, can a variable that is around at this statement be actually be used later on at some point that happens after the specific statement we add? And this is a very interesting analysis, for example, to identify dead code. So dead code is code that is useless and could be removed from a program without really changing the semantics of the code. And this is useful for two concrete applications. One is bug detection. So if you know that there's an assignment somewhere, but there's no way that the variable that is assigned to is ever going to be used anywhere, then probably the code is not in the state that developer intended. For example, because you've maybe written to the wrong variable, or maybe you're using later on a wrong variable that you actually wanted to not use. And even if there's not a bug, then it's also interesting for optimization because if you know for sure that there is some assignment that is not used anywhere later on, then you can remove this dead code and speed up the program by doing this. And in order to do this, you first need to compute this set of live variables and this is exactly what a live variables analysis is doing. So let's illustrate this idea using a concrete example again. And in this example, we have a couple of variables X, Y, and also Z. And then we have an if somewhere where we are going to define some of them depending on some condition. So if you start with the first statement, then here we can say that X is actually not live after this first statement, which means that there's no path that the program can take after that statement where this assignment of X really matters because the only thing that happens next is another assignment to X. And there's no use of X in between because the only statement in between is not using any other variable or at least it's not reading any other variable and in particular it's not reading X. In contrast, at the end of this statement here, the third one, both variables X and Y are actually live, which means that both the value of X and the value of Y that we have defined last at the end of that statement will be used or may be used in some path that comes after that statement. And in particular here, they are used at this in the conditional of this if because we are reading X and Y in order to decide which of the two branches to take. So in terms of what a compiler could maybe do or what a programmer should maybe do with this code, we now know that this first statement here which assigns two to X should or could easily be removed because it's not really needed, it's actually dead code. So now that you've understood the intuition of the analysis, let's again define it. And again, we do this by defining these six properties that make up data flow analysis. So first we need to define the domain and here the domain of the analysis is the set of all variables that occur somewhere in our code because each of these variables could potentially be a live variable. The analysis is implemented as a backward analysis because intuitively we look at where variables are used and then we propagate these backwards to figure out at what point the variable is live. The meter operator here will be the union operator and the reason is that we care about whether a variable may be used. So if you have two branches and on one side it may be used on one side it is used on the other one it is not used and if you propagate this backwards then we wanna keep this information and propagate backwards the fact that this variable may be used in at least one of the two branches and therefore the meter operator here is union. Next we're designing the transfer function which again looks very similar to what we've seen earlier. So we define one thing in terms of what happens on the other side of the statement minus the kill set of the statement plus the gen set of the statement. And again because this is a backward analysis note that what we're computing here is the state at the entry of a particular statement given the state at the exit of that statement. So we assume we know what happens or what the state is after the statement and then propagate backward through the statement to compute what the state will be at the entry of that statement. So the sets that are given here as LV exit and that are also then computed as LV entry are subsets of the domain of the analysis. So they will be sets of variables that are live at a given statement. This definition again relies on the two helper functions gen and kill. And in this case gen returns all the variables V that are used in the given statement S because whenever a variable is used then looking backward it may be live somewhere. And kill of S looks at assignments. So if S is actually an assignment and this is an assignment to a variable X then this will kill the variable X because we know that after an assignment to X or actually before an assignment to X the uses that come afterwards do not really matter anymore. And if the statement S is not an assignment then kill of S is going to return an empty set. Finally we need to define the boundary condition and the initial values. In this case because this is a backward analysis defining the boundary condition means that we need to define what the live variables are at the exit of the final node. And here what we say is that this is the empty set because we do not know anything about variables that are going to be used after the final node. So we can only say that to the best of our knowledge there are no live variables at this point. And the same is true for all intermediate nodes. So we initially assume that all intermediate nodes have no live variables until of course we propagate information through them after which they of course may have some live variables. So now let's put all these definitions in use and let's actually apply them to the example that we've seen before by computing the live variables for the statements that we see here in the code. And I would like you to actually do this yourself. So ideally please stop the video at this point and then try to use the machinery that you've now seen in this lecture so far to compute the live variables before and after every statement that you see here in the code. So without going into all the details how to compute it because we've done this now a couple of times let me just show you the solution that you should have found. So for each of the statements and I'm just numbering them in the order in which the lines appear here in the code. So we have one, two and three for the first three statements before the if then we have number four which is the conditional of the if, number five which is the assignment that happens in the then branch and then six and seven which are the two assignments that happen in the else branch. And now for each of those you should have computed LV entry of the statement and LV exit of the statement. And what you should have found here is the following. So let me start from the end because this is a backwards analysis. So you should have found that at the exit there's no live variable that's by definition then at the entry of statement seven we have Z as a live variable because statement seven is actually making use of this variable. In statement six we can then propagate this backwards by having the same set with Z as a variable as the exit of exit set of statement six. And then because statement six is writing to variable Z it's killing this variable but at the same time it's using another one. And so we will add this other variable namely Y to our live variable set here. The things look similar on the then branch where at the exit of statement five we have the empty set but then statement five is using variable Y. So we have this here. And then if you propagate things backward we reach statement four where we take the union of what we have in these two branches which gives us the union of the set of Y and the set of Y which is the set of Y. And then in statement four another variable is used on top of Y and this is variable X. So here we have X and Y and Z. And then from statement four and we can propagate this further backward where we have X and Y here at the exit of statement three because statement three is writing to X it's killing X so only Y remains here. And this propagates to the exit of statement two but statement two is assigning to Y. So this is actually also killed. So we have only the empty set left here and then this is what is propagated further back until we reach the entry node. All right, so this brings me to the end of this video. You've now seen a lot more examples of data flow analysis and I hope this overall idea of the data flow analysis framework has become a bit more clear. What we'll do in the remaining videos is to look a little bit more into how these data flow equations that you've now seen a couple of times are actually solved and then how to extend this whole idea to for example, not only looking at the code in a function but also into multiple functions. Thank you very much for listening and see you next time.