 Welcome to part 2 of the lecture on machine independent optimizations, today we will continue our discussion on data flow analysis. So, we covered illustrations of code optimizations in the last part, to do a bit of recap a data flow value for a program point represents an abstraction of the set of all possible program states that can be observed for that point. So, the issue is when the program is executing at every point in the program there may be several possible program states and when we do a data flow analysis we are statically examining the program. So, we compute an abstraction of this set. So, that is an approximation of this set in other words it is possible that this approximation includes certain program states which can never be reached at that point in an actual execution, but because of our static analysis this extra information will be included in the abstraction. The second important point is the set of all that the set of all possible data flow values is the domain of for the application under consideration. So, for each one of the applications this domain is going to be different for the reaching definitions for example, it would be sets of all subsets of definitions and for the available expressions it would be the sets of all subsets of expressions for the live variables problem it would be the sets of all sets of all subsets of variables etcetera. We also have two quantities in and out. So, these are the data flow values before and after each statement yes and as I mentioned we are going to be interested in a larger grain size that is a basic block rather than just a single statement. The data flow problem is to find a solution to a set of constraints on in and as sets. So, what kind of constraints can we provide? We provide two types of constraints the constraints based on the semantics of statements are called transfer functions and the constraints based on the flow of control are also going to be provided that is if there is a join of different paths in a program then what do we do about the computation of in and out that is going to be the question to be answered. So, a data flow schema consists of five components one is the control flow graph itself then a direction of data flow forward or backward has to be indicated a set of data flow values is to be provided a confluence operator usually set union or set intersection should be given and the transfer functions for each basic block as I said instead of statement we are going to provide the transfer functions for each basic block will also be provided. When we compute the estimates of data flow values that is the approximations or abstractions of data flow values we want to make sure that these estimates are safe. So, again safety is with respect to a particular application. So, in our case we want to perform several optimizations. So, safety is with respect to these optimizations. So, an estimate is safe or conservative if it never leads to a change in what the program computes after the change is carried out in the program by a transformation. So, in other words if we perform an optimization then the optimized program should provide the same output as the unoptimized version. So, that is precisely what we mean by safety. So, in doing the optimization we would have used a lot of data flow information. So, and we would have made decisions on transformations using that data flow information. So, this type of a decision is said to be safe if there is no change in what the program computes. So, this will become clear as we go on and take up more examples. These safe values may be either subsets or supersets of actual values based on the application. So, to get into the problem of reaching definition analysis. So, we need to define two quantities one is what we mean by generation of a definition and what we mean by killing a definition. So, we kill a definition of a variable a if between two points along the path there is an assignment to a. So, let us take up this example. So, here we have a equal to b plus c and later in the program in the same you know program we have another definition a equal to k minus m. Obviously, a equal to k minus m and a equal to b plus c are both definitions of a and you know it is possible suppose these two are in the same basic block. So, in such a case it is never possible for a equal to b plus c that is the value of a computed by this particular definition to be available after the definition a equal to k minus m. This kind of overtakes the previous definition. So, in such a case we say that this definition kills this particular definition. Now, what is the definition of reach ability what is a reaching definition. A definition d reaches a point p if there is a path from the point immediately following d to p such that d is not killed along that particular path. So, if there are there may be many paths from d to p, but we are only considering existence of just one path. So, if there is more than one path it does not add to reach ability, but if there is no path then the definition is not reachable. So, from d we must be able to go to point p by a particular path that is all it really means. So, if there is a path then the definition reaches the point p. So, remember again more than one path for the same definition does not add to reach ability, but if there is no path from the definition to point p then the definition is not reachable. We must also distinguish between unambiguous and ambiguous definitions of a variable a equal to b plus c let all the three a b and c three entities b names. So, in such a case this is called as an unambiguous definition, because we are explicitly defining the variable a. Later in the point of in the in the program at some point we have a definition such as star p equal to d this is called ambiguous definition of a. The reason being p may point to many variables it may point to p, it may point to b, it may point to x. Therefore, we cannot you know concretely say that p is defining a particular variable of course, if we already know that p points only to x then we can say p this statement defines a is a definition for x, but in general during static analysis we can only determine that p points to a couple of variables we can hardly say p points to exactly a particular variable it. So, if this is so this is called ambiguous definition and because it does not definitely define a particular variable we cannot say that an ambiguous definition kills the previous definition. So, this does not kill anything ambiguous definitions do not kill any definitions, but this is unambiguous definition again a equal to k minus m. So, this kills this unambiguous definition again this ambiguous definition you know may not kill this definition ambiguous definition, because p may never point to a. So, in that case this is not a definition of a at all. So, we do not want to kill such definitions either, then in the reaching definitions problem we as compute supersets of definitions as conservative or a safe values. So, as I said you know we want to compute safe estimates. So, in this case a safe estimate would be a superset what does it mean to compute a superset. So, it says it is safe to assume that a definition reaches a point even if it does not. So, this is what happens if we include more definitions into the set of reaching definitions than the definitions that actually reach. So, there may be some definitions which do not, but it is ok to include them this is our definition of you know safety here. So, for example, the let us say we have if a equal to b then a equal to 2 else if a equal to b then a equal to 4. So, let us analyze what happens during execution. During execution we compare if a equal to b then you know if it was true then we would have taken a equal to 2 and exited suppose a equal to b was false we would have taken the else part and we are comparing a equal to b again. That means if it was false here it has to be false here as well because we have not assigned anything to either a or b. So, this statement will never be executed. So, at this point during actual execution a equal to 4 will never this definition of a will never reach this point during actual execution, but because we know that you know by this execution path we will never take this particular branch, but static analysis does not know that. So, in static analysis we are going to assume that both a equal to 2 and a equal to 4 will reach the point after this semicolon. So, this is the conservative estimate. So, we know that during actual execution a equal to 4 will never reach this point, but because of our static analysis we have no idea that a and b we do not take the value of a equal to b here and then use it here as well we do not do that in static analysis. So, we will have to assume that both a equal to 2 and a equal to 4 will reach this particular point after the semicolon. So, this is an example of you know a super set that is being computed. So, here I present the data flow equations or constraints as they are called and I am going to explain these equations in great detail in the coming slides. The first constraint or the equation says in of a basic block that is the set of definitions reaching the entry point of a basic block that is what in b is will be a union of the out sets of all its predecessor blocks. So, p is a predecessor. So, this is what this says I will explain this very soon out b the set of definitions reaching the exit point of a basic block that is what out b is will be gen b the set of definitions generated by the basic block union in b minus kill b. So, in b is what is obtained from the top kill b is the definitions set of definitions killed by the basic block. So, remove from in b the set kill b and that will also be coming out of the basic block and in b is initialized to phi for all basic blocks. So, these are the constraints these are the equations. So, in these equations gen and kill are constraints. They will be computed only once and then they will not be modified again and let me go ahead with the definition of gen and kill before I explain these equations. So, if some definitions reach b 1 then in of b 1 is initialized to that particular set instead of phi. This is a forward flow problem the nature of the flow is obtained using the equation for out. So, if the equation for out is in terms of in then this is called as a forward flow problem. If the equation what turned out to be an equation for in in terms of out then it would become a backward flow problem. The direction of flow of the data flow values does not imply that we traverse the blocks in a particular order to compute the sets no any order is fine with us and the final result does not depend on the order of traversal of the basic blocks either. I have still not told you how to compute this how to compute the values for in and out typically once we compute gen and kill we compute in and out values in a loop you know we compute them again and again making sure that we reach what is known as a fixed point. So, at the fixed point you know after we cross the fixed point the values of in and out do not change again for any basic block. So, for the reaching definitions problem the fixed point is guaranteed to be reached in a couple of iterations. So, otherwise formally it is necessary for a new data flow analysis problem will have to formally prove that this sort of a thing fixed point can indeed be reached, but this is not a part of our course. So, we are not going to discuss the theory of data flow analysis in this course. So, again just repeating that so we compute in and out sets again and again until a fixed point is reached. So, let me define gen and kill before I go on to the explanation of the two data flow equations. Gen b is the set of definitions inside the basic block that are visible immediately after the block they are called as the downwards exposed definitions. So, let me explain that suppose a variable x has only one definition in the block then obviously the definition will be downwards exposed used you know at exposed and it will be visible immediately after the block, but suppose the variable x has two or more definitions in the block then only the last definition of x is downwards exposed and others are not visible outside the block. So, this is what is important we actually take the last definition of the variable. So, we can even compute gen b by doing a traversal from the end of the basic block to the beginning of the basic block no problem with that. So, kill b is the union of the definitions in all the basic blocks of the flow graph that are killed by the individual statements in b. So, in other words if a variable x has a definition d i in a basic block then d i kills all the definitions of the variable x in the whole program except of course, d i my picture in the next slide will explain this in a better fashion. Suppose we consider a basic block b we want to compute the gen and kill for this particular basic block as I said gen and kill will be computed just once and they will be used again and again. The basic block has five you know four definitions d 1, d 2, d 3 and d 4. So, d 1 defines a d 2 defines b d 3 defines c and d 4 defines a again and in other basic blocks of the program there are many other definitions d 5, d 6, d 7, d 8, d 9 and d 10. So, the set of all definitions in the program would be d 1 to d 10 right. So, four here and the next six in the other blocks. So, this is our universal set of all the definitions to compute gen so, let us look at the various definitions d 2 and d 3 compute b and c they are the only definitions of b and c in the basic block. So, d 2 and d 3 will be definitely included in the gen set they are generated by the basic block. Now, d 1 is a definition of a d 4 is also a definition of a, but as is very obvious this definition the value here will not be visible after the redefinition of a, this is you know just ordinary program sense. So, it is d 4 which will be visible outside the basic block at this point and not d 1. So, we include only d 4 in the set of generated definitions of the basic block b, what about the kill set. Computing the kill set is a little tricky. So, what we need to do is consider each definition in the basic block then consider all the definitions of the variable a in the entire program. So, we have we have a definition you know which is d 4 here which is the definition of a, we also have the definition d 9 which is also a definition of a. So, our definition of kill says take d 1 then it kills both d 4 and d 9. So, we include d 4 and d 9 in the kill set then d 2 defines b. So, it kills d 5 in the other basic block. So, d 5 gets included in the kill set then we have d 3 which defines c. So, it kills d 10 which is also included in the kill set now we come to d 4 which is a definition of a. So, it kills not d 1 and d 9 d 9 has already been included in the set. So, only d 1 gets included in the kill set. Why are we you know computing the kill set in this fashion even though we do not consider whether there is a control flow from this basic block to any other basic block which defines that variable or not. The answer is simple we are computing gen and kill before the reaching definitions problem is solved. To compute the exact kill set in fact to check whether there is a flow from one basic block to another and so on and so forth. We may end up solving the reaching definitions problem or a variant of it before we compute the exact kill set that is the one first reason why we have such a big set of definitions for the kill. The second reason is even if we include many definitions which are not relevant to the basic block b in the kill set it is not going to actually affect our computation of in and out. So, because of this it is immaterial whether the kill set is a super set of the actual values or definitions killed or is it the exact kill set that really does not matter to us. So, I hope this clarifies the computation of gen and kill. Now, let us move on to the data flow equations. So, we have two equations here one is the in set of the basic block b is the union of the out sets of the basic block predecessors of b. So, this is the meaning we want to compute the in set of the basic block b 4 there are three predecessors to this basic block b 4 b 1 b 2 and b 3. So, there are three sets here out b 1 out b 2 and out b 3. So, in of b 4 will be the union of out b 1 out b 2 and out b 3. This is actually quite intuitive the reason is you know suppose we consider the set of definitions reaching this point that is out of b 1 from here to here there is nothing to stop the definitions from reaching right. So, all the definitions which reach this point will take this edge and reach this point as well. So, all the definitions of out b 1 should be included in the in b 4 set the same is true for out b 2 and out b 3 as well. So, this is the intuitive reason why we take a union here the other reason you know the actually what happens is the reaching definitions problem we are not really worried about the definition reaching along all paths. We are interested in knowing whether the definition reaches along at least one path. So, it may reach along this path or this path or this path. So, reaching along any one of the paths is fine for us in fact if it reaches along this path and does not reach along these two paths we are still happy there that is the reason why we want to take the union of all the definitions which reach this particular basic block. Then what about the computation of out of b is computed as gen b union in b minus kill b let me explain why this is. So, we have the same basic block b 4 there is a set in b 4 which is already computed let us say using this equation. Now, the set gen and kill has already been computed we want to compute the out set. So, here are some values given for in and gen kill. So, let us say the set in b 4 is p q z the gen set is a comma b and the kill set is z. We need to you know take the gen values and put them in the out set that is very clear because whatever is computed within the basic block b 4. Obviously, by definition reaches the end of the basic block and that will be in the out set. So, that is the explanation for including gen of b 4 in the out set the second is the in part. There are lots of definitions reaching this point right the in part and a few of them get destroyed because of the kill set why there are definitions which are coming in then some of these variables will be you know redefined here. So, the definitions corresponding to those variables in the inset will all be will have to be removed. So, that is why recall the definition of kill if z is defined here then it is z kill that definition kills all the definitions corresponding to z right. So, I have not named the definitions here let us assume that these are the variables just for the sake of example really speaking I should have given the definitions numbers corresponding to these variables, but it helps if we actually keep the variables for the sake of under here. So, let us assume that there is one definition corresponding to z here you know another definition corresponding to this z another definition corresponding to a one more corresponding to b etcetera. So, the out set now will contain the definitions corresponding to a and b of course, it will also contain the definitions corresponding to p and q because they are not killed and the definitions corresponding to z will all be removed from the inset and that is the reason why this has only a b p and q. So, this is how we compute the outset in terms of gen and kill gen kill and in. So, remember if the kill set contains certain you know definitions then they will be removed from the inset that is what this really means. So, we have removed the definitions of z from here we have included the definitions of p and q only the definitions of a and b only. So, the outset will contain a b p and q. So, that is what this equation says include the definitions of gen take away the kill set from in and include it in the outset of b 4. Let us take a big example and this has been adopted from the book by Aho Seti Indulman. So, the first thing to do is to compute the gen and kill sets and then make initializations appropriately. So, let us do that this basic block has three definitions d 1 d 2 d 3. So, computing gen here we would include d 1 d 2 d 3 in the gen set that is very obvious because they are all visible here and kill set of b 1 you know if you consider the definition d 1 it kills all the definitions with i. So, it kills d 4 right and then it kills d 7 as well. So, d 4 and d 7 are included in the kill set. The second definition d 2 corresponds to with a variable j. So, it kills the definition d 5. So, that is also included in the kill set. The third definition corresponds to variable a. So, it kills the definition d 6 involving a. So, that is also included in the kill set. So, this is the computation of gen and kill. Initialization in of b 1 is made phi and in fact in the practical implementation we can say that out b will be initialized to gen because of this equation right. So, ignore this part we will still include a minimum of gen b in any out computation. So, we can initialize out b to the gen set d 1 d 2 d 3 in this case. Coming to b 2 this is a bit tricky gen is easy. So, d 4 and d 5 are being defined here kill again you know look at the variable i and the definitions of i elsewhere. So, there is d 1 here then you know we have d 7. So, and then corresponding to j we have d 2. So, all the 3 are included in the kill set and then the in set is phi and the out set is initialized to d 4 d 5. So, you must observe here that the value of i which is used on the right hand side of this assignment actually arrives from the value i here you know the old value i here or the value here. This is a new value which is computed and the same is true for j this j arrives from this particular value and then of course, after it is computed it will also arrive via the loop. The third basic block gen is just d 6 and kill corresponds to the definition of a which is d 3 in will be phi and out will be d 6. Gen of b 4 is d 7 and kill of b 4. So, we look at the definition of i. So, that would be d 4 and then we also have d 1. So, these are the two definitions which are killed. So, it is very probably very clear that you know d 1 will never reach this particular point here you know it never reaches here. The reason is it is redefined already at this point. So, we must compulsorily take this path. So, only d 4 will perhaps reach this point it will d 1 will never reach this point. So, this is not a relevant definition as far as the kill set is concerned, but as I said since we cannot compute exact kill sets without solving the reaching definitions or similar problem we take a superset and it is not going to matter to us. So, now pass two. So, this was the old value of in and out gen and kill down change old values of in and out are here. These are the new values what does the new value require the equations are here. So, the in set is a union of the out sets of the predecessors. So, in this case there is nothing coming from here. So, in b will always remain 5 always. What about out b you have to include a part of you know one part is corresponding to gen which is already included that is d 1 d 2 d 3 and the in set corresponding to this is 5. So, there is nothing to take away and this is the old value of in b right it is 5 and therefore, we still have d 1 d 2 d 3 as the out set here. So, so far from here to here there is no change, but once we consider b 2 we see changes these are the old values of in and out. So, when we compute the new value of in for this particular basic block b 2 we take the union of the two out sets coming from b 1 and another coming from b 4. So, the out set of b 1 here. So, that would be d 2 d 1 d 2 d 3 and the out set of b 4 is d 7 and therefore, the in set of b 2 would be d 1 d 2 d 3 d 7 it is the union of the two. So, remember the old value as 5 and the new value is different even though there was no change of anything here there has been a change here. So, what about the out value of b 2 obviously, as usual the gen value d 4 d 5 gets included in the out set of out set then we take the in value and remove the kill value. So, if we do that you know in b 2 is d 1 d 2 d 3 d 7 kill b 2 is d 1 d 2 d 7. So, only d 3 remains and that gets included into b 2. So, d 3 is still relevant it goes through whereas, d 1 and d 2 are not relevant whereas, d 1 and d 2 are not relevant because they are being redefined here that is quite intuitive for us. Then the basic block b 3 again the in value is 5 out value is d 6 whereas, here the in value will be the new out value of b 2 the new out value of b 2 is d 3 d 4 d 5. So, the new in value of b 3 will become d 3 d 4 d 5 it is different from in the out value of course, is the gen value first that is included that is d 5 and then. So, that is d 6 sorry that is d 6 which is included and then we take the in value d 3 d 4 d 5 and remove the kill that is d 3. So, d 4 and d 5 will also be included in out. So, d 4 d 5 it implies that d 3 cannot flow through b 3 which is very obvious because it is being redefined here a is being redefined, but d 4 d 5 will obviously flow through b 3. So, that is what out b 3 signifies. So, what comes out here is this definition and these two definitions which are coming out the last one is b 4. So, if you consider b 4 in of course, was 5 as usual now the in value of b 4 is the union of the two outsets one coming along this direction and other coming along this direction out of b 3 is d 4 d 5 d 6 and the out of b 2 is d 3 d 4 d 5. So, we get d 3 in of in will be d 3 d 4 d 5 d 6 the union of the two. What about the outset it includes d 7 because it is gen and then we take in and remove the kill part. So, if you remove the kill part d 4 goes out d 1 is extra which is immaterial. So, we include d 3 d 5 d 6 and d 7 as the outset. So, these are the four new values that have been computed for b 1 b 2 b 3 b 4 and in doing. So, we have actually is looked at the basic blocks in a particular order b 1 b 2 then b 3 and b 4. Actually we could have looked at the basic blocks in the reverse order also b 4 b 3 b 2 and b 1. Since we are going to iterate through the basic blocks in the control flow graph a number of times and make sure that none of the values of the in and outsets of any basic block you know remain the same. The order in which we visit the basic blocks is immaterial, but it so happens that you know the even though the values do not change values do not depend on the order of traversal of the control flow graph. The number of iterations required will definitely change depending on the order in which we visit them. The heuristic one of the heuristics which is used is to use a depth first search order we will discuss this a little later. So, assuming that we use a depth first search order the iterations converge to the fixed point value very quickly. So, this is the final result I would encourage you to verify it you know look at the this block this set of result and then apply the same data flow equations once more and then get these values. And after this values are reached any number of iterations further will not give you any changes will not provide any changes. So, these are the fixed point values and these happen to be the reaching depressions for the various basic blocks. So, let us just take a look at them out b 1 is d 1 d 2 d 3 which is very clear. So, all these depressions reach this point out b 2 is d 2 d 3 d 4 d 5 d 6 d 4 d 5 are very clear then d 3 is also very clear it comes out of this right. And then d 6 is some value which is defined here it goes through the loop and then comes back again. So, there are two definitions of a one here and one here neither overtakes or overrides the other one in the first iteration this is relevant and then onwards in some of the iterations if this path is taken then this a becomes relevant. So, both the definitions of a reach this point then out b 3 here for the basic block with is d 4 d 5 d 6. So, d 6 is very clear d 4 d 5 also come out then finally, out b 4 is d 3 d 5 d 6 d 7. So, d 7 is very clear d 6 is very clear d 4 d 5 you know d 4 does not reach because i is redefined here, but d 5 definitely reaches the end of the basic block here. So, this is how the definitions reach the end of the basic block. So, remember even though we have no idea of the execution path you know we take the definitions which column along all paths take a union and then say that these are the definitions which reach this particular point. So, this is a conservative estimate because at execution time exactly one path will be taken whereas, during static analysis time we are going to consider all the paths. So, now it is time to look at the algorithm in a program like fashion the algorithm is quite simple this is as I said an iterative algorithm. The first of all initialization for each block b do in b equal to 5 and out b equal to gen b. So, that completes the initialization then to detect whether there has been a change in the computation of values we use a flag change equal to true initially and then we loop while change do. So, set change to false and then for each of the basic blocks apply the data flow equations in b equal to something computed then store the old value of out and recompute the new value of out. So, if out has changed then you know change is if out b not equal to old out that means, out has changed then we reset change to true again and then loop once more. So, we keep doing this until none of the out values change the reason why we are checking only the out and not the in is out is dependent on in. So, if there is a change in in it will automatically be reflected in the change in out. If we had reverse these two equations out first and then in then we would have checked in here instead of and store old in instead of old out. So, of course, there is nothing wrong in storing both old in and old out and checking them, but that is not really necessary. Now, we still have to look at the data structures which are necessary to store the various data flow values. So, this iteration will go on until there is no change and once there is no change we quit. So, we actually use bit vectors for the with one bit for each definition in the program. Let me show you an example. So, here the there are actually only seven definitions in the whole program in the control flow graph. So, a bit vector of size seven with one bit for each is sufficient to store all the definitions. So, the first bit always stores the corresponds to d 1 second bit corresponds to d 2 etcetera etcetera. So, the position of the bit itself gives you the definition number. So, all over sets gen kill in and out will also be seven bit vectors bit vectors. So, for example, the set gen b 1 actually has d 1 d 2 d 3. So, it has once in the three places one d 1 d 2 d 3 and the rest of them as zero. The kill set has you know d 1 d 2 d 3 as zero and d 4 d 5 d 6 d 7 as once then in b was initialized to phi. So, this is and remains phi as I said. So, these are all zeros and the out b 1 value d 1 d 2 d 3 is represented here. So, this is how the. So, this is how the data flow values are stored as bit vectors. The same is true for this basic block this basic block and also this last basic block. The data flow analysis problem that we are going to deal with is called as the available expressions computation. So, here this problem becomes a very important problem to perform the global common sub expression elimination optimization. The sets of expressions constitute the domain of data flow values here. In the reaching definitions problem it was the set of definitions. Here we are going to look at the sets of expressions and they constitute the domain of data flow values. This is a forward flow problem and the confluence operator is intersection. By the way the confluence operator in the reaching definitions problem was the union operator. So, let me show you this is here. So, we have an equation for out in terms of in. So, as I said this determines the direction of the data flow and we have the confluence operator which is union here. Whenever we combine the values at a joint point in the control flow graph we must use the confluence operator. So, in the available expression computation problem it is a forward flow with a confluence operator as intersection. In the previous case it was union here it is intersection. This will become clear very soon. Now, the schema for the data flow analysis requires the equations. We will provide the equations very soon and it also requires gen and kiln sets. But, before that let us define what exactly is availability of an expression. An expression x plus y is said to be available at a point p if every path not necessarily cycle free. So, we could go in cycles. But, what is important is the path must begin from the initial node of the control flow graph. So, from the initial node to p. So, we must consider every path here. In the reaching definitions case we considered any path here we must consider every path from the initial node to p and it must evaluate the expression x plus y. So, and after the last such evaluation prior to reaching p there are no subsequent assignments to either x or y. The important point here is there must be you know a computation of x plus y from the initial node to p along every path that is number 1 and the other important point is after the last such computation modifications to expand by x or y should not happen. If you modify either x or y or both then the value of x plus y changes. So, therefore, changes to x first x or y or both should not happen after the last computation of x plus y otherwise the value of x plus y would have really changed. Then so this is availability in most important it must be along all paths and after the last computation of x plus y no changes to either x or y. What do we mean by killing an expression here? So, a block kills x plus y if it assigns or may assign to either x or y or both and it does not subsequently recompute x plus y. So, the point is we have a computation of x plus y then we have redefined x let us say if we recompute x plus y then the block does not kill x plus y, but if it simply modifies x and then does not recompute x plus y then it kills x plus y. So, let me show you an example here here is 4 star i a computation here is another 4 star i another computation and here there are two possibilities one possibility is to go along you know it is not that both the blocks are present at the same time I have just written the second block as an alternative in dotted lines. So, either this block is present or this block is present. So, if there is no assignment to i obviously 4 star i will not get changed along this path at all. So, we can say it is available along this path and also this path. So, it is available along all paths at this point and suppose we redefine i. So, we have defined 4 star i computed 4 star i here now we have redefined i here, but then we also computed 4 star i again. So, in this case also 4 star i is available along this path we have because we have recomputed 4 star i 4 star i is available along this path 4 star i is available along this path also. Remember we are not considering looking at the exact value of 4 star i here. So, what we are saying is whether we use 4 star i along this path or this path we do not have to recomputed it here that is all we are trying to say during the common sub expression elimination we want to avoid computation of 4 star i here either by reusing this value or this value it really does not matter, but suppose there was no recomputation of 4 star i here right it was just a modification of i then this value 4 star this expression 4 star i is not available along this path. So, we say 4 star i is not available at this point what about the generation of x plus y a block generates x plus y if it definitely evaluates x plus y and does not subsequently redefine either x or y or both. So, this is very simple we just compute x plus y and then do not modify either x or y. Now how do we compute this gen and kill let us call them as e gen and e kill. So, let us assume that the equations you know the statements are of the form x equal to y plus z we are trying to compute it for each one of the statements in the basic block. So, this must be done one statement at a time and then the effect of the entire basic block will be felt at the end for the point q let us assume that the e gen is given. Obviously, at the beginning of the basic block this will be empty. So, let the e gen for this point q be a now we have this statement x equal to y plus z. So, what do we do we act now y plus z gets computed. So, it is actually generated by this statement. So, we do a equal to a union y plus z then x is being redefined here. So, all the you know expressions involving x will have to be removed from a. So, we that is the killing part. So, a equal to a minus all expressions involving x at this point therefore, e gen p is the new value of a. So, this is how we compute e gen p and now if there is another statement after this you know the effect of that statement will be computed exactly like this because e gen p is now available. How do you compute e kill q again you are given e kill at this point. So, let us say that is a now x equal to y plus z is the statement for which we have to compute the effect. So, now you know y plus z is being computed here. So, it is not killed it must be removed from kill even if it was killed before now it is just being recomputed. So, we must remove it from kill now x is being redefined here. So, x kills all the expressions involving x. So, they will all be added to a. So, this is the complement of what we did here. So, a equal to a union all expressions involving x and at this point e kill of p will be the new value of a. So, this is how we actually compute the gen and kill for each of the statements in the basic block they are all combined together and used. So, the set of all expressions appearing on as the right hand side assignments in the flow graph is assumed to be available. So, we must. So, this is our universal set of all the expressions right, but then you know what do you do with these we have used a bit vector representation for the reaching definitions and using bit vectors the union and intersection operations are all trivial right. In the case of expressions we cannot directly use a bit vector what we do is to store all the expressions using a hash table. So, we hash them and put them into a table and the index of that hash table is the used as the bit position. So, we make a bit vector out of these index positions of the various expressions and that will be used as our bit vector for various union and intersection operations. So, we let us stop here and then continue with the discussion of available expressions problem in the next part of the lecture. Thank you.