 Welcome to part 3 of the lecture on machine code generation. Today, we will continue with our discussion on Sethe-Ullman algorithm for optimal code generation and also look at the dynamic programming ways and tree pattern matching based algorithms. So, to do a bit of recap, the Sethe-Ullman algorithm is one of the probably optimal algorithm. So, in other words, it generates the shortest sequence of instructions possible for a given basic block. Of course, the limitation of this algorithm is that it works only for expression trees and it does not work for DAGs. So, if there is a DAG given to us, you know then we will have to break it into trees and apply the Sethe-Ullman algorithm on individual trees. The machine model is that all computations are carried out only in registers. So, and the instructions are of the form op r comma r or op m comma r. So, the point here is the left you know operand should always be in a register. So, it computes the left subtree into a register and reuses it immediately. So, in this type of instruction, this register is the left operand and the memory could be the right operand. It cannot be the other way it must always have the left operand in a register. So, it has two phases one is the labeling phase and the other is the code generation phase. The labeling phase is quite simple. So, let me do the recap with the example itself. So, assuming that this is the expression tree given, the operators in the nodes are not very important to us. So, we have just named them as n 1, n 2, n 3, n 4, n 5. The labeling algorithm labels the leftmost leaf of its parent as 1 and all other leaves of the parent as 0. So, this is a 1 and this is a 0. The same thing is true for n 2 and n 3 as well. When we go to the internal nodes n 1, n 2 and n 3, since the left subtree has a label of 1 and the right subtree has a label of 0, which implies that the two labels are not equal, we take the max and assign it to the parent. So, this becomes 1. Similarly, this becomes 1 and this also becomes 1. Now, we have a situation where n 1, n 2 have labels of 1 and n 3 is the parent with children which have equal labels. So, we take increment the label value and take it as a label value for the parent. So, this becomes 2. So, now, we have 2 and 1 here. So, n 5 automatically becomes the max of these two, which is 2. How do we evaluate such a tree with two registers? So, there are two possibilities. Let us examine both of them. One of them, you know, gives us the correct evaluation, the other one does not. So, let me look at the incorrect evaluation first. So, let us assume that we evaluate n 4 into, you know, n 4 first and then n 3. So, the subtree at n 4 first and then the subtree at n 3. So, if we have to evaluate this, we must obviously evaluate both its operands. So, the left operand must be in the register. So, we must load it into a register or one, let us say and then we evaluate the parent n 4, which an operator n 4. So, which permits the right operand to be a memory operand. So, this gets evaluated and the result will always be in a register. So, that let us say that is R 1. So, if we have finished this now, this evaluation has been finished now and it says yes it can be done with one register, which we have satisfied. Now, the remember that the result is held in the register R 1. So, when we go to n 3, the min drag value says it is 2. So, without 2 registers, it will not be possible to evaluate this 3 without storing the intermediate values into memory location. Let us see why. Of course, this is n 1 is very similar to n 4. So, it can be evaluated into R 0, but then we do not have R 1 free, it holds the value of n 4. So, now unless we store R 0 into a memory location, which is not permitted by the algorithm, because there exists a better order with which we can do the evaluation. So, this you know evaluation fails. So, suppose we start with the evaluation of n 3 and then go to n 4. Now, we have a happy situation, we evaluate n 1 into R 0 as before. Now, R 1 is still free. So, we can evaluate n 2 into R 1. So, this is the second register that we are using and now n 3 can be evaluated into R 0 with its two operands being in registers. After that, value of n 3 is held in R 0 and R 1 is freed. This R 1 can be used to evaluate n 4 and then we can evaluate n 5. So, the basic principle in the algorithm is take the sub tree at a particular node, try to evaluate the sub tree with requires more number of registers and then the sub tree with requires less number of registers, but there are details. So, we will come to those details very soon. Here is the procedure gen code, which is the main procedure to generate code for the sub tree rooted at the node n. The procedure uses two stacks, one is the R stack which is the stack of registers starting with R 0 to R R minus 1. So, there are little R number of registers, the stack contains R 0 at the top and R of R minus 1 at the bottom. T stack is a stack of temporaries. So, we generally assume that there are enough number of temporaries and that there is no scarcity of supply. That is the reason why the bottom of the stack is not mentioned, it would be a fairly large number. The top of the stack is T 0. So, these are the various temporaries that are generated. So, whenever we want a register, you know we can use the registers from the stack of registers and then whenever we want a temporary, we can use it from the stack of temporaries. But, we must remember that we should use the top of the stacks, both register stack and temporary and then go into the deeper parts of the stacks. So, a call to gen code n generates code to evaluate a tree T rooted at the node n into the register, which is at the top of the R stack. So, top of R stack and the rest of the R stack remains in the same state as the one before the call. So, this is a very important consideration to prove the correctness and optimality of the algorithm. So, we must make sure that the result is always at the top of the stack in the register located at the top of the R stack and that the rest of the R stack is in exactly the same state as the one before the call to gen code was made. So, these are two very important things. So, incidentally somewhere in the middle of the algorithm, we will also have a situation to swap the top two registers of R stack. So, this becomes necessary to ensure that a node is evaluated into the same register as it is left child. So, we will see this as we go on. So, let us look at the details of the algorithm. There are many cases 0 through 4, case 0 we are looking at a leaf node n. So, n is a leaf representing the operand n and is the left most child of its parent. So, this means it has a label of 1. So, if it were to be not the left most child then it would have had a label of 0. So, this has a label of 1 that means you know you can evaluate it with one register which is very trivial. We just generate the load instruction load n comma top of R stack. So, automatically this implies the value of n will be put into the register which is at the top of R stack at the time of evaluation that is the execution. So, this is a trivial case. Now, case 1 similar, but not the left child, but the right child is the leaf node. So, the left child is the root of a sub tree that is n 1 and the right child of n is a leaf node. So, if because this is the right child it will have its value as label as 0. So, n is an interior node with operator of left child n 1 and right child n 2 and the label of n 2 is 0 that means it is a right child. So, then let n be the operand for n 2. So, this is the memory location corresponding to n. So, we first generate the code for n 1 you know we can use the memory operand as it is. So, let us finish the code generation for the root for the trees rooted at n 1. So, that requires a certain number of registers label of n 1. So, once we have finished code generation n code prints out the code for that tree rooted at n 1. Now, it is time to generate the code for n. So, that is what we print out. So, code print op n comma top of r stack. So, i top of r stack top of r stack contains the result of the tree n 1 right. So, that is the left most child and we are now reusing the left most child value immediately in the parent. So, and the result will also be now left in the same register as the left that of the left most child. So, that is top of r stack. So, remember gen code n 1 will you know make sure that the top of r stack the register will contain the value of the tree at execution time. So, this is correct print op n comma top of r stack will that is an instruction when execute which when executed will leave the result in the top of r stack again. So, for this entire tree we have satisfied the rules of the game that is the rules of the algorithm the value of the entire tree is available in top of r stack and the rest of the r stack below the top is in the same condition as it was when we call the procedure. The next case. So, we have both n 1 and n 2 you know as roots of sub trees, but the case is the tree n 1 requires less than r number of registers r is the number of available registers and the sub tree rooted at n 2 requires more than what n 1 requires. So, it is not more than r not necessarily, but it definitely requires more than what n 1 requires. So, the thing is the both of them you know are sub trees. So, we will have to generate code for them it is not that we can use an operand as it is. So, 1 less than or equal to label of n 1 less than label n 2. So, that is what we have written here and label of n 1 is less than r. So, because n 2 requires more than the more registers than n 1 that is what the label means and n 1 requires less than r number of registers. So, when we evaluate n 1 we do not require any stores into memory we really cannot say much about n 2 because the condition says label of n 2 greater than label of n 1 it does not say it is greater than or equal to r. So, now since we are evaluating the right sub tree and suppose we evaluate it as it is then the top of r stack will contain the result of n 2. So, this would get us into some trouble because the left sub tree must leave it is result in the top of r stack and that will be reused immediately by the root. So, to take care of this we do an r swap on the r stack. So, that the swap function swaps the top 2 registers of the stack it ensures that a node is evaluated into the same register as it is left side. So, once it is swapped the second register from the top will become the top most register now we call the encode on the n 2. So, the second register which is at the top now will contain the value of the entire tree. In other words it generates the instructions which evaluate this tree and place its value in the top of r stack which is the second register from the top after the swap operation. Now, pop the r stack and remove that register remember which register it is that is r. Now, we have you know the original top of r stack at the top. So, we call gen code n 1 now it leaves the value in the top of r stack. So, this code sequence when evaluated will leave the value in the top of r stack the register at the top of r stack. So, and r now holds the result of n 2. So, remember n 1 is in the top of r stack register and n 2 is in r now we are in a position to generate the code for the root n. So, we say op r comma top of r stack. So, this is correct this top of r stack right now is the original top of r stack before the swap. So, now r is of no use to us because the top of r stack will contain the result of the entire tree after evaluation. So, we push the register into the r stack now the top of r stack originally you know now becomes the second one. So, we swap it brings it to the top. So, the invariant of the algorithm remains true the top of r stack now contains the value of the entire register and the rest of the r stack is in the same state as it was before the call. So, this is the reason we require a swap in this part in this particular case. So, we are evaluating the right node the right sub tree and to make sure that the left sub tree leaves its result at the top of r stack we do a swap then the next case. So, again you know it is the case similar to the previous one the right sub tree requires less than r number of registers and the left sub tree requires greater than what the right sub tree requires. And of course, we also say a label of N 2 is less than r. So, this can be evaluated with no intermediate source, but there is no guarantee about N 1. So, now we have a very simple case because N 1 requires more registers than N 2 according to the label we simply call gen code N 1 which evaluates after the when the code is executed the result will be left in the top of r stack register. So, remove that register pop by pop operation now gen code for N 2. So, the code for N 2 will now be emitted or holds the result of N 1 and the top of r stack after this pop operation will hold the result of N 2. Now, generate the code for the root. So, op of print op top r stack comma r. So, now the top of r stack you know is the register currently at the top, but r was at the top of r stack before the algorithm on the procedure as called. So, now r will hold the value of the entire sub tree. So, push r stack comma r. So, now r becomes the top of r stack and thereby the invariant of the algorithm is maintained r is the top of r stack and it contains the result of the entire tree. So, this is the last case in this algorithm. So, both the sub trees N 1 and N 2 require more than r more than r equal to r number of registers. So, in such a case it is not possible to generate code without storing some of the intermediate results into memory. So, in such a case since both of them require source into memory it really does not matter which one we take up, but since we need to leave the result of N 1 in the top of r stack and then reuse it immediately for the parent. Obviously, we must generate the code for N 2. So, we generate the code for N 2. So, if we had generated code for N 1 we would have had to put it into a temporary and load it back from the temporary at the time of you know evaluating N. So, this would mean inefficiency. So, whereas if the result of N 2 is kept in a temporary it can be reused as a memory operand in the instruction while evaluating N. So, this is the reason we generate the code for N 2. So, that is the gen code N 2 then pop the pop a temporary from the stack and generate an instruction to place the top of r stack value into the temporary. So, now all the registers are free again. So, generate code for N 1 otherwise we would have held back one the registers which is not correct. So, generate code for N 1 then print op t comma top of r stack. So, we now have you know the value of the entire tree in the top of r stack register. Of course, the temporary is not required anymore. So, push it back into the t stack. So, again the invariant of the algorithm is satisfied. So, this is the detailed you know description of the algorithm. So, now let us look at two examples one in which the number of registers is two and we do not require more than two registers and the other would be we have one register and we require more than one register. So, obviously the algorithm is very simply applied starting from N 5. So, the min reg values are all mentioned here. So, this is one and this is two and this is two this is one this is one. So, now we start with N 5 and we have two registers available. This requires one register and this requires two registers. So, we call gen code on this right. So, once we call gen code on this the task is to evaluate these two right. So, both have one register requirement and it is less than the number of available registers. So, we can simply go to left sub tree and that requires loading from memory location. So, load a comma r 0 then op b comma r 0. So, this will evaluation of this would be completed by this stage. Then we go to N 2 generate the code for loading c into r 1. So, r 0 holds the value of this now r 1 is still free. So, we load this into r 1 and then op of d comma r 1. So, that completes N 2 we go to N 3 and op of r 1 comma r 0. So, that completes N 3 as well now r 1 is free. So, we can generate the code for N 4 that requires loading e into r 1. So, that is load e a comma r 1 then op f comma r 1 evaluates N 4 and keeps it in r 1 and when then we generate the code for N 5 which is op of r 1 comma r 0. So, without storing any result into any temporary location we are able to generate the code for this. So, the code sequence would be load a comma r 0 op n 1 b comma r 0 load c comma r 1 op n 2 d comma r 1 op n 3 r 1 comma r 0 load e comma r 1 op n 4 f comma r 1 op n 5 r 1 comma r 0. So, this is the sequence of instructions for this particular tree. So, let us look at another example the number of registers is just 1 whereas, we require 2 registers here and also here. So, when we start code generation at this point this requires 2 and this requires 1, but both of them require greater than or equal to number of registers available which is 1. So, as I said we in this case that is here you know. So, we would always go to N 2 both require more than or more than or equal to r number of registers. So, we go to this then this is as usual load e and then op. So, those 2 are done now r 0 contains the result of N 4, but we must free it otherwise we cannot evaluate N 3 at all. So, there are no instructions to operate on with 2 memory operands that is the problem here and. So, we free this register by generating an instruction to load it into a temporary. Now, we go to N 3 and then obviously, there is N 1 and N 2 both require greater than or equal to r number of registers. So, again it is a write by us that we have. So, we go to N 2 then load c and then op N 2. So, these 2 require 2 instructions to generate to code which evaluates N 2 again we must free the register. So, that this can be evaluated. So, load r 0 comma t 1. So, we generate we take another temporary t 1 and move the result into t 1. Now, we are ready to generate code for N 1 which is very similar load and then op. So, load a comma r 0 op N 1 b comma r 0. So, that completes the evaluation of this N 1. Now, left sub tree is in a register right sub tree is in a temporary we can generate code for N 3. So, that would be op N 3 t 1 comma r 0. So, now the result will be in a register r 0 and this evaluation is completed t 1 of course, can be released at this point. Now, we are ready to generate code for N 5 which would be op N 5 t 0 comma r 0. So, that places the result again in r 0 and releases the temporary. So, again so we have generated code for the entire sub tree and we also have the result in a register. So, remember this requires storing result intermediate results into temporary locations. So, now we move on to another optimal code generation algorithm. So, this is the dynamic programming based optimal code generation. Again, it caters to only trees and it is not possible to handle directed acyclic graphs. Unfortunately, DAX you know code generation for DAX is NP complete. So, there is no optimal algorithm that is possible. So, at least there is no optimal algorithm which works in polynomial time that is possible. We will see that later. This dynamic programming based algorithm caters to a slightly broader class of machines. The first requirement is that there are r interchangeable registers r 0 through r r minus 1. Remember this is not a stack anymore. So, it is just a set of registers r 0 through r r minus 1 and the requirement is that there are there is nothing like a special purpose register here. Any register can be used in any instruction. Then the instructions are of the form r i equal to e. Not much restriction is placed on the form of e like in the case of the set Hulman algorithm. The only restriction is if e involves registers r i must be one of them. So, the left operand left operand of the assignment is a register and the right part r h s is an expression. So, if e uses expressions then r i must be one of these sorry if e uses registers then r i must be one of those registers. If all the operands in e are memory locations then r i can be any other register. So, this is the so again the computation is performed in a register. Of course, we definitely require instructions of the form m i equal to r j to store values which are in registers into memory, but all other instructions are of this form. So, here is a sequence of possible rather set of possible instructions. So, assuming that there are many registers available for each one of those registers there is an instruction. So, r i equal to m j of course, m j is any memory address. So, if we have two registers then we would have instructions possible such as r 0 equal to m j and r 1 equal to m j. So, similarly for all the others another possible instruction is r i equal to r i of r j. So, see that r i is used in the expression here. So, r i must be on the l h s also. Similarly, r i equal to r i of m j. So, one of the operands can be a memory. Then we have r i equal to r j. So, of course, we would we may also have r i equal to m i of m j, but in this example we do not have it that is all. So, r i equal to r j which is a copy instruction. So, again this j must be i if this is i then it is of no use. So, r i equal to r i is a useless instruction. Whereas, if r i and j are different then it is a copy operation and that is a useful instruction. So, this is this and this are the two variations of this. So, this e involves registers r i must be one of them is not really satisfied here. So, this is the format of instructions that we are going to assume in our examples, but any general instruction with more than two operators is also possible. So, more than one operator rather or one operator with three operands. So, this has two operands we can have you know two operands now, but we can also have three operands. In other words this is a very general purpose machine. So, we may even have complicated instructions such as multiply and accumulate. So, that would be something like two operations in the same instruction, but with a special operator for it mac for example, multiply accumulate. Based on the this is based on what is known as the principle of contiguous evaluation which I am going to explain very soon and it is it produces optimal code for trees. So, again basic block level trees no DAGs are permitted and it can be extended to include a different cost for each instruction. In the Satie Wollman algorithm there was no question of a cost for each instruction. The length of the sequence of instructions generated was the cost of the entire code whereas, here we can attach an you know a cost to each instruction. So, this could have a higher cost compared to r i equal to r i of r j simply because this involves a load from memory right. The same is true for m i equal to r j this involves a store into memory whereas, this is just a copy operation. So, each instruction could have a different type of cost then we still have to explain what is contiguous evaluation. So, consider a tree in which there is an operator op at the root then we have two sub trees T 1 and T 2. How does one evaluate such a tree? Well there are many possibilities right. So, before evaluating op it is certain that we must evaluate both T 1 and T 2, but we can evaluate T 1 first then T 2 and then op or we could evaluate T 2 first then T 1 and then op these are two possible orders, but then while evaluating T 1 we could evaluate a part of T 1 then jump to T 2 evaluate a part of T 2 continue with the evaluation of T 1 then again hop to T 2 etcetera etcetera. So, and finally, once the evaluations of T 1 and T 2 once they are completed we can evaluate the tree T, but the principle of contiguous evaluation allows only two possible orders one is evaluate T 1 completely or evaluate T 2 completely. So, one of these must be done first then we can once assume that we have evaluated T 1 first then we must evaluate T 2 and then we can evaluate op otherwise we must do T 2 completely then T 1 and then op. Now, the finer details. So, when we are evaluating T 1 and T 2 there is a rule which says evaluate sub trees of T that need to be evaluated into memory first. So, there may be parts of T 1 which require too many registers. So, what we do is evaluate that small sub tree of T 1 into we using all the registers store it into a memory location. We do that for all parts of T 1 which must be evaluated into you know memory the same is true for T 2. Now, after all these evaluations are done we would be you know we would actually be left with parts of T 1 and parts of T 2 which can be evaluated in using only the registers. In such a case after the evaluations into memory are over we must evaluate the entire T 1 first and then T 2 and then op or entire T 2 first and then T 1 and then op. So, this is called as contiguous evaluation order. Why should we say evaluate sub trees of T that require that can need to be evaluated into memory first and then the rest the problem is when we have something to evaluated something to be evaluated into memory. We might as well use all the registers available load that result into memory and then again go to some other part which requires evaluation into memory use all the registers and then store it into memory and so on and so forth. If we do not do this if we try evaluating other parts of the tree which require registers alone we would be actually left with lesser number of registers to evaluate these small sub trees which have to be evaluated into memory. How does one decide whether to evaluate something into memory or using registers alone that we will learn very soon because there is going to be a cost attached to every type of evaluation. So, what is not permitted in contiguous evaluation well we cannot evaluate assume that everything can be evaluated into in using registers alone right. So, we cannot evaluate parts of T 1 and then parts of T 2 again parts of T 1 again parts of T 2 etcetera etcetera we cannot do that that is permitted only when they need to be evaluated into memory otherwise when we are using only registers we must finish the evaluation of T 1 if we start it and then T 2 and then up or the other way first T 2 then T 1 then up. These decisions are actually driven by the cost considerations and we will see how to do it very soon. Why this contiguous evaluation well the most important part property of contiguous evaluation is that it is optimal. So, the code which is going to do contiguous evaluation is optimal code in what sense the code does not require any higher cost and it does not require more registers than any other optimal evaluation. So, there may be many optimal evaluation possible using the same number of registers and having the same cost. What this contiguous evaluation theorem tells us is generate code for contiguous evaluation then it actually cannot be worse than any other optimal evaluation it is as good. So, the contiguous evaluation you know principle is very simple and it allows us to generate code in a very simple fashion that is having decided that we generate code for T 1 we do not have to jump to T 2 at all we can finish the code generation for T 1 and then go to T 2 generate the code for T 2 and then generate the code for up. If the cost considerations had told us that we must do the other way first T 2 then T 1 then up we finish the code generation for T 2 then T 1 and then up. So, we do not have to consider the operation you know the evaluation orders which require parts of T 1 you know optimal evaluation orders which require parts of T 1 to be evaluated first and then parts of T 2 etcetera. We can we can stick to contiguous evaluation orders and still be certain that we will generate optimal code. So, let us look at the algorithm now there are three steps in the algorithm the first step says compute in a bottom up manner for each node of n an array of cost c this is a very important probably the most important part of the algorithm. What does the array c contain c of i is the minimum cost of computing the complete subtree rooted at n assuming i registers to be available. So, if there are three registers available then this c will have four locations c of 0 cost of minimum cost of computing the entire subtree assuming 0 registers similarly c 1 would be with 1 registers c 2 contains the cost with 2 registers and c 3 contains the cost with 3 registers. So, 2 and 3 registers right then consider each machine instruction that matches at n. So, this is the way we compute c i and consider all possible contiguous evaluation orders using dynamic programming at the node n. So, and then add the cost of the instruction that matched at node n. So, this is the way to compute c i. So, we consider all possible orders then take the minimum and assign it to c i. So, we assume 1 register 2 register 3 registers etcetera say r registers which are available and then to compute the cost using you know into memory the cost of computing a tree into memory would be cost of computing the tree using all registers plus 1 which is the store cost. So, we do not have to do anything special compute the cost using 1 register 2 register etcetera of 2 r registers then fill up the cost of computing into memory by saying cost of computing the tree we using all registers plus 1. So, as I said this is the way we do it then once we do this when see store is over all the registers can be freed. Then using c determine the sub trees that must be computed into memory. So, that I mentioned then you know travesty and emit code. So, memory computations first and then the rest later in the order needed to obtain optimal cost. So, this is you know fairly straight forward as I am going to explain now the most important step is to compute this c. So, the sub trees with that must be computed into memory must be of the lowest cost otherwise we will compute it into registers rather than into memory. One trivial possibility here is the leaves which are already in memory the operands corresponding to the leaves will be in memory. So, computing something into memory in that case would be of cost 0. Whereas, computing it into a register means loading it into a register which is of higher cost here is an example. So, this is our tree right. So, there is a plus there is a minus there is a star there is a slash and these are the operands a b c d e which are in memory. So, as I already said the cost of computing a into memory will be 0 because it is already a memory operand the same is true for all the leaves b c d and d. So, remember the computation of c proceeds in a bottom up fashion. So, we do this then this then this then we do this we do this we do this then with this then this and finally this. So, that is the post order traversal of the tree that is the way we would do it. So, let us assume that there are two registers and it is a very elaborate to present the whole example. So, let us take node 2 and beat it to death find out all the details regarding the evaluation. So, node number 2 is here and in our instruction set that instructions which match at node number 2 would be r i equal to r i minus m with i equal to 0 and i equal to 1. So, that means, this right operand can be in memory, but the left operand must be in a register. We do not have instructions with both operands as in memory whereas, we definitely have instructions in which both operands are in registers. So, r i equal to r i minus r j i j both can be either 0 or 1. So, both of them can be in registers or left can be in register and right can be in memory. So, these are the only two types of instructions which match at node number 2. Obviously, this is the same for this as well now. So, we are going to compute c let us say let us call it as c 2. So, that we know that it is the cost at node number 2 with one register at two registers and then compute the cost of computing it into memory by saying it is 1 plus c 2 of 2 that is cost of 1 plus cost of computing it with two registers. Let us see how to compute node number 2 with one register. So, if we want to do that since the left operand is always in a register. So, we must look at the possibility of c 4 that is node number 4 with one register c 5 with 0 number of registers plus the cost of computing the root that is 1. This is the only possibility we cannot say c 4 of 0 because there is no pattern which matches here and once we have done c 4 of 1 we are left with 0 number of registers. So, saying c 5 of 1 has no meaning. So, it will be c 5 of 0 always of course, 1 is for the root itself. So, c 4 of 1 means it is a leaf node it is in memory. So, loading into a register will require cost 1 and that is trivial. So, that is so if we have a cost of 1 for an instruction of the type r i equal to m j then it would be 1. If we have a higher cost then that is the number which we are going to put here. So, this is what I meant when I said different instructions could have different costs as well. So, c 4 1 in this case is 1 it could have in 2 or 3 based on the machine. C 5 was 0 computing something into memory when it is already in memory is obviously 0 and we are assuming that minus requires 1 cost has cost 1. So, that would be plus 1 again. So, the whole cost is 2 it is more interesting to look at c 2 of 2 with 2 registers there are many possibilities. So, we have finished c 2 with 1 registers now we are looking at c 2 with 2 registers. So, we could evaluate this with 2 registers of course, evaluating this with 2 registers has no meaning you know 1 register is enough. So, the other would be just left as it is. So, whether we have 1 register or 2 registers the value will be moved into only 1 register. So, the cost will always be 1 with 1 register or 2 registers for this particular node 4. So, after c 4 of 2 we will be left with 1 register 1 register being used for node 4. So, c 5 can be evaluated with 1 register and then we have 1. So, c 5 of 1 means move it into a register and then 1 for the root this is common always right. Other possibility is c 4 with 2 registers c 5 with 0 number of registers just because we have a free register it does not mean we must use it. So, c 5 with 0 registers plus 1. So, these are the 2 possibilities with 2 registers now we look at the possibilities with c 4 using 1 register. So, c 4 1 then obviously, we would have c 5 2 plus 1 then c 4 1 plus c 5 1 plus 1 c 4 1 plus c 5 0 plus 1. So, these are all the possible ways of evaluating the tree with 2 registers right. So, the costs are this is cost 1 very obvious this would also be cost 1 because we want to move it into a register plus 1. So, 1 plus 1 plus 1 this is cost 1 this is cost 0 this is cost 1. So, 1 plus 0 plus 1 again this would be cost 1 this would be 1 this would be 1. So, 3 this would be 1 and this would be 1 and this would be 1 again 3 and the last one this would be 1 this would be 0 this would be 1. So, this is 1 plus 0 plus 1. So, the minimum of this would be really 2. So, cost of evaluating node 2 with 2 registers is also 2 it was 2 with 1 register and it is 2 with 2 registers also minimum cost well that is very intuitive because whether we use 1 register or 2 registers this requires only 1 register. So, and this is best in memory there is no point in loading it into a register and then using it. So, this cost is all is 2 even with 2 registers. If you want to store it back into memory then we need another extra cost. So, that would become 3. So, the triple for this node C 0 C 1 C 2 would be 3 comma 2 comma 2 and for the leaves it was 0 comma 1 comma 1 simply because it is already in memory. So, cost 0 whether we use 1 register or 2 registers it is always 1. So, all the leaves have the same triples this is very similar to 2. So, this also has the same triple then let us look at node number 3 this has 5 5 4. So, remember the cost here is 0 1 1 whereas, the cost here is 3 2 2. So, we could computing node with 3 with 2 registers let us look at that that is the most complicated case. So, we could do with left subtree with 2 right with 0 left subtree with 2 right with 1 left subtree with 1 right with 0 similarly, left subtree with 1 right with 1 left subtree with 1 and right with 2. So, in all these cases we can you know compute the cost. So, with 2 registers this would be 1 with 0 registers for node 7 it was 3 right. So, that is 3 and the root would be 1. So, this is 5 this would be again just a load. So, this would be 1 and this would be with 1 register that would be 2 here. So, this is 2 and then we have 1. So, this would be 4. So, similarly 1 and 0 would mean 1 plus 3 plus 1 this is next one would be 1 plus 2 plus 1 etcetera. The minimum value is 4 cost of computing with 1 register is very simple. So, this is 1 and 0. So, I have avoided writing it again. So, that would be 5 cost of computing into memory would be cost of computing with 2 registers that is 4 plus 1 that is 5. So, the triple will now turn out to be 5 comma 5 comma 4. So, it is more advantageous to evaluate subtree with node 3 into 2 with 2 registers rather than with either 1 or 0 registers. So, this is at the entire tree. So, annotated with the triples right. Finally, when we finished this route we would be left with the triple 8 comma 8 comma 7 and all others have already been evaluated. Now, the way we generate code we pick the component of the triple which requires minimum cost. So, in this case it happens to be the cost with 2 registers right. So, while computing the triple I must mention that we must keep track of the instruction pattern which actually uses that cost as well otherwise it is not possible to generate code. So, with this particular cost 7 the instruction pattern would have been r i equal to r j plus r i something like that. So, we can instantiate that with any registers assuming that there are 2 registers we have instantiated it as r 0 equal to r 1 plus r 0. So, the result this implies. So, this is the pattern which matches here and that is the least cost pattern it implies that the r s t would be computed into r 0 and l s t would be computed into r 1. The cost information should also be born in mind. So, this would be with l s t would be with you know the least cost would be again with 1 register and here the least cost would be with 2 registers. So, that is what is mentioned here then we go down. So, for node number 3 this is the one which requires higher number of registers. So, 2 registers so that is the one which should be dealt with first. So, again we compute r s t with 2 registers into r 1. So, this is the least cost and l s t would be into r 0 then we go down further. For 7 we compute r s t into memory already available and l s t into r 1. So, the code generated would be in the other order. So, this would be r 1 equal to d first then r 1 equal to r 1 you know slash e then we have r 0 equal to then we have r 0 equal to c then we have r 0 equal to r 0 star r 1 and so on and so forth. So, once we have annotated the tree with all these instructions you know code generation would be very easy. So, annotation requires a top down traversal of the tree starting from the top pick the minimum cost then it will also tell you the pattern it will also tell you which particular register is to be used for the l s t and the r s t. So, there we again pick the number of available registers with minimum cost and the sorry the cost of evaluating the register the sub trees with minimum cost and so on pick that and go on doing that for the rest of the tree. So, the basic idea is when we start at the root we always follow the direction which you know the sub tree which requires more number of registers will be dealt with first. So, this information of which pattern matches there and what is the number of registers that is required for this pattern etcetera will have to be kept track of. So, remember that we have that information here you know c 4 with 2 registers c 5 with 1 register etcetera etcetera. So, we have one pattern for each of these that is possible and that pattern and the number of registers required for l s t and r s t will have to be maintained in separate data structures. So, that we can generate code you know we can cover the tree with the appropriate code during this pass and once that is over we could simply do a post order traversal and generate the code itself. So, the code that is generated is listed here. So, r 0 equal to so remember this is not the order in which we do it it is post order all right, but we will have to do the right sub tree first and then the right left sub tree and so on and so forth. So, r 0 equal to c so we have to follow the order in which we have generated code and that is the way in which we finally emit the sequence. So, we will stop here and continue in the next lecture. Thank you.