 So, this is a loop which just adds two vectors puts the result in one of the vectors right and this is the MIPS translation. So, 0 dollar a 0 is the base of x and base of y is 400 dollar a 0 ok. So, these two instructions loads the values of x i and y i into two respective registers. This is the loop iterator incremented by one and this instruction adds these two puts the result in these two. So, this one could not be put here because of the load delay slot all right we have to be delayed by one cycle. Then this one stores the value back to x i same as this address and this one compares dollar a 1 against 100. If it is already there then it sets dollar v 0 to 1 this one checks if v 0 is already 1 if not goes back and executes it and this is the branch delay slot which is always executed which increments the address by 4 because these are integer arrays all right to take you to the next element of the array. So, now the question is that with our model of any question on this translation is it clear. So, with our model of execution we are fetching at a certain rate all right and the execution model is that whenever an instruction is ready to execute I will execute it and I also leave the constraint that in a particular cycle you can only execute one instruction. You can execute unlimited number of instructions in a cycle. So, the question is what is the minimum number of cycles required to complete let us say one iteration. So, what will you do in the first cycle which instructions you can execute you can send these two for execution because they are independent you can also send this one for execution you can also send this one, but there is a small problem if you send this one this one violates what kind of hazard is this between this and this these two instructions right after read right. So, we violate a right after read dependence if we actually. So, essentially by executing these I essentially what I am doing is I am reordering this instruction ahead of this one I am saying that this this this and this. So, this instruction will be put here essentially all right and that would violate this particular dependence all right. How can I fix it any simple solution I still want to execute this one in the first cycle there is a slight problem because this one is my sort of it is connected to the loop iterator. So, how do you really translate then ok. So, I you are saying that I change the target to something else what do I do in the next iteration I cannot use a 0 anymore right. So, my translation breaks there is a very simple solution what can I do can I change this to minus 4 this offset it will have the same effect right I increment it first and then I put it put a minus 4 here instead of 0 right. So, that will allow me to execute these four instructions in the first cycle is it clear what about the second cycle what can I execute. So, if you want me to show you the pipeline. So, this is the first load second load add i u and add i u ok. So, if I have a 5 stage pipe. So, what will what I will do in the second cycle. So, my second cycle fetch will be here what are the instructions that are eligible that will be ready I know that that will be ready what about this one will it be ready there will be the delay slot of the loads they cannot issue this one of course, not because this one this one depends on this one this one no this one also depends on this one this one no this one depends on this one. So, I have to put a stall cycle there is no other option actually ok. So, after one cycle stall what can I do. So, now I can probably send this one why because this one will pick up the value from the bypass right ok. So, this is add u what else can I send this one no because this one actually stores the value that I am computing in this cycle and everything else. Oh wait no no no ok sorry let me back track a little bit no why do not I send this instruction in the in the in this cycle this one is just comparing dollar a 1 against 100 and dollar a 1 is being computed here. So, as such if I fetch it here execute it here it will actually pick up from the bypass what is the problem there what is it called right after. So, there are two hazards that are being violated here one is right after read another one is right after write right same problem. So, dollar v 0 is written here and read here and I am actually if I if I boost it this instruction before it I would be violating both the both the dependencies right exactly yes this one also has a source dollar v 0. But this is somewhat unfortunate that is very important that is that is the important observation that you should make because as such these two instructions have nothing to do with this one absolutely nothing this is a completely independent operation it is just comparing dollar a 1 against 100 it does not use any results from these two instructions. So, how what is what could be a solution sorry use some other register except v 0 right. So, use some other register except anything used here so that you would not violate any dependence right. So, these are often called name dependencies these are not true dependencies they are actually false dependencies they arise only because compiler happen to choose this register for allocating the result of this operation all right. So, we will come up with systematic solutions to get rid of name dependencies the point is that name dependencies are false dependencies and they should never hamper your ILP all right. So, let us for now assume that this is actually some other register all right. So, I should be allowed to issue it here right without any problem. So, this is SLTI all right I do not think I can issue anything else in this cycle. So, next cycle can right and can send the branch also. So, add you and this one will pick up the values from the bypass b and z this one will pick up the value from the bypass. So, what are you left with actually I could all right we are left with only this one right everything else is done I believe 1 2 3 4 5 6 7 8 yes. So, only left with the store which will now go into the branch delay slot all right is that clear. So, that is pretty much the best we can do even if I give you the freedom of executing unlimited number of instructions every cycle. So, question is how can we go beyond this can we or is it the theoretical limit what do you think given these particular loop which executes 100 times we have looked at just one iteration can we improve this any further. So, any question on this anybody has any question on this particular schedule ok yeah somebody was saying something and why why can you do that exactly right very good. So, loop iterations are independent in this case two different iterations compute on different data. So, we should be able to execute instructions from different iterations in parallel the only question is how to really systematically do that because the biggest problem is the branches. So, although here you can clearly see that this branch will be taken 99 times or 100 times and the last one will be not taken the question is how does the hardware figure out at run time that this is going to be the case the hardware hits this particular branch and has to figure out should I go down or should I actually go back and execute this one sorry right exactly. So, but the predictor will actually help you, but it is not the it is not really an oracle it will not give you the correct answer all the time. So, the point is that branches introduce a problem that one has to recognize it reduces the stretch of independent instructions what it introduces is control dependence there is no data dependence as such, but all it is doing is it is putting bunch of instructions that that become dependent on a certain branch solution once somebody has already proposed that have branch predictors, but once in a while they will make mistakes. So, you have to do corrective measures the second solution is unroll the loop. So, what if I change the loop to something like this you go from 0 to 99 at a step of 2 and within an iteration you actually do two things. So, instead of having 100 branch instructions you now have 50 branch instructions right. So, that definitely helps is there a downside of doing this. So, does everybody see that it will now help because now I will have four load instructions which are completely independent I have a lot of freedom in selecting my instructions is there a downside of this. So, I will let you see this one. So, imagine that this is just doubled everything will have now every instructions will get replicated every instruction will get replicated size of size of code yes that is one problem your size of code will double if you have if you. So, by the way. So, this is called a loop with an unroll factor of 2 unroll twice if you move in a step of 4 and have 4 such things the unroll factor will be 4 and so on and so forth. You can have an arbitrary unroll factor if the unroll factor does not divide your loop trip count you will have to have some code at the end of the loop to fix up that you know to execute the residual iterations yes. So, as she has suggested this will definitely inflate your code size. So, that is a big problem. So, when you actually duplicate these instructions to be able to use them to be able to execute them in parallel they must have different targets they cannot use the same target clearly. So, that puts more pressure on your register. So, often you will see that when you start unrolling more your false dependencies will start showing up more and more because the compiler is actually running out of registers. So, it is trying to reuse same register over and over. So, that is one problem and in fact beyond a certain unroll factor the compiler will have no register to use. So, essentially what will do is it will take register values and store them to memory and use that register for doing some computation and when that value is needed it will load it back from memory to some register. So, that increases your code size even further and what is worse it will actually increase your number of memory operations. You will have now extra memory operations which are not because of the code it is because of shortage in registers. But in general if you can choose your unroll factor wisely it gives you it exposes more ILP especially in this type of loops where iterations are independent. So, essentially so this is just one technique. So, loop unrolling that is also called a static technique which is compiler driven because of the compiler will actually do this transformation. However, of course not all loops are like this they are not so well behaved where iterations are independent. So, for those you require more sophisticated solutions and as I told you last time if I get time I will get into this otherwise not. So, I will just name a few things. So, this is this stands for very long instruction word computers. So, this is again a compiler technique where compiler analyzes your code and prepares long instruction packets. These are essentially combination of multiple instructions. So, in a VLIW machine one instruction would be for example a combination of four instructions that will go in parallel. So, that is how the compiler will prepare instruction packets and that is where the name comes from these are very long instruction word computers. Epic is very similar it stands for explicitly parallel instruction computers. So, here you do the pretty much the same thing the compiler prepares this explicit parallel instruction packets. The only thing only difference between this and this is that in VLIW the packet length is usually constant that is at the design time you decide that one instruction packet will have four instructions one long instruction packet will have four instructions. Which essentially means that if in a particular slot compiler cannot find four parallel instructions it will put no ops in the empty slots to make sure that every slot is four instructions long. Like for example here four long VLIW would be happy with this packet, but will have three no ops in this packet will have only SLTI and three no ops. Similarly, in this packet it will have two instructions and two no ops and so on and so forth. Epic allows you to terminate instruction packets early. So, epic will have a variable size instruction packet. Software pipelining and predication these are again two techniques to expose more ILP. Predication is very much related to branches this is not prediction this is slightly different this is predication essentially the idea is that you convert control dependence to data dependence. The idea is that you evaluate this expression put it into a predicate register and you tag all these instructions here with predicate register value true and tag all these with predicate register value false. Now essentially what I was saying is that there is no control dependence. All these instructions depend on this predicate register. Similarly, all these instructions depend on this predicate register. These instructions will finally survive if the predicate register value turns out to be true and these instructions will survive if the predicate register value turns out to be false. We are converting our control dependence to a data dependence. So, it will essentially look like exactly same predicate registers to be bypassed to your, you know, required instructions and so on and so forth. The only thing is that the extra thing that will require is you will execute all these instructions, some of them will get cancelled later when the predicate register value becomes available. So, again as I told you, if I have time I will get into this, but this really belongs to a compiler course. So, this is not really priority here. So, what are the dynamic techniques? So, that is these are essentially hardware techniques. So, as I have already hinted, one important thing is registered renaming, where you try to remove name dependencies that we have already seen. Branch prediction, we have talked about it at length. So, I will not get into this anymore. Out of order issue, we will talk about that. So, I have already shown you some flavor of this here. What is happening is that I am reordering instructions. Multiple issue, that is also shown there. I am issuing multiple instructions every cycle and some more advanced techniques for exposing more ILP. So, this is going to be our agent essentially. This is what we will focus on. And you might ask, well, will I ever get an effect of loop unrolling in this particular loop? The answer is yes. Can somebody see how is that possible with dynamic techniques? That the amount of ILP that is exposed here will also get exposed in dynamic techniques. How is that? But the code that the machine will see is exactly this. What will happen exactly? So, I keep on fetching. When I hit this branch, I ask my predictor. So, in this case, the predictor will be fairly accurate. You will say that, go back. So, you continue fetching. So, now essentially what will happen is that at any point in time, your processor, the instruction pool is going to have a large number of iterations already fetched. So, what matters now is that the hardware has to pick up the independent instructions. So, automatically it will pick up these two loads from one iteration and some other two loads from the other iteration and execute them in parallel. So, it is going to have the same effect, but there is one condition here. That is, the predictor has to be accurate. The predictor tells you something wrong. Your instruction pool will get populated with useless instructions. After that, whatever you do, it does not matter. You are doing essentially wrong things. So, eventually your branch will resolve and you will get to know that your predictor said something wrong. You have to cancel everything, flush everything from an instruction pool and start over again. So, you are going to get the same effect pretty much depending on how good your predictor is. The compiler removes that particular condition by analyzing the code and telling you that you know this code can be unrolled and you can get good ILP from that. So, to summarize what limits ILP, just whatever we have discussed here, one is data dependence or true dependence. That was the reason why we could not execute more than one in this particular cycle, because whatever we had here were dependent on something that were executing. So, this is often the primary concern, because these are the dependencies that you have to obey, because these are needed for correctness. You cannot really violate any of these dependencies. So, this is essentially this concerns flow of data, dependence between a producer instruction and a consumer instruction and it may introduce read after write hazard and stalls. So, I say that it may, because the reason is that dependence is a property of the program. Stall is caused by pipeline organization. How many cycle you will stall will depend on, for example, how deep my pipe is. Certain dependencies may not cause any stall, simply because the distance between them is large enough, so that I can actually issue the consumer without any stall at all. So, I mean you can see that here. So, I could separate this particular instruction from this one, so that there was no stall. So, that just like that. So, is this point clear to everybody? The dependencies may not always cause a stall. That is very important to understand actually. And flow can happen through register or memory. So, in this particular example, we are only seeing flow through registers. For example, the value flows from this instruction to this instruction through dollar v 0. Value flows from this instruction to this instruction through dollar v 1 and so on and so forth. But you can have data flow through memory also. How is that? Can somebody give an example? Exactly. So, you store something to memory and later read that value from memory through a load instruction. That establishes a memory dependence, a dependence through memory. So, the question is how to discover memory dependence? So, let me show you why this is not as easy as register dependence. So, how do you discover register dependence? Exactly. So, you just compare the registers, the source of one instruction and target of another instruction. If they match, we know that there is a dependence. Here, this is not so easy. So, let us take a look at two instructions that are in a dependence. So, let us say, as he has mentioned, we have a store which let us say, stores register R 2 to some address 100 R 3. And then there are a bunch of instructions and I have a load which loads from this address into this. Turns out that these two are actually same address. It can happen. The problem is by looking at these two instructions, there is no way to know that actually they are in a dependence. Is that clear to everybody? It is just that the values of R 3 and R 6 are such that these two addresses turn out to be same. And there is nothing that stops the compiler from generating this code. In fact, compiler does generate this code many, many times. Given the availability of registers, when it is generating this particular load instruction, it may not use R 3 again. Because R 3 may have been allocated to some other variable by then. So, the question is, how do you really discover them? That two instructions are in dependence because I may make a mistake. I may think that, well, I am probably, it is probably safe to send these two instructions together or even worse, I can execute these instructions before this story then, which may end up this load getting a wrong value. So, how do I discover this? What is the simplest solution? Reorder, what is that? Reorder. What is that? No, I am not, no, no, let us not get there. You may be knowing what the reorder buffer is, but I am not really asking for a solution. What I am asking is, give me a naive way of figuring out that these are dependent. What you are saying is that, well, I can actually go ahead and execute the load before the store, but eventually I will be able to figure out. That is what you are saying. Well, so, we will get to that solution. What I am saying? How does normally, how do you really, even your case, suppose let us take that case. You say that the load, I have executed this load already. When the store executes, I figure out that there is a problem. How do I figure this out? The final value. So, the point is that, before these instructions can go and access memory, the address will have to be computed. So, when that is done, I am ready to figure out whether it is in a memory dependence or not. The point is that I have to wait till then, whereas in case of register dependence, even the decode time, I know which are the instructions that are dependent on each other, but here I have to wait until the address gets computed. Then only I will get to know whether this instruction depends on this one. So, that makes things a lot complicated, because which means my instruction pool, which is having instructions, which are already decoded. So, my job was just to go through this pool and pick up independent instructions for execution. It seems that that is not sufficient anymore, because of this memory dependence, because from the decoded instructions, I cannot figure it out anymore. I have to actually partially execute these instructions to compute the addresses and then only I will get to know. So, that poses a lot of problem that clearly hinders your ILP, because essentially it says that if you do not have any other machinery, you have to be conservative. Whenever you find a load, you say that, well, I cannot really execute it until all stores before it have been executed. So, that is correct. That will guarantee correctness, but that will make sure that you will not get as much ILP as you would have got if you could actually execute this load earlier. So, we will come up with more sophisticated solution, which can actually get rid of this problem, but is this problem clear to everybody, this particular one, memory dependence problem. Any question on this? Is it clear? So, how do you really go around data or true dependence? Well, you try to schedule as many independent instructions as possible and that is what we have been trying to do there and we will see dynamic techniques to do that. The second type of limiter is name dependence. These are also called false dependence, as we said, because there is no data flow between involved pair of instructions. So, there are two types of false dependence. One is called anti dependence, which may cause right after read hazard. So, this is just opposite of flow. That is why it is called anti dependence. So, we can look at these two actually. So, these two instructions are in an anti dependence. So, this one is reading from dollar v 0, it is writing to dollar v 0. There is no as such data dependence between them. It is a totally false dependence, happened only because compiler ended up choosing dollar v 0 as the target of this instruction. And these are anti dependence, because there is a read before a write. Just an opposite of the flow dependence. Does anybody see any other anti dependence in this code? There are more pair of instructions in anti dependence. Now, order 4th and 6th, 4th and 6th. Anything else? First and last? So, dollar a 0 written here, read here. So, the other type of false dependence is called output dependence, which may cause right after write hazard. So, these are essentially dependences when you are trying to write to the same register. For example, these two instructions are in output dependence. And this again happens unfortunately, because the compiler chose v 0 as the target of this instruction. So, dynamic techniques are there to get rid of these as he has suggested. Just rename them, use some other register. So, there has to be systematic way of renaming. So, that because you can see that if I change this one, this register to something else, my iteration flow will break immediately, because next iteration will have a problem then. So, you have to be very systematic in renaming these registers. So, we will talk about algorithms to do that. So, solution for both is renaming. And again, you have to answer this question that is you may have false dependence in involving memory operations also. How is that possible? Can I have an anti dependence involving two memory instructions? I can have right. I can switch these two pairs and there will be an anti dependence. Or I can introduce another store and if it turns out that these two address the same, these two instructions are in an output dependence. So, you have to also do memory renaming to get rid of these dependent instructions. So, we will answer all these questions. Any question? And the third thing that limits ILP is control dependence, which we have already discussed in great detail. Simply because data flow alone is not sufficient for program correctness. Control flow must also be preserved, because that decides along which path you execute. And the solution to that is branch prediction, which we have discussed. And what if you go wrong? We will discuss that. We have not discussed that in much detail. We will have to have systematic ways of recovering from mis predictions. Because now, what may happen is that in this case, although in this particular case it may not show up, but when you have two iterations clubbed together, what may happen is that a large number of instructions might have been issued on the wrong path, when the branch finally results. And this will especially happen if you have a very long pipe. So, there has to be a systematic way of removing these wrong instructions from the pipeline. And fixing certain other things, which are related, for example, you will now start renaming registers and all. That will have to be also fixed on the wrong path, which whatever you have renamed. So, we will see all these problems very soon. So, this one is not easy. That is the whole point. It is not as easy as we have seen in a single instruction fetch executed linear pipe. It is much more involved. And it gets complicated because of various types of renaming that we will do. So, any question? So, essentially our goal would be to. So, ultimately, the ultimate limiter is essentially this one. Assuming that you have an oracle to decide your control dependence. This is false. You have unlimited number of registers to get rid of all name dependence. So, then the ultimate limiter is going to be this one only, which will actually limit your IMP. Everything else will go away. And that is what the goal of a processor is. That you know, a designer would like to live only with this and get rid of everything else. So, the goal is to design a good branch predictor. And the goal is to design good mechanisms to do renaming. So, we will focus on these two things. Branch prediction is already talked about. So, I would not go to get into that. Questions? Sure. So, that is the limiter. And the source is flow dependence. No, that is what I am saying. So, if you remove all these. Suppose, you have an oracle predictor here, which is 100 percent accurate. And you have enough registers to have no false dependence. Then what are you left with? Flow dependence. And all stalls in your pipeline will come from only this. Nothing else. So, that is what you are saying probably. So, that is what I am also saying. That this is the final limiter, which you have to live with. Because this actually guarantees correctness of your program. So, but we can even remove here, try with stall. How? We have a pipeline. By making the pipeline shallower, you are saying. Yes. Yes, of course. Sure, you can do that. Yes, yes. But there are other trade-offs involved there. As you make the pipe shallower, you may lose frequency. So, that will have to be looked into more carefully. But what I am saying is that ultimately, this is what will limit your IMP. So, any question on this? So, of course, we will also talk about techniques. So, this is often called the data flow limit of a processor. So, we will talk about techniques, which actually try to go beyond that limit. Can anybody guess what it might be? So, I tell you that I have an oracle branch predictor, 100 percent accurate. I have enough registers, so that I have no false dependence. So, what else can I do to go beyond the data flow limit? So, data flow limit is essentially this. The reason why I could not issue this instruction in this cycle, this cycle, that was because of a data flow limit, because the value would not be available on time, which is why I have to delay this instruction by one cycle. Suppose I issue this instruction here, I fetch it here. So, by the time it reaches here, the value is not yet ready. What can I do? Come on, you should be able to answer this question now. What makes sense? Other than stalling, of course. Predict the value, exactly. Why cannot I predict the value that it is going to produce from memory? So, we will talk about that a little bit though, because it is a much harder problem than branch prediction. Yes, yes, exactly, yes. So, accuracy is going to be low, but turns out that there are many programs which load constant values. So, they are predictable. Some values are predictable, some values are not. And turns out that one of the most loaded constant is 0. So, anyway, so that comes from program analysis. We will talk about things that how does this such a predictor look like. And what you can do when you make a mis-prediction? You have to have a fixing mechanism there also, just like a branch predictor. So, that will allow you to actually go beyond this limit, this data flow limit. You can start predicting values correctly and you can even execute instruction dependence before the producer is done. So, that is possible. So, just to kick start the process with a simple example. So, here there is a code snippet which shows that data hazards may introduce unnecessary stalls. So, let us try to understand that first. So, here I am talking about our traditional pipeline which can fetch one instruction every cycle and it will take the instruction to the pipeline. So, whatever we have seen so far, not like this that I can issue multiple things together. So, what is it doing here? So, the first instruction is a division, a double position division operation which operates on F 2 and F 4 and produces value in F 0. Second one uses F 0, produces value in F 10 and this operation is using F 8 and F 14 and produces value in F 12 which is independent of both of these. So, now, whatever we have seen, what will happen? This instruction will have to wait until the value of, until the division operation completes and produces F 0. And since we have a constraint that we have to go in order one after another, this instruction will have to wait. It cannot be fetched actually or even if it is fetched, it has to wait somewhere in the pipeline simply because this instruction cannot execute at this point. So, this in order issue an execution and precise exception as you have talked about actually disallows this instruction to overtake this one because if it overtakes this one, there are many dangers. One thing is that this instruction may later raise an exception, some type of arithmetic exception by when this instruction is already completed. Then you have to find out, we have already talked about solutions how to fix those things. So, I want out of order execution and still maintain precise exception. So, what this means is that, so this is a very generic pipeline that you will find today. The front end of the pipeline, there is a fetch, decode and possibly issue are in order. So, they will fetch instructions in order, decode them in order and issue them in order. By issuing I mean, they will put the instructions in some queue and then this particular middle part of the pipeline is going to be out of order completely. So, they will pick up instructions from this issue queues which are ready as and when they become ready, they will be sent to register file to read the values, read the register values, pick up the values from bypass, execute, look up memory and they will put the result somewhere. They will remember the result somewhere, let us for now assume that they will remember the result in the issue queues itself. The issue queues slot is maintained for this instruction and the value will come back here. It will not yet write to the register file. So, finally what will happen is that this issue queue will be drained in a FIFO manner one after another, the values will be transferred to the register file and that is when they execute this particular phase. So, what I am doing is that I am now decoupling the pipeline of an instruction into three phases. This is one phase which is completely in order. This phase is totally out of order. We do not maintain any order at all here and this phase happens sometime much later when the instructions turn comes to complete. So, this makes sure that you have no problem with precise exception at least. You can catch exceptions in the order they go and you can cancel instructions which are after the accepting instruction, but you still have to deal with these two types of hazards. But does everybody see that or have I already solved them in some way. So, I said that every instruction the model is very simple. I fetch an instruction, I decode the instruction and I allocate an issue queue entry for that instruction and this entry is maintained until the instruction is completely done. By done I mean has written the register value back to the register file. And from this issue queue I pick ready instructions as and when it becomes ready that may not be in the P4 order. They look up the register file, they execute, they look up memory and the value is stored back to the instructions issue queue slot. And finally when an instruction is done and is at the head of the issue queue, it will move the value from the issue queue slot to the register file entry and the issue queue entry will be freed at that point. So, I tell you that I have not only solved the precise exception problem, I have actually solved these two also. Why is that? I am somewhere doing a renaming actually. What is the name space of these instructions now? The name space of the target of these instructions. In the decode phase the name space was the register name space of course, whatever the register IDs are. Where do I store the value after the instruction has done executing issue queue entry? So, that becomes a new name space actually. So, two instructions with the same target register will now get two different issue queue slots and they will store the values actually in those two slots. So, these two slots now serve as two different names for the same register. How do I guarantee true dependence in this particular model? But the dependent instructions should get the correct values also. How do I make sure that happens? Who goes in order? Execution is not in order. Yeah, but let us go back to this example. So, here this instruction. So, let us give them slots, issue queue slots. Suppose these are the slot IDs. Actually the slot IDs will be in this order. I am sorry, yes. Because the issue queue is assigned in order. So, let us see what are the slot IDs? So, 0, 1, 2, 3, 4, 6, 7. So, these are my issue queue slots taken by these instructions. So, these load instruction will complete and put the value in slot 0. This instruction will complete and put the value in slot 2. The question is this instruction. So, the slot 5 instruction must get the value from slot 2. How do I guarantee that this happens? So, that means when instruction 5 executes or issues, just to look up the entire queue, why should it be at the end of the queue? I am really looking for dollar A1, the latest dollar A1. That is what I am looking for actually. Which may not be at the end? So, I have to search through until I get the latest dollar A1. So, that is how I maintain true dependence in this particular model and the other two dependence are already gone because I have renamed them already. So, for example, here you can look at, we had a problem with these two instructions. We had a anti dependence here. So, slot 4 here will pick up the value from slot 3 and this instruction here has already renamed dollar V0 to slot 5. So, that has decoupled dollar V0 from this dollar V0. This dollar V0 really corresponds to slot 5 now. It is not dollar V0 anymore. Similarly, if you look at this one here, this is now really slot 7. It has nothing to do with this dollar V0. So, finally, of course, your queue will be drained in order. So, this instruction will first go and write to dollar V0. This one will write to dollar V1. This one will write to dollar A1 and so on and so forth. And when writing back, in fact, even you can even nullify some of the writes. For example, you can say that well by the time if dollar V0 is already consumed and overwritten, I may not even write it back. It is enough to write back this one because this is a final dollar V0 that survives. But anyway, so those are some optimizations which are not really important. But is the model clear to everybody? How I resolve name dependence? And this one will also resolve memory dependence, same way. Because two memory operations here will get two different issue slots. Even if they have two different addresses, it does not really matter. Finally, they will write to the memory in order, but they can execute out of order without any problem. So, we will continue from here next time. We will try to concretize this model more.