 is to have CPI less than 1. And the primary way to achieve that is to reorder instructions to reduce the number of interruptions. So, we have discussed the technique of dynamic So, here is a summary of the major elements of dynamically scheduled pipeline. You need a good branch predictor to make sure that the later part of the pipeline works on useful instructions and minimize the wasted work. Otherwise, you do not have a good predictor, you will be fetching wrong instructions most of the time and you will be doing wasteful work, which will have to be thrown away later. Second element is multiple issue. You must fetch, decode, select and issue multiple instructions in this cycle and that is what is called a super scalar processor. Third element is wake up logic. So, this logic is responsible for waking up instructions that are waiting for values. So, it preserves data flow and the implementation of the wake up logic is essentially a bank of comparatives. Next one is selection logic, which essentially selects a subset of instructions that can execute that. And there could be two possibilities of selection. You could select in order or you could select out of order instructions. So, we will look at both of these very soon. In fact, we have already looked at out of order selection. The only constraint, the extra constraint for out of order selection is giving a priority mechanism or often called a time breaker. For example, if you have decided that you design your hardware such that in a cycle maximum you can issue 10 instructions. Now, it could be possible that you have 20 instructions ready. Now, out of order selection of course, does not impose any constraint on which instructions you can pick. You can pick any 10 of these. So, there has to be a time breaker policy. Whereas, in order already gives you a restriction that you have to go in order. You cannot violate the order, which means many if you have 20 different instructions at different places which are ready, you may be only able to select out of order. So, we will soon look at this one actually. It is a reconstruction of out of order selection, much similar work performance. So, essentially the idea here is that the selection logic obeys the issue constraints such as available register file reports, memory reports and functional links. So, the example just I mentioned about this 10 instructions maximum that comes from these constants. Because, 10 instructions will possibly mean that your register pipeline has 20 reports. Memory reports may be something depending on how many of these 10 instructions can be memory operations and so on and so forth. And finally, you have decoupled execution and commit. So, this is very different from what we have discussed for single issue in order pipelines. That is, you execute an instruction and then after some time, the result you bring back to the register file or memory. So, essentially this particular logic preserves in order write back and store computation. So, these are the major elements of dynamic scheduling that we have discussed, which tries to achieve this goal that is CPI as low as possible. So, at each stage, if you observe carefully, you are introducing certain constraints for implementation purpose. For example, here you are saying that I do not have an oracle predictor. I have a realistic predictor, which will make mistakes certain times. That will actually increase the CPI by some amount. Here, you are saying that I cannot really fetch decodance and select an issue and infinite unwanted number of instructions. I have some limit there. That will further reduce increase your CPI. Here, you are saying that I can wake up arbitrary number of instructions, but I cannot really select all of them for each one. So, there are also imposing certain limits. And here, of course, you are saying that I cannot really commit an unwanted number of instructions. They are also imposing some limit. So, all these taking together along with your data flows will give you some CPI, which is hopefully better than whatever we have discussed and whatever we are doing in your book. Any question of this? This basic scheme of dynamic scheduling. So, this actually, this description abstracts away all the implementation details. How exactly you go and implement all these things? These are also discussed. So, just to recap a little bit. So, when you are issuing multiple instructions in a cycle, ideally the CPI should go down by a factor equal to minimum of issue and commitment. So, how many instructions I can execute in a cycle? And how many instructions I can commit in a cycle? Minimum of these should decide by how much my CPI should go down compared to a single issue in order by that. So, assuming full renaming, that is, suppose you have no name dependence, no false dependence. You have only true data flow dependence and a control flow. So, data flow and control flow will only limit the CPI. Also, there could be structural hazards that is insufficient functionalities. And there are two possible selection algorithms, in order selection and out of order selection. This one we have already discussed. So, let us take a look at this one. What really is in order selection? So, this is the simplest possible design of our super scalar processor. By the way, the term super scalar really means that you are executing multiple instructions in a cycle. So, in this case, what you do is you issue the instructions sequentially. So, you scan the issue queue, stop as soon as you come to an instruction which cannot be executed. Which could be possible due to pending operands or insufficient functionalities. So, here is an example. Let us see what is happening here. So, these two are independent instructions. And let us assume that their operands are now ready, are 5, are 6, and are 3 are ready already. This one depends on add. So, it is clear that this one cannot execute together with this one. That is pretty obvious. And these two are also independent. So, an out of order scheduler would actually pick these four instructions except the red one. And would actually execute them in a cycle. And in order scheduler would actually issue only these two in this cycle. Because it cannot violate the order. In the next cycle, it will probably pick up these three and issue them. So, you cannot issue the last two even though they are independent of the first. So, that is what you do is what you do in order issue. And what you gain by doing this simple design in four terms. So, from this list, can you tell me what gets simplified? Wake up logic gets simplified. Why? You still have to wake up the waiting instructions. So, selection logic is not important. Well, the selection logic is the logic which picks instructions and stops at the first reading. So, it is a simpler logic. What else? Do I need this one? In order omit or anything? Yes. In selection logic, we are already choosing like this. So, then how? They do not. Can I modify the registers immediately? So, here we are actually putting the results back to the queue at which and later moving it. Can I do it here? Can I move R4 and R7 directly to registers immediately? Is that possible? Is that possible? Is that possible? Whether it goes to a supply or two stock? Yes, you are very good. I can? Anybody else? So, in two different seems that in order issue should read me off this particular logic in order to commit. Is that right? Sir, the pipeline stages are not the each instruction is not the same. So, let us assume that there is a single pipeline. Every instruction goes to that pipeline. Yes, you are right. Otherwise, there will be problems. Yes. Because essentially, we are talking about for completion problem. Yes. No, let us not get into that. So, different execution units. Right. So, essentially, you are saying that in that case, whatever we have seen is a multi-cycle execution. Some instructions they complete early and they write. No, let us not assume that. We have a single simple pipe stage pipe or whatever. Every instruction goes to the same pipe. The right after right is like there are two statements writing in the same register. Okay. And if the order of then gets changed. Okay. The order of. Okay. So, essentially, you are saying that in the register find right port, I should have some mechanism to maintain the order in which they write. Okay. Is that understandable? For example, these two instructions could be writing both to R4. Okay. I must preserve this order when they write to register find. Okay. Which makes sense. Anything else? Any other issue? Yes. We have branch. Yes. But remember that instructions execute in order. Which means if there is a branch before it, it has already executed. Okay. And I know the fate of it by now. You cannot put anything into the branch. Okay. Now you can say that I probably cannot execute anything together with a branch. That's what we want you to say. Yes. True. Yes. Because anything that goes in parallel with a branch may have a possibility of producing a raw value and productive the register find. Yes. That is true. Yes. Any other issues? The store and door also needs to be independent of them. Time being developed. Sure. Yes. So we follow the same protocol. Yes. Here the things are much simpler because they will go in order. Okay. You know that the execution is in order. If there is a store and the register is a load, you can't figure out that this load is independent of the property. Right. So you are saying that if I have a store and a load, I cannot execute them in parallel in a cycle. That's what it is. Yes. Yes. So we still have certain issue constants as they're coming out here. We also have to maintain register light hold order to detect right after right hazards. Any problem with right after read hazards? Can that create an issue? Right after read dependencies? Okay. All right. So the point here is that it's indeed simple. Okay. It pretty much leads you of the in order extra five stage for in order commit. Provided you maintain certain points that you guys have raised. Okay. And here is an example of order issue. We have already discussed these also. So colors are hopefully clear. Right. We have three colors here. So depends on your test and how you see it. I'll call it a pink. This particular color. Okay. So the pink instructions can go in the first cycle because they're all independent. And you can figure out why the rest cannot go. This one depends on the load. This one depends on the ad. This is independent. This one depends on the XOR. This is also independent. This one depends on the ad and this one depends on the shift. This one is independent. All right. Okay. So these three instructions are independent. They can go in the first cycle. All right. So that's the basic idea of order issue. I don't care about the order. Okay. We have discussed it already. The blue ones can go in the second cycle because the black ones still cannot go because I'm saying that it is a cache list which is essentially meaning that it's going to take a long time to complete. So these three instructions are arithmetic instructions, actually logic instructions. They're complete in a cycle program. Okay. So their dependence can issue in the next cycle. Like this one can issue in the next cycle. This one can issue in the next cycle. Okay. So that's the second issue cycle. And finally, when the load completes, let's see. Okay. In the third cycle, this can go without any problem because both its operands will get ready from these two. Okay. All right. It needs 20 and 19. They come from here and here. And then finally, when the load completes, this will go. And in the next cycle, this will go. So that's the order model issue. And then my commit logic will guarantee that they update registers in the order specified in this program, in this particular signature order. That's the essence of order model issue. Okay. Okay. And we also discussed the presence of this right after read problem, which you have to worry about. As you can see here, right? So we said that the shift can issue in the second cycle. But you have to be sure that it doesn't overwrite R19, which has been consumed by the end instruction much later. Okay. And in order to commit, it solves the problem because we'll make sure that 19 will not get rid of you until all these instructions get a chance to write. Right? So in order to commit, we'll make sure that first R4 is written, then R5, then R10, then R26, then R20, then R27, then R19. Okay. So there's no end dependence problem here. If you commit in order. Okay. But the downside is that you need space to buffer results. Okay. You have to make sure that R19 is not made visible to the other instructions. Okay. So often, people call dynamics scheduling also a form of speculative execution. Because the point here is that you are combining. So you can see it as, if you go back to this list. So often this part, except the branch predictor, can be used independent of what's happening in the fetcher. Okay. Whether you have a predictor or not, you can definitely use this part of the logic. Okay. So essentially you can say that I'll keep on fetching in order. Okay. No branch predictor. On a branch, whenever I observe a branch, I'll stop. Okay. I can still do this section. Even without a branch predictor. Nothing stops me from using all this logic. Okay. So that's the pure dynamic scheduling. There is no speculation. As soon as you introduce a branch predictor, you're bringing in some speculation in the pipeline. You're now saying that there are certain instructions which are speculatively fetched, which may be actually on the long path. Okay. So eventually the branch predictor's prediction will be tested. And I'll get to know whether the instructions are correct or not. Okay. So these two taken together is often called speculative execution. So there's just a different name of what we have already discussed. So resolve control dependencies by predicting. Continue execution past predicted branches. That is, you not only fetch from predicted path, but also execute them. Okay. So they've already discussed this also. Buffer results in some structure is called a reorder buffer or an active list depending on the processor architecture. So active list is the term used by MIPS architectures. Otherwise, in most other processors, you'll see the term reorder buffer. Okay. This is actually a separate structure which maintains the results of the instructions and order of the instructions. All right. Allocate it together with the issue keyword tree. So in the issue keyword tree, we've put everything that the one that we saw. Here I'm just decoupling a few fiddles from the issue keyword tree, putting in the same structure or the order over, all right? So essentially decouples the value field from the issue key. That's all right. Typically issue key is distributed among different types of functional 도s. These are known as blizzard mission stations. That's what we have discussed also. So, you can say some of the comparisons because all to all comparison is actually not needed for all cases. And finally, you write or commit results register file or memory only when instruction comes to the top of the ROB. So, ROB is an acronym used for view order buffer. So, view order buffer is actually a FIFO queue. It maintains the instructions and it has a value field which stores the result produced by the instruction and it will be drained in order. So, that is essentially that is what we have discussed. It is just that I am just introducing a new structure called ROB to decouple the value field from the issue queue and distributing the issue queue across functional units. So, you can say that you have an issue queue for a point a you will have an issue queue the load store unit the cache will have an issue queue and memory system will have an issue queue. Any question is it there? So, now just one small element that is left in this architecture is something called register renaming. We have already seen this essentially what we were doing was that we are renaming registers into issue queue slots. So, whenever instruction produces a register it is called as this example. So, here essentially R 4 will be renamed to slot ID 0. R 5 will be renamed to slot ID 1. R 10 will be renamed to slot ID 2. So, whenever slot ID 0 is done it has a dependence on slot ID 0 it will be opened up. So, essentially what I am doing is I could actually now rewrite the instructions as follows. I could say load word ID 0 comma 0 R 6 at I ID 1 comma ID 0 comma 0 X 20 and so on and so forth. I can actually translate the instructions to a different in space. Similarly, I could say and ID 2 whenever I get a new instruction I get a new target. I assign it to a new slot ID 1 R 19 XR ID 3 R 2 R 7 sub ID 4 ID 3 etcetera. Now, this earlier this is this is reaming with N is it I am translating the registers from one name space developing the only thing to observe here is that it is a mix of two name spaces. Some of the names are coming from the registers architectural registers some of the names are coming from my queue slot ID. And there will be a table who should maintain this name for example whenever you do this you will say R 4 is now mapped to ID 0. So, that is how you do that. So, that is how we have done till now. And here since we have two namespaces we had a problem that is we need to figure out whenever instruction issues where from it should read the back. So, there are two possible places it can read the value from. So, for example, this instruction will read one value from the register 19 in the register part. And because of these two different sources of values we had a problem that we have to wake up cycle at the time of commuting. We make sure to let this instruction know that for example, suppose this instruction suppose this instruction issue got delayed for some reason for some reason whatever it may be. And by the time it issues this instruction has already written back. So, when it writes back it has to tell this instruction that now we should not take the value from ID 0, we should actually take the value from R 4. That has to be now changed back. So, this becomes a huge problem because of the primary reason for this problem is that we have two namespaces mixed up here. We want one namespace that is it. We want all values to come from single namespace. So, that is exactly what today's processors do and that is called register remaining. So, let us try to define a few terms. Is the problem clear? Any question before I move on? So, registers visible to the compiler are called logical or architectural registers. For example, 32 in number from MIPS, the one that you are using for your move on. There are 8 for x86 and this is fixed by the instruction set architect. That is how much registers should be made visible to the compiler or the program. And there are physical registers inside the processor. Usually today it is much larger in number and not visible to the compiler. However, the requirement is that your physical register file size should be at least as large as the logical register file size. So, that in the minimum case you should be able to have a 1 to 1 mapping from one logical register to one physical register. And that is what the processor you are using for your homework actually does. It has 32 logical registers. It has 32 physical registers. There is always a fixed 1 to 1 mapping from a logical register to physical register. But today's processors actually have much larger number of physical registers inside the processor, which are not visible to the compiler. And if you have that essentially then you have to establish the mapping. And the mapping actually changes over time. Which logical register maps to which physical register actually keeps on changing. And that is exactly the algorithm that defines register. The destination logical register of every instruction is assigned a new physical register now. Just like this, except assigning a Q slot ID, I will become a free physical register and give it to the instruction target. The dependence is a tract based on the namespace of physical registers for you. There is no Q slot ID namespace. So, MIPS R 10 K has 32 logical and 64 physical registers. Intel 20 of 4 has 8 logical and 128 physical registers. Today's Intel processors have many more. So, we will see exactly what it means to have more physical registers. Why you want that and so on. So, let us revisit our last example. Let us see how renaming works. So, let us assume that there are 64 physical registers. And this is the currently already renamed registers. So, R 6 currently points to physical register 54, R 19 points to P 38, R 2 points to P 0, R 7 is P 20, R 5 is P 3, R 6 is P 2. So, there will be a table which will maintain these maps. So, I will have a table for MIPS. I will have a table with 32 entries actually 31, because R 0 is actually fixed. It is hardware to 0. So, in this case for example, your let us see R 2. So, this row is for R 2. It will basically hold 0, which means that currently any instruction requiring R 2 should read physical registers 0 and etcetera. So, the first instruction comes. It has a target R 4. So, it should get a new register. Which register should it get? So, there is a free list of registers. There is a feedback to maintain, which register is free. I pick up the first register which is free. So, let us suppose that it is P 15. So, R 4 gets renamed to P 15 and then the instruction finds that R 6 is the source of the register. So, it looks up the map table finds that R 6 is currently mapped to P 54. So, it gets renamed to this. So, when the instruction will finally issue an execute, it will go and read P 54 to get the value and put it in P 15. So, the next instruction comes. It has R 4 as the source looks up the table finds that R 4 is renamed to P 15. So, it changes to P 15 and assigns a new register to R 5. And the process continues. Now, you can see that the registers which had a war problem now get two different names. That problem actually goes away. So, this instruction will read from P 38 because at this point R 19 was mapped to P 38. And here this instruction will get a new register for R 19. Here it cannot be P 38. Why is that? It is not over the handle. It is just that P 38 is not available at this point. It has already been allocated to a register. So, we will soon see how to recycle the registers. So, there will be a mechanism where one will actually recycle the registers. But currently the point is that whenever something gets renamed to a physical register, that physical register is occupied. The subsequent instruction cannot use that register until it is freed. And we will see when are the registers needs. So, it is a basic algorithm. It is very simple. Yes. So, you assign it to a new register. So, every target gets a new register. The new register is the one which is free out of many free registers. So, does this example tell you that you probably need at least one more register, one more physical register than the number of logical registers. If you had exactly same, could I make any progress actually? I would, right? Because at any point in time, all registers will probably be mapped to something. I get a new instruction, nothing is free. I cannot read the other thing. So, to make more progress, I need at least one extra physical register. So, this will formalize this notion that what exactly, how it relates to execution time and performance. Any question on that? All right? So, branch instructions do not have a target. So, they do not get a new register. They only rename their sources. So, R19 becomes P45, because P45 was assigned R19 here. And R20 is taken from here, which is P8. R20 was renamed to P8. But there is nothing, there is no new register assigned to branch. So, you just rename the sources. So, this one now makes sure that everything is a single namespace. And instruction issues, it knows where to read from the value. It reads the value from there and that's it. So, here is another example how renaming solves the problem of right after right hazard. So, here are two instructions that write to the same register R5. So, in this instruction R5 will get renamed to some register, say P50. And here it will get renamed to some other register, say P45, whatever is free. So, now they can actually execute concurrently. They will back to different registers. So, now it is safe to issue them in parallel, because they are actually independent. It is just the compiler gives a shortage of registers, architectural registers that you could use the right after right hazard. So, essentially what I am saying here is that, well actually you could ask the question that the compiler was forced to do this. Why not expose all the registers to the compiler? Why do you have this complicated hardware inside which would do that? If the compiler all the registers that you do, you do a good job of register automation. What do you think? Why is it done in this way? Is the question clear? Right. So, what I am saying is that, see what I am trying to say is why are we doing the name? Because we want to get rid of the name difference. That's the only part. And why do we have the name differences? Because the compiler was probably short of registers. That's why we have to introduce these name differences. Like here, you would very well ask, why the compiler choose R 19 here? It could have chosen a different register, right. And the reason is that probably it didn't have a register. If you get out that R 19 is available, it just picked up R 19 and now you get the value here. So, if you give it more registers, it should do a better job. Why is it done in this way? Why internal processors are stuck with each logical registers forever? Sorry, say it. You don't need to know, but you can expect that the compiler will definitely do a better job, give it more registers. There will still be name differences. It is not guaranteed that all name differences will go away, but probably it will be so low in number that it doesn't matter anymore. But it is never done, it will not be done ever. Why is that? Yes. Sir, the ISI would change because of adding new registers. Exactly. So, instructions and articulations have to change. Sorry. Magward compiler. Yes. So, the binaries that are compiled today will not work tomorrow as soon as you do this. Sorry. ISI is going to change. So, you have to recompile all applications, which is possible. Sorry. Second. No, no, no, here is a problem. You have bought a Microsoft office suit, which is compiled for yesterday's 686 and yesterday's. Microsoft is going to issue a new license to you, actually, to all the customers with a new binaries. It is a huge headache for the software industry. So, it is primarily a business angle. Why this is not done? Theoretically, there is no problem. We could do it. So, just to hide this problem, processors are doing it inside in the pipeline. Although I must mention, actually, AMD took an interesting step when they designed Octave. They actually introduced 16 logical registers. And we showed that we actually use them. So, of course, they came with a rider that if you want to make use of this, you have to recompile your binaries, but your old binaries will continue to work with lower probabilities. That's all. So, they are, of course, the native level. Okay. All right. So, registered renaming maintains a map table that records the logical register to physical register map. After an instruction is decoded, its logical register numbers are available. The renamer looks up the map table to find mapping for the logical source registers of this instruction, assigns a free physical register to the registration logical register, and records a new mapping. So, that's the renaming procedure. So, how does the pipeline look like? So, these are my pipeline stages. I'm not saying, really, that each stage is a cycle. So, fetch decode, then rename allocate. So, in the rename allocate stage, you rename your logical registers, source registers, assign a new physical register to the registration. You allocate ROB entry. You allocate issue queue entries. Then you do a selection issue whenever the instruction is ready, eventually it will be ready. Then you read the register file. So, when you read this, you read the physical registers of those. Okay. All right. So, you execute, look up memory immediately, and then retire, commit. We also come to this particular stage, because now the question is, what this stage is really doing? Because previously, we were actually copying values from ROB to the register file. Now, that need is gone, because I can execute. I can write the register immediately, because each instruction now gets a different register. So, there is no question of a write-up to register or write-up to read. I can write the register immediately. That's what that particular stage does. So, fetch decode, rename, allocate are in-order stages. Each handles multiple instructions every cycle, meaning that here you go sequentially fetch bunch of instructions, because all of them, rename all of them, allocate all of them. All right. Select issue, register file, read, execution, memory, look up are out of order. You can execute instructions as and when they become ready. You don't have to obey the order. An instruction directly updates the destination register file. A destination physical register contents on competition doesn't wait until commit. We'll soon argue that this is correct. Although intuitively it is correct, because every instruction gets a new physical register. So, there is no question of a conflict, and also there is no question of write-up to read hazard. So, out of order register update that happens actually now, and you'll have to argue why is it correct. So, this will come to that point. And retire is again in order. So, these are technical term often used for it. That is an instruction retire. So, it means that it was born when it was fetched, and finally it's done. We retire. So, retire is again in order, but multiple instructions may retire each cycle. And register write-back is not part of this. It happens as part of the execution stage. When instruction issues, it broadcasts its destination physical register ID to all instructions in the issue queue, and that's essentially the wake-up logic. So, every instruction will compare its source physical registers with the destination of the broadcast register. So, for example here, when this instruction issues, it will broadcast P 50 to all the instructions. And this guy will have a match, and we'll know that now the next cycle will probably be an issue. In this case, probably count issue in the next cycle will be a low interlock. Probably wait for a few more seconds. Similarly, when this instruction issues, it will broadcast P 62. And this instruction will match its source, and we'll actually wake up and know that next cycle it can go. So, that's all the wake-up. Any question? So, with register renaming, what does speculative execution now look like? Commit is much simpler because there is no need to transfer values from ROV to register file. In fact, no computed value is stored in ROV. Now, the ROV only stores a few state reads about the instruction and just maintains the order. Branches update predictors, branch target buffers, and return under stacks when they commit. Killed instructions just drain out. So, these are the instructions that are the wrong path, later discovered because of misprediction. So, they just get marked as killed, and ROV will just remove them from the processor. Also, stores write to memory at this time. Store queue or speculative store buffer holds a value. So, remember that we say that we distribute the issue queue into household reads. So, memory subsystem will get two issue queues. One is a load queue, other one is a store queue. So, the load instructions will go into the load queue, store instruction will go into the store queue. And the store queue will also have the value that it needs to store. So, when the store comes to the head of the ROV, the value will be moved from the store queue to memory. And finally, the ROV entries recycle. So, that it can now be used. And remember that the issue queue entries are recycled when the instruction issues. So, they are recycled much earlier than ROV entries. So, ROV entries held if you go back to the pipeline. So, ROV entry is allocated in the allocate pipe stage. And it remains held until the instruction moves to this stage. That's a pretty long time actually. Whereas an issue queue entry is allocated here in the allocate stage, and it gets freed as soon as it is used. Because there is no need for this issue queue entry anymore. Instruction is the only issue. So, issue queue entries actually remain held for a shorter period, whereas ROV entries are held for a very long period. So, what is the implication of that? Can you refer anything of the relative sizes of these 2 queues? ROV entries remain occupied for a longer period of time. And the issue queue entries remain occupied for a shorter period of time. So, which one should be bigger? ROV or the issue queue? ROV. ROV, that's pretty obvious. Branch misperception recovery needs to restore the register map. So, we have talked about this. This is a restaurant check point, the same thing particular. Whenever a branch is renamed, you take a check point of the map table. And if it is mispermitted, you just restore the map table. In addition to marking all the instructions were all by the scale. So, these are these instructions. Branches were going through renaming check point, the register map. Remember that these are not the values. These are only the map that is check pointed. So, here for example, when this branch instruction shows up, when it is renamed, it will only check point the map. Map meaning that it remembers that at this point of time, R19 was holding the map P45. R27 was holding P19 and so on and so forth. The value of P45 is not check pointed. Value of P45 can be whatever. So, we will soon argue why it is correct. Because it is not very obvious that we are not check pointing the value of P45. We are only check pointing the map that R19 corresponds to P45. So, could it be that some instruction actually updates P45 in a wrong way? The only observation here to make is why it is correct is that any instruction that comes, let us suppose that this branch was actually mispermitted. Any instruction that comes after the branch will not be allocated P45. It will be allocated to some other register. So, P45 can only be updated by this instruction. There cannot be anything else updating to P45. So, you do not need to check point the values. You only need to check point the values. So, again of course, we are continuously making implicit assumption that there is an algorithm which recycles registers in the same way. We will come to that soon. So, this one, the implication of this is that this one also talked about limits the number of outstanding branches. Because depending on how many check points you can echo more. If your processor can echo more than 100 check points, then your processor should be able to support 100 unresolved outstanding branches in point. Usually the number is much smaller than 100. Few tens of. Any question? Okay. So, the last point. How do you recycle the registers? Any solution? When should I? Okay. Let us go back to that. So, when should I free P15? When can another instruction use P15? When add-I has finished. Do I know that there is no instruction in future that we will be needing P15? There is no instruction writing to R4. No. Why? There is no instruction writing to R4. Sorry, say it. There is no instruction writing to R4. Do I have any such situation here? Okay. I do not have any common target here. Okay. So, you are saying that if there is an instruction which writes to R4. When? So, an instruction now has several, it is not really a point in it, right? It spans over a certain cycles. So, it is fetched. How do you do that? Instruction reaches. So, your instruction which writes to R4. Okay. Okay. So, essentially what he is saying is that let us suppose an instruction here writes to R4. All right. So, at that time R4 will be given a new map which is some register. Okay. Whatever that register is free. Whichever register is free at that point. And he is saying that when this instruction finally commits, I am guaranteed that P15 can be recycled. Which makes sense. Because from now on R4 has a new committed name. Okay. Right. Whatever was assigned here. So, R4 to P15 map now must expire. Okay. And I should be able to reuse P15 for some other register. Makes sense? Does everybody see that it is a conservative policy? I could have done better. Why? Because R4 has not used for a long while. Uh-huh. We can just expire this P15. We can use P15 for other registers. Right. So, do you see any difficulty over that? You are right. Yes. That is such an assumption. Why it is conservative? Do you see how to do that? Let me ask this question. I really cannot see the future. That is a problem. So, what do I need to determine? I need to determine if beyond a certain instruction. P15 is dead. There won't be any instruction meeting P15. That is all I need to determine. If I had a way to determine that, I could have done that. Okay. Right. Yeah. So, there are research papers which are looked at this problem. You can actually design predictors which can predict last touch to a certain register. That is actually. Sorry. Is it a standard information? Starting from point, you can follow different control paths. Right. And on different control paths, the register may become better different points. Yeah. You can remember the whole thing. On this path, the register becomes dead here. On this path, yeah. Of course. If there are enough storage, you can remember the whole thing actually. Yes. So, what instruction wake up the other instructions? Yes. So, we can type this G. What are instruction and getting wake up? Okay. All right. So, whenever all the instructions are being complicated for wake up instruction, then it is safer to. No. Does everybody see? So, what he is suggesting is that when this instruction issues, it will broadcast P15. So, all the instructions waiting in the issue queue will compare and you know what the instructions waiting for P15. And when all of them actually have committed, they can. How do you know? There will not be any further need for P15. It is not a wake up. There will be some instruction which is not a fetch structure which may require R4. Okay. So, yeah. So, as such, it is not an easy problem. Okay. So, what the processor actually do, what he has suggested. We wait until R4 is over it. And that instruction when commits, I know that the new map has been committed. So, I can now free P15. Okay. Now, the easy way to do that. So, how do you really implement it? That is how the question is. And the difficulty is that when this instruction commits, you cannot really say that, well, I look up the map table and find out what R4 maps to and I will free that register. That would be wrong. Okay. Because R4 now points to some totally different register. Because by the time this instruction commits, there may be many more instructions in the pipeline which already have over it in R4. Okay. So, how do you really free P15? How do you find out that P15 is the one to be freed when this instruction finally commits that uses R4. That overrides R4. Yes. Exactly. So, you remember in the ROB. So, when you rename this instruction, when you assign R4 a new map, you look up the table there and find that, oh, R4 now maps to P15 at this point. You just remember P15 in your ROB engine. Okay. And when you finally commit, what P15 has to be. Okay. So, that's why register recycling algorithm. So, now it should tell you why I don't need to checkpoint values for branches. Okay. Because there is no way P45 is going to be freed. Okay. And when we reassign before this branch commits, because any instruction that overrides P45 will have to come later. Okay. And it cannot commit. Okay. So, yeah. So, if the renumber runs out of physical registers, the pipeline stalls until at least one register is available. So, physical registers must be recycled. When is it safe to free a physical register? We have just discussed. So, that's the interesting question actually. So, think about this instruction. This one. So, here R2L gets mapped to P59. Okay. And the pipeline continues moving. Eventually, you figure out that this branch was mis predicted. Which means, this instruction should not have been fetched actually. So, now what do we do with P59? Why does P59 get fetched? It will be now. Right. Does everybody see that? As soon as this instruction comes to the head of the ROV in the kill state of course, I can free P59 immediately, because it's actually a non-committed map. Okay. So, there is no problem in free P59 and using it. Because whatever value that was put in P59 is a garbage map. I can ignore it completely. I can ignore it. I can override it. So, more physical registers means more implied instructions. That should be now clear actually. But as soon as I run out of registers, my pipeline has to stop. Okay. So, if I have more registers, I can keep on renaming registers and keep on moving my pipeline. Okay. Right. And that also opens up possibility of more parallelism. So, number of registers have a great connection with how much ILP you can expose. Okay. But cannot make the register file very big because there are downsides. It takes time to access as it gets bigger. Because there is a thumb rule that you have to remember that is smaller is faster. And usually both these structures are slower. Okay. So, as you make it bigger and bigger, it's going to be slower and slower. Okay. So, you start using CPI in some other case. Okay. And also it burns power. It's very important. Large structures usually very energy efficient. What is that? Is there a relationship between ROB sign and register file size? Okay. So, today register file size means the number of register registers. What do you think? Should there be or can there be size? Actually maximum is less than equal to this register. Right. What is less than equal to? The ROB size is less than equal to the register for registers. What? Mapping of all these registers. Mapping of all these register registers. So, at max we can map all these register registers. When we are using all these registers, we have a mapping for all these registers. So, you are saying ROB size is less than equal to physical register size. Right. Right. All the distributions are same. Right. Right. Okay. So, whatever the size of this register file size is. Yes. So, that many register registers are same. Right. Okay. There is one more entry. Right. So, what is the same destination register? There is no cross-money mapping in the register. If all the instructions have the same destination register. Yes. Then I will be actually reciting the register. Right. What do we do? At the time of commit. Yes. You are not committed. The worst case would actually be that. Okay. All right. Fine. So, what are you suggesting? So, less than equal to? What is this? So, you are saying that? You are right. So, you are saying that as many physical registers I have. I could. I should be able to rename those many instructions. So, which way is the equality? What makes sense? Sorry. There is an equal to. There is an equal to. Yes. Because if I have a bigger ROB. Some of the slots will remain empty. All right. I cannot rename these. In the worst case. Can I tighten this a little bit? If I tell you that I have n logical registers. Sorry. n minus 1 is what? What is n minus 1? This one we have already. So, you are saying ROB size cannot be bigger than n minus 1. Why? It is independent of this size, isn't it? I have n logical registers. Yes. Two times. How do you add that? You are using two names for single register. If a register is using R4. So, add is using R4 as a source. And then a load instruction is using R4 as a register. So, we have to use two names for that. Right. So, we can't. We won't use load instruction for a single logical register. It will be three, four names. Because register will not be free until that is completed. You don't know when it is going to be. But then after the load instruction, you would use the same physical register. No. So, you are saying that there are two instructions with the same target. No. No. Okay. Sorry. Yes. So, one instruction is using say one as source after a source. So, it has one mapping. They are using one name for that. And one is using it as a destination. No. Which comes first? The destination one is after the source. So, we have to use second name for this. Right. Okay. And we should require more than these two names. What if you can have another instruction that uses the same register as target? We can feed the only one that commits. You don't know what to do. By then, you may have 20 more in condition of the same register. So, if I add the ROB size. ROB size should be? ROB size should be? ROB size should be. Okay. Okay. Another name? Why? I mean I have raised my arm. What's wrong with that? I mean I have an arm of your own instruction. You find it. No. No. What I'm saying is that the minimum number of physical registers occupying is going to be N at any point. Register of my size should be greater than N. Okay. So, we have that we have already discussed right. That our size should be greater than equal to N plus 1. So, I have already n physical registers occupied, now I start fetching installments, can I just say like this, in a always occupied, whatever I have left with only that much I can allocate in the worst case, of course eventually they will get freed, but I do not know when they will get freed. So, I should not have an arbitrary size bigger than this, arbitrary size should