 We looked at the pitch and the code stage last time, also looked at the branch predictor So, this process are uses class of instructions called conditional instructions. We have a discussed it earlier, so I thought I will spend some time explaining what these are. So, essentially the idea here is that if possible we are going to get rid of branches completely because we have seen that branches cause problem in the pitch unless you predict it. So, take this particular piece of code which says that if not a then smaller equal to smaller b. So, this case translated to that particular instruction c move z which stands for conditional move 0 and essentially what it does is that it moves the value in R 3. So, essentially it converts the control dependence into a data dependence because now what we are essentially saying is that we have two sources for this instruction which are R 1 and R 3. So, essentially we have to we have now more time to get R 1 ready instead of the in the future we will now what does it mean is that if we instead had a move instruction of course, your mips assembler will translate the move possibly to an added instruction with addition with 0. So, suppose instead of this instruction we have a simple move instruction which moves R 3 to R 2, but before that we have branch instruction. So, the code will directly look like this if you do a pitch translation of this branch is not equal to 0 R 1 to level R 2. So, essentially what would have happened is that this instruction gets fetched. So, let us call this I 1 let us call this I 2 I 1 time line which will be like this. So, to be able to fetch I 2 as opposed to I 3 we need the outcome of this instruction here. Whereas, if you look at that instruction we can wait till the execution stage for R 1 and R 3 to be ready. And only the execution stage of this instruction will be stalled only at least one of these is not ready R 1 and R 3. So, that is the meaning of this particular statement that we have more time to get R 1 ready instead of the feature we are using in the execution stage. So, this also known as if conversion there is a technical term used in the compiler community and very useful in compiling this piece of code. For example, if you want to say y could be absolute of x which is essentially this value x less than 0 then y is minus x. So, very useful in getting rid of hard to predict branches for example, this could actually be a very hard to predict branch depending on how value of x varies. However, eliminating branches that got a large piece of code may require too many conditional moves. So, that may not be feasible actually. So, this is normally tried on small pieces of conditional structures and see who said is supported by all passers today. Now, the question is that how does it interact with register domain to understand why this is at all a problem. So, we have not yet looked at the register domain. So, let us first try to see what the domain one thousand beams are taking. So, any question of conditional instructions? So, this takes place in the second pipe stage. So, it essentially has decode remain used in the single pipe stage and as you have discussed every destination is assigned new physical register from the fields. So, this is what we have already discussed in when discussing register domain or box. Whenever you get a new instruction the destination of the instruction will get a new physical register. For example, here R 3, R 10, R 4, R 1, R 3 these are all new physical registers from a free list of registers. The source of the assigned the existing map. So, for that you have a table which maintains the current map of a logical register to a physical register. The back table is updated with a newly renewed destination. For every destination physical register a busy bit is set high to signify that value in this register is not in 20. This bit is created after the instruction completes execution, but possibly before we have it. So, this one also we have discussed this point in the program. How exactly readable interacts with your register value bit. The integer floating point instructions are assigned registers from two separate fields because these are really disjoint register sets. The integer few register files are separate which has 64 registers. So, notice that bits has 32 logical registers for each of these and I have double the number of physical registers. There is slight complication involving one bit. If you remember these two instructions they have actually two destinations because if you multiply two values to 32 bit values you produce a 64 bit result and that is normally split into the high and low registers. Similarly, when you do a division the quotient and remainder go to different registers high and low. The question is how do I rename them because I do not have an option of relating two destinations here in this particular program. So, what bits does is that it first of all puts restriction on. So, as we said it relays four instructions in the cycle right. Now, there is restriction on the position of the multiple instruction in the model. Whenever it encounters a multiple div instruction in the regular model it actually terminates that model there. It cannot retain any function instruction. It breaks down this each of these two smaller instructions internal and will remain there separately. So, model will broken down into instructions div will broken down into instructions and they will go separately through the. However, their coupling will be maintained throughout the pipeline. These two instructions are actually atomically coupled as they close the pipeline. Any question on this basic protocol? Because you have to rename two destinations and here you do not have an option of doing that. The hardware allows you to rename only one destination every cycle. So, from the free list. So, the way the free list is designed R-10 is that it can provide you only four registers every cycle, four new registers at most not more than that. So, why discuss this? The point is that what would be the other alternative? You would make it arbitrarily large like each instruction can ask for any number of registers. So, then your hardware will become fairly complicated that is the problem if you think about it. So, we have not yet talked about the free list organization. So, we will soon talk about that you will see why there is a problem. So, what is the method will look like? It is a multi ported RAM. So, we are talking about the table which maintains logical to physical register mapping. So, if you have forgotten about the structure of this table it looks like this. If I have 10 logical registers it will have any entries and if I have p physical registers each entry will be locked in its right. So, particular register X will have the corresponding physical register map stored it. So, it has 16 read ports and 4 write ports. So, this is probably understandable 4 write ports why is that? In a cycle you will at most update 4 entries for this table. So, you require 4 writes. So, 4 write ports why do you have 16 read ports? There are 4 states in the pipeline which have used this register. Oh really which states? There is a decode phase to assign these registers. So, 4 of them have. How many reports should it have? If I left it blank X that is it should be equal to number of reads I can do in a cycle right from this table. How many reads? Forget about this number 16 give me some number that makes sense to you. So, let us suppose I am renaming let us take a instruction from here. Let us say I am renaming these are bad instructions. Well I will become a instruction and R 1 R 2 R 3. So, what do I reach from the table to rename this particular instruction? Sorry. The memory which entries I am going to read from the table for this instruction R 2 and R 3 right. I will read R 2 and R 3 to get the corresponding physical register is for this right. And I assign a new physical register to R 1 right. So, I will update R 1 and I will read R 2 and R 3 I have 4 such instructions. How many reports should I require? 5 8 ok. So, is there anybody who thinks that we need more? So, I just say that you could have conditional and instructions point out C at z which would have 3 sources. So, I can convert it to this kind of an instruction right because to 12 right. Now, MIPS does not have this instruction MIPS has only one type of conditional instruction that is conditional rule. And conditional rule instructions as you can see we have these two sources, but I tell you that the conditional rule instruction will still require to read R 3 also from the table. Why is that? Sir, is there for all the instructions whether it is not a conditional rule? Yes. Other instructions like add simple which also we will also need to read the last mapping of the destination. So, he has raised a separate point he says that in this instruction for me to move the conditional part of it. I also need to read the old map of R 1. For example, suppose R 1 was mapped to p k previously now I give it a new map which is say p n I need to read p k also I just cannot write over write p k with p n. Why is that? Why do I need to read the old map of the destination? So, what are right? So, currently R 1 is mapped to p k before that destination is relayed. So, I ask the free list to give me one new register and it gives me p n. So, now p k will be replaced by p n, but what he has say is that I also need to remember p k. So, I need to read it I just cannot over write p k with p n. Why is that? So, why do I free p k? Instructions. Instructions do not have maps register sub maps. For this particular register and again instruction like this. So, like the last last instruction to produce R 1 dot p k. Yes. Yes. So, until the instruction comes p k. At the time I can free p k. So, saying that there was a there was a instruction which produced R 1 and we mapped R 1 to p k at that time. And when that instruction commits I can recycle p k and assign to somebody else. If it is not used later. We cannot know that in future we will get used, but if it is over written like R 1 is assigned to somebody then we can free it. So, when does it get free in this case? Here when we assign R 1 and then we assign. When you assign. The two condition one of the p k previous uses of p k was committed the instruction How do you know that? The instruction commits we cannot it is in word p k. What would be a very simple analysis we assign to p k? You are right. So, I should commit it when the last instruction to use p k has committed. How do I know what is the last instruction? And as I mentioned that once this instruction is inside the pipeline I know that beyond this point p k is there for sure because R 1 has a new incarnation. But this instruction has a finite life inside the pipeline it is not a you know point life where it comes and goes. It starts here goes to several pipe stages and then completes. How do I know that? During this time there is still some instruction is in the pipeline which uses p k there could be that which is before that instruction. So, when am I sure that p k will not be there? When this instruction commits. Exactly because when this instruction commits I know that everything before it has committed and anything after it should not require p k because R 1 has now a new incarnation. So, it is very conservative but it is correct. And this is the reason why this instruction needs to carry p k with it. So, that when it finally commits it can recycle p k it will move p k to the previous back. So, that means I need for more extraordinary course because for every destination I need to read its own map as well. So, that makes it 12 right I had two sources which is 8 plus 12. Now, coming back to the previous question I say that I need to read R 3 here also for a different reason not what is mentioned. And if you can answer that that will make it to 16 because in a bundle I can have four conditional rule instructions points and so on. Think about how to implement conditional rule. So, in front of entry register you need a multiplication is that correct to be able to implement conditional rule. The multiplexing selection would be R 1 and what do you write what are the inputs to the multiplexing? I need two inputs what are they R 3 and R 2 right. So, I either write the same content back to the register or I write a new content right that is what it means as a change. So, R 3 acts as a source as well as business function. So, I have to read R 3 I may write R 3 to R 3 or I may write R 2 to R 3 depending on finally, what R 1 results to which I do not know what it is going to be understood. How do you design the register? Tell me what is the hardware structure there is no other way other than this actually. We can read the same the right for R 3 is being written in this insert seat. Yes. So, R 3 is being written always that is may matter even if it fails R 3 will be written because this is the architecture this is the hardware issue. So, you have to be prepared for the worst case that is a bundle contains four conditional rule instructions that is the worst possible. So, R 3 is the same register which one? R 3 the open acting of. Yes, but how do you design the hardware? We looked at the SNM course right how to design. Yes, we have four separate orders. You are reading one register twice. Yes. That is all the way you give me the hardware which I will give you. So, that is how you get 16 reports. So, third operand is a condition bit. So, we talk about this particular predicate bit when we talk about register file. But keep in mind that this is also renamed even if it is a bit it is still renamed. So, by bit I mean it has it is a single bit value, but it is actually a full register. So, it is very intense it is taken from the normal architecture. So, any question is it here? 16 reports 4 right quotes. And yes if you did not have condition rule instructions you have 12 reports which makes no sense. But, because of this you have to have 4 extra reports. The remember uses 24 5 bit comparators to resolve dependencies. What are these dependencies? So, before trying to actually account for this particular number 24 here can someone tell you what are these dependencies? So, remember that we are currently in a piece of hardware which is trying to rename instructions in parallel by looking up this particular table. So, you have 16 reports and 4 right quotes. And you are renaming actually 4 instructions. And these 4 instructions may itself they may itself be dependent upon this one. So, how does actually this suppose that we read all the input dependencies first start. No, I did not tell you that. How is that? No, first of all can you give me an example where there might be a problem. So, let us suppose the mind ok. So, to get you started let us suppose the first instruction of module is this. Yes. So, can you give me. This is using I go in ok and I go at some other. Do you mind if I add to the same register? No. So, let us take these 2 instructions ok. So, what is the problem? So, as you have mentioned I cannot just go ahead and treat each instruction individually and say that well I am going to read R 2 and R 3. I am going to read R 1 and R 1, but that this instruction will get a wrong map. It should get the map that this instruction gets actually. So, this is the dependency we are talking about here. So, now can somebody account for the number 22? We have 4 instructions. Yes. And plus so one problem that we will discuss in this particular example. You also make sure that this map finally survives in the slot of R 1. This should not. So, this order has also should be maintained. So, even though you are naming 4 things in parallel there are certain order restrictions that have to be obeyed within this module. Depending on the dependencies. So, these comparators are trying to figure out these dependencies. Can somebody explain that number 22? So, let us suppose that I have a bundle size of k and k instructions. And for the worst case we have 3 sources for instructions. Within a bundle I can have k condition of these instructions. So, how many comparators we have? Exactly. So, k into k minus 1 per operand I have 3 operands. How do I get this? So, if I take the first target I compare it with next k minus 1 instructions. Take this target I compare it with next k minus 2 instructions. And for each instruction I have 3 of them. So, if I plug in k into 4 I still do not get 24. I get slightly less than that I get 18 by 2 sorry. What are the extra 6 comparators? So, this is the law difference. What are the extra 6 comparators? I still need to make sure that in this case that this map survives not this one. Write down and write how many? I mean 4 choose 2. Sorry, 4 choose 2. So, k into k minus 1 by 2. For the first target I compare the next k minus 1 templates. We start it and compare the next k minus 1. k into k minus 1 by 2. So, the total is twice k into k minus 1. All right, clear? So, now the question is how do I implement this? I know how many comparators I need to figure this out. So, you can guess that there will be certain bypass paths within the rename stage that are actually operating. Which would actually pass on this R 1 map to these guys. So, that they do not they are whatever they have read from the table will finally get over it by whatever they have read. So, to be able to do to be able to try. So, essentially what I am saying is that the final map that it is going to get this particular register source will be either whatever it has read from the map table or something coming from one of the destinations already. So, you will have multiplexer which will be having these two inputs and its selection will be given by one of these comparators or maybe a combination. How do you organize a free list? So, it is four way banked each bank is 8 and 3, 3, 4. So, each bank's read pointer gives you a free register line. So, that is the simplest way to implement a free list that can give you four registers every side. So, you have four FIFOs. So, you console the read pointer of each of the FIFOs whatever they point to is your free register. So, after renaming we are almost ready to issue. So, what you do is you assign an active list entry to a free structure. So, this is essentially the reorder buffer it means it is called an active list. The active list is a 32 entry FIFO queue which keeps track of all implied instructions which can be at most 32 in order. Each entry contains various information about allocated instructions such as physical list, register, number, etcetera. How do you organize the active list? Is it a single FIFO? Should be broken down into bunch of FIFOs. How do you decide that? How do you decide this one? Four way bank 8 entry FIFO why not 8 way bank 4 entry FIFO? Why not 16 way bank 2 entry FIFO? 2 way bank 6 entry FIFO? Why is this one? Why not just one FIFO with 32 entries? Well, that is an interesting question. Here we say that each register file has 64 registers. So, what do you expect your free is to be 64 entries? We have to separate 3 registers. Yes, we have to separate. Another time you assign, another time you require only from a bank of 32 entries. Because, what does I need these are operations? No, no, no, each one has 64 registers. 32 registers. So, logically is there only 32? Yes. Ok. From 32 only you have nothing. Maybe there are more physical registers, but logically there are only. Ok, maybe you have nothing, let me see. I will ask a general question. What are the logical registers and physical registers? How big is my free list? N. Not N minus P. N. N. N minus P. N minus P. Which one is bigger? P is bigger. P is bigger. Would you like to revise it? P minus P minus P minus P minus P minus P minus P minus P. Why? These are the complete registers. No, no, no, it can be in this. Depends on how many registers I look at it. Anything else? Login? It could be less than P minus P minus P. What is it? Depends on this. Sir, there are many instruction in that. They are not basically. Ok, maybe ok. So, what does a free list entry contain? As I mentioned, this could be used. The question is not. Sir, it is an integer. Free list entry contains an integer, which could be used. So, the question is how will you size this free list? What is the maximum size? Why do you minus it? There should be some initial map. Initial map. So, what is the longest size of the free list? How do you prove that it is P minus P? You are saying there are certain points in time. There are certain registers in the map. So, at most n. At most n. Exactly n. Exactly n. At least P minus n are free. At max. At max. Sir, are you asking? Initial, no, not max P minus n are free, because there might be dependencies. It is an instruction, which are not a tab. So, basically they might be. So, in the pipeline, there could be multiple instructions that have mapped the same logical register. That is possible. So, there would be, in certain point in time, there could be more than n logical registers that are mapped. Which means there could be more than n free list entries that are occupied. But what is the worst case? That is what you design for it. Are you assuming an initial mapping? When the machine moves up, something will be mapped. Why? Because there are three physical registers and n have already been mapped. Can you prove that n will always be mapped? At least n. You are looking for the worst case all the time. That is what we need to do. The largest possible free list size. Sir, when do you rename these to other registers? When you always rename it to some other physical register. That is right. You change the entry to some other physical register. So, we have replaced one with one. That is right. The number remains in p minus 1. At least it marks me. At least it marks me. All right. So, essentially the argument is that at any point in time, at least n registers will remain mapped. If your pipeline is completely empty, so as it is doing nothing, then you will have n registers mapped to some physical registers. So, the largest possible free list size is never more than p minus 1. That is what we need to size up. So, that explains why it is 32. Because it is 64 minus 32. Why is it organized like this? Two free lists. Yeah, they are two free lists. I am just talking about all of them here. Both are organized in the same way. One integer free list for 40 points. Why is it organized like this? Four right points. Four? You require four re-nits at a time. Yeah. For four way bands. Four bands. Four parallel re-nits. Why can't I do that with a single free list? This way we can get the first free list at one time. But if they are implemented in 32 lists, we need one file and we need to create the free list. I can design my hardware with four re-points. I will keep four re-pointers that will move synchronously. So, I will just tell them that if I activate these four word lines, give me the context. Now, what do I use in that organization? I use something compared to this. Here I have 32 entry RAM which is four way ported. Here I have four RAMs which are simply ported. Each of them. There is a huge difference in radius of these two days. So, that is why it is organized like this. And why four? Because I need four registers every cycle as I mentioned. This is not about writing four things. I need to read four things for this free cycle. Now, this works provided. I also return four things to the list every cycle. So, that also determines your commit width. Because we return registers to the free list when instructions commit. Like I said, when this instruction commits, I will free the old map of R1. I will now return to the free list. So, I will move the right point of each of these banks. So, four registers will return every cycle to the free list at most. So, I hope you remember how FIFO works. It has a read point to the right point. So, at the head and at the tail. Now, GALIC is forward. How do you organize this FIFO? That is what determines the organization. So, how many banks should I have here? It is four banks, eight entries per bank, 16 banks, two entries per bank. What is it? Two banks, 16 entries per bank. How do you decide that? So, where do you read the active list entry? In what pipe stage? Decode. Decode. What do you read from the active list entry? Decode stage. Why do you write to the activist? Why do you write to the activist? At each stage. Which stage? At each stage. Exactly. So, that is what this whole first center says. During the second stage, every structure is assigned an active list entry. Which means I become an active list entry and write to it. I fill up certain things in the active list entry. So, how many active list entries do I need to write every cycle? How many structures come into the stage every cycle? Four. How many structures come in four? Instructions leave. I need to write four things in the cycle. So, I need to write fours. What do I need from the active list? The information that I am going to get here will be needed when. For example, it says a physical list entry is to be registered now. So, for example, you are not here, right? It would store its old map in the active list entry. That is a place to develop one. What is good to read it? Sorry, exactly. So, when the structure commits, I will require this particular number. So, I can feel it. So, number of read ports is determined by the common instructions and commit. And MIPS, that is also four. So, that basically tells me that I should have four maps. Each is an active list entry. So, I can read eight things. No, we don't. We will carry the relevant information along with us to the entry. Issue to entry will store those. But at certain interval, we need to check where the address is. Now, I need them to be executed. So, those information will be stored in the entry. So, our entry will also contain, for example, your branch out. So, that you can operate the branch predictor when the instruction commits. So, anything that you write to this particular active list entry will be required at commit time. That is why you need that. So, also its instruction is assigned to one of the three issue queues. Issue queues depending on its type. Intelligent queue holds the integer ALU instructions. Floating point queue holds the floating point instructions. And address queue holds the memory operations. So, those stage two may stall if the processor runs out of any of these resources like active list entries, physical registers, or issue queues.