 I will look at the detail of the algorithm that we are discussing yesterday. So, this is the code snippet that we looked at yesterday, one iteration of the loop. I have also numbered the instructions as they will appear in the dynamic sequence. So, this is one iteration, the first one, this is second iteration, this third iteration and so on and so forth. Essentially, we are going to execute this on our model that we discussed yesterday. We have a single issue queue. So, this is my issue queue. So, I have not shown what it really contains. So, the idea is that instructions will be fetched in that order as numbered will be decoded in that order and will be allocated in the queue in that order. So, essentially instructions will sit in the queue exactly in that order, in the order of the instructions shown there and then from the queue will select instructions and execute. So, any question on yesterday's material or any doubt that you might have on this model? So, what I am going to do is, so each queue entry is going to have of course, it will have a slot ID which is the ID of the queue entry. It will have a bunch of source registers and at most one destination register. So, let us take the first instruction there. So, it is number one source dollar a 0 and destination dollar v 0. So, for execution purpose of course, I will have to also remember the opcode etcetera. Let any other argument like for example, the immediate argument also I will require. I am not showing them here. So, I will put the destination at the top. That is my first iteration. So, this is a source one, source two, dest etcetera. So, that is the order in which instructions will be allocated in the queue. Now, when you allocate an instruction and you know if the instruction can, so let me tell you the remaining pipeline stages. So, that might help you to visualize what is going to happen in future. So, next stage is called wake up and there is a select issue stage and then the usual execution, memory, write back. So, the wake up stage essentially wakes up any waiting instruction. That might be waiting for the operands. So, when I allocate an instruction, I need to know if the instruction can be selected in the next cycle. So, the instructions which are ready, they will actually skip this particular stage. There will not be any wake up stage for that. So, how do I know that? When I allocate a new instruction, how do I know if it can be, if it can participate in the selection in the next cycle? So, how do you figure this out? So, let us suppose that I am fetching in this order. The first instruction comes, queue is empty. So, nothing needs to be checked. So, I will actually put a ready bit here. So, in this case, the ready bit is going to be 1 meaning that next cycle it can execute, no constraint on this. So, this is a ready bit. So, this is about the second instruction. So, I take dollar a 0 and compare with, why is that? Because this instruction is currently sitting in the queue with the destination v 0. So, will that be a correct algorithm all the time? Should I compare with all the instructions before me, the registration of all of them? Does everybody see that? So, at any point in time, whenever I put an instruction in the queue, it needs to compare its sources with the stations of all the instructions before it. Is there any way to optimize this? This is a lot of search actually. For every instruction, you have to do a linear search and whatever way you want to implement it, there will be a linear number of comparisons. Any way to optimize it? Remember that we discussed last time that as soon as the instruction executes, I may not write it to the register file. I have to wait until its turn comes. For example, seventh instruction may execute now, but it will not be able to write it until everything else before it has written to the register file. No, no, no, it is not the question of bypass. I cannot write it to the file because there may be an exception that I may have to handle. The seventh instruction writes to the file, fifth one takes an exception. How do I recover the register file now? On a write back, I make the binary as 0. Yes, so what he has suggested is that I prepare a table. What is the table store? It tells me for each register. So, each entry contains, so let us suppose we have in MIPS, we have 32 registers. So, this is going to have 32 entries. Is that what you are saying? A particular entry, for example, let us suppose the entry for register 1 will tell me the slot ID in the queue for the instruction which generated this register for the last time, which was the last producer of this particular register. So, whenever an instruction shows up with this source register, I look up the table and immediately know who I depend on. Because all I care about is the last instruction to produce this value. I do not care about anybody else before it actually. Is that correct? So, for example, here, this instruction cares only about this particular instruction. It does not care about anybody else in this particular sequence. So, let us try to populate this register, this table. So, the first instruction comes, the entry for dollar v 0 will get populated with 1. And we will also have a valid bit. So, that is yes, no binary thing. So, this valid bit is normally 0. I will mark it 1. Whenever an instruction shows up with this particular, so I am assuming that this is the entry for dollar v 0, whatever that is. So, when I bring this instruction in, it looks up this table, goes to the entry for a 0, finds that it is actually invalid, which means I do not depend on anybody. So, essentially this instruction is also ready to execute. Next I bring in this one. So, also I establish dollar v 1's entry. So, let us suppose this is v 1. This becomes 2 valid. So, then this instruction comes. So, it is dollar a 1. It requires dollar a 1, produces dollar a 1, looks up the entry for dollar a 1. It is invalid, does not depend on anybody. So, it is ready to go. So, let us suppose this is my dollar a 1 entry. I mark it 3, make it valid, because it is also producing dollar a 1. Next instruction comes. It needs to look up 2 things, v 0 and v 1. It looks up v 0. So, it finds out it depends on 1. It looks up v 1. It depends on 2. So, that is what it marks here. So, it is not ready to execute and it has a dependence list. At most 2 things it can depend on a particular instruction, because it has 2 sources. What are the dependencies? 1 and 2. It depends on these 2 slots. So, this is my, I depend on these 2 things. What about the next one? It produces v 0. Now, something interesting is going to happen. So, it is going to override v 0. So, I am going to change this entry now, because the new incarnation of v 0 is actually 4. Then this one comes. It also needs 2 things, a 0 and v 0. So, it looks up a 0, a 0 is free, but v 0 comes from 4. So, this is also not ready to execute and it is a 0 is free, but v 0 has a dependence on 4. So, so on and so forth. As the instruction comes, you look up this table. So, store does not produce any value. So, I do not change this table on this instruction. You look up this table, find out who it depends on and that is how you fill up this queue. Is this step clear? What the allocate stage does exactly? It just picks up the next free entry in the queue and fills up these particular slots whatever it is and that is it. That is what the allocate stage is doing. Any question? So, we will come to this stage very soon. Let us keep over to the select stage. So, the select stage, what it does is that, it picks up the entries with the ready bit on and selects a subset of those for execution. It can select any subset for that matter. So, we discussed that also a couple of lectures ago that instructions which are ready can execute in any order. So, let us suppose that for this particular cycle, it selects these three instructions. So, it assumes few things. When you say that it selects these three instructions, that means I have. So, what are these three instructions? Two loads and one add. So, for these three instructions to execute together in this particular cycle, essentially they will move on to the execution stage in the next cycle. Sorry, I have missed one stage. So, in this stage essentially the selection hardware says that you can issue these three instructions in the next cycle, which means they will go and access the register file. So, these three instructions do not depend on anybody. So, they will go and access their source registers. So, in this case they need dollar A 0, dollar A 0 and dollar A 1. So, as soon as you say that you implicitly say something about the register file organization. You say that the register file is designed in such a way that it can source three operands in a cycle, because in this case you need three. And in fact, when you say that I can issue three instructions every cycle in the worst case, you should be prepared to source six operands. In this case it just happened that these three instructions did not have a second source. Is it clear to everybody this particular thing? So, essentially what I am saying is that how much you can issue in a cycle depends on number of available ports in your register file. How many source operands you can get in a cycle? So, that is about your register file. Now, we want to execute two loads and one add instruction in a cycle. So, that means what? What kind of functional units do I need to be able to do that? Sorry, louder? Three adders. Why three adders? Exactly. Two for generating the address of these two instructions and one for actually computing these instructions. So, I need three adders. So, that is another implicit assumption that we are making when you say that I can actually execute these three in a cycle. So, what this means is that it puts one more constraint on the selection hardware. It should be aware of what are the functional units that are available in the machine. It cannot just pick an arbitrary subset. So, gradually restricting the subset, you can see that. First we say that you depend on number of read ports in the register file, how much you can read in a cycle and then we are saying that what mix of functional units you have that also decides what subset you can actually issue. So, you need three adders. What else? What else do I need to execute those three instructions in a cycle? Yes, louder? Two memory ports. So, I should be able to do two load instructions in a cycle. So, I need two data memory ports. So, then if I have three adders and two data memory ports, then I am ready to send those three instructions to the execution unit and they can execute concurrently without any problem. So, what happens after they finish executing? So, essentially we are going to turn these bits to 0, ready bits because they have essentially executed. They are not eligible for participating in future selection. Now, the question arises that where do I put the values that these three instructions produce? I cannot send them to the register file, because it may not be time yet. So, we create one more entry here which is the value. So, here it will store the value of the load. So, the values will be filled up here, here and here. What else has to happen? You have to wake up the waiting instructions that are waiting on me. So, how do I achieve that? So, what happens is that whenever instruction finishes, it will broadcast its queue slot ID over all the entries. So, whenever this load instruction completes, it will essentially send this particular entry ID to everybody in this queue. And everybody else's job is to pick up this slot ID and compare with these two entries here. If there is a match, that means one dependence has been resolved. This particular operand is ready. So, for example, in this case, one will match this one and essentially this will go away. That means this is no longer dependent on this instruction. This instruction will complete, will broadcast to and this will also go away. And when both go away, this bit will become on. So, this is ready to execute now. So, in the next cycle, it will actually contain for selection. So, this is the wake up cycle that this instruction goes through. And you can see that an instruction can have at most two wake up cycles. In a particular cycle, one particular dependence may get resolved. In another cycle, another dependence may get resolved. Yeah, I am coming to that. So, where is it going to read the value? So, in this case, it needs v0 and v1. So, essentially, these two slots will give you the register value. So, it is not going to read from the register file, but it will actually going to read from the dependent queue slots. Remember that the values are stored here. Well, it was recording the dependents here. No, no, no, no, not yet. We read the value in this stage. Yeah, so I am coming to that. Yes. So, I am avoiding the hard corner cases. I am just trying to describe their basic algorithm. So, this is now ready to go. This is not yet ready to go. And there will be many other instructions which are ready to go. I have not shown actually. You can fill up the queue and see. So, in the next cycle, this instruction is now eligible to participate in selection. So, let us assume that the selection hardware selects this particular instruction. So, now, in the next cycle, this instruction has to read the register file. Now, the question arises, the value may not be in the register file. The value may be sitting here actually in these two slots. So, essentially now, you interpret these two entries in a different way. You figure out if these two entries essentially tell you where to get the value from. So, in this case, you will get the value from these two slots and go to the execution unit. Complete execution, broadcast your slot ID. Now, you will essentially going to wake up this one because this has four. So, four will match actually. So, and that instruction will issue and finally, time will come when you want to write back. And the write back will happen in this order exactly. For example, this instruction, the first one, is eligible for write back as soon as it completes because it is the head of the queue. So, when the time comes to write back, essentially what will happen is that it will pick up this destination ID, pick up the value, then it will register. And this queue entry will become free. So, if you design it as a circular queue, the pointer will move. That entry will become now eligible for allocation. All right. It is a basic algorithm clear. So, any question? So, one point we will notice is that we have implicitly done renaming. That is one point because essentially we are using these queue entries as an alternate name for the destination registers. For example, here I could execute two instructions with the same destination without any problem because they essentially hold the values in two different slots in the queue. And they will write back the register file in this particular order. So, I can concurrently execute them. Of course, in this case there was a dependence which is why I could not concurrently execute them. But otherwise, if there was no dependence, I could actually. Also notice that I can also resolve memory dependence because the store value here. So, what happens is that when the store instruction executes this particular one, the value actually still remains here. It does not go to memory yet because it cannot. Only when the store comes to the head of the queue, this value will be moved to the memory. So, then the question arises, what does it mean to issue a store instruction then? So, in this particular model, it actually does nothing. You cannot do the store at this point. Why is that? Why? There might be a load before it, which what is the problem then? Yes, it should need to devalue, that is fine. So, that is one problem. So, for now let us assume that because of exception, I cannot send it to memory. So, I am coming to your point very soon. So, the store cannot really modify memory yet because there may be a chance that an instruction here may take an exception. Then you cannot modify memory until it transcomes. So, that also gives you memory renaming essentially because now I can have two store instructions in this queue going to the same address. They can see it in two different slots. So, when you introduce caches, we will see that actually it makes sense to issue a store early. There is some meaning, but for now there is no meaning. It does not do anything. It just sits there with the value. When it comes to the head, it will send it to memory. Now, what he has mentioned, we discussed this also yesterday that suppose, so this is a store instruction, this one here. This particular entry is a store instruction. It writes to some address, which is this and we do not know until the store instruction is issued and it executes. So, in this particular pipe stage, I actually get to know the address of the store. Now, let us suppose that, so here our ninth instruction is a load, this one. Ninth instruction is a load instruction. The address of which also we do not know until it issues and executes, but on the face of it, if we just look at it, I could have actually issued ninth instruction along with 1, 2 and 3, because it will be ready, because dollar A0 is, oh sorry, yes it cannot, because it depends on this actually. In this case, I have a dependence, so which is why it cannot, but if I did not have a dependence, then I could have issued actually, 9 with 1, 2, 3. Skipping over the store here, which may be a problem, which we discussed yesterday, because it may be a problem if these two addresses are same. Then the ninth, this load will take get a wrong value from the model. It should be getting the value from this store only. The problem is that, I cannot really resolve it until both of these are issued and executed, then only I get to know the address. So, that is a big problem and which is why often what you do is, you would not issue a load if it has a store before it. So, you wait until fifth instruction is gone from the queue, it is executed, so that you know its address. You can store the address in a slot in the queue and this instruction can compare and you can figure out, whether it can issue or it has to wait or whatever. It is a basic algorithm clear, what is happening. So, this is called Tomasulo's algorithm after the name of Robert Tomasulo, an IBM engineer who came up with this particular systematic way of maximizing ILP for the IBM 360 machine. It has many different descriptions. This is just one of those many. The book has a different description, which we will go through very soon. But the crux of it is this, that you keep track of dependencies. You have a table to know who you depend on and so on and so forth. Now, there are small issues that are left, which actually can be troublesome, which you have to think about. The first question is, let us suppose that the instruction that produces, so currently v0 is produced by 4. And for some reason, what happens is that, this instruction is not fetched. Nothing beyond it is fetched. So, this is the last instruction that produces v0 in a queue and everything onward is not fetched yet. So, eventually time will come when this instruction will write back. So, at that point essentially what it means is that v0 is no longer held by 4. It is in the register file. So, at that time, I actually turn this off. This bit goes to 0. So, that is what he has mentioned that when you write back to the register file, you can mark this entry as invalid. That means, v0 is not in the queue anymore. If you want to read v0, it is ready in the register file. You can get the value from there. Any question? All right. So, the second question is, it may happen that, so we said that when you allocate, you read this table to find out who you depend on, which slot is producing your sources. So, there may be a raise going on. It may happen that one of the sources that you require, the instruction currently being allocated requires is currently being written back in the same cycle. That can happen, because in the same cycle many things are happening. Something has been allocated, something has been executed, something has been written back. So, an instruction requires register r, which is being allocated in this cycle and the instruction that produced r, the last instruction to produce r is being written back in the same cycle. So, this instruction comes, looks up the table, finds that the entry is 1. Thinks that well, it depends on this particular slot and it goes there and waits forever, because it is never going to be opened up, because that instruction has been written back and so, how do you resolve this? Is the raise clear to everybody? Because in the next cycle, if you checked, of course, this entry would be 0. But in the cycle that you are getting allocated, there is a raise going on. You are being allocated, same register is being written back. And you remember that the way a pipeline works is that, in a particular pipeline stage, it will sample all its inputs and then work on those inputs during the cycle and then make the modifications at the end of the cycle. So, at the beginning of the cycle, it will sample the table, find out that this entry is 1 and will think that it is supposed to get the value from 4. How do you solve this problem? Timeout, you have to be very careful if you rely on timeout, because there are the memory instructions have non deterministic delay. You do not know how long they are going to take. What if this register is being produced by a memory instruction, which may be a legitimate dependence actually. If you timeout too early, then, all right. So, yes, that works. So, you are saying that periodically you are going to check this particular entry. That makes you that will make forward progress eventually, but we may end up losing a lot of cycles. Any better solution? So, you are saying that you want to phase the execution. No, see, the wake up happens when an instruction completes execution. You want to modify that? So, before I go to his solution, what he suggested is that why do not you have a phased execution, that write back happens in the first half of the cycle and my allocation happens in the second half of the cycle. Then, the problem is solved, because by the time allocate works, the table is up to date with these cycles write back. So, I will always get a consistent state in the table. That is one solution, but if I want a very high frequency processor, I may not be able to accommodate my write back in half cycle. Yes, you want to wake up after this? Before after is very fuzzy in a cycle. When you are within a cycle, there is no meaning of before after. These are all concurrent. They can happen at any point in time. Yes, so this one works. So, what you do is you divide your write back essentially into two parts. You spread it over two cycles. In a first cycle, you update your table. So, you mark this entry to be 0. So, that is what this write back stage is doing. If you want, you can write the value also to the register file. In a second cycle, you broadcast this entry again over the queue. So, in case, in the previous cycle, somebody had a race, that guy will now wake up. This cycle, if somebody is being allocated, it will actually see a 0 value. So, it will not wait. So, that takes care of the problem actually. Is it clear? Splitting the write back into two cycles. There is only one broadcast per write back. So, when this particular v 0 is written back in the first cycle in the WB stage, it will reset this bit and it will write back the value also to the file. In the second cycle, it will broadcast 4. Actually, in this case, it will broadcast 4 along with the value. The value will also be needed. So, if anybody missed that bit in the last cycle, we will wake up now and pick up the value from the bypass and we will go. Clear, everybody? More problem. So, we said that this wake up stage is needed because this will wake up any waiting instruction, which are waiting on some dependence. So, that they can now participate in the selection in the next cycle. So, you can have the same race there between allocate and wake up. You are being allocated and your source register is also being generated in this very cycle. You are being woken up actually. So, what will be the downside if I miss this particular wake up? Because there is a chance that I may miss it because I am being allocated in this cycle. So, what will happen is that I will go and look up the table. Table will say 1 here, let us say. So, I pick up 4, but in this same cycle, 4 is being completed. But I will miss that wake up signal. So, what I am going to do? I am going to wait there until this is going to get written back, when I will be woken up because of that broadcast of the second broadcast that we have put there. Sorry, no, it will be reset only when it is written back. Yes, that is what I am saying. So, I am allocated an instruction in this particular cycle, which needs v 0. The instruction which is producing v 0 is also completing in this cycle and is broadcasting this particular slot ID. So, this instruction is going to miss this particular signal because there is a race going on. In the same cycle, you are being allocated and you are being woken up. 2 cycles. No, the point is that. So, maybe I should clarify. So, let us go back to this. So, these are ready to go and this is not it ready. So, assume that this instruction is not it allocated. So, now, what is going to happen is that let us see. So, v 0 is being produced by 1, v 1 is 2. So, 1, 2, 3 are issued, they are executing. They go through the memory stage, they are done executing. So, now they complete. So, essentially I am going to broadcast 1, 2 and 3, they are complete now. So, when this broadcast happens in the same cycle, 4 is being allocated in the queue. 4 looks of this table, finds that it depends on 1 and 2. So, it goes in there, sits there waiting for 1 and 2, which will never happen because 1, 2 and 3 have been broadcast in this cycle. So, this instruction which could have executed in the next cycle will now wait until these two are written back, when they will be opened up. Is a problem clear to everybody or you still do not see it. So, what I am saying is that you are now depending on this particular wake up to know that you are ready instead of this one because you miss this signal. In the same cycle, you are broadcasting 1, 2 and 3, which all the instructions are supposed to compare against their dependence list, but this instruction is not yet here to compare anything against. It is still being allocated in this particular cycle. So, we will look up the table, we will get to know its dependence correctly, it will mark 1 and 2, it will mark itself as not ready and it will wait there until these two are written back, when they will be again broadcast and now they will pick up based on this. So, there will be some lost cycles, which we actually do not need. Is it clear? How do you solve this? One more bit in the table, in the table, you want a ready bit, you are ready bit. All right, how does it help? When is this modified? As soon as it issues or when it completes execution, which very same cycle? While execution may not be one cycle, multiple cycles, last cycle. So, then what is the difference? I mean that point I broadcast, the wake up also goes out. Okay, I see. Yes. So, you want to do this one cycle early. So, we will actually do the same solution, we are going to split up the wake up in two cycles. So, we are going to change the table entry here. So, it is a ready bit, which says that whether this particular value is ready or not. It may not be ready in the register file, but whether it is ready in this particular slots or not. So, whenever the instruction executes, I may not be able to do it in the same cycle, may be the next cycle. Okay. I will first change this table entry to say that, yes, this register is now ready. Okay, meaning that the value is ready. All right. And then in the next cycle, I will do the wake up. So, what will happen is that in this cycle, if somebody misses this, will be opened up in the next cycle. All right. And in the next cycle, if anybody comes, we will actually get the correct value from the table. Okay. And all of them can now participate in selection in the next cycle. So, essentially what I have done is that, I have introduced one more cycle here. So, I will just call it wake up one and wake up two. Okay. So, essentially what I am doing is that, I am essentially splitting the wake up into two state changes. Okay. There are two states associated with the wake up now or associated with the write back. All right. So, that gives you essentially what? An 11th stage pipeline. Okay. So, these are pretty much a functional design. This is going to work. And the only limited to your ILP is the length of the queue. How many instructions you can see at any point in time? If you have more, you will discover more independent ready instructions, which can participate in the selection. Okay. And of course, you are limited by your register file ports. You are limited by your number of functional units. How much can be issued? Also, we haven't discussed one thing. That is, you are limited by the rate at which the queue can drain. Okay. And that depends on what? Write ports. The number of write ports in the register file. Because if I want to drain three entries every cycle, I better have three write ports in my register file. Okay. All right. Because I may have to send three values to register file. So, if you have an unbounded queue, you will be limited. And if you have unbounded register file ports, if you have an unbounded functional units, you will be limited only by flow dependence. Nothing else. Only data flow will be the limiter. Okay. You will get all other ILPs possible in your program. And to notice what is going to happen with this particular piece of code, your queue will actually fill up, even though the seventh instruction is a branch instruction. Assuming that this branch predictor here, which will be looked up here, actually. Okay. Whenever the seventh instruction is fetched. And if it tells you the right thing, it will actually keep on filling the queue in the right direction with right instructions. Okay. And then your selection hardware will actually, in this case, although it may not be possible because of this dependence, these two loads will depend on this particular one. And then you have various other dependence instructions. Okay. But otherwise, nothing stops you from picking up ready instructions from different iterations. Okay. All right. Okay. Any question? So your book gives a slightly different description of this. It has a different organization of this queue. And it also maintains this particular table in a slightly different way. And we'll actually, so by the way, so this is actually not implemented in any processor in this way. Never. Okay. Can anybody guess what is the problem? So here I have a single queue, right? Length of queue. Yes. I want a very large queue. So what? Can't I have one large queue? No, it will be bounded to some length. Let's say 200 entries. Broadcast. How many comparisons? Yes. Right. So how many comparisons? How can, yeah, how many? Why? Is it? Two into number of entries, right? So per entry, I can do at most two comparisons. That's the worst case. Multiplied by number of entries. So in the worst case, every cycle in a 200 queue design, 200 entry queue design, I'll be making 400 comparisons every cycle. Right? And possibly to accommodate these 400 comparisons in my short cycle time, I'll be doing all of them in parallel. Right? I have to do them, which means I need 400 comparators, okay, in my particular design, which is probably out of question. Yes. Right, but per entry can make only two comparisons, right? Compare with every broadcast ID. Yeah, exactly. So yes. So what is the number? Suppose I can write W entries in a cycle. W into N into 2. So does everybody see why is that? Whenever I broadcast one entry, that will be compared against both of these for every entry. Okay, right? So N into W into 2. Okay. So that's pretty large, actually. Very large. So how do I solve this? I cannot get rid of the broadcast. That's fundamental to this design, actually. But everybody needs to compare. That cannot be taken away. So number of comparators, can it be reduced? There is more. Other than this wake up, there are other comparisons involving load store operations. I've just mentioned, right? So suppose that I have a constraint that load instructions will not issue until all the stores before it have completed. All right? So I have this particular constraint. Okay. So whenever a store is issued, what it does is that, it doesn't write the value to memory, but it computes the address. Okay. It goes to the execution stage, computes the address and stores it in a separate field in this particular entry. Okay. So whenever a load instruction wants to issue, it first checks that all the stores have been completed before it. That's the first constraint. And let's suppose that, yes, they have all completed. But there are a bunch of stores before it actually, which haven't yet written to memory. All right? So what it will do is, it will issue, compute its address, and then broadcast the address over all the queue entries for comparing against the stores which have completed. And if anybody matches, the load has to be called back. Okay. All right? Load has to be called back. It will take the value from the store's entry. It cannot take the value from the memory. Is that clear to everybody? Okay. Because the load is supposed to consume this value, not something that is in the memory. So that actually increases your number of comparators even further. You need address comparators. So these are QID comparators. So these are dependence comparators. In addition to that, you need address comparators. So what do you do? How do you reduce this overhead? This, the storage is not a problem. A 200 entry queue is okay actually. Okay. So you want a list of dependence here, as opposed to storing who I depend on, you want to store who I need to source. Okay. All right? But this list may be unbounded, right? So how do I size it then? Because this has to be done in the silicon, which will be fixed at design time. You cannot increase it at runtime saying that oh, I need more. Yeah, which is gigantic, right? Yes, hash table. Will that reduce your number of comparisons? It will reduce W. How? What is the organization of a table? Now, what is my hash key? IDs. Okay. All right? No, no, no. Tell me the organization of one entry. What does it contain? All these things? Same. So then what is the difference? Okay. So we will continue from here next time. So I will stop here today. Okay. Based on the? Means? Can you, little bit more? Okay. So you want to attach a list of dependence with a register. But the list of dependence is unbounded. I am not unbounded, but it's. So that still is large, right? Even so many fields. Not too many. This is, this is, see I am saying storage is not a problem actually. Okay. The problem is the operation that you are doing on this storage. That is the problem. You have gigantic caches on chip. That's not a problem there. The problem is the kind of operations you are doing. So we'll continue from here next time. We'll see how today's processors actually handle this problem. But keep in mind that this is the model. Okay. This is, now essentially we'll do small tweaks on this to make sure that this is, this becomes implementable. Okay. All right?