 So, today I thought I will walk you through an example of how the branch predictor works. So, this is a C code that we are going to look at. And this is the MIPS 32-bit MIPS translation. After this, there are the code to predictor and all those are there, which I have skipped, not very important to me. So, just to quickly go over the translation, dollar A1 holds x and dollar V1 holds i. So, here this one is essentially doing i percent 8. So, if you end with 7 and if it is not 0, then you know that it is not divisible by 8. Numbers is by 8, we have last 3 bits 0. So, that is essentially this branch. It is checking if it is not 0, then it jumps to 8, 4, which skips over the increment of x. So, remember that in most cases if you have any condition, if the branch is not taken, then only you execute the condition, the code under the condition, otherwise you skip it over. So, it skips over if it is not 0 and then here it increments i and compares against 1000, set less than immediate, sets V0 to 1, if V1 is less than 1000. So, if it is not 0, then it loops back to here and in the delay slot it computes i 1 0 8. So, main loop is essentially 401078 to 401097. So, that is your main loop. After this I have skipped over, as I said there will be a call to printf, there will jump and link to printf and also you can serve the argument for printf, printf has two arguments, one is a constant string, other one is going to be your x. So, is the translation clear to you? So, we have two branches that are of importance here, this one and this one. So, we want to know what the predictors will tell us, run this code through our branch predictors. So, we have talked about several. We will look at some of them today and run this code through that and see how well it predicts. So, any question? So, first I am going to start with a bimodal predictor. So, what does it look like? It has a VHD and we look at three sizes of this actual encounter. Two bits, one bit and three bits and we will see how it impacts your prediction increases. So, to start with let us assume that it is two bits and we will assume that we have 124 entries in this. So, we have two branches here that are of importance. And if you look at them, they are pretty close actually. One is 7H, other one is 8C. This is a hex program count. So, you can easily see that if you have a 1024 entry VHD, these two branches are going to fall onto different entries. Because we are going to take a modulo 1024 hash function to map 10 to the VHD. So, essentially we are looking at the last 10 bits after removing the last two. Because you can see that each one is actually a multiple of 4. So, last two bits are going to be 0. So, these two branches are going to fall on different slots of VHD. So, there is an alias. That is the first important observation. So, now which means we can actually look at these two branches in isolation. We can analyze them separately. So, let us first start with this one. So, this one corresponds to this particular branch. So, what do we expect here? We expect that your… So, this branch is going to be taken 7 times out of 8 times and not taken once. So, first time for example, it will be actually a not taken branch where I is 0. So, the actual outcome. So, let us write it down here. So, first time it is going to be not taken. So, I will probably write 0 here. So, that is what the actual outcomes. So, let us see what the predictor tells us. So, I assume that this table is initialized to all 0s. So, first time we get the branch. So, let us allocate an entry. I suppose that this is the entry for this branch. So, initially this is 0. So, first time the branch comes, I look up the predictor. The predictor says, what is the prediction algorithm? So, not taken if count is equal to taken if count is equal to 0 1 2 3. So, first time the branch comes. So, this one it predicts not taken. So, predicted. So, how do we update the counter? It has to be decremented. So, nothing to decrement it remains as 0. Next time the branch comes and you call the predictor it says, what is the prediction? Not taken. But this time the branch is actually taken. So, counter increases to 1. So, next time the branch comes what do I say? What is the prediction? Not taken because the count is 1 which is less than 2. But the actual outcome is actually taken. So, it goes to 2. So, next time I will say taken. So, counter will go to 3. And then everything will be fine. So, I will keep on saying taken how many times? 1 2 3 4 5 6 7. So, count will saturate at 3, stay there at 3. And then the branch comes for this one. The value of i is 8. I say what is the prediction? Taken which is wrong. So, now the count will be decremented to 2. So, cancer will tell me what am I going to do next? What will be the prediction after this? It is all going to be taken. And the counter will essentially remain at 3 most of the time. Occasionally come back to 2 go back to 3 again. So, it will be all 1. So, what is the prediction accuracy? How the tax is brought to the exit? How many mispredictions? So, I will be mispredicting when the value of i is 8, 16, etc. And at the beginning I have 2 mispredictions here. So, essentially whatever that turns out to be you can calculate that. So, you can roughly expect that it is going to be 12.5 percent. Because you know once out of 8 times you will make a misprediction. Last one is a little bit because of this start up mispredictions and all. Any question on this? Is this clear? Very basic. You do not understand this please ask. So, I will just write misprediction rate approximately 12.5 percent. Although if you are asking the exam I will expect that you keep the right answer. Exactly. Do not put these approximations. So, what about this branch now? So, let us suppose that this branch gets a separate entry here initialize to 0. So, what will happen first time? So, what is the actual outcome for this branch? It is going to be what? Actual outcome on the string. So, there are how many times taken? 999 or 1000? 999. What is it? What if I was what if it was 1 here? How many times the branch will execute? Well if I is 1 that first of all the code is not correct. So, anyways the compiler assume that it is not 1. So, that is why you could see that. Let us suppose that these 2. How many times that code will execute? If I is 1. Well if it is I is 1 then what will happen? Sorry, yes it is correct. So, this will go to once check the compiler will fail. And it will actually not go back it will just continue. So, it executes only once. So, this is 999 and then that is 1. No this is fine. Why do you want to do that? This code is fine. Sorry. This is similar to Google I love. So, in a following way check first and then enter the code. Is there anything wrong with this code? That the checking should be before first. Present the code in this code that is what I am asking. This translation. There is a mistake in this translation. Oh really? Why? If that 1000 was 0 like this. No then this code is wrong. So, compiler can see this actually is a constant. So, compiler would have generated the other code other piece of code that we are talking about when it is 0. I am asking for this particular value of the trip count. Is this ok? Yes. And the branch will execute exactly 1000 times. Out of which 999 times will be taken once it will be not taken last time. Because last time this comparison will actually fail. So, we have given this counter to this branch. First time this branch comes. First time we say not taken. Counter increases to 1. Next time we also say not taken. Counter goes to 2. And then all we are correct. We will all say 1. Last time we will also say 1. Which is a mistake. So, this is highly accurate. This will give you 998 correct predictions. So, here the misprediction rate is 99.8 percent. I am sorry 2 percent ok. Is this clear? The accuracy of the binocular predictor ok. So, it is not very good on branches like this. But it is highly accurate for loop branches ok. So, next one that we are going to look at is slightly more sophisticated. We are going to look at a SAC predictor. So, here we have 2 tables. First one is a pattern history table. And the second one is a branch history table. Oh sorry ok sorry. One more thing I have already done before that. So, in this case with the by motor 1. Suppose I make the counters narrow. I have 2 bit counter. 1. So, what happens in that case? Does it affect that accuracy of this branch? It does not right. It starts at 0. It actually makes slightly less number of mispredictions. You reach 1 ok. So, in this case of course, the not taken if count is not 1. If count is 0 taken if 1 right. So, in this case initially we made 2 mispredictions. This one will actually make just 1 misprediction. And then of course, at the end it will make a misprediction. What about this branch? What do you think about that? So, here what was the predicted pattern? It was 0, 0, 0. Is that a mistake? It is ok right. No I will predict 1 right. So, I am now I am talking about the 2 bit counter. So, for 1 bit how does it change? So, when this not taken outcome comes right. The next one I will actually make a misprediction right. So, essentially what will now happen is every time I switch I will make 2 mispredictions instead of 1 ok with 1 bit counter. So, misprediction rate will actually double down. Is that clear to everybody? Because the count will go back to 0. I need to observe once more before I go back to 1 right. So, here the 2 bit counter was actually protecting against that going to the other side. So, in some cases actually in most cases having a wider counter helps it allows you to. So, these called the hysteresis that is what you are seeing actually. So, you will not change your outcome based on you know occasional changes in your prediction pattern. So, there are these Fourier 0s coming in, but you are not changing your prediction. Of course, you are making a mistake, but you are not making too many mistakes right. Here you make 2 mistakes every time you switch ok. What happens if I make it 3 bits? Why? How will this change? So, initially I will make more mispredictions here right. Because until I reach 4 in that case right. Because 4, 5, 6, 7 are essentially my taken counts ok right above the midpoint. So, I will require 4 observations to get to 4 actually ok. Until then I will make mispredictions ok right. What about this one? What happens here? Any change? Initial mispredictions, but after that I will be exactly like this ok right. So, essentially if you make your counter wider what it tells you is that it takes longer to learn, longer to reach the steady state actually ok. So, what may happen is that here of course, we have a branch which has you know very steady pattern that you can easily learn, but branches are not like this. They will have very erratic patterns. So, whenever you change it will take much longer to change actually because now suppose you have you have a taken string and then you have a not taken string ok. With a 3 bit white count you have to climb down from 7 to actually below the midpoint it takes longer to get there that is the learning time for this predictor. So, as you make your counters bigger there is a loss also you will have much bigger learning time ok. So, now we can talk about the other one. So, we will assume that this is still 1024 entries and 2 bits. This one we will assume also 1024 entries. How wide is the history? 10 bits right law of this ok. So, we have a 10 bit history and I will index the VHD with the history. So, now the analysis is going to be a little more involved because you cannot take the branches in isolation because there may be history aliasing in VHD. But if the two branches actually have similar history that they may actually try to update the same counter in the same level taken ok. So, you have to actually take both the branches together. So, we have to essentially simulate the code now. So, just it does that you have to be a little careful nothing is very involved in this. So, let us see what happens. So, first time you execute this branch right this is going to be not taken ok. So, let us ok first again. So, the same observation again that these two branches are going to get 2 different entries in the pattern history table ok that is for sure alright. It is also going to be 4 entry table with the module hash function they are going to get 2 different entries. So, let us allocate 2 different entries to them ok. So, this is for this branch ok alright and this is for the for this branch ok. So, this branch executes indexes into this entry and is initialized to 0 ok everything initialized to 0 this is all 0 indexes into the first counter right because this is essentially my index to the second level table alright. This is 0. So, I say not taken and I shifted one bit of history which is of course, not taken so nothing really changes still remains 0 alright. And here it still remains 0 because it is actually not taken branch alright. Then shows up this branch ok this goes here right this is 0 this is index to this table this entry is 0 I say not taken. And then I shift in a bit of history which is actually taken. So, it becomes 1 and it gets incremented to 1 alright. Is this here what happened next iteration this branch shows up this is taken ok. So, I go here this is 0 I look up this entry ready not taken and the branch is taken. So, I shift in 1 bit of history here ok and this increases to 2 right because remember that we increment the counter with the old hc entry. So, hc is updated later ok now this branch comes the value is 1. So, index to the second entry in the table this is 0. So, I say not taken right. So, I shift in another bit of history this goes to 1 then this branch comes this is 1 this is 1 I say not taken this becomes 1 1 this increases to 2 ok. Then this branch comes I look up a new entry the third entry I say not taken increases to 1 another bit of history shifting. So, is everybody following what is happening ok. So, how long is it going to take until I actually start making right predictions. So, even before this ok. So, what will happen is that once this branch is easier to analyze. So, let us first look at this entry ok. So, when this is filled up completely with all once because that is what is going to be actually for this branch right the history is all once right even if you move over 50 minutes all once after you write. So, this particular entry the last one right this index is in the last entry. So, this one will gradually build up right after this point once you have filled up this history because after this point this branch will only access this entry there is nothing else that is accessed in bridge. So, this will gradually build up and get to 3. In fact as soon as it crosses 2 I will start making correct predictions. So, how many mis predictions will I make before I get to this point 10 mis predictions to fill up the history right and 2 more to cross the threshold right. So, the 12 mis predictions before I start making correct predictions. Now, of course, there will be another mis prediction at the end of this particular book this particular branch is it going to be right ok. What about this branch this is slightly more difficult actually. So, here what will happen is if you if you move a sliding window right how many different histories can I get starting with of 10 mis there are quite a few right right. So, essentially what I need is. So, this branch will gradually accumulate history then and it will actually index into 10 different entries in VHC right and you take some time for all of them to give me correct predictions. So, essentially what will happen is that all except this particular history let me see if I can come up with the right one. So, this particular history can be history right this will learn to say not taken right the next one is these are 7 7 months next one is not taken ok. So, whichever counter it will index into some counter some whatever value that is ok. We will essentially remain at 0 we will learn to remain at 0 because every time you encounter that history here we will essentially index into that entry and we will not all other histories that this branch can have will be will learn to say taken which was kept in else in there and it will never index into this counter because it will never get this history. So, this is no aliasing in the steady state in VHC between these two branches this branch will modify 10 different entries and this branch will modify some other entry that we never activated that in steady state in VHC. Is this correct? So, can anybody guess what will be the accuracy for this one for this branch? So, can you can you guess how long it will take before I start making what type of predictions right 25 how do you get 25 how do you get this number? So, for 7 it is like for 7 1 it will go into different entries of VHC. Right. Which won't be used in steady state. Right. Then there will be 10 5 for 7 even more right this is a 10 bit history. So, initial time will be used again and again. Yeah. It will be used again. Those won't be. Why you are saying even more does not matter. Initial time entry will be there. Right. We may have to VHT only once. So, why initial 7 why not there is also. Yeah, that's right. I am saying that even we can go beyond that right even the first 8, 9. 9 outcomes right. Yeah. Yeah. So, then. So, the point is that you make some constant number of mis predictions at the beginning and then you lock up. Okay. So, you make all correct predictions after that. So, it's going to be much higher than much smaller than this actually. So, one more thing I want to look at that is a GIG predictor. So, by the way did you notice that actually I make more mis predictions for the for this branch in the set predictor compared to the bimodal right. Okay. So, that's one drawback of these predictors because they take more time to learn. Okay. So, basically when you are accumulating the history until you get to the full history which is essentially the learning time. So, that's the downside of having a bigger history. It will take more time to reach the steady state of the predictor. Okay. So, GIGs. So, here I have a 10 bit history which is global shared by all branches and that indexes into a BHT which has 1024 entries, 2 bits right. So, here also you have to carry both the branches together. So, we have a global shared history here. So, let's see what happens. So, initially everything is 0. Okay. So, this branch first shows up the first instance of this branch. It takes this history which is 0. Uses that to index into one entry of the table which will index the first entry right which is 0. So, I say non-technical right. Then this branch this is non-technical. So, the history is not updated nothing to be done. Then this branch comes it's actually taken history is 0 I look up this entry I say not taken update the history make this one because this branch is actually taken. Okay. Then I come to this branch this is also taken right. So, I look up the history goes to entry one I say not taken I shifting one bit. So, although I am showing that I am shifting it on this side is actually no this is shifting. Okay. This is shifting on this side and the new bit comes in the LSB actually. Okay. Alright. Okay. So, then this guy shows up I index the third entry which is 0 I say not taken. So, that continues right what will happen soon. So, here what will happen is you will fill up this 10 bits much faster now. Okay. Alright. So, after 5 executions you will be all once here. Okay. Right. After 5 iterations of the loop. Okay. And you say not taken, but this is still not the steady state history actually. Okay. Alright. Right. So, when you get there you index the last counter you say not taken. Okay. So, till now I have said all not taken I have not yet said anything taken. Okay. Now, what will happen? So, it will have been shifting. Sorry. So, this thing will continue for couple more iterations after which this branch will be not taken. So, then you will get a new history here 0 here and all once. Okay. So, now tell me how long we take before I make predictions or do you think it is going to be very bad in this case compared to that particular. So, what kind of histories am I going to get here? There will be 11 patterns. There will be how many patterns? 11 patterns. How do you get that? In steady state. How do you get that? 0 and 10 dimensions. So, how do you systematically derive that? So, you have to essentially consider t these two branches in sequence. Okay. So, if you do that let me see. So, this branch first. So, essentially what you have to do is that you take this one, this one, this one, this one and so on. That will give you the full history and then it will slide this time between over that 0, 1, 1, 1, 1 etcetera etcetera. Then again you say at some point you will get a 0. This length this distance between these two zeros will increase now. So, the point is that the learning will happen eventually as you can guess. So, how many histories is it? 11. Zeroes can mean all the 10 positions right exactly. All ones. And all ones because there can be how many? 14 roughly right. Yes all ones. Yeah right. So, what you are saying is that you can get 11 different histories because you can have zeros in any of these positions and all ones. What is that? That will give you 10 different histories and one history will be all ones. Okay right. And they will all not correctly because all these histories zeros in any of these positions will learn to say. Why? Because zero can only be a given place. Zero can only be in even places. Oh I see yeah that is right. Yes. All ones and define all. Right, but the point is that there is a so you can figure that out. What I am trying to emphasize is there is something about this all ones pattern. The problem with this pattern is that what will be the next outcome? When suppose I have all ones I have seen last 10 ones. Can you tell me what is going to be going to come here? Zero. It can be zero or one right. Both are possible actually. So, this last entry will have real term give you correct predictions. Whenever you get a zero it is going to be your wrong prediction. Once we predicted correctly. Because this counter will actually circulate to three. Come down to two sometimes and then go back to three. Other counters will have a unique next outcome. Other history patterns. Okay, so they will all give you correct outcome. But this history pattern will actually have these predictions. So you can work this out you will find that actually this predictor is going to be worse than this accurate. So it says that these two branches are not really globally correlated right. So when you put them together it actually needs to destruct in the last which is happening in the last entry. You are marching the history of two branches. So after this the next outcome may be zero maybe one. So how do you solve this problem? Sorry. So you use the SAC predictor. So here really there is nothing to gain from the GG predictor. So what I suggest is that you go back home. Please walk out this example. If you don't follow. Any questions? Okay, alright. Okay, so last time. That's probably okay. Okay. So last time we discussed this hybrid predictor. Where you can combine a sag and a GAT predictor. So we talked about why you might want to do that also. Because GIG is essentially giving you global correlation. And SAC is telling you about local correlation. And of course a large program would have probably mixed up both of these. So you want to be correct on both. So you have a selector which selects which one is going to be good for which branch. And that's how you combine them. And we also talked about how to update the components and everything. Okay. Yes. Oh I see. I thought I mentioned that. So this predictor was implemented in microprocessor from alpha. Alpha is alpha processor along and available in the market. Used to be designed by compact. And it was called a prudiment predictor. And this is just one example. Today in your every inter processor you will find similar things. The parameters may be different. Alright. Our architectures will be very different. Sorry very similar. You will have multiple components. There is a selector which selects between these components. Okay. So just to go back to the basic scheme of things that we discussed about prediction. So we said that you look up the branch target buffer in the fetch stage. With the instruction program. And you look up the direction predictor after the instruction is decoded. So that was the timeline for BTV and the direction predictor lookup. Okay. So essentially you wait until the instruction is decoded to look up the direction predictor. And the rationale for that was that it only makes sense to look up the direction predictor for conditional purpose. For others there is no point in doing that. So that's why we wanted to wait until we know that this instruction is a conditional branch. The problem is that this may be low performance if your BTV has low accuracy. Because what I am doing is that in the very next cycle I am relying on the BTV outcome to relate my fetch. Essentially that's what I am doing. Now you may say that well it's just a cycle. Because in the next cycle itself I am going to get the direction predictors outcome. Because the instruction will be decoded. But even losing one cycle may be a lot of problem. For example, if you have an alternating conditional branch, the BTV is always low. Because the BTV of it tells you what happened last time. And what is going to happen next time is exactly opposite of that. So the question is how do you really fix this problem? How can I avoid losing this cycle? Because it seems very strange that I cannot do anything. Because the instruction itself has the target for my branch. Okay. And I have a branch predictor sitting in the next pipeline stage. Why can't I just access it in this cycle? So there are, in fact this is what is used in most commercial processors today. You have a fused BTV and prediction predictor architecture. So what you do is you extend each BTV entry with a single bit. To specify if the entry corresponds to a conditional branch or not. And remember that these are going to change. Just one time, right? An instruction is either a conditional branch or not. That's it. It cannot change in future. So first time you encounter this instruction. You can store this particular information once the branch is decoded in the BTV. Okay. So next time, one more. Whenever you get instruction, look up the BTV. It tells you quickly that this is a conditional branch. So what you do is you look up the BTV and the direction predictor in parallel with instruction fetch in the fetch stage itself. Okay. If BTV look up hits and indicates a conditional branch, use the direction predictor's outcome, direct fetch in the mix up. Alright. A BTV entry always holds the target of a conditional branch and is never invalidated unless replaced. Alright. So first time you encounter the branch, you put the target of the branch in the BTV entry. See, here for conditional branches, the BTV entry has a slightly different meaning. It's not telling you what happened last time. It tells you that if you really take the branch, where should you go? That's what it's telling you. And on a BTV mix, of course, you have nothing to do, but what you do is that the direction predictor's outcome is carried forward through the pipeline. It's a single bit. Okay. So it's taken or not taken. And used or discarded after the instruction is decoded. Because at that time, you'd know whether it's a conditional branch or not. And then you can make use of this outcome at that point. Is the scheme created between what's happening between? Okay. I've actually nullified many of those single cycle bubbles where BTV can be very, very poor. Okay, all right. Because I can use the direction predictor's outcome. What's the downside for this? What am I using? Or is this idea so great that I use nothing? In case of multiple branch predictions, there are too many times which type of... No, why? Each conditional branch will get, hopefully, a different entry in the BTV. Well, there will actually. BTV is a tagged structure. So each entry will tell me whether it's a conditional branch or not. Is this great input? That's the first question. What's happening? So I look at the BTV and the direction predictor in parallel. The BTV has a special beat which tells me if this instruction is a conditional branch or not. So if I hit in the BTV and that indication comes from me, I can use the direction predictor's outcome right away. All right? If I hit the BTV and the BTV tells me that it's not a conditional branch, then I'll use whatever the BTV tells me. That's the last time what happened to this branch. If I miss the BTV, then what I do is I really don't know what this instruction is. It could be a conditional branch. So in that case, I carry the direction predictor outcome with me. It's a single bit. Okay? And once the branch is decoded, one instruction is decoded, I'll either use it if it's a conditional branch or throw it away if it's a normal instruction. They are looked up in parallel. Yes. So whichever is slower will... But there are three things now happening in parallel. BTV lookup, direction predictor lookup, instruction predictor. Whichever is slower will determine your cycle. So it may or may not affect your cycle. But there is a major downside of this. What is that? Sorry? What is that? I'll get... Yeah, the BTV entry tells me where to go. Yes. So if it's wrong, then... Yeah, that is true. So it will be wrong in case my direction predictor is wrong, right? Yeah, that's their problem. So how many lookups do I make in the direction predictor? In this key? The number of lookups? Equal to? Number of instructions. Number of instructions, right? For every instruction I lookup, the direction predictor. Why is it bad? I say that it's very bad. Why? What am I using in that? I have a bigger... Why not? Then you need to store it. I mean, do you want it to be stored in registers or somewhere? Even in cache? No, no, no, no. It's a separate structure. It's a separate structure. If I have some area over it, that's okay. Then that should be passed to be accessed. Yeah, so that's what he mentioned, right? There are three things happening in parallel. BTV lookup, direction predictor lookup and instruction cache. Whichever is slower will determine my cycle time. Just the way it's going, right? So how is it accordingly? I can afford it. These are simple tables. Very small. What do I use? I mean, is there going to be one bit for BTV? Yeah. That's one extra use. Are there a number of instructions available in the number of instructions? No, no, no. I think you're not following what I'm talking about, actually. Maybe you've forgotten the BTV organization. I know. So the BTV has what? It has a valid bit. It has a tag. It has a target. And I'm attaching one more bit here, which is conditional branch bit. So every instruction comes, looks up the BTV with the program counter. It may be a hit or a miss depending on the tag. If this entry holds a conditional branch, this bit will be set. So I know if it's a conditional branch or not. And if it is, then I'm also looking up the direction predictor in parallel, my pht, pht, whatever it is there. So I'll use that outcome if it tells me that it's a conditional branch. Otherwise, if you hit somewhere where this is zero, this bit, I'll actually discard this complete lookup, whatever I've got. Because it doesn't matter anymore. And if I miss in the BTV, I really don't know what this instruction is. So I'll carry forward this particular outcome with me. So in the next stage, the instruction will be decoded. And then I'll be using that for discarding it. So this single bit is just two. What is the scope of target? Scope of target. What do you mean by that? Why? Whatever was there before is there here. So we are putting one extra bit. Yeah, it is one extra bit. That's the only over it. And we are mapping here from PC. Yes, from PC. PC is the size we need to use. No, no, no. That is here, this one, right? The number of entries. If you use P bits of PC, it will be two to three entries. Nothing to do with the width of the entry. What do I lose? Ah, that's what I was waiting to hear. You lose battery life. Which is a very big deal. If you are running on a battery, essentially you are expanding a lot of power. You are looking up a branch predictor for every instruction. It's a huge wastage. So here is the trade-off. What is the trade-off? If I don't have this, fuse BTB and GP, I'll probably have lower power power. Which means I run longer. But each cycle may have lower power consumption. So overall energy may be lower or bigger. That depends on how good this is actually. If you can win back those cycles, so that you can actually expend more power per cycle, but you have less number of cycles so that actually total energy is less, then this is a winning design. Otherwise it's not a very good design. Because today you cannot give no energy. That's a very important thing. So you have to keep that in mind that here you are spending a lot more extra energy by looking up the branch predictor for every instruction. So there has to be a trade-off. So often what processors do is they would put a very simple branch predictor with BTB to get some outcome, which may be better than the BTB. And then we will have an elaborate branch predictor in the decoder, which we will look up. So there is a trade-off here. This may not be that good all the time. Let's keep that in mind. So just quickly, how do you decide the next EC in this scheme? So there are now four signals coming in. A kill from execute by stage. BTB kill from decoder. BTB miss from the fetcher. Is conditional branch from the fetcher that's relevant only if the BTB hits. This is this bit coming out from the BTB. That's relevant only if there is a BTB hit. So there are five selections now, target from the execute stage, predicted PC from the decoder, BTB outcome for non-conditional branches, erection predicted outcome for conditional branches that hit in the BTB, and PC comes forward. So I have to select one of these based on these four signals. So what is the selection logic? If kill is one, select target from execute. Because we say that this is a golden rule. This is always correct. So if kill is one, it overrides everything. If kill is zero and BTB kill is one, then you select predicted PC from the decoder. If kill is zero, BTB kill is zero, and BTB miss is one. That means you missed in the BTB, there is no redirection, you have nothing to do really. You will just select PC plus four, but also carry within the direction printer outcome. Because really we don't know what this instruction is. If kill is zero, BTB kill is zero, BTB miss is zero and conditional branch is one, then you will select the direction printer outcome. So remember that this outcome is not really a target. It's not a PC, it's a bit, taken or not taken. So you have to combine that with the BTB entry, whatever is here, to get the final target. And finally if kill is zero, BTB kill is zero, BTB miss is zero, conditional branch is zero, you select the BTB outcome. Because essentially these are the branches, these are non-conditional branches. Like procedure calls, unconditional jumps and all these things. So that's your selection logic. Any question? I think I will stop here. Next time we will look up some of the research materials from this branch.