 recap where we finished last time. We talked about these two level branch predictors, where we have a force level table, which can have multiple entries or can have just a single entry. And the second level table also can have, well the second level table usually has multiple entries. And we also came up with a taxonomy of branch predictors, where you can call the first level table as global or set or power branch depending on how you organize the first level table. That is the first level table you can just have one, one single entry of recording histories. That should be called the global table. If you have the first level table such that a set of branches gets one entry, then that would be called a first set first level table. And you can have a gigantic first level table, where you have all the entry number of entries equal to the number of branches. Similarly, your second level table could be global. Here global means all the table entries are shared by your histories. So, as such there is no specific table for recording the history. So, each history gets the access of one single table. Similarly, you can describe a first set table or power branch table. So, we will get into these architectures today in more detail. And the question is how we operate the first level tables. So, there are two modes, one is static, which we actually do not operate. The compiler generates certain things and when the program starts, we have some way of scaling the compiler generated contents in the history or it can actually learn what is happening and have to be updated. And we also talked about one special case that is g-shared. So, we will discuss again here today. And you can combine multiple of these. So, we will see that. So, in general our direction predictor you can see it as a function which takes your branch PC and global history register and gives you an outcome with a 0 or 1. And so, this one is essentially encoding your history over a sliding of all the branches. And the way it is implemented is that it can have multiple direction predictor components. Here I am showing k of them and a branch. So, each of these will give you a 0 or 1 outcome. And you have a branch PC and a global history register which will fit to a selector which will select all of these outcomes and based on whatever the values of these two things. So, ultimately what you get of course is 0 or 1 outcome. So, you look at today how exactly what kind of selectors you can design, how they actually work. So, it is a basic idea here that I have bunch of direction predictors. And I have a selector which selects one of these predictions. So, we will not try to come up with any function of these as such. We will just select more of these. It is possible to be actually design functions. So, that we actually attach weights to each of these predictions and come up with a function that combines all these things and gives you a binary outcome. So, that is possible also. We will not get into that. So, if you open up one of these direction predictors. So, they look like exactly same as this. They take a branch PC and g h r and g v 0 and 1. And the selector takes these k dimensional vector of 0 and 1s that are coming out of these direction predictors. And it also takes a branch PC. The PC of the branch structure where you can predict and the global history register the current content of that. So, what is d p n? So, we talked about this last time. You have a first level table is called a pattern history table. So, that essentially stores your history patterns. And what you get out of this is a history. So, there are many ways of indexing this table we will talk about that today. And the second level table which is a branch history table takes that history as input. And what comes out is a 0 1 outcome. So, this is the description of the pattern history table which takes a branch PC and global history register. And it gives you the branch PC and the history. So, the PC just passes through. And the branch history table takes the PC and the history whatever is output here. And gives you 0 1. So, what does the pattern history table look like? It is a table of histories. So, this notation says that each history is inputs point. So, we are having a sliding of inputs. We are looking at the last m outcomes of the branch. And how many entries do I have? We have such entries. So, is this notation clear? So, the pattern history table will look like this. There are p entries here and this is inputs. And this one is going to store some history. It may be history of a particular branch. This may be the combined history of multiple branches. For example, if you are storing a global history, then it will be essentially storing the history of all the branches. Whenever a branch shows up, it will store the outcome of the branches. And it will be shifting in this direction. So, whenever we see a new branch, we will shift this by one position and shift in the new branch to the other position. So, this is the pattern history table. Similarly, the B H T is the table of saturating counters. Each one is cubits white. And it tends to have to be the power of n entries. Why is that? Why is this relationship? If I have n bits of this here, why should we have to do the power of n entries? Even if you have forgotten, can you reinvent that correlation? Why is that? Sorry, total possible patterns. Why is that important? That is what I use to index here. I use one of these entries to index in the B H T. That is why I need 2 to the power of n entries here. This is n bits long. So, I will get 2 to the power of n entries. This is cubits. So, is this anatomic here to everybody? So, these are very generalized direction predictor. And what we are saying is that we will have many of these. And there will be a selector which will be choosing one of the predictors. The bimodal predictor is a special case where the B H T is not. There is no B H T. So, essentially what happens is that the B H T takes a branch PC. This is non-existent and gives you a 0 1. B H T does not exist actually. So, does anybody remember what the bimodal predictor actually does? So, by the ok sorry before that. So, I say that these are actually encounters. What are they counting? They are going to remember. Counters count. What are these actually encounters counting? How many times the pattern is there? Is it? So, given a particular entry here, I have a corresponding counter here. What is that counter counting? He says that it is the number of times this particular pattern is that correct? How do I update that counter? Does anybody remember? It always increments. It always decreases. What happens to the counter? Somebody. It takes an increment. Sorry second. It branches taken an increment. Otherwise we decrement. So, now can you tell me what if I look at the content of this counter at any point in time? What does it tell me? The trans increment one taken and not taken. After the pattern is over. So, it is telling me the difference in the number of times I have seen H 1 as opposed to I have seen H 0. H is the content of this. So, in other words, this should be telling me the probability of let us say getting a 1 given H. So, if I want to extract that from this counter. Since it is a new bit wide counter, I can only say anything about this particular occurrence in the last 2 to the power of q occurrences of H. I cannot tell you really what happened for future. So, out of 2 to the power of q occurrence last occurrence of H, I can tell you how many times 1 happened minus how many times 0 happened. Is that correct? So, the binomial predictor does not have a phd. So, what does it do? So, binomial predictor has a phd. What is this function of pc? What function makes sense? Last n bits. Last n bits. Yes, you will remove the last 2 bits because they are always 0 and then take the remaining n bits because of the lower n bits. So, essentially pc should be 0.2 and it will be for n minus 1. So, that is. You just want to pick n bits. Exactly. So, is there any justification between the lower n bits? Does anybody have an answer for that? Why do not I pick the upper n bits of pc? Maybe the problem for more instructions. They will be seen for few instructions. Exactly. So, they may be same for 2 branches. There is a very high likelihood. If I pick the upper n bits, they may be same for 2 branches. Whereas, if I pick the lower ones, since they are changing most, the chances are that the branches in a. So, essentially what am I saying? I am saying that I can guarantee allias free matrix on this table. 2 power n. If I within how many instructions? 2 power n. 2 power n instructions. 2 power n. 2 power n instructions. 2 power n. Out of the local of 2 to the power n instructions, however, branches you give me, I will give you an allias free matrix here. There will be no collision. That is kinetic action. And that essentially determines your, what size of region you can accurately predict branches. So, whereas, if I pick the upper n bits or if I move this n bits towards the upper side of pc, I will start using this property gradually. So, chances are that 2 branches may collide on this thing. Which is not good. So, otherwise here, what this counter is counting is essentially that, given this particular branch, how many times it has been taken, how many times it has been marked. And looking at this counter, I can know in which direction is biased. So, that is why it is called a bimodal counter. Because it works great for branches which have bimodal behavior. So, either highly bias force taken or highly bias force marked. The branch is somewhere in the middle, you will like to make this prediction. For example, if you have an alternating branch, which is like this 1 0, 1 0, etcetera. What will be the prediction accuracy for this branch? 50 percent accuracy. Is it 0 percent accuracy or 50 percent accuracy? Depends on the initial value. Sorry? Depends on the initial value. What is this 2, let us say 0. Then? 80 percent. 80 percent? Ok. When do you get 0 percent? If you just look at the previous value. What is that? And print it based on the previous value. No, no, no. I am sorry. I forgot to tell you. I thought that was where. So, essentially what is the prediction function? If my count is greater than or equal to 2 to the power of u minus 1, I credit 1, else credit 0. So, if I am above the midpoint, I say it is 1, otherwise it is 0. Why? The reason is that, whenever I get a taken indication, I increment the count. So, when do I get 0? If I initialize the count after 2. Given this particular history. So, if I initialize and counter u to the power of q minus 1. Minus 1. Yeah, minus 1. Yes, u to the power of q minus 1, then I am going to get all now. So, depending on initialization, you may get 100 percent wrong or you can get 50 percent wrong. So, the best thing that you can do is 50 percent right. It cannot be made on the map. Whenever you initialize, you can change that actually. So, that is the total by model predictor. So, let us take a look at now the different predictors. So, before I go there, I just want to give you an idea. Maybe we can take this alternative branch a little more. So, if I take that branch also, which has this alternative matter. What will happen now? Do I get anything better? So, this takes function here. So, although here I mentioned branch PC and GHR. In most cases, we will not be using the GHR at all. We will be using the branch PC only. So, here there is a function of PC. So, in this case, this function is going to take the lower events, lower log events after removing the branch events. So, what happens to this one? Let us say a branch has this particular history. Can you hope of improving here? Hope that the by model predictor. Forget about the first game outcomes. Assume that we are in steady state, the history is filled up. There is a particular pattern 0, 1, 0, 1. So, remember that we are sliding a M bit window here. So, how many possible patterns are there? Different M bit patterns. 2. What are they? Starting from 1 and then starting from 0. Let us assume anything about M or even let us assume something. Let us suppose the name is even. The other pattern is, these are the 2 possible patterns. If I slide the M bit window over this history. So, what will M to different location? First of all, both of these will reside here. In the same entry in the page. So, let us suppose that this is a branch. So, we may either see this pattern sitting here or we may see this pattern sitting here. So, then both of these. So, these 2 patterns will get to different counters here. So, let us suppose that this counter gets attached to let us say, which one is a bigger pattern, this one. So, this one will have the smaller pattern. So, let us attach this one here. So, let us suppose that this one is a bigger pattern. So, what will be the contents of these 2 patterns? So, both will be decrement because both have. Is it? Upper one will be decrement. Upper one is? Yes sir. Upper one is. Upper one is thing. Yeah. Upper one is. No, upper one is not decrement. Yes, what will be the content of this one? It will be. Second one. Second one. What is it? Still, right. Still. You know what this is. Max. I mean it will. It will keep on. Keep on. Right. So, in a steady state once I have, once I have crossed the midpoint, I will always give you the correct prediction of this branch, all the time. Is it really? So, I will just go through this taxonomy. So, first one was PAP, which essentially means that, have a power branch patterned history table and a power branch BHT. So, let us see what it looks like. So, this is my BHT, I have MBs of history and I have PNPs here where small p is equal to the number of branches in my program. Now, clearly given you know if you put yourself in the position of a processor designer, you have no idea about what programs that are going to run. Probably there will be many programs that will run in this processor. So, it is actually not possible to estimate small p. So, what will you do here if you want to create a BNP predictor, what is small p? This is going to be a hard coded predictor right, it is not that you know before running a program you can figure out what small p is and you can increase your size of predictor, they are not possible. So, what is this? What will you do? What is the constant? Well, no I know the BNP predictor worked in actually, what is the worst case? For each instruction. For each instruction right that is the best I can do in fact here. So, B is equal to essentially what can be called of size of my PC, shifted by in the law to predicts less right. So, that is my first level table. So, each branch gets one entry. So, that is the B stands for branch each table. So, there cannot be any position here can be. What does each entry point to? It points to a table, it points to one and this BHD is QB small and 2 to the n which is fine 2 to the n which is. So, whenever you get this code. So, this one is branch PC. So, whenever you get this branch, you have as many BHDs here right small p number of BHDs here, you look of that corresponding BHD and you will index with this particular history and BHD will give you lot. So, that is the size of your PAP predictor. First level table is MP bits and the second level each table is Q times 2 to the m bits, you have E such tables. So, if you assume that you have the capital N is the worst case number of some kind of instructions, which will be probably 2 to the power of the case of PC, the BMA plus LQ to the power of M bits. So, gigantic predictor very much. Is this there for PAP equation? So, now PAS essentially tries to collapse some of these BHDs, what it is doing is it is keeping the PHD unchanged, there is one BHD for a set of branches. So, essentially what you do is. So, previously we are taking a PC and indexing into a BHD, we are figuring out which BHD axis based on the branch piece. Now, a bunch of branches will actually use the same BHD. So, essentially what you do is you apply hash function on your PC to figure out which BHD axis. So, this is what you get, you get P over S number of bits and this is your total size of your PAP. Now, PHD says that well I just give you one BHD. So, that is the PHD predictor, whatever history you have here you use that index into the same, there is only one BHD, it is a global BHD shared by all this. So, what are the tradeoffs here? So, here essentially you give complete isolation, a particular branch cannot learn from the area of other branches which may be good or bad which is good if you have destructive interference from branches, which is bad if you have polarity branches, then you will never be able to learn the polarization which you saw in the last class. Here you start to have some amount of shared with the histories. So, essentially what now can happen is that two different histories may actually reinforce the learning of a particular counter that may start happening now and that gets even further reinforced here, but remember that the negative side is that there may be destructive interference, a particular. So, there will be two branches, they have different behaviors. So, one may be doing the following that 1 0 1 0 1 after that somebody has a 1, some other branches is 0 off in the same history, they will have a big interference in the same counter because 1 0 1 0 1 this particular counter will sometimes get increment, sometimes it will become that. So, there is a chance of making these predictions. Now, if you try to use the side of your ph, so you can do the same thing, you can say just like the ph you can say that well, I am going to give instead of giving each branch one history entry here, I can give a set of branches one history entry. So, that is the step which has the first level table is for entries for a set of branches, whereas the second level table is again part branch that is possible actually. So, essentially that each entry will still eject into a one phd, then you can get this kind of a size for the projector, you can assess where both phd and phd are each entry is for a set of branches and for a set of histories and you can do stand where, so this is one of the most popular predictors actually, where you have one global phd made by a phd. So, it looks like this actually, where the only thing is that p here is not really equal to the number of static branches, some number. So, set of branches will map one entry and then you can come to the size. And then you can make the phd global. So, what is the global phd? It is just one entry, it is often called the major global history register. So, here small p is equal to 1, so this is my phd. So, gap would essentially have still, so what would be what would gap would be? So, you have an array of phds each will be ejected by this content. So, these are all the people are going to increase and how you pick phd to index based on the branch pieces. So, each branch gets a separate phd, then you can try collapsing the some of the phds you get a gas predictor and finally, you can have a GHG predictor that is again a very popular predictor where you have zero phd. So, notice that when you make the phd global you do not use the branch pieces, except probably in these two cases where you select the phd based on the branches, but here there is no use of the branch pieces. So, this predictor is very good in discovering correlation between different branches. For example, we saw last time, so we go back and revisit that example and see actually GHG predictor will give you very accurate prediction of that all right. Any question for this? It is clear everybody how I update the phd phd and how I actually make a prediction. So, given a small piece of program you can calculate the branch misprediction rate for a given predictor I would assume that. So, there is special predictor called G share is very similar to GHG except this index function is changed a little bit. It does not directly use the GHG you have to use the function of PC and GHG. So, essentially what you do is you prepare a MD hash of the PC and absorb that with the GHG I mean I am sure you can do some other function, but the way G share was predictor G share was proposed for study was like this which GHG index with branch PC absorb G share. So, here you essentially shift out the last two bits and take the next 10 bits to absorb the G share. So, how do you update the direction predictor? So, remember that your branch gets executed in the x state of the phd before that you do not know the correct outcome of the branch. So, clearly you cannot update the predictor you do not reach the x state you execute the branch and now you know what outcome of the branch is you can go and update. So, first the phd is updated by indexing it with the old phd. So, let us take an example suppose I have a hash predictor. So, there is a branch with some PC which I have used applied by hash function and it gave me this particular ng. So, I get an ng history out of it and this is I use that index into my GHG which is 2 to the power of n increase and based on this counter content I gave you a prediction whenever the decode stage when you look at the predictor. So, finally, I execute the branch in the x state I figure out if the prediction was correct or wrong. So, now it is time to update the predictor because that is how the predictor is going to learn the correct thing that we keep on updating continuous in the correct prediction also the correct outcome. So, first I am saying that you update the phd by indexing it with the old phd entry by old phd entry I mean whatever phd entry you got when you index into the predictor at the time of decoding why is that why does it go now look up the phd whatever entry I get here I use that index in the phd and increment it if the branch is taken decremented if the branch is not taken why can't I do that why do I have to remember my phd entry how can it change. You know instruction which we have just passed may have changed the. You cannot change the predictor before the x stage that is not possible what do you have to change the predictor. No the new instruction which I have come it may have it will also drag in the my phd or something like that. It predicts it does not change in the predictor it does not change in the predictor. Phd entry will be different function will be same even different. That is for different branch. So, we have a branch instruction which went through the pipeline right. So, I predict the branch here by looking it up now the branch which is next stage executes I know the correct outcome I should be updating the predictor now. So, what I am asking is that when I predicted the branch here I will index to the phd right I got some history at that point. I have said that I have to use that to index in the phd now I cannot just go and look up the phd and get the history and index of phd why is that you are saying something. The new instruction which we have just passed just passed ok. Yes. Yes, but it does not update anything right. No it does not modify anything. It does not modify anything. It is using phd table. Yes, it index in some order you see give order prediction it does not modify anything it does not make any sense at all. Does it make any sense why am I talking about this problem or it is not a problem you think. The same instruction is getting passed. All right same you see same you see. Yes. But remember that the branch predictor is not modified until this stage. I cannot modify the predictor because I do not have anything to modify with. I can only read the context that does not make any difference. Previous instruction. Previous instruction ok. Can I submit a previous one here. Yes. Can you enter to can modify the phd. Right. So, if this instruction indexed into the phd at the same history it would have actually modified the history before it gets a chance to access actually it gets a chance to modify. So, essentially what is happening now is that in this particular cycle this branch is making a prediction while this branch is actually updating this. So, depending on what happens first you may or may not get a correct outcome. So, essentially the way it is going to work is that this modification is going to happen actually here in this cycle after the branch it has completed. So, you cannot really rely on the current content of phd because that will lead to a wrong history if you update the phd with that. You have to take up phd contents while you make a prediction because next time what am I trying to do actually. Next time when I encounter the same history I should be able to give up my predictions actually. So, which is why I have to copy this phd along with the branch instruction that we can go through the pipeline and that has to be fixed to index the phd and update the prediction. So, yes this means as if we are running the same branch instruction. No, it is not it is a hash function there will be collisions but forget that. There will be multiple branches mapped to the same history actually that is possible. But, I mean they need to be 2 to the power m this or 2 to the power m p this thing because we are. So, why this is just p entries right and p is not equal to number of branches. So, my hash function is I am taking the lower log p bits of the pc after removing the last 2 bits. So, that needs to be called yes the lower log p bits have to be called yes that is possible yes. So, this is the equation this problem why you have to carry forward the phd. So, you first modify the phd and then you go and update the phd. So, how do you modify the phd? So, in the last statement you increment this counter it is not taking a retimbler counter. How you modify the phd? You shift this out by 1 minute and shift in the new outcome 1 for taken 0 for not. Now, the question is why do I do it in this order actually or does the order matter at all? Can I update the phd first and then the phd? Does it matter? Yes. So, is it ok? Can I update in arbitrary order? No why not what is the problem? It will affect the next. So, if you assume that this is roughly summed with respect to other branches. So, why can't I flip this order? What can I? It can maintain the old phd and take separately and I think it should be fine. It should be fine exactly. So, since I am using the old phd and it does not really matter which order I access them. In fact, I will go ahead and update any parameter to save time. So, the old phd is only used for indexing the phd. Yes, but while updating the phd you will not use the old phd store to. No of course not. Yes, I will index into the table and whatever I get we shift it out and the new entry will be there. So, this order is not important. In fact, I will go ahead and update them now to save time. The next question that arises is should I update the predictor with the predicted outcome? That is when I make a prediction here, should I update the predictor at the same time immediately? Is there any gain of doing that? Because it seems so wrong. I do not know what my predictor outcome is, but that is correct or wrong. Now, I am asking that can I update the predictor with the wrong outcome? So, there are pros and cons. Can somebody think about it? Now, what I am saying is that instead of delaying the update in x stage, can I update it right here? Because the predictor is going to give me a prediction. The predictor thinks that the branch is taken. Can I take that outcome and actually implement this order and change this phd? So, that is one big problem that if the predictor is wrong, then we have corrupted the state of the predictor. The predictor is wrong and correct. So, that is one problem. Is there an advantage of doing that? Anything else? Next instruction will be. Next instruction will be? If I do not do it, is there a problem? So, if I update it correctly, then you are saying the next instruction will be a benefit. Why is that? Because of the previous prediction, otherwise when the second instruction is using the prediction of the previous one. So, you are saying that in the next instruction in the branch match to the same entry, this may actually benefit by seeing the new prediction. If there is a correlation between these. So, can you think of an example of this? So, to help you, suppose you have a very long pipeline. The time from you make a prediction, the time you get to update the predictor is very long. Many cycles are there immediately. So, very deep pipeline. So, this is the same branch I am executing over and over, but I have several instructions in between. There is a fall to body. So, let us start. Let us say, first branch shows up. Nothing yet. So, that will be all 0s. I initialize everything with 0 x. So, the index is the first entry. So, then the branch goes on to the pipeline. Before it gets to x stage, I will fix the next for loop. That is possible. Because I am actually fetching through the loop, I may actually end up fetching many iterations even before the first loop branch executes. That is possible depending on how many high studies I have. The next branch shows up. It still sees all 0, where we need to do the same raw prediction. So, is it here now that we actually have helped to update the predictor, provided I had a correct outcome. So, that is a very common example of. Sorry, I gave a very simple example here. You have a complicated branch where you have two branches which are correlated to each other. So, you want to update your predictor with the first branch outcome even before the second branch actually predicts. So, the second branch may have a higher chance of predictor. But of course, this is all true if I update correct. What happens if I do a wrong update? How do I fix the predictor? I have to fix the predictor now. So, finally, I go to the x stage and figure out that, oh, I have this prediction. Now, I have two things to do. One is, of course, I have to remove some instructions that are fetched strongly. Second is, I have to go and repair the predictor now. How do I do it? Yeah. We have known phd. But on the wrong path, I might have updated many phd entries. So, Ishtag. Ishtag. What? So, I will restore all of those one by one. Flash everything. All right. How do you do that? Yeah. Keep a check point. Exactly. Keep a copy of this phd. And just copy the phd on every branch. Whenever I have a prediction, I will copy the contents of the check point. It is not done in any processor because of the demo of it. So, essentially, you are check pointing in two bits. Phd is very large in size. And for every branch, you have to have a check point. Because any branch can go wrong. Whichever branch goes wrong, you have to restore the check point of that branch. But what is done is, if you have a GIG predictor, you have just one phd entry, right? That is easy to check. So, in a GIG predictor, you will often find the GHR is a little bit speculative. Because it helps a lot. You copy the GHR, it just embits. That is easy to copy. What about the VHD? So, there are two questions last one. Yes, regularly updated predictor. Do I at all update the VHD? Does it make sense? Does it have an advantage? And if yes, how do I fix it? If I update it in the wrong way. So, first you might want to think the easier one. Suppose I do not update the VHD. What do I use? Do I use anything? And assume that you are in the steady state. Do not ever think about the corner cases. The code starting on anything. In the steady state, it is then the advantage of updating the VHD. I have a taken prediction. I go and increment the VHD by once. If I am correct, do I gain anything? I do not, right? My prediction still remains taken. I increment it, right? If I did not increment it, it would have been taken anyway. So, it does not really bind me any extra accuracy by incrementing or decrementing the VHD entry. If I have a not taken prediction, even if I do not decrement the VHD entry, I still get a not taken prediction next time. I was getting a not taken prediction. So, VHD is never updated, right? You do not do that. Because it does not bind me any extra accuracy. It actually gives a VHD header that how it is like this. So, whenever there is a speculative update or a predictor, you only update the VHD provided you have a single entry. Or maybe a small number of entries. Otherwise, you do not know that. Any question on the updation of the predictor? Is it clear? All right. So, I wanted to talk about one hybrid predictor where essentially we have two components and there is a selector which will select one of these. So, today you find these predictors in pretty much every processor. And the goal of having a hybrid is that certain predictors try to address certain types of branches. And if you want to increase your coverage, you will probably be combining multiple different types of predictors. So, a very common example is combining an SHG predictor with a GHG predictor. So, just to remind you what they require. So, this is an SHG predictor. So, you use a PC or hash function of PC to index predictor. And the GHG predictor has a single phd engine which is the global history register that indexes into a phd. That is a GHG predictor. So, as you can see here in the SHG predictor, you only learn the correlation between different branches by accident if they actually match to the same engine. And that learning may either be destructive or could be constructive. You really don't have control over that. So, these ones are often called local predictors. They are essentially designed to capture the pattern of a single branch. Also here you are mixing all the history of all the branches. So, these are actually going to capture global correlations. These are often called the global predictors. So, it makes sense to combine these two and have a single predictor. So, that you can predict a globally coordinated branch strategy and as well as locally coordinated branch strategy. So, it combines a local and a global predictor. So, here we have the parameters listed. So, this is exactly what was implemented in the alpha 212 64 processor. So, SHG has 1024 entries that is a small b with 10 bits each that is small n and the second level has 1024 H3 DHG. It has to because that is 2 to the power of 10. Each being 3 bits that is Q. So, these are 3 bits actually come to us. These are 10 to the power of 10. The GHG has a 12 bit global history register and the 4906 H3 DHG which is 2 to the power of 10 each being 2 bits. So, whenever you have a branch what will you do? You send a few to the local predictor. It will tell you something 0 or 1. You also activate this particular predictor. It takes the DHG and GHR to tell you some of it. It will select one of this. So, there is a midder predictor or the selector. That is a bimodal predictor with 4906 entries indexed by GHR. So, there is a bimodal predictor sitting here. These are 2 bit counters 4906 entries indexed by the GHR. And each job is basically to tell you which one is correct or which one is likely to be correct. So, that is why it is called a midder predictor. It is predicting the predictor of which one is correct. So, you now need to carry 2 PhD entries around the pipeline because you have to update 2 things. You have to copy the GHR. There is also a copy of the PhD entry here. When it comes to update, we leave the other outputs. How do you update this components? The final branch executes. You know the correct outcome. What is the update to do? What makes sense? Do you update both the predictors all the time? So, there are 4 situations. The sad is correct, the gag is wrong. Sad is correct, gag is wrong, gag is correct, sad is wrong, both are correct, both are wrong. Tell me what we do in each of the cases. Also, you have to figure out how to update. So, essentially how you update the chooser, the selector. So, there are 3 things you have to update. How you update them? So, let us say the first case. Sad is correct, gag is wrong. So, we will update the gag. We do not update the sag at all. Won't you reinforce its learning? See, if you take the sag in isolation, if you add only that predictor, won't you update when it is correct? You should, right? You always update the predictor. So, you always update the sag component and the gag component with the correct outcome. Always. What about the selector? When both are outputting same, like both are correct, we won't update the bimodal predictor. And when both are wrong, even then, we won't update the bimodal predictor. What's that? Then, it makes no sense at all. Because when incrementing both, we are degrading both, right? So, when the sag is correct, gag is wrong. What do you do to the bimodal predictor? We increment the counter of sag. Well, the counter is only 1. The g-chart, right? g-chart is used to index into the bimodal predictor. So, yeah. When sag is correct, we decrement the counter saying that we will select the sag from the gag. And if gag is correct, we will increment the counter saying that... So, you do the opposite thing. Whatever way it is. Exactly. So, when sag is correct, gag is wrong. I will go and increment the counter this way. All right? And when the gag is correct or the sag is wrong, I will decrement the counter. So, what will be the selector protocol now? So, whenever I get a counter plan, I will compare it against the midpoint. And if it is over the midpoint, I will choose the sag. Otherwise, I will choose the gag. All right? And when both are correct, we do not update the selector because the selector has really nothing to do in this case. Both are correct. And when both are wrong, then also I have nothing to do with the selector. Okay?