 The simplest type of dynamic branch predictor we can have is a one-bit branch predictor. This just keeps track of what this branch did the last time we saw it, and predicts that it will do it again. So if the last time we ran this branch, we took the branch, then we would just predict that we'll take it again next time. If we didn't take this branch last time, then we'd just predict that we won't take it again next time. Alternatively, you can think of this as changing its prediction every time it mispredicts the result. You can think of this branch predictor as though it was represented by this type of finite state automata, where we have a taken state and a not taken state, and we just switch back and forth when we make the wrong prediction. This is a nice simple example of a local branch predictor because we're only looking at the local history. We're only looking at what this one instruction is doing. We're not looking at anything else. These actually work reasonably well, but they don't work well in a handful of cases. One of the most obvious cases where these ones don't work so well is when you have a nested loop. If you think about running your inner loop, you've been running this loop a whole bunch of times. It's been predicting that correctly. It gets to the end of the loop, and it's going to predict that you take it again. But this was our last time through the loop. We don't want to go run it again, so it predicted it wrong. So there's one misprediction. Then we go to the outer loop. It cycles through. It says, oh, hey, go run that inner loop again. So now we get to that branch again, and last time through we didn't take that branch. So it's going to predict that we don't take the branch again, even though we know we need to go run that inner loop again. So in this case, our one-bit branch predictor will make two mispredictions in a row. But if we just move to a two-bit branch predictor, that will actually solve that problem for us and some others as well. So this time we're only going to change our prediction between taken and not taken if we make two mispredictions in a row. As long as this branch is normally either taken or not taken, it will stay in that kind of state. You can think of our nested loop again. When we're running our loop a whole bunch of times, we end up in this state because we keep taking the loop. When we get to the end of our inner loop, then we make a mistake. We predict that it's going to be taken, but it's not really, so we move over to this taken state. Our outer loop says go back, run the inner loop again. Now we get back to the branch. It says, oh, we usually take this branch, so I'll predict that we take it again. This time it predicted that correctly, and we'd move back over to the first taken state, and we'd just sit there as we go through that loop again. It's still going to make a mistake when we get to the end of our loop, but that's only one misprediction compared to two. This method actually works pretty well. We get about 93% accuracy on a standard benchmark for this. So that's a lot better than the 50-60% we can get by having some sort of a static prediction method. That first one that we talked about is a one-bit branch predictor. It just has one bit to it, and that's it. We dedicate one bit for each of the branch instructions in our program, and we just let it keep track of whether we're taking this branch or not. The second one that we look at is a two-bit branch predictor because we dedicate two bits to each branch instruction. In general, though, we could have as many bits as we'd like dedicated to this task, so these are just a couple of options in the general class of end-bits saturating predictors. We could have one bit, two bit, three bit, four bits, but we don't get a whole lot more improvement by adding more than two bits. We get a little bit more improvement with three bits than even less with four. So generally, if we're going to build one of these, we're interested in two maybe three-bit branch predictors. Alternatively, we could look at how branches interact overall. We could keep track of the history of all of our branches and use that to make predictions. You can reasonably imagine that some branch results are correlated with neighboring results. Maybe if we just keep track of what all of our branches have been doing, that we'll get slightly better results. But it turns out that this doesn't work really well on its own. It can, however, work really well if we do it in combination with some of the local history that we had before. So we can talk about an MN branch predictor where the MN branch predictor allows us to select between two-to-the-M different end-bit branch predictors based on our global history. So that first one-bit branch predictor is the same as having a zero-one branch predictor. We have no global history. We just have one bit that we're using for our local history. That two-bit branch predictor we looked at is the same as a zero-two branch predictor. Again, we have no global history that we're using, but we're using two bits for the local branch predictor. One of the more popular methods of implementing these is a two-to-branch predictor, which would give us four different two-bit branch predictors to choose from. We choose which of those four branch predictors we're going to use based on the last two branches that we've either taken or not taken. Turns out this works really well. You can get up to 96% accuracy. So that's certainly more than the 93% that we had before. But that's still four more percent that we'd like to improve. So we'll look at some other ways that we might be able to improve that later as well. However, we don't really want to provide a branch predictor for every address that could possibly contain a branch in our hardware. That would be a whole lot of branch predictors, and we're not really likely to have that many branches in our code. So what we generally do is come up with a branch table and we'll index it based on our branch address. We'll pick one of those MN branch predictors based on what the address of our branch instruction is. Hopefully we don't have any conflicts where two branch instructions map to the same branch predictor. But normally we don't have a whole lot of branches in our code, so that's not likely to happen. This will allow us to implement a reasonably complex branch predictor just using a limited amount of hardware space though. We might only need 2-4K different elements in our branch prediction table, each one of which has one of those MN branch predictors in it. And that 2-4K would be enough to satisfy our entire architecture. That would be enough to cover all the branches that we're likely to see in any given time.