 this time we're going to be looking at perceptron branch predictors. So far we've looked at a series of saturating predictor algorithms, which take a set of n-bit saturating predictors, and use those to decide whether we should take a branch or not. This time, though, we're going to replace a huge chunk of those n-bit saturating predictors with perceptrons. With our mn branch predictor, we had a huge table that was indexed based on the address of our branch. And then we'd walk down that row and find one predictor based on whatever the global or local branch history was. We'd pull out that one n-bit branch predictor and use that to make our prediction. In this case, we're going to be able to replace an entire row with one perceptron predictor. So we'll still need to use the address of our branch, but that will be enough to find the perceptron that we're interested in, and then we'll be feeding the branch history into the perceptron. So these will work with pretty much any type of branch history that we've got. If we want to use local or global, either of those will be just fine. We just take the branch history that we've got and give it to the perceptron, then we'll figure out how that data can be used to predict what this branch will do. So perceptron is the earliest form of artificial neural network that we came up with. They were invented back in 1962, and they're really, really simple. It really just gives us this nice simple summation equation, which is pretty much just a weighted sum of all of the inputs that you get. Whatever your input x of i is, it's multiplied by the weight of that input. How important do we think that piece of data is for telling us about our result? Add all of those up, and it gives us an answer. But unfortunately, the simplicity also means that they're really limited in how they can do classification. If I have a series of points, and I'd like to be able to classify them and say that, oh, green ones fall in one category, pink ones fall in another one, I need to be able to find a single line that can separate these two categories. It needs to be one nice straight line, and if we're capable of finding a straight line that accomplishes this task, then a perceptron will work just fine. If it turns out that I've got something more complicated, then maybe I can shift my perceptron line so that it fits this. But if I have something too complicated, a perceptron is just not going to be able to find a single line that separates the pink dots from the green dots. I might want something that actually looks like that for my decision boundary. All of these dots represent some point in feature space, just what we get from those x of i's. It tells us something about what this one piece of data looks like. What are the features of it? How similar is it from some things? How distant is it to others? Modern artificial neural network algorithms allow us to do much more powerful things. They do allow us to find nonlinear decision boundaries. So while these algorithms are much more powerful, they're also much more expensive. For our purposes, perceptrons will work nicely, because it will be easy to estimate a straight line. Estimating something curved would be a whole lot harder and might take several cycles to accomplish. We like branch prediction algorithms that can be completed in a single cycle. So we'll use our branch history as the input to our perceptron function. It might take the last n branches, either from global history or local history. Use that for all of the x of i's in our equation. If our branch is not taken, then we'd put in a negative 1 for x sub i. If our branch is taken, then we'd put in a 1. To determine whether we should take our branch or not, we'll just look at the sign of y. If the sign is negative, then we'll assume that our branch is not taken. If our sign is positive, then we would assume it is going to be taken. That would be our prediction. We'd be able to use that information to decide which of the two pieces of code should we go run. Then once we've determined whether we should actually go run that instruction, we'd like to update our perceptron. We're going to go back through and we're going to see if we made the right prediction. If we did, then we probably don't want to change things, unless we have some low value. This was our first time seeing this branch. Then we might not be terribly certain. We probably still want to update our perceptron line, so that it more closely matches the data we've seen. Once we've seen a lot of data, then we're not going to move it around much, unless it's making the wrong prediction. If it is making the wrong prediction, then we need to update it regardless. The update equation is actually pretty simple. It's very similar to what we've got here. We're really just going to walk down the list of weights and update them based on whether the branch was actually taken or not, as well as whether the i-th oldest branch was taken or not. Multiply those together, that gives you the t times x, and you just add that to whatever weight you had for the i-th oldest branch. So update all of those, and then you're ready to go again. Store the list of weights back to the perceptron table, and you're good to go. You're ready for the next branch that comes along. So these can be used in pretty much any of the types of predictors we've seen already. Whether you're just looking at local history, global history, some combination of them, or some really complex scheme that takes into account all sorts of different conditions that you could have. You can use perceptrons in place of the N-bit branch predictors in all sorts of cases. So far, perceptron branch predictors have worked really well. We've been able to get up to about 98% branch prediction accuracy, which is a mere 2% branch mis-prediction. These are some of our best predictors that we've got, but they're not perfect. A lot of the improvements that we've made to them have involved using them alongside other things. Recognizing that because they can only do classification based on a linear decision plane, they don't work well in cases where you can't find a single line to separate the two decision categories. So instead, we can apply other things that we've seen before, even more complex options. And you can start to get really, really good performance out of your branch predictors. A 2% mis-prediction rate means you're only missing one branch out of 50. Those are 49 branches that are running just for the cost of running a regular instruction. We no longer have any delay associated with those instructions.