 Today I want to talk about hill climbing, which is a basic form of search, which is a basic thing that we use computers for. This is part of the class that we're working on in computational thinking and programming. So search is basically the idea of trial and error. Do something, see if it works, do not do something else until you find whatever works. Once you've found whatever works, you can kind of ignore all of the errors and just go straight to the solution in the future. So things are better. People used to say that computers can't do anything new, computers can't create anything because they can only do what you programmed into them. And that's true, but we can program them to search so that they may well, in fact, by trying a bunch of combinations and seeing which ones work according to however we defined working for them, they may well and do regularly find combinations of parameters that produce values that we ourselves didn't know. Search creates knowledge and it's the reason why it's worth understanding something about search in general, searching computers, searching in people. So the way we look at search test to basic parts, it has the problem space, which is the thing that you were searching, and then it has a searcher, which is actually a method of moving through the problem space. So a problem space for our purposes is basically a function. You have some unknown function, you're giving it a bunch of inputs, and you get back a number, an evaluation, a score. And your job as a searcher is to find the combination of inputs that will produce the best score. And if not the best score, maybe some good enough score, depends on the problem. So the way we characterize problem spaces is number one in terms of how many inputs to the function are there. Is there just one function, one parameter that you can vary to the function? That would be a one-dimensional search space. Are there two dimensions that you can three, four more dimensions that you can vary? That's the dimensionality. On each dimension, then there's a question of how many different values might you want to consider. Some things like price or something might be essentially continuous or very fine grained, penny more, penny less. Other dimensions like red or green, chili, are more discrete. There's only a couple of choices. So that goes to the granularity of the space. And finally, there's the ruggedness that if you've got some set of parameters like I'm willing to pay $0.59 for some green chili, and you say, well, would you pay $0.60? Would you pay $0.59 for red? You've got changing variations. How much is that going to change the answer? So if you can make small changes in the parameters and usually get small changes in the answer, that's a smooth problem space, a smooth landscape. If, on the other hand, making small changes in the parameters causes huge changes, cliffs, and peaks in the function value that come back, that's a very rugged landscape. Smooth landscapes are going to tend to be easier to search. Rugged landscapes are going to be harder. OK, how are we going to search it? The basic idea of search or the basic idea of hill climbing is this. You start with some set of inputs, whatever it's going to be, however many dimensions the function needs. You pick values for all of them, and then you try to tweak it. Change one dimension, one unit. Up, down, up, down, up, down. And look to see if you can find an improvement. So that's what makes it climbing a hill. It's as if you've got a guy sitting somewhere in space, and he's looking all around him and saying, well, that one's uphill, so I'll go there, and then I'll look all around it again. Well, that's uphill, and so on. So hill climbing is an example of what's called a weak method. Weak meaning it makes very few assumptions about the problem space. If we knew certain things about the problem space, if we knew like the output was linearly related to the inputs, whatever that means, we could do much better. We could come up with very clever solution techniques that would just essentially go right to the answer, or very quickly find the best answer. But a weak method doesn't make any assumptions. It doesn't assume that the landscape is smooth, for example. It might be rugged. So a weak method is something that we can apply to almost anything. That's good. The flip side of the weak method is it doesn't actually guarantee to work. We'll see that in a second. So in order to do this idea of saying, here I am, that's got this score, here's a bunch of variations, the searcher has to evaluate the function at all of those neighboring spots. And those evaluations take some work that might be what depends what the function is. So typically, the cost of searching is going to be dominated by the cost of doing these evaluations of different varieties. I mean, if evaluation costs were zero, well, we could just try all possible combinations. Pick the best. But evaluation costs are not zero. It always takes at least a little bit of time. And in some cases, you may have to actually build models or do a physical experiment to determine how good a combination of inputs are. The evaluation cost could be very expensive. And then finally, we have to have a stopping criteria, an idea of when we're going to give up. Do we know that it's possible to reach a value of 100? So we're trying to find that. Are we just going to search for two days and then take the best one we've got? What's our rule? OK, that's search. Let's look at some examples. All right, here is a one-dimensional problem. So now, there's just a varying the x. And you get a height back. And the job is to find the highest height. And here, we've got a guy. He's called the steepest ascent hill climber. And in fact, he's just done the job for us. So if we start him over, all right. So he was stuck in some random position. If we do one evaluation, he didn't move. Now, he did. One, one. Why is he only moving every other evaluation? Because he's got to look to his right, look to his left, evaluate both of those, and decide which one is the best. They both might be uphill, in which case he wants to take the one that's more uphill, steepest ascent. So in fact, he's making a move every other evaluation here. And this works great. The flip side of making such good progress is that it's easy to mess up the steepest ascent hill climber. So if we put a few little hooks or little nasty spots, and now we try to let this guy go, he climbs. He gets stuck on this little teeny peak. This little teeny peak is what's called a local maxima. Local meaning every possibility one step away from where he is is worse or no better. So it just sits there. But it's not also the global maximum. So the global maximum, in this case, is our peak up here. It's the best possible function value that we would like to get, if we could, over all possible values of the function. So here's a little thing where people get the quiz question wrong, that this point up here, there. I stuck him up there. Is that guy at a local maximum? Well, people would say no. He's at the global maximum. But that's the trick. The global maximum is also a local maximum. Local maximum meaning all the neighbors are worse or no better. That's true. Left, right, both worse. It's a local maximum and the global maximum. All right. But this is a problem. Let's let this just go. Most of the time now, if we start the steepest descent guy in a random place, he got it there, he's not actually going to get to the global maximum if there are these little irregularities. And we can do better than this if we admit that maybe being absolutely strict but only going up is too strict. Maybe if we were willing to go down a little bit, we could then go up much further. And that's what stochastic hill climbing does. Let's start them both again. All right. So stochastic hill climbing looks kind of stupid because it's going down. It's going up. But the virtue of it is it blows right through those little barbs in the landscape that catch the steepest descent hill climber. It's totally willing to go down. And then, whoa, look at this. There's another place that it can go up. The flip side of that is this thing doesn't ever seem to stop. It doesn't. It doesn't know that that spot in the middle is the global maximum. As far as it's concerned, it could be something else further away that's even better. There's sort of a combination of both of these guys called simulated annealing that starts out without going into the details. Starts out being very random, willing to go up and down and up and down. But it's got this parameter called T, which stands for temperature. And the temperature governs how willing it is to be crazy and random. When the temperature is high, it's very random. As the temperature gets lower, it gets more and more focused on improvements. So here, the temperature is down to 16, 15, 14. It's still pretty random at the moment. But it's starting to spend more and more and more time heading up. And the idea is, if we're lucky, it'll end up getting to the global maximum before the temperature gets so low that, essentially, it turns into the steepest ascent hill climber and will never go down again. So we're doing pretty well here. Yeah, it looks like annealing is going to kind of lock in on the maximum in this case. Now, this is only one kind of landscape. There are many more sort of rugged landscapes like this. That's a sort of weird one. I don't really know what that means. So most steep ascent did good there. Well, that's because we didn't have enough. Who knows what. Now we've got sort of big broad areas where you'd have to go down hill a whole bunch of times anything you want. But this is just an example. It's a one-dimensional function. And if you're thinking, these guys all look pretty stupid, well, you're right. I mean, in this particular case, there's only actually 100 different possible inputs to the function because I divided it up that way. So one to the left, one to the right, from 0 to 99. It's really easy just to try them all. And then you'll know the maximum for sure. But that's because we've just picked a simple example. Let's look at another one. Here's a two-dimensional example where now we have an x and a y and we get back a grayscale where white is good and black is bad in this particular case. We'll switch that later. And so here's the other thing. Now steepest ascent, 1, 2, 3, 4, 1, 2, 3, 4. Now steepest ascent has to do four evaluations before deciding where to go because it goes up down on x, up down on y, before finding the best one. But once again, it finds the maximum in this unimodal function, a function that only has one peak without a problem. In the same way, if we sort of like carve out a moat around this thing, now this is no longer a, let's close it off there. No longer a unimodal function. It's a multimodal function. And in fact, it has many local maxima. And usually steepest ascent is going to get screwed and hung up along the edge of the moat. Oh, that time it managed to get in. And the same thing, stochastic hill climbing or annealing will avoid that problem. These small problems, these small little dips, they'll just sort of blow through. And they'll very likely get to the maximum in the end, even though it's got some traps along the way. But still, this is now something like, this might be 64 by 64. That's still enough that we could really just try every single spot. And at this point, we really might want to. Because who knows, if we're being a weak method, we could have a little secret crystal city off in the corner here that's super great. And if a searcher never ever got there, never even evaluated anything in that corner, you would have no way of knowing that. But since we only have a small number of two dimensions, there's a lot that we don't see as far as really what makes search challenging. So for a final example, let's look at this. What the heck is this? This is a picture, but it's been hidden. There's a secret picture in there, and then there's a bunch of masks that have been laid on top of it. Each mask flips a bunch of bits in the picture. And there are 64 masks, and some subset of those have to be turned on to reveal the hidden picture. And so what we have down here at the bottom, this white bar, is a bit vector, 64 bits in a row, that each one controls one of those masks. So if I turn this guy on, well, in this case, if I turn that guy on, that makes the score worse. That guy next to him, if I turn him on, it makes the score better. It goes from 274 to 277 and so on. So I can flip all of these bits one at a time or whatever I want to do to try to find a better score. And let's let the steepest descent guy see now. In this case, the steepest descent guy has to do 64 evaluations just to move one step. Let's speed it up. One step, one step. So, well, all right, but the score is going up 344, 357, and so on. It's making progress. Unfortunately, 373, what's going to happen here? Still finding some improvement. We're doing good. Usually the way I've designed this thing, steepest descent, eventually gets screwed up. Eventually it ends up getting into some local peak where it tries turning off one mask that makes this area better and this makes some other area worse. So our evaluation function in this case is just checking, oh, there, now we're stuck. Got a score of 407, and it was the best that we could do. Now all of the alternatives are red. They're all negative. All right. So let's throw the stochastic hill climber at it instead. All right. So we're starting at some random point now. And the stochastic hill climber makes a lot more moves because it doesn't need to check all 64 dimensions before deciding what to do. It tries flipping a single dimension, and it's either going to make the score better or it's going to make the score worse. And either way, it considers that. And even if it makes it worse, it might pick it because that's what stochastic hill climbing does. It's biased towards picking the uphill moves, but it accepts downhill moves as well. And this thing may take a while. It might not even succeed in any time that we're willing to wait. We'll see. But this sort of thing, a high dimensional problem of relatively coarse dimensions, uh-oh, stochastic hill climbing, looks like there's just one more mask it needs to, there it is. It solved the problem in this case. And yes, it's cats. We're still on the internet. All right, that's it.