 Hello everyone, this is Alice Gao. In this video, I'm going to talk about simulated annealing. Previously, I talked about how adding random restarts can improve the properties of greedy descent. With random restarts, greedy descent will find the global optimum with a probability approaching one. Why is this a good idea to combine greedy descent with random restarts? Well, greedy descent focuses on optimization. It tries to improve the state as much as possible and as quickly as possible. Another name for this is exploitation. If we are already good at doing something, we should keep doing it. On the other hand, random restart allows us to explore our search space. We can jump to a completely different region of the search space and hope that we will find the state with a lower cost. So, greedy descent with random restarts has good properties since it combines the ideas of exploration and exploitation. Simulated annealing combines exploration and exploitation in another way. Annealing is a term from physics. You must have read about or seen the process of forging a sword in a story or blowing glass. We start by heating up the metal, then we'll cool the metal down slowly to put it into a better shape and to make it strong. In the simulated annealing algorithm, we'll keep track of a temperature, initialize it to a large value and decreases it slowly. At each step, we will choose a random neighbor of the current state. Note that this is a random neighbor and not the best neighbor. There are now two cases to consider. Case number one, the neighbor is an improvement. That is, the neighbor has a lower cost than the current state. We will move to this neighbor since we want to minimize the cost of the current state. Case number two, the neighbor is not an improvement. What happens in this case? Well, it is not a bad idea to move to such a neighbor some of the time. Remember that greedy descent has a bad property. It can get stuck at the local minimum. If we're willing to move to a worse neighbor sometimes, this is one way we can get out of a local minimum. So, if the neighbor is not an improvement, we will move to it with some probability. This probability depends on two things, the current temperature and how much worse the neighbor is compared to the current state. If the random draw based on the probability decides that we should not move, then we'll stay put and choose another random neighbor. How can we determine the probability of moving to a worse neighbor? Let A be the current state and let A prime be the neighbor that's worse than the current state. Let delta C represent the cost difference between A prime and A. Delta C must be positive since A prime has a higher cost than A. T is the current temperature. We will move to A prime with the probability of E to the power of negative delta C divided by T. This function is called the Gibbs distribution or the Boltzmann distribution. It appears in several other fields such as physics. This function is also related to the sigmoid or the logistic function, which you will encounter when learning neural networks in machine learning. Let's work on two questions, which will help us understand how the probability depends on the current temperature and the cost difference. Question number one, if the temperature decreases, how does the probability change? Question number two, if delta C increases, in other words, the neighbor becomes worse, how does the probability change? Take some time to solve these questions yourself, then keep watching for the answers. The correct answer to question one is B. As the temperature decreases, the probability decreases. The correct answer to question number two is B as well. As delta C increases, the probability decreases. Please watch a separate video for detailed explanations. Here's a pseudocode for simulating annealing. Let's start by choosing an initial state randomly. We'll initialize the temperature to a large positive value. Next, we'll get inside a loop. In the loop, we will decrease the temperature slowly. As long as the temperature is above some minimum value, say zero, we'll do the following. Choose a random neighbor. Calculate the cost difference between the neighbor and the current state. Decide whether we want to move to the neighbor or not. If the neighbor is an improvement, we will move to it for sure. Otherwise, if the neighbor is worse than the current state, we'll move to the neighbor with some probability. When the temperature reaches the minimum value, we will stop and return the current state. One thing you might be wondering is how can I implement this probability? Here's a common trick. Draw a random number between zero and one. If this number is smaller than the probability p, it is a yes and we will move to the neighbor. Otherwise, it is a no and we'll stay put and choose another random neighbor. The algorithm decreases the temperature at each step. How exactly do we do this? The way we decrease the temperature is called the annealing schedule. In practice, we want to decrease the temperature very, very slowly. If the temperature decreases slowly enough, then simulated annealing is guaranteed to find a global optimal with a probability approaching one. This is a great property and similar to what we have for greedy descent with randomly starts. Finding a good annealing schedule is an art rather than a science. And it highly depends on the exact search problem that we're trying to solve. One of the most widely used annealing schedules is geometric cooling. Geometric cooling works by multiplying the temperature by a value that's less than one, but very close to one. For example, we can multiply the temperature by 0.99 at each step. You can also use many other smooth functions as the annealing schedule such as annealing linear, logarithmic or exponential functions. On this last slide, I want to give you some intuitions behind simulated annealing. First, let me share an analogy which would help you understand simulated annealing. Let's imagine a big surface, one that we can hold in control. The surface has mountains and valleys. We'll drop a tennis ball onto the surface. Our goal is to get this tennis ball into the deepest valley on the surface. Unfortunately, we cannot touch the ball, but we can shake the surface however we would like. So how do we do this? Well, gravity is our friend. Gravity pulls the ball downward at any time. This represents the force that's optimizing for us. However, the ball might get into a shallow valley at first and get stuck there. That's a local minimum. To avoid this, we'll start by shaking the surface really intensely. This resembles the case when we have a high temperature. The ball will bounce around a lot on the surface. And during this exploration phase, the ball may get to many different parts of this big surface. As time goes on, we'll slowly decrease the intensity at which we're shaking the surface. And hopefully, when the shaking stops, the ball will rest in one of the relatively deep valleys on the surface. Now, let me get a bit philosophical. Simulated annealing is like life. It's a perfect example of an exploration versus exploitation trade-off. When the temperature is high, what does that represent? Well, you're young and energetic, and life is full of possibilities. In this phase, you're trying a lot of different things. What does trying a lot of different things mean? It means making a suboptimal move. At this stage, you're not optimizing that much. Instead, you're trying out as many things as you can. As you get older, you gradually start to optimize more for your goals and your dreams. Hopefully, you find something you're good at, something you enjoy and you're passionate about. You gradually get into the exploitation phase where you are trying to improve and optimize the current state of your life. Simulated annealing is very much like life in the sense that in the earlier phase, we tend to explore more, whereas in the later phase, we tend to optimize more. This is a trade-off between exploration and exploitation. Here's something for you to think about. Which phase of life are you in right now? Are you still exploring a lot? Or are you mostly just optimizing? Here are my two cents. Believe that we should always keep a little bit of adventure in our heart because life is full of possibilities. It has so many things to offer. I hope that no matter how old you are, which stage of life you're in, that you always have the courage to explore and to try and learn new things. That's everything on simulated annealing. Let me summarize. After watching this video, you should be able to do the following. Explain the procedure of the simulated annealing algorithm. In particular, how we decide to move to a random neighbor. Explain the properties of simulated annealing. For example, is it guaranteed to find a global optimal solution? Explain at a high level how simulated annealing balances exploration and exploitation. That's everything for this video. Thank you very much for watching. I will see you in the next video. Bye for now.