 Hello, everyone. This is Alice Gao. In this video, I'm going to explain some intuition behind gradient descent. When I first learned gradient descent, I found the formula to be quite unintuitive. Why do we have to calculate the partial derivative and why do we have to decrease the variable given a proportional to the partial derivative? So I thought I should try to figure out an intuitive way to rederive these ideas so that I don't have to memorize them. The way I figured out is that we can look at a very simple function such as this squared quadratic function and use this function to get some intuition about why we want to change the input value in the negative direction of the partial derivative. So in the example here, I'm using the function y is equal to x squared. Let me plot the function and explain the intuition that I came up with. All right, here's my function y equals x squared. It's not quite symmetric, but I can live with it. So we want to find the minimum of this function. We're going to start with some arbitrary value for x, say, called x0, and then let's look at to minimize this function, how in what direction should we change the value of x and by what amount? First of all, let's look at direction. Well, the direction is going to depend on whether we start from the left-hand side of the minimum or the right-hand side of the minimum. Suppose we start from the left-hand side of the minimum, we start with this x0, and at this point, the derivative is positive because the curve is increasing at that point. Now at that point, in order to minimize the function, we want to decrease the value of x. Right? So this means when the derivative is positive, we want to we want to change x by the negative of the derivative, decrease it rather than increase it. Let's try the other side. Well, for the other side, suppose we start with x0 value on the other side. At this point, the derivative is negative, the curve is decreasing and how do we want to change x? Well, we want to move x to the right. So we want to increase the value of x. So similarly, if the derivative is negative, we want to increase the value. So we want to change it in the direction of the negative of the derivative. Right? So I don't have a theoretical proof, but we can generalize both of these cases to a pattern. Right? The pattern looks like when we want to minimize the function, we want to change change the input values in the direction of the negative of the partial derivative. So this is the intuition behind the direction. Now, what about the amount? Let me discuss two cases again. Think about a place on the curve where the gradient is large, where the partial derivative is large. In that case, it means the curve is very steep there. Right? And because our functions continues, so if we're in a region where the curve is steep, we're likely far from the minimum. Right? The function cannot suddenly change from a really steep curve to something that's flat, right? When we're at a minimum, the function should be flat. So at such a place, we can afford to take a larger step because we're far away from the minimum. We are not afraid of missing it by taking a large step. On the other hand, for the second case, suppose the gradient is small, what does that mean? Or the partial derivative is small. That means the curve, that region of the curve is flat. That means we're likely to be quite close to the minimum. In that case, we have to be careful. Right? We want to take smaller steps so that we don't miss it. We don't overshoot and miss the minimum. So this analysis, discussion, hopefully gives you an idea that, oh, it seems like when the partial derivative is large, we want to take a larger step. When the partial derivative is small, we want to take a smaller step. Therefore, it makes sense to take a step in proportion to the magnitude of the gradient or the partial derivative. All right, I hope this intuition can help you to remember how to derive the ideas of gradient descent if you don't, if you have trouble memorizing it. I mean, memorizing something is never a good idea. It's better to understand it so you can always re-derive it. So next time when you're stranded on a deserted island, no, this is how you can derive gradient descent if you really need it urgently. Thank you for watching. I will see you in the next video. Bye for now.