 All right, we're back. How are you guys doing? Okay? All right. So actually we have some interesting stuff coming up this week. We'll talk about today, we'll talk about directional derivatives and then on Thursday we'll talk about maximum and minimum functions. So this, you know, we're getting to some more interesting parts of this course. So the first topic today is directional derivatives. Very lively today. That's good. So, but let's focus, okay? Directional derivatives. We have talked about partial derivatives, okay? If you have a function in two variables, it's natural to differentiate with respect to each of the two variables, X or Y, right? And we've done that and that's what we've called partial derivatives. We call them f sub X or also df dx and also f sub Y df dy. And now the question is, is there anything else that we can say about derivatives for functions in more than one variable like function in two variables? So to answer that, we have to remember what is the derivative? What does it represent? Okay? Derivative represents the rate of change. We can believe it, right? People seem to have forgotten that one, you know? But a year ago, it was all the rage. So derivative in general is the rate of change. What do I mean by this? Let's say you're crossing a street, okay? And you see that there's a car. You don't wanna be hit by a car. So you try to estimate how much time you have to cross the street, right? So what do you estimate? You don't estimate the distance, right? You estimate the distance divided by time. You estimate how quickly it's coming, right? Because so your, our brain is wired so that even without thinking about it, we are watching, you know, how it moves and what is the distance that it covers in a given, you know, short period of time. And that gives us an idea how fast it will be here. And we wanna make sure that it won't be here while we are crossing, right? So if it's going too fast, we'll wait. And some people will go anyway, but you shouldn't. So you wait and if it's going slowly, then you cross, right? So what you're interested in is the rate of change. It's the rate of change of what? Of the distance. How quickly is it approaching you, the car in this particular case, right? And that's what we call the velocity, right? So the velocity is really the ratio between distance and traveled and the time that was required to travel that distance. But you can measure it over one second or, you know, 10 seconds and so on. But if you want to estimate the kind of instantaneous velocity, you wanna measure it over a very short period of time, right? One tenth of a second, one hundredth of a second. So what you really wanna do is take the limit when the time becomes very, very short, goes to zero of the ratio of the distance traveled and the time. And that's the derivative, right? So the derivative is the instantaneous rate of change at that very moment. That's why we studied derivatives in the first place because it gives us an idea of the rate of change at a given moment. And also, as we've learned, this is the first step in doing linear approximation. When we have a complicate function and we try to approximate it by a linear function. In the first approximation, when we look at that car, we are approximating its movement as a movement at constant speed. Even though we realize that the guy actually can go faster and slower, but we are trained in the first approximation. That's what we call first approximation. We look at the instantaneous speed of that car. In other words, we are approximating its movement by a straight line movement, by a line at a constant with constant velocity, okay? So that's why the first derivative is important. It gives us information about this first approximation. But the bottom line is that the derivative is the rate of change. So when we have a function in one variable, it's very clear what the derivative should be. It will just be, there's only one choice. We have to look at the, if you have function in one variable, you can just take, you know, df dx, right? So that's the limit when of the ratio of the increment in the function over the increment of the argument when the increment of the argument goes to zero. Questions? But if you have a function in two variables, right? There is no obvious choice for that. You can take the limit, you can divide, you can take the limit of the ratio by delta x or delta y, for example. And that's how we get the partial derivatives. In other words, in order to define a derivative, we have to say rate of change with respect to what? Right? There is a rate of change with respect to x. There is a rate of change with respect to y. And these are a priori two different rates of change. That's how we get these two different derivatives. But now let us think about it geometrically. So here's the xy plane. And here's our point. And so we have a function in two variables which assigns to each point on this plane a certain number. This is what we call f of xy. We have a point xy. This function assigns to it a value f of xy. And so now we are trying to see how this function changes with respect to one of the two variables. And the way you can think about it geometrically, let me draw it like this. You can try to move away from this point along the x-axis or in the direction of x, right? And you can see how the function changes with respect to when you do that. So let's say if this is x and this is x plus delta x. So then you would have f of x plus delta xy minus f of x divided by delta x. So this looks like this. I wrote it, I call this delta f because if you have a function in one variable there is no ambiguity. But now let's rewrite this in a more similar form, f of x plus delta x minus f of x divided by delta x. So you see this ratio is very similar to this ratio. But now here you have a function which you evaluate at x plus delta x. And also evaluate at x, you take the difference divided by delta x. Here you only increase one of the variables, x by delta x. But you keep y fixed, which was of course the rule we followed when we calculate partial derivative, we said we fixed the y variable, right? We freeze the y variable and then we just change the x variable. So this is very similar, but we've made a choice. Geometrically the choice is moving along the x-axis. And then if we take the limit of this when delta x goes to zero, that's the first partial derivative, right? This is f sub x, df dx at this point, right? To get the second partial derivative we would go in this direction. So then this would be y and this would be y plus delta y. So we would write df dy as the limit when delta y goes to zero of f of x y plus delta y minus f of x y divided by delta y. So here we freeze the first variable, but we are changing the second variable. And we look at the rate of change as we do that. And algebraically it looks like that's all we can do because there are only two variables. So we can calculate the rate of change with respect to the first variable or the second. But if you look at it geometrically, which is why I drew this picture, you see that these two vectors are just two possible directions in which you can move away from this point, right? How about moving in this direction or in this direction or in this direction? Geometrically it seems clear that all of them, you know, are meaningful, have the right to exist and we should understand what happens in each of these directions. In other words, it is only an illusion that there are these two preferred directions, this one and this one. In fact, there are many more, okay? And just like we could calculate the rate of change in the direction of x or in the direction of y, we could and should also be able to calculate the rate of change in any direction. So of course, then you can ask why should we do that? Why is that interesting? That's the next question. In other words, what are the applications of that? We'll talk about this. But for now I just wanna set up the procedure and introduce this concept and this is the concept of directional derivative. It's a rate of change in a more general direction than x and y. That's what it is. So how do we go about calculating this and what is the result? So first of all, we have to choose a direction vector, rate of change with respect to what direction? So have to choose direction vector. And so that's a vector on the plane, right? So that's going to be a vector of the form AB or if you want AI plus BJ. So say that this vector would have two components, x and y and this would be A and this would be B. Now, if I take a proportional vector, if I take a vector in the same direction but multiplied by a certain number, I would get the same direction, right? So in a way, we don't wanna consider all possible directions, all possible vectors. We really care about the direction rather than the vector itself. So what we should do is amongst all vectors pointing in this direction, we should choose one, okay? And of course, there is a natural way to do that. We'll just choose a vector which has length one or norm one. So let's assume that the norm or length, whatever you like to call it, is one. What does it mean? It means that the square root of A squared plus B squared is one. For example, we can take vector one, zero, which is I, that's this yellow vector which I drew at the beginning, which is parallel to the x, which goes along the x-axis, that corresponds to the direction along the x-axis, right? Or we can take zero one, which is a vector j. Both of them are unit vectors, certainly. These are not all possible direction vectors. We could take, for example, one over square root of two, one over square root of two. That would also have length one, and it will be a vector which has angle 45 degrees to the x-axis. So it's a kind of a bisectral vector. It goes sort of in the middle between x and y. Or you can take square root of three over two, one-half. So that's a vector which has angle 30 degrees to the x-axis, right? And so on. There are many more. So let's suppose we've chosen such a vector. Let's call it u. Let's call this u, u for unit, unit vector. Unit meaning that it has length one or norm one. So now we would like to define, introduce a notion of rate of change, rate of change with respect to this vector. So the direction, and this is what we'll call directional derivative. Directional derivative. So maybe at this point, actually, I would like to adjust my notation to emphasize that I would like to choose a particular point, x zero, y zero. I haven't done this, I didn't do it earlier. But I think that it's better to do it because otherwise we'll be confused later on as to what stands for a variable and what stands for a particular value, a particular coordinate of a point. So likewise here I would have a y zero. And here I made a mistake, y zero. Okay, all right? So let's just do that to be sort of on a safe side. So directional derivative at the point x zero, y zero. At the point x zero, y zero. In the direction u, direction u is the rate of change, a directional derivative of f, of a function f. Is the rate of change, is the rate of change of f in the direction u. And it is defined in the same way as before. What I'm going to do now, I'm going to let my point move in the direction u, which means that I would say that x zero, so the vector x, let me draw it again so that this will be vector x, x zero, y zero. And this will be the direction vector AB, okay? So I would like to move in this direction, which means that I take the vector x zero, y zero, the position vector of my point and I add this unit vector with some multiple h. H will be a small number because I don't want to use epsilon, right? We use h, but you can use any letter you want, but this is going to be, this h will play the role of this delta x or delta y in the earlier formulas. But I don't want to use delta x or delta y because we are now not differentiating with respect to x or y. We are doing kind of a mixed derivative. So that's why we are actually changing both x and y. So this is going to be ha and y zero plus hb. So you see, this unit vector has two components, AB. So changing the original point in this direction by h means that we change x zero, the first coordinate, by ha and the second coordinate y zero by hb. So these are going to be the arguments which I will substitute in F. So that's going to be x zero plus ha and then y zero plus hb. So that's the new value. That's a new point and that's a new value. Mine is the old value at x zero, y zero. And then I will divide by h. So you see, if I chose AB to be one zero, the vector which goes in the x direction only and I denoted h by delta x, then this formula would become this formula because x zero would change by just delta x, which now I call h. Y zero will not change at all because the second component would be zero. So I would get exactly this formula. If on the other hand, I chose, so we get df, dx. If I chose this to be zero one and h to be delta y, then we would get df, dy, right? For the same reason because in that case, if my direction vector is zero one, x is not changing but y is changing by h, which we can then call delta y and then we'll exactly recover this formula. So what I've written is a generalization of both formulas where I'm allowed, I'm changing both x and y at the same time but in a controlled way along a given vector. So the increment is not random but it depends on one parameter h and it looks like this, h a, h b. That's exactly, geometrically that exactly means that we are moving away from this point in this direction. Okay, is that clear? Any questions? Okay, so clearly then what we are calculating is the increment of the function in this direction and when we divide by the increment of along this path or along this line, h, we get the rate of change and for a given finite h, it's going to be an approximation to the actual rate of change but if we take the limit, provided that the limit exists of course, which we are assuming here, if we take the limit h goes to zero, we end up with the instantaneous rate of change at that very moment or more precisely at that very point, x zero, y zero and that's the directional derivative and we call it d sub u of f at x zero, y zero. That's a notation, by definition, this is the definition. So once again, examples are d for the vector i, for the vector i is just f sub x, x zero, y zero and for the vector j, it is the second partial derivative but this is a kind of a mixture. So now, what is it equal to? Can we calculate it by using rules similar to the rules we used for partial derivatives? For partial derivatives, we could calculate them rather easily, that's right. Okay, so here is a question, the question is, here's a question, what do this, what do the points of this form represent on the plane? Or this, more precisely, these points on the plane because I'm evaluating now my function f at points of this form, x zero plus h a, y zero plus h b. That's what I'm doing here and I'm subtracting. So it's natural to ask these points, where do they lie, these points? Well, I rewrite this as a sum of two vectors. One is the position vector of the initial point, x zero, y zero and the second is h times u, right? Let's separate two questions. First question is about what those points represent, right? So first of all, the set of points of this form where h is a real number is the line which passes through this point in the direction of this vector u. So this is something which I didn't even mention because I assume that this is somehow obvious because this is exactly how we got equations of parametric equations for lines a few weeks ago. When we got equations for parametric equations for lines a few weeks ago, we also took the initial vector, sorry, the position vector of the initial point plus h times what we called v. So this u now plays the role of v. And instead of h, we usually write t, but it doesn't matter, right? So this is precisely the form of the line which passes through this point in the direction of this vector u. Of course, we did it mostly in R3 in space and now I'm doing it on the plane, but the result is the same, it's similar. But only two coordinates instead of three, but otherwise it's the same, okay? So that should be clear that all of these points lie on this line. The line obtained by moving away from this point in the direction of this vector or the opposite direction for that matter, okay? So that's the first question. The second question is why do we take the limit when h goes to zero? For the same reason, why we took the limit when delta x goes to zero, delta y goes to zero? If we don't take the limit, we get an approximation to the rate of change. It's like your car is moving and you take the distance it has traveled over the period of one second and you yy by one, okay? What do you get? Well, even during this one second, it goes faster and slower. So it is an approximation to its velocity during that time. But if you do it over one hundredth of a second, it's going to be even closer to, closer approximation and so on. So if you're really interested in the instantaneous velocity of the car, at that very moment, you have to take the limit. That's right, that's right, that's right. But I'm dividing, you see, I'm dividing distance by time. Of course, if I didn't divide by time, think of h as time. The car is moving in this direction. The analogy with the car is much better for the function with one variable. But never mind, if you don't divide by h, you'll just get zero when h goes to zero, right? So this, the numerator becomes very small and denominator becomes very small. But you take the ratio and you capture the rate of change, right? The rate of change, it's not change. It's the rate of change that we should believe in right now, right? Because the rate of change means that you take the increment, the difference between the values of the function and you divide by the difference in the arguments. The point is that now we have two arguments. So we cannot just take this point minus this point, right? We have to take, we have to choose a particular direction and h is the, h is the, would then control the increment in that direction. Okay, so I hope it's clear now. Now, so this is a set up. This is a set up. This is the definition of the directional derivative, which is kind of, it's very geometric. But now I wanna get an algebraic answer for it because I don't wanna each time take the limit and try to figure out what this limit is equal to. I would like to have an efficient computational tool for this. When I get this problem on homework, you wanna, you're asked to calculate the directional derivative, you wanna get the answer very quickly, right? And to get the answer, we have to, to get the answer, we have to use a tool which is actually very useful on its own, on its own right, which is called chain rule. So to calculate it, we use a chain rule. Compute it, use chain rule. So what is a chain rule? So chain rule is something you've learned already, you've learned already in the case of a function one variable, which is that suppose you have a function one variable of f of x, but x is also a function of some variable t, right? And suppose you wanna calculate the derivative of, of f with respect to t. And so the result is that it's going to be df dx times dx dt. And now you can appreciate how easy it is to prove this formula. If you really understand what the differential is, which is what I explained a couple of weeks ago, right? Explain what the differential is. So if the function is differentiable, this df makes perfect sense as a certain linear function. And you can, you can apply to these functions like df dx and dt, the same rules that we apply to everything. And so in particular, we can actually cancel out dx in the numerator and denominator. And that's what this formula really is. This formula is very, is very elementary. It's just a statement that you can cancel out this dx, dx is in this four, in this fractions. You see? So there's no mystery. I mean, the way maybe it was presented before it was kind of mysterious. But in fact, there's no mystery. If you, if you make sense out of differential, if you explain what it is, then and the function is differentiable. In other words, that df actually can be defined. Then you can actually apply this rule. And you can just cancel out this numerator and denominator under suitable conditions, right? So that's the chain rule for one variable. So now let's suppose that you have a function in two variables, f of x, y, okay? But x and y are in turn functions of another variable which we'll call h. So now I would like to calculate what is df, dh. So let me, let me draw it by a following diagram. So this is f, f is a function of x and y. And x is a function of h and y is a function of h. So then f becomes a function of h by composition, right? Because for each value of h, you will have some value of x and some value of y. And then you plug them in into f and you get some value. So effectively, you get a function of h, right? So it's like a chain of two functions or more three functions. There's f, there is this function and there is function. And by substituting this functions g1 and g2 into f, you get a new function of h. And once you get a new function of h, you can ask what is the derivative of this function with respect to h? And of course, in all of this discussion, I'm assuming that this derivative exists, which is of course not guaranteed. Certain conditions have to be satisfied for the functions f, g1 and g2. But in all of our examples, those conditions will be satisfied. So I'm not sort of focusing on those conditions too much. I'm just going, I'm just trying to explain the formula which we'll get once these conditions are satisfied. And so what is this formula? We have to compute df, dh, right? How can we do this? Well, we have to remember the formula for df which we had two weeks ago. The formula for df was f sub x dx, or actually let me write it by using this notation will be slightly better. Remember that we had two different, two possible notation for partial derivatives, df dx or f sub x. So I'm just using this one, but it's the same as f sub x. dx plus df dy dy. So that's df. Let's substitute it in this formula. We just need to divide this expression by dh. So we'll just divide it here and here, right? So what we'll get is df dx, dh, oh, dx. dh plus df dy dy dh. See, so we've got a very simple formula for the derivative of f with respect to h. And it's very similar to the old formula. In the old formula, you have to differentiate by x and then x by t. But now there are sort of two channels. There are two channels with respect to which f could depend on h. The first channel is through x and the second channel is through y. So through x, it depends in this way. You have to take the partial derivative of x with respect to x and then the derivative of x with respect to h. And through the second channel, you get the second term. But it's very easy to remember if you draw a diagram like this. Note that in this formula, I cannot cancel these guys out because while dx actually makes perfect sense, as a differential, this curly df or dx doesn't make sense by itself, right? I think I explained this when we talked about partial derivatives. The notation is just for the ratio and it is a partial derivative, but this and this by itself doesn't make sense. So I can't prove this formula by cancelling out things in denominators and denominators. Instead, I'm proving this formula by using the formula for the differential and I'm just dividing this formula by dh. When I divide on the left-hand side, I get this expression. When I divide on the right-hand side, I replace dx by dx dh. I replace dy by dy dh. That's the formula I get, okay? So that's the formula for the derivative when you have this chain of functions. Why am I explaining all this? Because I claim that to calculate, I can calculate the formula for the directional derivative by using precisely this rule for the following reason. I can think of this function as a function which is obtained in this way. Namely, I will just say that I have a function f of xy and then I will have x is equal to x0 plus ha and y is equal to y0 plus hb, right? So this is what I called in this general setup by g1 of h and g2 of h. This is g1 of h and this is g2 of h. They're very simple. They're just, I just take them from the formula for the directional derivative. So now I can apply this rule. So the point is that du of f at x0, y0 is exactly df dh in which then I substitute h is equal to zero, right? Because what I'm doing is I'm just taking this function, the composition of these functions when exactly this that I have, you know, f in which I substitute x0 plus ah and y0 plus ph and I take its derivative at zero. Taking derivative at zero precisely means that I subtract the value at h equals zero and the value at h equals zero is exactly x0, y0 which is what I have and I divide by h and I take the limit. So this is just rewriting this ratio. This ratio can be considered as a derivative of this composition of functions at h equals zero. But now I can calculate this derivative by using the chain rule, right? So I use the chain rule. So what do I get? I have to get, which is just this formula. So I just get df dx times dx dh. What is dx dh? A, that's right. This is a linear function in h because x0 is fixed, right? x0 is a constant, a is a constant and h is a variable. I view this expression as a function of h. It's a linear function. We're used to this kind of things. When we talk about equations for lines, we would have x0 plus ta, for example, right? So, but usually we would call our variables x, y, z or t. Now the fancy thing is that I'm using this letter h for the variable. But of course we realize that we can use any variable. The reason why I'm using h is because in some sense it's an auxiliary variable. So I don't want to mix it with the other variables we've used because in some sense it plays a very special role here. And also in the book is the letter h is used. So in other words, there are some reasons for using h which have nothing to do with the subject matter, right? It's just a matter of convenience or custom and so on. And if you view this as a function of h, it's just a linear function in h. And as such, its derivative is just a. It's just a coefficient in front of h. So this is a, oh, I'm sorry, this is a. Let me just write it here. dx, dh is a. And likewise dy, dh is b. So therefore the second term is df dy times b. So finally, we get the formula for the directional derivative. Directional derivative with respect to a general vector du which is ab is just, and I want to write it slightly different way, a times, oh, it doesn't matter. I can write in this order, fx of x0, y0 times a plus f sub y, x0, y0 times b. Just this formula. But now I switch back to the old notation f sub x, f sub y. The reason why I use this notation for chain rule is because if you write it in this way, it's easier to remember the formula because you can kind of remember that almost, it's as though you could actually cancel them out even though you can't. But it's a kind of a useful rule to just memorize the formula. But now it's sort of more convenient to use the old notation because I wanted to emphasize that it looks very similar to the old formulas for partial derivatives, which I wrote here. So in fact, there is no mystery. The rate of change in the general vector is a combination of the rates of change in the x direction and the y direction. And in retrospect, what else could it be? I mean, if you look at this formula, you see that it's very natural. Suppose you already know the rate of change in the x direction and you know the rate of change in the y direction. What will be the rate of change sort of in the middle if you go 45 degrees to x and y? Well, clearly it will be just sort of a combination with equal coefficients, right? Both of them will contribute in the same way. The point is like which coefficients should we choose? Well, the coefficients we choose will be coefficients of the unit vector. So the square root of a squared plus b squared is one. That's how we agree from the beginning to define direction by unit vectors. And so then we use those components as the weighting factors. You can think of this as the weighting factors. Each of the two partial derivatives contributes to the rate of change with respect to the vector a, b. And they contribute with the weighting factors a and b. In other words, if a is greater than b than this, the first partial derivative will contribute more and will contribute exactly proportionally to the ratio between a and b. So that's the formula that we get. This is the formula for the direction of derivative. Okay. And now we can first of all calculate direction derivatives in practical situations, okay? And we can also discuss the following question. In which direction you do we achieve the largest or the smallest rate of change? So the key to answering this last question is to understand that this looks like dot product. And it is actually dot product, right? Because we have already a and b as a vector, the components of a vector, namely the vector u, our unit vector. And then if we put the two partial derivatives together as components of another vector, okay? We can then see that this formula is actually nothing but the dot product between these two vectors, right? So let me take the two partial derivatives and put them together as components of a vector. A very natural thing to do. You'd think, why haven't we thought about this before? It's the most natural thing to do. Because we know that when we have to get a vector, say on a plane, we need two components. And here there are two natural components because there are two partial derivatives, x and y. So why not put them all together into a one single vector? That's this vector. And then we have our vector u. If we take the dot product between them, we'll get precisely this expression, obviously. So this is a dot product between our original direction vector and some other vector. So it looks like this vector is important, okay? So in fact, we'll introduce an notation and terminology for it. This vector is called the gradient vector. Gradient vector for the function f at the point x zero y zero. And we will introduce an notation for it, now we'll have. And to emphasize that it's a vector, we put an arrow like this, okay? So this is a gradient vector. So what have we learned so far? We have learned that first of all, in addition to two partial derivatives for functions into various that we have studied before, there are also derivatives with respect to a given direction vector, which would choose to be a unit vector u with components a and b. That this derivative is nothing but the rate of change in the direction u. It's given by this formula. It can also be thought of as a derivative of the function f with respect to some auxiliary variable h. If we write x and y as functions of h in this way. And finally, we have a formula for this directional derivative as a dot product between our original vector a b, the direction vector a b, and another vector which is determined by the function f and the points x zero y zero. This vector is just, has two components, which are nothing but the partial derivatives. This vector is called the gradient vector, okay? So what is the geometric significance? So you ask, what does this formula mean? What is the geometric significance of this vector? And how can we use this formula to answer various questions about the rate of change of our function, okay? And here we have to explain a very important interpretation, geometric interpretation of this gradient vector. And the interpretation is that this vector, this is a normal vector. This is in fact a normal vector to the level curve of f at the point x zero y zero, okay? So let me explain this. Bless you, exactly. I'm just thinking if this blackboard is big enough for me, maybe I should, let me save it for something else. Let me use a bigger one. So first of all, let's recall, let's talk about level curves. We have a function into variables, which has a graph, okay? So for instance, so let's say it's an elliptic paraboloid or something like this. Now, when I wanna draw it, to give the illusion of it being 3D instead of 2D, what I usually do is I draw the circles, right? To give the illusion of three-dimensionality. What are these circles? These are the level curves. These are the curves which we get by intersecting this graph by a plane parallel to the xy plane. So for instance, this is intersection, intersection of the graph with a plane z equals some, let's call it z zero. So let's say our point, our point could be somewhere here. That's our point x zero and y zero. And this is the value, this is the value of the function at this point. So we'll call it z zero. So z zero is actually f of x zero y zero. But now we look at all points on the same graph which share this value of z. What does it mean? It means that we just cut it by a horizontal plane, which is at the level z zero. That's why we call it the level curve because this is a level, it corresponds to a fixed level of the function. Okay, so in other words, this is a curve which is given by the equation. The equation for this curve is f of xy is equal to z zero. Now, let's be a little bit more precise. The level curve actually lives in space, right? The level curve is in the three dimensional space. It does not belong to the xy plane usually, unless the level is zero. If the level is zero, then of course it will be part of the xy plane. What I've written here on the other hand is a curve on the xy plane, right? So to say that this is the equation for the level curve is not strictly speaking correct. More precisely, this is the equation of the curve which I obtained by dropping this level curve onto the xy plane. In other words, what I can do is I can just drop this, take the projection of this level curve onto the xy plane. So it has exactly the same shape. So in this case, let's say it's a circle, let's say, or maybe an ellipse if you want a more general elliptic paraboloid. And so this is going to be exactly the same circle of the same radius or the same ellipse, except that this guy was living at the level z zero and I just dropped it to level zero. So now it's on the xy plane. So often time when we talk about level curves, we will not distinguish between the level curves as the curve which lives on the graph and its projection onto the xy plane. Because in some sense, once you cut it by the plane parallel to the xy plane, it sort of lost its three-dimensionality. It's become an object which is better to think of as an object in the two-dimensional ambient space. So to be exactly correct, we would have to say that this curve as living on the graph itself in the three-dimensional space is defined by two equations. The first equation is z equals z zero. That defines the level. That's this plane by which we cut this graph, right? Plus the equation f of xy equals z zero. But oftentimes we will forget about this and just look at the projection at the corresponding curve on the xy plane. So that's the level curve. And now what I'm claiming, except the way I drew it is not very good because it would have to go through the point x zero, y zero. My picture is very misleading, in fact, right? So it has to be more like this. The question is what do we want? Something that looks more beautiful or something that is correct. So I'm afraid we have to do something that's have to do the correct one instead of the beautiful one. So maybe let me just erase it and do a good job. Like this. Something like this. As long as I don't make any sharp corners, I think. Is it okay? Okay? So yeah, so this is very important, of course, that this curve has to go through the point x zero, y zero because this was the original point on the level curve. That's why we talk about the level curve to begin with because of this point. We look at all other points which share the same value of the function. Okay. And so now what I'm claiming is that if you draw the gradient vector, if you draw the gradient vector, nabla starting at this point, then it will be exactly perpendicular to this level curve. Which is to say that it's perpendicular to the tangent line to this point. And that's the geometrical significance of this gradient vector. So now we're getting in a kind of a confusing territory because now we are going to have an interplay between different objects, between tangent lines, tangent planes, and various normal vectors. So I want to sort of set it all straight for you. So this is the first result that I'm going to emphasize which is that this gradient vector that we discussed, f sub x and f sub y is normal to this curve. It's perpendicular, which is what I wrote here. Normal vector to the level curve of x at x zero y zero given by the equation, by the equation f of x y equals z zero. So this is the first fact. Now I want to relate this fact to another fact which we realized two weeks ago about tangent planes to the graph. And that fact is the following, that the vector f sub x at x zero y zero, f sub y at x zero y zero and negative one is a normal vector to the tangent plane, tangent plane to the graph of f at the point of zero y zero. And I want to explain that these two things are very closely related to each other. And in fact, one implies the other. So you see, this is a very important point to understand that at this, we look at this graph and we have this point on the graph. We have this point on this graph because the graph itself is two dimensional, there is a tangent plane to the graph, which I'm not going to try to draw, but it is like this, right? Tangent plane to the graph. So in fact, let me use this as a model. Let me use this as a model for it, okay? In other words, I will not have the paraboloid. You'll have to imagine the paraboloids somewhere here, right? But think of this as a tangent plane, which is missing on that picture because it would be too messy to make. This is a tangent plane, okay? And the paraboloid goes like this. Remember, I brought a vase last time or whenever, two weeks ago. So there is this paraboloid which goes like this. And this plane touches the paraboloid, that paraboloid at one point, let's say at this point. So this is the fact number one, fact number two. The paraboloid has a level curve which passes through this point, which would be a circle going like this. The levels will be parallel to the floor, right? So the level will be, it's just going to be the plane which is parallel to the floor, which goes through this point. It will cut the paraboloid over a circle. It will make a circle out of the paraboloid. And therefore, we will get a curve on the graph, this circle, okay? Now, this plane touches the whole graph. And in particular, it touches the level curve. But inside this plane, I will have a line which is specifically tangent to the level curve. So this is a tangent line to the level curve. You see what I mean? On this picture, on this picture, I drew this tangent line downstairs because what the other thing I did is I dropped the level curve onto the XY plane and then I drew the tangent line, this tangent line. But now let me draw this tangent line right here. So this white tangent line is that tangent line over there, okay? I hope this makes sense. Okay. But now the point is that this tangent line belongs to the tangent plane. You see that there are two tangent things. There is a tangent plane to the whole graph and there is a tangent line to just the level curve. But level curve is part of the graph. So its tangent line has to be part of the tangent thing to everything which is the tangent plane. That's why I drew that line on this tangent plane. And now let's talk about normals. The normal vector to the entire tangent plane we discussed last time or wherever, two weeks ago. And that's the vector which has F sub X, F sub Y and negative one, right? When we use that normal vector to write down the equation of the tangent plane which was like Z minus Z zero is equal to FX and so on, right? So this we knew, this we knew from before. That's the normal vector. But now I wanna talk about the normal vector to just the tangent line to the level curve which is not the same thing you see because this vector has three components because it's a normal vector to a plane in space, right? It has three components. But now I wanna talk about the normal vector to the curve. And again, this curve can be thought of in two, it has sort of two different incarnations. One is the curve which lives on the graph itself. So it sort of lives, it's elevated over the XY plane. And then there is this tangent line. So really this tangent line, my picture is terrible because these two have to parallel of course, right? My worst drawing so far. Imagine that these two are parallel, okay? It looks like non-Euclidean geometry, the parallel lines intersect. They shouldn't, so. But anyway, I don't wanna waste more time redrawing. So let's just trust my word, they are parallel to each other. Even though it doesn't look this way. So I think the most confusing aspect of this whole story, the story is incredibly elementary but the reason I'm explaining it to you because in my experience, many people find this confusing. And I think the main reason why people find it confusing is because there are two ways to think about the level curve. One is to think about level curve as being part of the graph, which in some sense is the right way because that's how it was defined. It was defined by intersecting the graph with a plane. But then we realize that once you do that, the Z variable effectively disappears. So we might as well think of it as a curve on the XY plane. Even better, let's think of this plane as being the XY plane. And then actually we don't have to move it back and forth. Just think of this as being a plane with two coordinates X and Y and no Z coordinate, okay? So as such, this curve has a tangent line, okay? And the question is, what is the equation of this tangent line? That's what I'm getting at. So the equation of the tangent line, the equation of the tangent line, we can obtain very easily from the equation of the tangent plane because let me remind you the equation for the tangent plane. Equation for the tangent plane. The equation for the tangent plane is like this. Z minus Z zero is equal to F sub X X zero Y zero times X minus X zero plus F sub Y, right? That's the equation, which we have discussed 20 times and it wasn't even on the midterm, right? Of course, saying that that's the equation of the tangent plane is precisely saying that this is a normal vector, right? Because these are the components. F sub X, F sub Y and negative one. Why negative one? Because if you take this to the right hand side, you will see that Z minus Z zero appears with coefficient negative one, right? That's the negative, that's this negative one. This negative one is here because we have this term Z minus Z zero. Okay, that's the equation of the tangent plane. And now I would like to write down the equation for the tangent line to the level curve inside this plane. What should I do for this? Well, on the level curve, Z is equal to Z zero. That's the definition of the level curve. On the level curve, Z is just identically equal to Z zero. Everywhere on this level curve, Z is equal to Z zero, right? So this has to drop out. This has to disappear. And then we'll get the equation of the part which is tangent to just the level curve. So therefore, the equation of the tangent line of tangent line to the level curve is F sub X of X zero, Y zero times X minus X zero plus F sub Y of X zero, Y zero times Y minus Y zero is zero. That's the equation of the tangent. So let me remind you that these are numbers, right? These are the numbers, two numbers. And these are the variables. So we have X minus X zero, Y minus Y zero. And what are these two numbers? These two numbers are precisely the components of the gradient vector. In other words, you see in the equation of the tangent plane, we have the two partial derivatives. But we also have this negative one which comes because of this term Z minus Z zero, right? But once we look, when we focus on the level curve, this Z minus Z zero disappears. So the only information which is left is the two partial derivatives, FX and FY. And we get the equation for the two partial derivatives. For the tangent line to this level curve, okay? Any questions about this? Yes, that's right. It's on the XY plane. Exactly. So I want to do the analysis of everything related to the level curve on the XY plane. Strictly speaking, each time I would have to add the equation Z equals Z zero to indicate that actually this plane is elevated to that level Z zero. But usually we don't do that. We just kind of skip that, okay? But this has to be understood. You either have to think that you are here but then each time you have to write at the equation Z equals Z zero. Or you could just say, okay, we are just thinking about dropping everything onto the XY plane, okay? But now this equation is very suggestive because this equation is just like the equations of planes that we are used to writing, right? So I claim that this equation, this equation precisely means that this is a normal vector to this line. So therefore the normal vector to the tangent line to this tangent line, to this tangent line, which is illustrated by this picture. This yellow vector is the gradient vector, which is, this is a gradient vector, right? This is, it is perpendicular to the tangent line or normal to the tangent line. But to say normal to the tangent line is the same as saying normal to the curve itself, right? We don't distinguish between being normal to the curve and being normal to the tangent line. It's the same thing because it's something about just a very small neighborhood. So a linear approximation is good enough for this. Why is that a normal vector? That's actually something we should have talked about before what we didn't. So let me, this is sort of a digression. What is a general equation of a line on the plane? So actually let me do it by as a kind of analogy. Okay? So here we'll look at planes in R3 and here we will talk about lines in R2. What is the equation? What is a general equation of a plane in R3? This we know very well, a x minus x zero plus b y minus y zero plus c z minus z zero plus zero, okay? Let me emphasize this a, b, and c. What are these a, b, and c? a, b, and c are components of the normal vector. Normal vector to this plane. For a good reason, which we explained when we talked about planes. Now I want to express lines in R2 in the same way. And I claim that a line, a general line in R2 can be written exactly the same way, except I don't have the last term because there's no z. So a times x minus x zero plus b times y minus y zero. I claim that just like this is an equation of a plane in R3. This is an equation of a line in R2, for the same reason. It's actually simpler because it's a line in two-dimensional space as opposed to plane. But we never talked about this, somehow. Because, well, usually we talk about lines as given by equation y equals kx plus b or something. But actually that's not the most general equation because what if you don't have a coefficient in front of y? What if you have a, it doesn't take care of the vertical lines. So this is the most general formula for a line in R2 which allows vertical horizontal lines, whatever. And, of course, if you believe in this formula, you've got to believe in this formula because, first of all, you get this formula by setting z equals zero. This line is a level curve of this plane at z equals zero. So that's obvious. That's obviously the case. But then if you believe in this, you also have to believe the fact that a and b, if you take these two coefficients a and b, that this is a normal vector to this line. This is a normal vector to this line, okay? a, b, c was normal vector to the plane, a and b is a normal vector to the line. But now my line here, my tangent line, which I'm trying to explain that is very important, is given by this formula, which is exactly like this formula where a, this is a and this is b, right? If so, then the corresponding vector, a, b, is a normal vector to that line. And that's the gradient. So I have proved that, this statement one, that the gradient is a normal vector to the level curve. Okay? So there are two important vectors which show up when you have a graph of a function. One is a normal to the plane, to the entire tangent plane. And the other one is normal to the tangent line, to the level curve, which is a vector that is parallel to the floor, parallel to the x, y plane. And there's a very simple connection between them. The first of them has three components because it's a vector in three space, which are fx, fy and negative one. And the second one is obtained from this one by chopping off this negative one. And when you do that, what's left is just, this one is going to, oh no, it's better, okay? All right. Hopefully it will survive the next five minutes. Okay, so let me repeat this. For our worldwide viewing audience, we shouldn't forget them. This negative one is important, okay? The negative one is because the equation of the plane had, if we put z minus z zero to the same side of the equation, it will appear with coefficient negative one. And that's exactly this negative one over there. And that's the z component of the normal vector. So you see, the point is that the normal vector to the graph is like this, it's tilted. It's not like this, it's not parallel to the floor. It's tilted because the plane is tilted. But the normal vector to this line is going to be inside the same plane, which is a plane that is parallel to the floor. So therefore it will be like this. What we do, we lose that negative one. Well, here it appears like it's positive one. But the reason why it's, it's because in fact it is like this. It was like this. If you think about the tangent, the normal vector is being like this, then you will see what I mean. Because think about that picture. So this is, I feel like I'm waving my hands too much. But if you think about the normal vector as being on this side, which you can't see, but you can imagine, then you will see that it's z component is negative one. Okay? All right, now. So now hopefully we understand the geometry of this whole thing. So let's do a small, let's do a small exercise. Let's do a small exercise. Suppose that f of x, y is x cube x cube minus four x squared y plus y squared. And the first question is to compute the directional derivative with respect to the vector u, which is three-fifths and four-fifths, at the point zero negative one. So see, these are x zero and y zero. And these are a and b, let me call them. These are not the same a and b though. Maybe not, let me not use a and b now. Because at the beginning I used a and b as components of the vector u, but now I'm using, in my later discussion, I was using a and b as components of the normal vector. So let's just not, okay? So what do we need to do for this? Well, we need to find the gradient vector because the directional derivative of f at this point is the dot product of the gradient at this point and the vector u. So u we are given, the vector u we are given, so we need to calculate the gradient vector. And the gradient vector consists of the two partial derivatives, right? So we need to calculate f sub x is equal to three x squared minus eight y and f sub y is minus four x squared plus two y and then we calculate them at this point. So here we get eight and here we get, yes? That's right, x, y, thank you. I'm glad somebody's paying attention, that's good. So this is zero actually here, right? This is zero and second one I think I got, did I get it right? I got it right, so that's going to be negative two, right? So the dot product is going to be between zero negative two and this one which is three fifths and four fifths. And by the way, know that this is indeed a unit vector. If you take this squared plus this squared, you'll get one, right? So now we take the dot product and the dot product is minus eight over five. So that's the answer for the rate of change or directional derivative in this direction. Now another question which you will also encounter is the following, in which direction is the direction or the relative, does the direction of the relative take the maximum possible value, okay? And for this you have to find you for which this dot product will give you the maximum possible value, right? You see, you cannot change this vector. This is just, this is a gradient vector. Once your point is given, this vector is fixed. But the direction you can travel any direction you want. So you can ask, given this vector, in which vector, which vector will give you the maximum possible dot product, okay? And of course we know that the dot product is equal to the product of the length times the cosine. So to maximize the dot product, we have to say, we have to maximize the cosine and maximize cosine means the cosine is equal to one or negative one if you think in terms of the absolute values. And so that means that the vector, the direction vector has to be the unit vector in the direction of the gradient, okay? So that's how you solve this second part, okay? I'm out of time, so we'll continue on Thursday.