 Good morning. In the last lecture, we completed our module of linear algebra. In the present lecture, we start the small module on calculus. This will have three lessons, topics in multivariate calculus and then two lessons on vector calculus. So, in this lecture, in this lesson on topics in multivariate calculus, we will briefly summarize those topics of multivariate calculus, which are likely to be confused or misused inadvertently. First issue is the derivatives in multidimensional spaces. First of all, if we have a scalar function f of a vector variable, for that the first order derivative is the gradient, which has these following as the components del f by del x 1, del f by del x 2 and so on. So, the n partial derivatives with respect to the n variables x 1, x 2 up to x n, these form a vector, which is a column vector that is a transpose of this row vector that is called the gradient. And in what sense it is the first order derivative of the function f? The sense is this, that is the differential change, the first order differential change in the function value is the product of the gradient and the first order change in the value of x. So, this will be considered always the notion of first order derivative that is the first order derivative is something, which multiplied with a differential change in the independent variable produces the corresponding differential change in the function or the dependent variable. So, in this sense the gradient vector forms the first order derivative of the function f, scalar function of a vector variable. So, this is the meaning of it. Now, if for the multidimensional multivariate function in a particular direction, you want to find out the rate of change in that direction in the corresponding rate of change, the scalar value of it is called the directional derivative. And the definition of it is like this, that is from the current point x in the direction d, if we move a little step alpha and then we consider the change of the function value between the original point and the change point that is this numerator and then divide that by the little step alpha that we took. And then if we take the limit of this quotient as alpha tends to 0, then what we get is the rate of change of the function along that direction that is called the directional derivative. Keep in mind that this vector d need not be a unit vector, though quite often if we use a unit vector there then the meaning of the step size alpha is becomes more appropriate. However, it is not necessary that the vector d is unit vector, it can be any vector for that matter. In particular you should sometime verify that these relationships always hold that is if we try to take the directional derivative of the function in the in a coordinate direction e j the j th coordinate direction then that turns out to be the same as the ordinary partial the partial derivative with respect to x j that is a j th variable. And then you can also verify that the directional derivative with respect to a direction d turns out to be equal to the inner product of the vector d and the gradient vector grad f. In particular this is another important relationship that is if you take the vector g hat as the unit vector along the gradient vector itself suppose grad f that vector is in this direction and if you take g g hat as the unit vector in this direction then if you try to find out the directional derivative with respect to this unit vector then the magnitude that you get is the same as the magnitude of the gradient of f. These relationships you should work out and verify the points to note are the following among all unit vectors taken as direction the rate of change of a function in a direction is the same as the component of its gradient along the direction. If you take this vector as the gradient and you want to find out its component along this direction and suppose this is a unit vector then if you work out its component along this direction like this then this component turns out to be the directional derivative of the function f in this direction. This is one important point to note and the second point to note is that the rate of change along the direction of the gradient is the greatest that is among all directional derivatives the directional derivative in the direction of the gradient is the maximum among all unit vectors taken as directions. Now, this is for the first order derivatives now when we go to find out the second order derivative then what do we get we again should have the notion of the definition of a derivative that is the second order derivative should be a quantity which when multiplied with delta x should give us a small change in the first order derivative that is the gradient. With that understanding the second order derivative the role of the second order derivative is played by this matrix known as the Hessian which is the n by n matrix formed by the second derivative second partial second order partial or partial derivative of function f with respect to x 1 x 2 etcetera. So, the diagonal entries are the direct second derivative with respect to individual variables say this is del 2 f by del x 1 square this is del 2 f by del x 2 square and so on and the diagonal elements will be in the form of del 2 f by del x i del x j and this is a symmetric matrix this is a symmetric matrix. Now, in what sense this matrix is the second derivative of the function the sense is this a small change in the gradient at x and x plus delta x that small change in the first derivative is roughly equal to this matrix into delta x. So, this is the role of the second derivative that this matrix plays you can work out you can multiply this complete matrix with the vector delta x and see that what you get will turn out to be the small change in the gradient vector that is a column vector. Now, so far we have considered the function f to be a scalar function of a vector variable x we can also consider a vector function of a vector variable. So, now you can say that suppose we have a vector function of m components like this. So, here the function itself is also vector and the independent variable is a vector variable that means the x the variable x is a vector which has so many elements that is x 1, x 2, x 3 up to suppose x n. So, then if you try to work out the first order derivative of this vector function with respect to the vector variable then again you get a matrix and that matrix is called Jacobian. Then you find that the Jacobian is given by this expression. So, each member of this del h by del x 1 then del h by del x 2 and so on each of them is a column vector because the function h itself is a column vector. So, now in this matrix you will find that there are n columns and m rows h is a m component vector. So, there will be m rows. So, this will be a column vector similarly this will be another column vector and so on. So, such n columns will be there and this when multiplied with delta x having members delta x 1, delta x 2, delta x 3 etcetera that will produce delta h that is in its rows we will have delta h 1, delta h 2 and so on. So, this is called the Jacobian of the vector function h. So, that way you can say that the Hessian turns out to be the Jacobian of the gradient because gradient is a vector function of x its derivative the n by n matrix is the Hessian the second order derivative. Now, in order to see how we get the gradient and Hessian of some very simple functions let us see some examples. We will try to find out the derivatives of a few simple functions this is one this is another. Now, this is a scalar function of x, x is a vector variable this is a scalar function of x and then we will consider derivatives of this with respect to x and with respect to y and in particular we will consider that situation in which y is the same as x that is another special case that we will consider first this is a scalar function. So, we can we can find out its gradient with respect to x. So, first of all let us verify that these two are actually same how do we do that we open it and say a transpose x is equal to a is a column vector. So, we will have a 1 a 2 n and x is a column vector. So, a itself is a column vector that is why a transpose becomes this row vector and as we open this we will get a 1 x 1 plus a 2 x 2 and so on. If we try to find out x transpose a then here we will have x 1 x 2 x 3 x 4 etcetera and there we will have a 1 a 2 a 3 a 4 etcetera. So, the product will be the same thing. So, that is why this a transpose x and x transpose a are actually the same thing. Now, if we try to find out its gradient then in particular let us try to find out its partial derivative with respect to the i s variable. So, with respect to the i s variable all these partial derivatives. So, all these will go to 0 except the term a i x i. So, the derivative of that will be a i. So, then as we try to find out del by del x 1 we will get a 1 del by del x 2 we will get a 2 and so on. So, the when we frame the complete gradient we will get a 1 a 2 a 3 a 4 etcetera. So, we will get a which will be the same thing as the gradient of this. So, the gradient of a transpose a or x transpose a will be simply the vector a now consider we want to find out the gradient of x transpose a y we can denote it as this or we can denote it like this we are looking for the gradient of this function. Now, consider x is an n dimensional vector a y will also be an n dimensional vector otherwise this multiplication x transpose a y will not make sense. Now, in place of a y if we can use this a itself in place of this a if we use a y then what we will find in that same original expression in place of a if we write capital a y then directly from that expression we get the gradient this will be a y now note that this function x transpose a y is actually a function of two vector variables if you consider y also as variable this derivative is its gradient with respect to variable x with respect to variable y also we can find out the derivative when we want to do that we note that this x transpose a y is a scalar function. Now, a scalar is a 1 by 1 matrix its transpose is that scalar itself. So, we can replace this with it transpose if we do so we get y transpose a transpose x and now we have got a similar situation here we were trying to find out derivative with respect to x and x appeared here we had x transpose something and the derivative turned out to be that something here we are trying to find out the derivative with respect to variable vector y and the function is y transpose something. So, the derivative will be that something that means we have a transpose x now consider this that in the special case where x is equal to y then what will happen then when you want to differentiate this we will find that this derivative will have two components one considering this x as variable treating this as constant and the other in which we will be differentiating this x treating this as constant. So, the first one in which this is the second this x has been considered as constant we will find that the derivative is a x on the other hand in the second case where this is differentiated and keeping this as constant we will have the derivative as a transpose x from here and that means we have the derivative gradient as a plus a transpose x. Note that this matrix is symmetric irrespective of what is a this matrix is symmetric typically in this kind of a function which is called a quadratic form which we have encountered earlier also a is taken as symmetric, but even if originally a is not taken as symmetric finally, this will be symmetric anyway. Now note that this is a vector function this is a gradient if we differentiate this then we get the Hessian of the original function that is the second order derivative and that will turn out to be a plus a transpose because a small change delta x here will produce this into delta x that much change in the gradient in this function. So, this will be the Hessian in the case of symmetric a this will be twice a because a and a transpose will be same. Now the second important issue in this lesson that we explore is the Taylor's formula and Taylor's theorem and Taylor series. So, let us try to motivate the discussion through a very practical issue we consider three trains all of them at a particular time start from one station and at another time reach another station. So, from the first station to the second station the distance that is covered the same distance is covered by all these trains in the same duration. So, one of the trains goes at a constant speed. So, this will be it is time versus distance curve graph. Now this is constant speed and speed is the rate of change of the displacement. So, constant speed means here we get constant slope for this train this is the first train and the train which initially goes a little slower than the first train. So, this is train 1 let us say this is train 1 train 2 initially goes a little slower. So, that means its initial speed is less than this that means a little lower slope like this it starts, but then at the same time it reaches here. If initially it was a little slow, but at the same time it reaches the final destination as train 1 then that means that somewhere else it must have made up. So, if initially it was slow, but it could make up finally that means that somewhere else train 2 must have moved faster than train 1. So, that means with higher slope compared to this like this. Now if initially it was slow and in between it must have become faster than the first train somewhere that means that from slow to fast at some point of time it must have equaled its speed with the speed of the first train for the first train the speed is constant. So, that means that between this initial time and the final time there must be some time where when the speed of the second train is exactly the same as this constant speed of the first train. So, wherever that time happens to be there must be some point of time where that happened where the slope of this curve this graph is parallel to this the same slope. So, the tangent is parallel. Similarly, if there is a train 3 which initially was moving very fast, but then finally it reached the destination station at the same time that means somewhere it must have turned from faster to slower. So, there must be some point sometime when its slope was the same as this one. So, the tangent somewhere must have been parallel. So, from faster to slower there must have been a point of time when its speed was same as this. Now, this is necessary because the speed cannot change suddenly that is because the speed is a continuous function of time. If that were not so if the speed could suddenly change then this was not necessary. If it could suddenly change that means the graph could suddenly turn its direction then it would be possible to have it like this. At this point suddenly there is a change and there is no tangent. There is one tangent like this and another tangent like this, but in this kind of a situation if we say that the first order derivative is continuous that is speed is continuous then it becomes necessary that at some point of time between this and this there must be the slope which is parallel to this which is same as this. So, this in mathematical terms will mean that if the function is continuous and if the first order derivative is continuous then between the initial and the final there must be some time say t 0 such that at t 0 the slope is equal to the average slope that is this is Lagrange's mean value theorem. So, the Lagrange's mean value theorem says that if the function between these two points is continuous and its derivative is also continuous then there must be some point in this interval where the first order derivative is the same as the rate of average change average rate of change. So, that is the statement here if we consider only up to first order that is f of x plus delta x is equal to f x plus here in the place of x we will have some x x c x c here which I have represented as t 0 in that place we will have x c. So, that is Lagrange's mean value theorem is for first order derivative if the function is n times differentiable then we can go on extending that and we can include first order change, second order change, third order change up to n minus 1th order change and that final mean value we can write in this form. The nth derivative in the Taylor's formula is then evaluated at x c some point in between the interval. So, here the way we have written here that t 0 is some value between t i and t f we could have said that where t 0 is equal to t i plus some parameter into t f minus t i it would be same thing and that parameter can be between 0 and 1. So, we can say in that sense also. So, if we say that in that sense with up to nth order derivative included then this term is known as the remainder term and this is Taylor's mean value theorem. Taylor's mean value theorem basically assures us of such an x c such a value x c. Now, if we say that there is a function which is infinite times differentiable then this remainder term we can go on postponing go on postponing and we can have an infinite series and that is this Taylor's series it goes on. This is for a scalar variable. Now, what it is what is it is analog for vector variable or for multivariate function. For a multivariate function you will find that this f x term is same here this f prime x into delta x that term will be included like this delta x transpose the gradient plus this second order term will be taken as delta x transpose the Hessian into delta x and so on. So, you can say that this delta x transpose grad del square when we write it we get when you open it out we get. So, the third order term onwards terms become very complicated and you cannot write it in the form of matrix matrix multiplication and that is why most of the sensible analysis goes only up to second order. So, this is what is written here is the second order truncated Taylor series that is we have truncated it up to the second order term. This is an expression which is going to be very useful for many of our analysis later. Another important issue that we will need quite often is the change of rule change rule and change of variables. So, we know that for a function for a scalar function of a vector variable we can change x 1, x 2, x 3, x 4 etcetera independently and if we make several such changes then this quantity is called the total differential in which the components d x 1, d x the smaller changes d x 1, d x 2 etcetera affect these individual changes in the function value. Now, if we divide this whole thing with d t and then take the limit when d t tends to 0 in the sense that x the variable vector x itself is a function of t another parameter then what we get as the total derivative d f by d t which is actually the ordinary differential coefficient of f which was set to t turns out to be grad f transpose d x by d t. So, this is the way we differentiate a scalar function f of a scalar variable t when the description of the function is available not directly through with t, but through a vector variable x that is f is a function of x which is a vector and x itself is a vector function of a scalar variable t in that sense in that case the application of chain rule d f by d t in the ordinary calculus would be d f by d x d x by d t. So, the multivariate analog in which x is multivariate turns out to be like this gradient of f transpose d x by d t, but x is a vector t as well as f are scalar. Many situations arise in which the function f is expressed as a function of t and x f of t and x in which x itself turns out to be a function of t in that case the total derivative of f which has to t should include the contribution in the derivative through direct dependence on t and also the contribution through the dependence by the dependence over t through the vector variable x. Then what you will have will have this term as well as the ordinary derivative the partial derivative with respect to t. So, the total derivative d f by d t turns out to be the partial derivative with respect to t considering this entire x as constant and then a separate component added to that which contributes the derivative part which is due to its dependence over t through this. So, that is coming from this expression. Now, it may happen that f is a vector function of v and x in which v is a vector variable and x is another vector variable which is again function of v. So, in a similar model the way in which we worked out this we find that the derivative of f with respect to v i turns out to be turns out to have two parts. One by considering v i only as the variable and v 1, v 2, v 3, other v as well as this axis this v is kept constant and it is differentiated with respect to x and then multiplied with the derivative of x with respect to v i like this. So, this is in the same sense as this and when such partial derivatives we assemble together then we get its derivative with respect to v that is the gradient that is the full gradient of f with respect to v. While using this kind of expressions you should exercise caution to note this transpose and then the order in which these matrices and vectors are multiplied and so on. So, quite often the sizes of the matrices and vectors would give you a quick check, but sometimes they may not, but always if you try to see what each quantity means and what you will get if you write the components of those matrices clearly and what is the meaning of the chain rule applied in the multivariate context you will find that the confusion get removed. In order to see how it is used let us consider this small example. We have got a function of two vector variables x and w and this is a function in which we have got a two dimensional vector x, x 1, x 2 and a three dimensional vector w, w 1, w 2, w 3 and the function is like this. Now, in this the vector w itself is a function of x which is given like this. Now, from here if we want to find out its gradient with respect to x then we can talk of two such gradients. One is the partial gradient with respect to x that is evaluated keeping w as constant. So, if we do that then the gradient of this the partial gradient keeping w as constant will be w 2 cos x 1 and then minus w 3 sin x 2 and that is it. If we similarly construct the gradient with respect to the variable vector w keeping x constant then we will find the derivative with respect to w 1 is 1, derivative with respect to w 2 is sin x 1 and derivative with respect to w 3 is cos x 2. Now, this relationship which gives w as a vector function of x if we differentiate this with respect to x then we get the Jacobian that is del w by del x and that will be this 3 by 2 matrix. This is the derivative of w 1, this is the derivative with respect to x 1 that is derivative of w 1, w 2, w 3 with respect to x 1, derivative of w 1, w 2, w 3 with respect to x 2. Now, that formula there this formula applied to this particular problem will give us grad f the total gradient of f which accounts for the derivative the rate of change because of this direct change as well as the change through a change in w because of change in x. So, that will be this direct change in f because of change in x plus Jacobian transpose this Jacobian transpose multiplied to this one. So, when we construct this now see what it means this Jacobian what does it mean it means del w 1 by del x 1 then del w 2 by del x 2 del x 1 and so on then del w 3 by del x 1 similarly these are with respect to x 2. Now, its transpose will have these 3 fellows in the first row. So, here we will have in the first row we will have del f by del x 1 plus here we will have the 3 elements that I have written there they will be in the first row of j transpose multiplied with this. So, then we will get del w 1 by del x 1 into del f by del w 1 del f by del w 1 plus the second one del w 2 by del x 1 del w 2 by del x 1 into from here the second entry that will be del f by del w 2 plus the third entry from there del w 3 by del x 1 into the third entry from here that will be del f by del w 3 that is the first element of this. Similarly, there will be a second element of it in which in place of x 1 we will have x 2. Now, note that this is a direct variation for x 1 this is a variation which is to x 1 through variation of w 1 del f by del w 1 into del w 1 del x 1 and so on. So, the variations in w 1 w 2 w 3 due to a change in x 1 account for small changes in f through this through this and through this and the direct dependence over x 1 is accounted for here. You can expand this and find out the gradient that you get from this and then try to do the same thing all over again by first putting those w values here and getting it in terms of x 1 and x 2 only and then finding the derivative directly. So, through both methods you should find out the final derivatives and see that they match and we proceed to another important issue that concerns us quite often for that let us consider a vector of m plus n dimensions and a function of it h of x which is an m dimensional function that is a vector function h of a vector variable the variable x is of m plus n dimension and the function is of m dimensions only. If we partition this vector into two parts n variables in z and the rest of the m variables in w like this then this relationship becomes h of z and w and now if we equate this to 0 0 vector that will mean h of z z and w equal to 0. Now, this gives us m equations the function h of x is an m dimensional function m components. So, this equal to 0 gives us m equations in all these unknowns all these variables z and w. Now, note that z has n variables and w has m variables and there are m equations. Now, if we give the value of z that is the n variables listed in z if we prescribe their values then what it becomes it becomes m equation in m variables in w. Now, m equations in n m variables same number of variables can we solve it and then say that for every set of values for z can we work out w if we could then we would basically have a straight forward function w of z. This is the question can we work out the function w of z that is by prescribing z 1 z 2 up to z n by prescribing those n variables can we determine the rest of the variables that is w 1 to w m from these m equations. So, in general for general non-linear problems all over the domain we cannot do this through a single closed form expression, but then if we ask for something less that is if we say that we have one valid pair z and w which satisfy this requirement then in the immediate neighborhood can we frame can we form a first order approximation this is possible under a certain condition how. So, for that what we do we consider the derivative of this this is a vector function of a vector variable of two vector variables. So, if we try to differentiate this then we get del h by del z plus we are considering the derivative with respect to z because z is prescribed z is going to play the role of the independent variable and w is going to play the role of the dependent variable. So, we try to find out its derivative with respect to z. So, its derivative with respect to z we will have two parts one direct and the second through w. So, the derivative of this is del h by del z direct derivative and plus the other component will be del h by del w del w by del z. So, we have this now note that h is of size m w is also of size m this will be a square matrix. So, we can talk of the problem of finding its inverse. So, for that we take del h by del z on the other side try to pre multiply with its transpose and then we get del w by del z as this that is if this matrix is invertible if this matrix is invertible. If we can do this then we can say the first order change in w can then be found out by this that is through a change in z by this amount the corresponding change in w that is w 1 minus w will be given like this del w by del z into this. So, this is a first order approximation of the function w of z around a point pair z and w. Now, that is why this local neighborhood description in this manner is possible if the Jacobian del h by del w is non singular if this is invertible. So, that is the condition and this result is known as the implicit function theorem which is going to be very useful in many of our applications where at one point we try to find out a locally valid first order approximation and continue with that particular approximation as the local description of the function. So, this will be possible when the Jacobian matrix is invertible non singular. After this we have another few small issues that we need to quickly have a look into for a multiple integral when you conduct a change of variable say the original integral is in terms of x y z. Now, if we conduct a change of variable in this manner x y z all three of them are taken as functions of three new variables u v w then first of all we need to transform the domain from a to a prime a bar where a bar is a is the corresponding domain in the u v w space and then here in place of x y and z we put x y and z in terms of u v w and then in place of d x d y d z we have this determinant of the Jacobian into d u d v d w. So, this Jacobian determinant is the element that transforms a volume in the u v w space to a corresponding volume in the x y z space. As an example consider this small case in the 2 by 2 situation. So, quite often for evaluating a double integral we transform from rectangular to polar and then apart from transforming the limits of integral when we do that transformation from rectangular to polar we keep here r cos theta r sin theta and in place of d x d y we use r d r d theta. What is this r doing here? Note that you can see that if you take x y as this vector function of r and theta then if we try to work out its Jacobian the Jacobian will be a 2 by 2 matrix in which we will have del x by del r here and then del x by del theta here del y by del r here from here and del y by del theta that is. So, if you try to take the Jacobian determinant you will find that this cos theta into r cos theta minus sin theta into minus r sin theta. So, you will find r cos square theta minus minus plus r sin square theta that is this is 1. So, the Jacobian determinant is r that is why while transforming the unit volume or unit area in this case d x d y transform to r d r d theta. The same thing you can find out geometrically also in the rectangular coordinate system from this point a small area element turns out to be d x into d y. In the case of polar coordinates the typical area element is like this now this is point r theta this is angle has changed through d theta. So, this length is r d theta on the other hand this is radial change this is d r. So, you find that the elemental area is d r into r d theta. So, that gives you this as the elemental area. Now, if we continue in our next important issue here this is going to have a lot of use in the vector segment. So, here if we have a differential differential quantity and then if we ask this question that does there exist a function f x for which this is the differential that is if we can we talk about a function f for which d f turns out to be this or equivalently we can say that for which the gradient is the vector function p x the components of which are p 1 x p 2 x sitting here. So, if the answer is yes then we say that this particular differential is a perfect differential or an exact differential and it can be integrated to find f for every differential this may not be valid. The last point that we consider in this lecture is the formula for differentiation under the integral sign. So, that is useful in differentiating a function which is available in the form of an integral the integrand here is a function of x and t integral is with respect to t and the limits of the integral are functions of x. So, if we try to differentiate this considering this as a function of x u and v u and v themselves being functions of x again then the straight forward chain rule will give us this expression for the derivative. This first term is the direct derivative with respect to x which does not consider the dependence of x and t which is here through the limits of integral. So, that is simply this in which the derivative is taken into the integral sign, but that is only the first term. In these two terms d u by d x and d v by d x are derivatives of u and v which are known functions of x, but we need to find out these two partial derivatives. So, for that what we do we consider a function capital F whose derivative with respect to t is this small f our small f here and then phi x turns out to be here itself in the place of f if we put del f by del t then we get this and the integral of that is capital F of x t evaluated at v minus evaluated at u. Now, here you see this whole thing is the corresponding function of x u and v. Now, if we differentiate this here with respect to u partially and with respect to v partially then we get these two partial derivatives. So, we do that so derivative of this partial derivative with respect to u will be found from here that is minus del f by del u that is minus del f by del t evaluated at t equal to u and similarly for this. So, from here we get these two expressions and insert these two expressions for the partial derivatives here and then we get the complete expression like this. This is called the Leibniz rule the special case is of course, that one in which u and v the limits of integral are constants or independent of x in that case we will have only this first term these two terms will not be there. So, this is called the Leibniz rule and it is used for differentiating under the integral sign there is another small topic here in the lesson which we will omit because it is quite straight forward, but I advise you to go through it in the text book and be conversant with it because it will be useful in many of the applications that we consider later in the course. So, these are the derivative formulas for scalar functions and for variance and Hessians the extensions of them the vector extensions of them can be worked out like this. So, in this lesson these are the important issues that we have considered the multivariate functions derivative the sense of it the partial and total gradient implicit functions Leibniz rule. So, these are the important topics that we will be using again and again in the rest of the course. In this lesson the necessary exercises that you must complete to develop the essential amount of understanding are these. Thank you.