 In the previous two modules we reviewed the fundamental concepts from finite dimensional vector spaces and then various properties of matrices one would normally come across in the analysis of data assimilation algorithms. In this lecture we are going to be also providing a broad overview of the fundamental tools that we need from multivariate calculus. The reason for including this is as follows almost all the students who take a BS degree in basic sciences or engineering in any part of the world they have done calculus 1234 in equivalent of calculus 1234 they have been introduced to univariate calculus differentiation integration differential equations all in one variable. But when you formulate a problem in a data simulation framework the problems have to be formulated using multivariate analysis. The state of a system is defined by a vector x the dynamics of a system could be linear or non-linear for a linear system the matrix defined the state of state transition. So we need to be very familiar with the multivariate analysis. The tools for multivariate analysis include understanding of finite dimensional vector space a thorough understanding of matrices and properties and also a good facility with dealing with multivariate calculus which is an extension of the ordinary univariate calculus that anybody who does a BS degree should be familiar with. So our goal is to be able to provide you a broad based introduction to fundamental concepts from multivariate analysis. So let us start with from fundamentals we are going to start with the notion of functions to be able to define a function we need different objects one is a set A another is a set B A is a F is a function from set A to set B we call A the domain of the function we call a B the range of the function this is called the domain of the function this is called the range of the function. So I need a domain I need a range then I need a function what is function your function is simply an association of points that the domain with the points in the range by definition F has to be defined for every member of the domain the value of F in the range need not take all the values in B that is where the distinction between various functions come to be. So by definition F is defined for all members of the domain that means you cannot leave anybody here by definition a function is also called single value what you mean by single valued if you think of function as a black box if you give an input it gives an output FFX for every x there is a unique FFX there is a single value what is the difference between a single valued function and a multivalent function this is a single valued function what is the multivalent function this is the multivalent function if I took x so this is x this is x here for x there are 3 values this is a single valued function. So while in principle one can have single valued functions and multivalent function in mathematics we exclude multivalent function from consideration so when a mathematician says let F be a function he already has at the back of his mind a domain a range a range is also called co-domain and an association between points in the domain of the co-domain by single valued means it is unique so that is the broadest possible way one can define functions. Now there are special classes of functions such as special class of matrices so F is a function is called 1 to 1 it is called injective what does it mean if x is not equal to y FFX is not equal to F of y that means distinct points are mapped into distinct points in the range that means y is equal to x square so distinct values of x have distinct values of y so that is what is called injective F is called on to or surjective means F is maps all the points of a on to complete set b F is called 1 to 1 and on to it is also called by injective it is both injective and by injective. So let us give some examples of functions FFX is absolute value of x x square sin x e to the x sin x e to the x these are all examples of functions and these are different classifications of functions we will more often be interested in 1 to 1 functions which are both injective and surjective because it is for these functions we can have inverses so if F is a function from here F inverse is a function from here to there that is the inverse function in order that the inverse is defined we need to be able to have further constraints the constraint is F must be 1 to 1 or injective these are all facts essentially come from basic definitions of functions. So now I am going to talk about other classifications of functions F is a scalar valued function of a scalar so what does it mean here is a black box sorry here is a black box F this is x this is FFX x is a scalar FFX is a scalar that is what is called scalar valued function that means the domain is real the core domain is real. So what are the examples of functions which are scalar valued function x is equal to x log x 2 to the power of x e to the x these are all scalar valued function of a scalar the input is a scalar output is a scalar so you can think of F as a transformation as a black box something goes in something gets out F is a scalar valued function of a vector so in this case what happens F x belongs to R of n but FFX comes in FFX belongs to R it converts a vector into a scalar so that is what is called scalar valued function of a vector such a thing is also called functional we have already seen the notion of a functional when we talked about vector spaces so what are examples of scalar valued function given a vector x the norm of x a norm is a number associated with every vector there is a number so a norm is a function is a scalar valued function quality form of x is a scalar valued function in the product of x with a for a fixed a that is a scalar valued function so these are all examples of scalar valued function of a vector the input is a vector the output is a number but in dynamical systems theory as well as in data simulation we are going to be interested in a third class of function which are called vector valued function of a vector so what does this mean here I have a box F x gets in FFX gets out x is a vector FFX is also a vector this is called vector valued function of a vector in general F is also called a map map is a very technical term used in dynamical systems theory let F be a map what does it mean F is a scalar valued function of a vector input of the vector output of the vector so let us okay in general these two n did not be the same it could be a n vector this could be a m vector I want you to see the difference the input of the vector output of the vector then the vectors could be of same size or of different size. So let us work an example give an example let n be 3 that means input vectors are size 3 let m be 2 output vectors are size 2 the x is equal to x1 x2 x3 so FFX is F1FX F2FX so what is F1FX x1 square plus x2 square plus x3 square what is F2FX x1 x2 x3 so you give x1 x2 x3 you get this vector given by this so that is what is called a map or a vector valued function CAB denotes the set of all continuous functions defined over the interval that is an huge set it is an infinite set CK of AB is the set of all continuous functions with derivatives of order up to K if I say CAB continuous function need not be differentiable but in the second set CK AB it not only be differentiable but also I would like you to be differentiable up to the order K so what does this mean I have C I have C1 I have C2 I have CK continuous function is the largest set differentiable continuous and differential function is smaller functions which are twice differentiable smaller so you can think of a relation these are supersets C1 is a subset of C C2 is a subset of C1 which is a subset of C I am putting greater conditions and the behaviour of the function so functions come in various shapes and forms functions are of various types continuous function differentiable functions set of all continuous functions set of all differentiable functions of order up to K where K is an integer if K is infinite what we call such functions are called analytic functions for example polynomials are analytic function they have they have derivatives of all order exponential functions are analytic function they have derivatives of all orders and so on with that as a background now I am going to introduce various concept that we would need in trying to do data simulation algorithms especially optimisation algorithms we are introducing the notion of what is called a gradient of a function so in this particular case we are concerned with yes I am sorry we are concerned with a scalar so what is the starting point let F be a scalar valued function of a vector the type 2 let X and Z be 2 vectors in Rn we say X F of X is differentiable at point F if and only if there exists a vector U such that F of X plus Z minus F of X is given by U times Z see this Z and this Z are same so the definition is contingent on the existence of a vector so this is an inner product hot means higher order terms higher order terms in Z and what is the proper to higher order term the ratio of the higher term to the norm of Z they go to 0 as Z goes to 0 this is the limit so such a U is called the U is called the gradient of F of X with respect to X so this is the most general definition of a gradient this gradient algorithmically can be computed as a set of partial derivatives of F with respect to X 1 F with respect to X 2 F with respect to X n so this is a n vector the derivative is denoted by the inverted delta subscripted by F F of X this is called a gradient you can call it so we use the term derivative for univariate gradient for the multivariate so even though F is a scalar valued function of a vector its gradient is a vector in R n so I want you to be able to see the importance of introduction of vectors and matrices very soon so you cannot do multivariate calculus very well until and unless you understand finite dimensional vector space as well as matrices very well so what is gradient in simple terms gradient is simply a vector of partial derivatives that is a simple form of being able to describe what a gradient is this operator inverted del which is the gradient operator it has lots of interesting properties let F be a scalar valued function G be a scalar valued function the gradient of the sum is sum of the gradients so gradient is additive as an operator it has an additivity property gradient of a constant times the function is constant times the gradient of the function that is called the homogeneity property gradient of the product has this rule which is called the product rule which is the extension of the product rule in univariate calculus we are used to d by dx of u v is u dv by dx so we already know this right d by dx of u v is equal to u dv by dx plus du by dx times v that we call this product rule so this is the analog of the product rule from univariate to multivariate in the multivariate calculus we are also interested in another concept called directional derivative so F is given F is a scalar value function of the vector I pre-specified direction Z I would like to be able to compute how the function varies in this direction so it is called the directional derivative that is defined by F prime X, Z X is a function so FFX is a function F prime X, Z is the directional derivative of FFX along the direction Z and that is given by the inner product of the gradient with Z by Cauchy Schwarz inequality this inner product is equal to the norm of the gradient times norm of Z times cos n theta theta is the angle between Z and the gradient you can ready to see that so Cauchy Schwarz inequality we already saw in the previous lectures so this is essentially an application of Cauchy Schwarz inequality that tells you how the directional derivative the magnitude of the directional derivative can be obtained by computing these. Now we are going to be talking about a slightly related concept until now we assumed X is a variable in itself but here X is not a variable X is a function of another variable so X of t is a vector each component is a function so Xn of t is a scalar valid function of t X2 of t is a scalar valid function of a t Xn of t likewise so I have a vector function each component is a function of t so if I am going to be computing the total derivative of X with respect to the time t df by dt is df by dx1 times dx1 by dt so what is that I am now talking about I have talking about f of X of t so if I am interested in so f so I would like to talk about couple of things now f of t of X of t these are all different functions f of X so let us talk about all these functions f of X X is a simple variable f of X of t f is a function of X X is a function of t so it is a function of a function f depends on t in this case only implicitly in the third case f depends on t both explicitly and implicitly so this is no dependence implicit dependence implicit and explicit dependence in these cases we should be able to carry out the computation of the derivative of f with respect to t so derivative f with respect to t is the total derivative derivative f with respect to x1 x1 times derivative x1 with respect to t so on and so forth so this is called the total derivative f with respect to t by chain rule so chain rule additive rule homogeneity rule product rule that we have learnt in basic calculus all carry over there is nothing new but the old concept take a new form when you go from univariate to multivariate that is the idea next is the notion of what is called second derivative if f is a function of a scalar second derivative we simply say the second derivative is simply given by so f df by dx d square f by dx square we are done but when f is a function a scalar value function of a vector like this x is not one that are n variables x1 to xn so I can gradient is a vector I can consider the second derivative matrix the second derivative matrix look at this now the first row of this matrix second partial of f with respect to x1 second mixed partial of x1 x2 second mixed partial of x1 to xn likewise each row such a matrix is well defined this matrix is a special name is called the Hessian of f so you can see matrices arise very naturally not only that we know from basic calculus the mixed derivatives are essentially the same that means del f by del x del y is I am sorry del square x by del so del y del x the mixed partial derivatives if the partial derivatives are continuous the mixed partial derivatives are same the order in which you compute the partial derivative is a material so given that this matrix is a symmetric matrix so n by n symmetric matrices naturally arise when you consider the second derivative matrix which are called Hessian matrix of functions of scalar variables so Hessian is not so symmetric so symmetric matrix so where do symmetric matrices come from symmetric matrices come from various directions one of the simple ways in which symmetric matrices arise is by computing the second partial derivative matrix of a scalar valued function of a vector and this matrix is singular because the mixed partial derivatives are the same and that is what I told you a minute ago so this is called the representation of second derivative matrix for a function which is a scalar value function of a vector now we are going to vector value function of a vector we are going to move to the next level we are so let f be a function from Rn to Rm look at this now I would like to be able to keep this picture of the back of the mind what goes in is x what comes out f of x x belongs to Rn f of x belongs to Rm so f of x is f1 of x f2 of x fm of x x is x1 x2 xn so there are m functions each of which is a function of n variables I hope that is clear now what is that I can do now I can take f1 and compute the partial derivative of f1 I can take f2 I can compute the partial derivative of f2 I can compute fm I can compute the partial derivative of fm now this partial derivative when written as a column is called the gradient so what is that this is the transpose of the gradient so this is essentially transpose of the gradient of f of 1 this is simply transpose of the gradient of f of n so what is that we do now we take component by component we compute the gradient which is a column vector you transpose it to your row vector you stack these rows there is one row for each component of f there are m such component so there are m rows there are n variables so this matrix is a m by n matrix there are m rows there are n columns therefore if you have a vector valued function of a vector from rn to rm this is the collective first derivative matrix for the entire function the collective first derivative matrix is in general rectangular matrix this matrix is given a special name it is called Jacobian so Jacobian of f Jacobian of f is defined only for vector valued function of a vector Hessian of a scalar valued function of a vector gradient scalar valued function of a vector so these are all the various quantities associated with functions in terms of their derivatives now I am going to give some examples let a be a vector let x be a vector a be a constant vector so I can define a function so look at this now I pick I pick a I pick a in rn I keep it fixed so I am going to define f of x equal to ax which is equal to a transpose x which is equal to summation a i x i i is equal to 1 to n so it is a function of x it is a scalar valued function of x because it is an inner product the also the output is a scalar input is a vector a is a common vector that transforms every input vector so what is that this is simply a linear function because the right hand side is linear in each component of that so what is the gradient of f of x partial of f1 partial of f2 partial of xn and partial of x1 is a1 partial f2 is a2 partial of xn is an so that is equal to a so we have enunciated the first rule of multivariate calculus what is that if f of x is equal to a transpose x the gradient of f is a this is very similar to what the univariate calculus person does d by dx of e to the x is e to the x d by dx of sin x is cosine x we develop a table of differential coefficient of various standard functions so in data simulation we need to have such table this is the first entry in the table so if f of x is a transpose x the gradient of f is a now let us compute the gradient of x transpose a that is a quadratic function a is yesterday we talked about with respect to quadratic function we need to consider only symmetric matrices so let a be a symmetric matrix f of x is x transpose ax so f of x is ax1 square plus bx1x2 plus cx2 square let me compute the gradient of this f of x partial of f with respect to x1 partial of f with respect to x2 a simple calculation shows is the this vector you can rearrange it at 2 times this matrix times this that is 2 ax therefore we got the second entry into our table if f of x is equal to x transpose ax its gradient is 2 ax then f of x is equal to one half of x transpose ax minus b transpose x the gradient is ax minus b by combining 2 1 and 2 anybody who has done 3d war should immediately recognize that these terms very naturally occur in 3d war so when you read 3d war without recognizing that these are all tools from multivariate calculus you will have more trouble now if you know this 3d war will become a simple exercise and that is the reason why I believe that it is necessary to understand many of these basic concepts before you start data simulation algorithms. My examples continue now I am going to I am getting little bit more sophisticated x is a vector h of x is another vector so h is a function so in this case h is a function from Rn to Rm so h is a m vector h has m components h1 h2 hm each of the components are functions of n variables x is n variable x1 to xn so let us fix a vector so let us define a function f of x is equal to a transpose hx or hx transpose a this is simply an analog a very simple analog of what we did in the previous case a transpose x here a transpose h of x h of x is any general nonlinear function so what is gradient of this f this gradient of f can be this gradient of f is given by the transpose of the Hessian of h times a you remember already we have talked about a Hessian so this is how you are going to be able to compute the gradient of this again I am not in my class I will derive these things in this lecture we may not have time to derive all these things but it is very necessary that each of you what these examples have that aha in your mind to be able to handle some of these things to develop that independence and dexterity in trying to make this calculations and manipulations. Now let us extend it even further I am going to take the same h now I am considering h transpose a h look at this now this h here we consider x transpose ax here we are considering hx transpose a hx this is again very often come across you come across in 3d war especially with respect to the nonlinear observation operator so h is generally used as a nonlinear observation operator I am using the same kind of notations in here. So in this case what is the formula for the gradient of h the formula for a gradient of h I am sorry gradient of f f is given by this is 2 times the Jacobian the transpose of the Jacobian of x times ax if you it is very imperative to me to me these are all the nuts and bolts yesterday someone was observing after the class who are the people who develop algorithms for data simulation those who understand and have good mathematical skills are the one who are going to be able to invent new or algorithms. So there are 2 ways one is to use somebody else's algorithm another is to be able to invent your own algorithms to be able to invent your own algorithms you have to develop all kinds of mathematical skills and that is one of the underlying purposes of doing this preview of very many different tools. Now consider the next case hfx is the composite function function of a function hfx is g of ffx and this we denote as so x you apply f first and then g first in terms of picture there is rm there is rn there is rd f takes you from rm to rn g takes you from rn to rd hfx is in fact a bridge goes from rm to rd so hfx must be related to f and g in this way so this tells you the relation between h and f and g so what is the Jacobian of h the Jacobian of h is simply product of this Jacobian of g at x and Jacobian of f at x. So this is a this is a kind of a chain rule for Jacobians this is again a fundamental result these all are important things that we will apply when we talk about algorithms. The next topic in multivariate calculus the notion of a Taylor series expansion Taylor series expansion for a scalar valid function of a scalar scalar valid function of vector vector valid function of a vector again there are t 3 layers of Taylor series which are important. So let x and z be real numbers f of x plus z is f of x so what is the basic idea the basic ideas as follows. If I have a domain if I have a point x if I know f of x the derivatives of x all at the point x if I am given a point very close by x plus z this is x this is x plus z is small how can I infer the value of the function at x plus z given the value of the function and its derivative at the point x that is a question Taylor answered. So the value of the function at a neighbourhood point is given by the value of the function at the point plus derivative times z so you can think of z as a perturbation. So first derivative times the perturbation second derivative times the square of the perturbation kth derivative times the kth the power of the perturbation. So this is a Taylor series in a small neighbourhood under super appropriate condition this series will converge this is one of the fundamental theorems in in in in in multivariate in in calculus this is the infinite series by truncating this infinite series at the kth degree term this is not the n kth the degree term in z we can get the kth order approximation this is kth order approximation. So normally we do not use k more than 1 and 2 we talk about first order approximation second order approximation. So that is the general rule with respect to approximations in analysis either you compute exactly there are not too many things we can compute exactly in life approximation is the order of the day. So Taylor series based approximation is often a very useful approximation computationally so this Taylor series is is absolutely plays a absolutely fundamental role in computation. Now we are going to consider the next class of functions the next class of functions are functions which are scalar valid the input is a vector output is a scalar in this case x is a vector z is a vector again the z is a vector z which is rn. So here again I have x I have z I have a point here the distance this is z so this is x plus z sorry this is x plus z again if I know the value of the function at x and its gradient and its Hessian Hessian is second derivative I can approximate the value of the function at x plus z by this relation. So this is called the second order approximation second order approximation we also know the gradient and the Hessian are related I am sorry gradient of the Jacobian related by the transpose of each other. So I can rewrite this using replacing the transpose by the by the Jacobian so it is this form we will use in our analysis so this is the second order Taylor series of it of a scalar valid function of a vector. Now I am going to extend it further the second order Taylor series for vector valid function of a vector you can see there are so many intricacies in here. So what is this f so what is f of x f of x is tack you stack f1 f2 and f of m each component is independent so if I am going to be concerned with the second order expansion for f of x what did we do you can you compute the second order expansion for f of 1 f of 2 f of m stack them together that is it very simple. So once you know how to compute the second order expansion for a scalar valid function of a vector you have conquered the Taylor series expansion for the vector valid function of a vector cos a vector valid function is simply a collection of m independent functions whatever you do for one does not affect the other you do everything same for everybody so expand everybody in second order term take them all together collect the term you got that so that is the basic idea. So with that in mind the second order Taylor series expansion for this is given by this that is the Jacobian term this is the second order term this second order term is little bit more complex look at this now so look at this now f is the vector this is the vector this is the matrix times of vector this is another vector one half of a vector and how this vector is is is designed this vector is given by this look at this now I have f1 I have a hash enough f1 this is the quadratic form a hash enough f1. The hash enough f2 this is the quadratic form the hash enough fm stack them all together so you get the Taylor series expansion simply by concatenating or putting together the Taylor series expansion for each of the component please remember these are all Hessian also please remember this is quadratic form. So you can see quadratic form occurs in many many different ways one of the natural ways of dealing with quadratic forms is second order Taylor series expansion of scalar valid functions vector valid functions and so on. So that is where these things come into play it is unfortunate that once you finish BS you take our special disciplines and masters electrical engineering mechanical engineering meteorology, oceanography and other things when we take oceanography or meteorology for example they run you through lots of dynamics and so on which are very necessary. So many of the meteorology courses are very strong on models some of the meteorology program very strong on collection of data but there are not many there are not programs at all where much emphasis given data simulation models are necessary data are necessary but data simulation is something beyond in my view data simulation is an engineering discipline sitting inside the science of prediction. So this aim of this course is to be able to bring out the mathematical underpinnings of this engineering discipline called data simulation why do I call engineering discipline engineering always concentrates on developing a product what is the product forecast the development of forecast product in my view is a branch of engineering the product for public consumption and I would like to be able to create a good quality product by doing a good quality engineering which is called data simulation. The next concept is the notion of variation 3D war the war refers to variation 4D war refers to variation. So the notion of variational calculus first variation second variation is fundamental to the development of many of underlying algorithms I would like to highlight some of the fundamental properties of this notion of first and second variation within the context of multivariate calculus. So let X be a vector let delta X be another small vector with small components we call this a perturbation vector or a small increment so X X plus delta so F of X F of X plus delta X when X changes value of F also changes. So change in X induces a change in F the change in F is called delta F. So what is delta F delta F is the resulting change in the value of F of X induced by the increment in X the increment is delta X. So what is that you can think of now there is a black box that is F if you give X it gives you F of X if you give X plus delta X it is going to give you F of X plus delta X but F of delta X I would like to be able to express it in terms of F of X itself I would like to be able to compute approximately what F of X and that is where the notion of computing the induced variation. So what is delta F delta F is the difference between the new value and the old value the induced perturbation. So input perturbation induced perturbation delta F is called the induced perturbation this is the input perturbation. So if F is a smooth function smooth function means what it is differentiable up to order 2 that is C2 you remember the notion of C2 C k functions. So to be able to do Taylor series expansion a function you should not only be continuous but also by differential at least once differentiable if a function is k differentiable I can consider the kth order Taylor series expansion. So I am assuming a better minimum a function is in C2 if a function is in C2 then I can compute the increment delta F to a second order accuracy. So F of X plus delta X that the actual value of the function at the new point is approximately equal to the function value of the old point plus increment one correction this is second correction this is called the first order correction this is called the second order correction. This first order correction is denoted by delta F the second order correction is denoted by delta which is within bracket 2 of F. So this is called the first variation this is called the second variation likewise I can consider the kth variation the larger the order of variation I can add more accurate the value becomes if you chop off at any level it is only an approximation that is why approximation symbols are important in here. What is delta F delta F is simply the inner product of the gradient with delta X what is the second derivative it is a quadratic form delta X transpose this looks like X transpose AX what is this this is the Hessian please remember this is the Hessian this is X transpose AX. So that is called the second variation term so I am I have given you the definition of first variation second variation the first variation is linear in delta X second variation is quadratic in delta X therefore when you are talking about very variational methods 3d var 40 var we are interested in computing the increment suffered by the output resulting from increments the input and that is where the notion of first variation second variation comes into being and these are essentially the so so the variational calculus within this setup is derived out of the fundamental concept that underlie Taylor series expansion here we are concerned with second up to second order in principle I can also go up to kth order. So given this now I am going to give you formulas for computing the first variation much like I gave you formulas for computing the derivatives so these are the tables of variational calculus just like tables of multivariate calculus tables of univariate calculus. So what are the differential coefficient of standard functions what are the differential coefficient of standard function multivariate calculus what are all the first variation formulas for our various cases and that is what I am trying to so these again help you to develop that skill to compute all these quantities which are fundamental to applications. So let us fx be look at this now X is in Rn sorry that is correct X is in Rn f is a vector of size m so f is m vector n I am sorry f is m vector x is n vector I am now going to be concerned with the first variation of delta f. So the first variation of f is simply the first variation f1 f2 fm f1 f2 fm are all independent compute the first variation f1 compute the first variation f2 stack them all together you get a you get the first variation of that first variation f1 is simply the inner product of gradient of f1 with delta x gradient of f2 with gradient of f1 with so you get the formula and this delta x is a common factor and the resulting one is a matrix it can be very easily verified this is the Jacobian times this so the first variation of a function is related to the matrix vector product the matrix being the Jacobian the vector being the increment. So this is a beautiful formula that we will use repeatedly in the derivations of 3D or 4D or that is the reason why they are called variational methods here are some examples again if ffx look at this now I am using the same example through to compute the gradient to compute the Hessian to compute the first variation to compute second variation the reason I am keeping the same example is because then you can see the interrelations between the gradient and the variations I think it is it is the ability to knit together a picture how these relations are built is fundamental to a thorough understanding of what we are planning to achieve in this course. So if ffx is a scalar valid function of a vector so the first variation is simply the inner product of the first variation of a with delta x again this is the first formula in variational calculus if ffx is equal to one half of x transpose ax delta f is equal to the inner product of ax with delta x here a symmetric obviously a symmetric because of quadratic form. Now the third example should be very familiar to all those who have done 3D var z is the observation hfx is the model prediction z-hfx is the error so this is the sum of the squared errors this is the function that we are often using in least square methods to minimize so given z given h we would like to minimize this with respect to x so this function is the cost function of the linear least square problem so not to be able to compute the solution for the linear least square problem I need to be able to compute the first variation I need to be able to compute the gradient I am giving the formula for the first variation of f is given by the inner product of this vector again these are simple exercises these are obvious I did not go into the derivation of each of these things I am trying to hit on various important themes and you have to fill in the blanks for a thorough understanding of all these things the aim of a course like this is not to provide all the details we will not be able to accomplish much if that were the case the aim is to be able to tell all the important concepts to see how things are needed once you understand once you have developed a bigger picture then you can dig deeply into each of these so I would like you to be able to develop that digging deeply as you go through the modules with this we come to the end of this part I have given several exercise problems these are exercises are essentially extensions of the concepts we were we had talked about look at this as an example the first problem essentially ask you to compute the first order Taylor series and second order Taylor series not for any arbitrary H but for a special H when the forms of H are given in a specific way so this will these are concrete examples if you did it you will have that final aha again compute the first variations again verify the different formulas that we have already talked about so doing this in long hand in pencil paper would help to complete the picture here what is the standard reference for multivariate calculus my favorite is a slightly older book but it is still a classic in my mind Apostle mathematical analysis I have a copy of it whenever I get into difficulty which I do very often I fall back on Apostle Apostle is a beautiful written book on multivariate calculus with that we will conclude this discussion and overview of concepts and and and and and and properties that are often used in from multivariate calculus in data simulation thank you