 criteria for deciding whether a point is maximum or minimum how do you qualify a point to be maximum or minimum the problem at hand was minimizing a function a scalar function from m dimensions to 1 dimension rm to r and we wanted to qualify which point in the state space qualifies to be an optimum point so we said the point at which the first derivative the gradient goes to 0 that point is a stationary point to be very precise it could be a maximum it could be a minimum it could be a saddle point we do not know okay. So to come up with you know qualification further qualification we have to look at the second derivative the Hessian matrix and if Hessian matrix is positive definite or positive semi-definite it gives us way to qualify the point to be a minimum or a maximum so it should be strictly positive definite if it is to be a minimum it should be strictly negative definite it is to be a maximum okay. So let me just summarize what we looked at so this was so we are looking at necessary and sufficient conditions for optimality we have this function phi x from rm to r and we said that this phi function is twice differentiable okay at any point right now this x belongs to rm, m dimensional space okay and then we said that if a point x equal to x bar if this point x equal to x bar we want to see whether this is a stationary point then gradient of phi with respect to x at x equal to x bar so this is nothing but dou phi by dou x1 dou phi by dou x2 this gradient vector should be equal to 0 and the way this necessary condition was proved by considering the fact that if you take it non-zero it contradicts the fact that x bar is a local minimum well we are talking only about local minimum this is a condition to be satisfied at a local minimum because all the arguments regarding a point to be local minimum or a local maximum that is a stationary point we are made using Taylor series expansion. Taylor series expansion holds only locally it does not hold everywhere globally so these conditions necessary condition is a local condition just remember that so first thing is given an objective function first thing is to compute its gradient and set it equal to 0 that will give you if you can find such a point then that qualifies to be a stationary point a stationary point where you have gradient equal to 0 we further qualify so this x bar is a stationary point it is a stationary point and whether the stationary point further qualifies to be optimum that is maximum or a minimum that will depend upon this matrix so by the way this derivative is computed at x equal to x bar this computed at x equal to x bar okay now phi of x of x bar plus delta x we wrote this as you know phi of x bar plus phi at x bar transpose delta x plus half if we approximate this locally using Taylor series it is delta x transpose del square phi and now we are set that this is a stationary point so this is 0 so this is 0 right and then then locally we want to look at this difference which is governed by locally in a small neighborhood of x bar this difference is governed by this delta x transpose this term governs the local behavior in the neighborhood of x bar okay now if x bar is a minimum what should happen if I move away from the minimum what should happen value should increase so this difference should always be positive any direction I try to move away from x bar if x bar is a minimum any direction I try to move away from x bar I should have this difference to be positive that will be possible only when this matrix is positive definite if this matrix is positive definite what is definition of positive definiteness x transpose ax is always greater than 0 for any non-zero x anyway any direction I try to move okay this will always be positive that is positive definite matrix if this is always positive irrespective of whichever direction I try to move away from x bar then x bar is a minimum okay just whatever you can visualize in two dimensions or three dimensions same thing holds in n dimensions if you try to move away from that point just think of being in a valley any which way if there is a lowest point in the valley any which way you try to move you will only increase height will not go further below okay if you are at a minimum then that is the logic simple logic okay how do you mathematically express this using positive definite matrices okay mathematical quantification of simple fact that you know that when you are in a valley at the lowest point in the valley any direction you try to move okay height will increase it cannot further decrease same thing same thing is being said here just remember that simple thing and correlate with this mathematical analysis then you will understand it better now other way round now if x bar is a maximum you are at a peak what should happen anyway which you move now height will decrease anyway which you move so what should happen is if I move make any movement delta x from x bar this difference should be negative okay when will that happen for any delta x that is very important for any delta x that will happen if this is negative definite if this Hessian matrix is negative definite I am sure that any which way I move okay this is going to be negative so that means x bar is a local peak of course this local condition should also be satisfied at the global condition however this analysis does not tell us how to reach the global conditions it just tells us given a point if the derivative is 0 how do we qualify minimum or a maximum locally okay now this condition can be checked for objective function which is twice differentiable of course you cannot check it if your objective function is not differentiable or twice differentiable so that is very very critical so this is how what if your matrix is neither positive definite nor negative definite then the point is neither a minimum nor a maximum you know in terms of a hill it could be you know something like this like a step where gradient with respect is becoming 0 or a saddle point in what you know as a saddle point so a multi-dimensional extension of a saddle point you know let us say you have a valley but a valley which is something like this well it is difficult to draw I am not good at let us say there is no one unique point there is no one unique point where you have you know you have multiple points where the gradient goes to 0 so you may have minimum with respect to one variable not a minimum with respect to other variables so you may have a saddle condition where there is not a unique point where so a saddle point is qualified by looking at property of this matrix local matrix the nice thing is that definiteness of this particular matrix will allow us to find out whether particular matrix is a particular point is you know maximum minimum or a saddle point okay now let us come back to our problem of fitting CP versus temperature okay I am going to generalize this set of equations and then going to solve it so now let me rewrite this problem which we had at hand we had this CP values CP1 CP2 CPn I had this n values I had n values I had collected this CP values at different temperatures that T1 T2 T3 Tn large number of right I am just rewriting those equations in a matrix form earlier I had just put them one below each other as single equations I am now just writing them as one matrix equation that will help me solve the problem very easily so now this is one T1 T1 square one T2 T2 square just have a look at those earlier equations I am just revisiting them writing them in a different form everyone with me on this same equations how many unknowns n plus 3 and number of equations right now we have are only 3 okay so first thing that we need to do is to define an objective function that objective function should be twice differentiable right that objective function should be twice differentiable and then we should you know minimize that and see whether the minimum is a positive definite matrix or negative definite matrix then we will be able to qualify whether we have reached a minimum or not okay so now I am going to call this as x I am going to call this as x I am going to call this vector as capital E x is a vector which is three dimensional in this case in general x will be a vector which is m dimensional suppose you have cubic equation which you are fitting there will be 4 okay and so on so if you fit a higher order equation you will get higher order polynomial and I am going to call this matrix as matrix A let me be let me see whether I am consistent with the notation slight change in the notation I am going to call this as theta I am calling it theta because theta is typically if you see literature on parameter estimation a parameter vector is called theta okay that is why I am calling it theta no nothing particular about this also in the notes I have developed everything with notation as theta so when you read the notes it will be easier for you to follow okay so this particular vector this particular vector I am going to call as capital U this particular vector I am going to call as capital U this is A this is theta this is E my large number of linear equations which are these linear equations why but there is t1 square t2 square they are known I have taken measurements of temperature t1 t2 t3 tn are known I can compute this so these will be just columns of a matrix with some numbers so this is the linear matrix equation okay so my equation reduces to U is equal to U is equal to A theta plus E E is the vector of errors is a vector of errors okay now I need to define an objective function it will define an objective function and then this objective function should be differentiable I should take its first derivative I said it is equal to 0 take its derivative with respect to ABC okay now I am just moving from this specific problem to a general problem where theta in general could be m dimensional okay this A will be n cross m matrix see here m is 3 so this is n cross 3 matrix it is a tall matrix no so this n could be 100 and you have only 3 parameters to be estimated okay so I am going to define a phi of theta which is E transpose E which is nothing but E1 square plus E2 square En square some of the square of errors what are these what is E1 to En errors in modeling we are developing a correlation model of Cp versus temperature okay it is not exact fit it is an approximate correlation this E1 to En are errors okay I want to develop a model that minimizes some of the square of errors simple idea which probably you have heard or done in your undergraduate studies for single parameter probably we fit a line we very very usually fit a line I do not know whether we do in undergraduate we are taught to find out the best or v square fit but we fit a line by many times in experiments if you fit a line by you know just observing what is a trend visually okay we do not try to do all this business but then if you ask excel or matlab they will do this for you and give you okay so now so what is this this is equal to U minus A theta transpose U minus A theta right what is E U minus A theta U minus A theta okay now when do I get when do I how do I find the optimum this is a scalar function is this a scalar function why this is a simply this is simply norm E2 E2 square right some of the square of elements from norm E2 square norm is a scalar function if I it is a positive scalar function always be positive well an objective function need not always be positive but in this case it turns out to be strictly positive the smallest value E transpose E can take is 0 that means all the errors are 0 okay well that is possible for this particular problem on a different story but the smallest value it can take is 0 in this case okay so what I want to find out is I want to find out doh phi by doh theta that should be equal to so this should be doh phi by doh theta 1 to doh phi by doh theta m this should be equal to 0 vector okay now I need to know something about how do I differentiate so this is a this is a function of a vector which maps into a scalar how do I differentiate this how do I differentiate with respect to a vector we know how to differentiate with respect to a scalar value right so how do I differentiate with respect to vector I am just going to state the rules you can derive them they are not so difficult to derive you need to be a little bit patient and then do your algebra correctly so in terms of notation I am just digressing a little bit because I want to differentiate this E transpose E with respect to theta set that is not equal to 0 okay I will get m additional equations see these are already n equations I need m more equations m here happens to be 3 I need m more equations those equations will be obtained when I do this but before that I should know how to differentiate a function which is a scalar function which is function of a vector so let us say I have this function f which is x transpose by I have a function x which is x transpose by b is a matrix x and y are vectors I just want to differentiate f now f is a scalar function you can see that f is a scalar function okay how do I differentiate so doh by doh x of doh f by doh x differentiating this scalar function with respect to the first argument x okay that is b y doh f by doh y is equal to b transpose x okay and a straightforward corollary of this is that doh by doh x of x transpose b x is equal to 2 b x if b is symmetric if b is symmetric matrix then it looks b symmetric matrix okay then it is well if it is not a symmetric matrix then what will it be b plus b transpose b plus b transpose if it is not a symmetric matrix it will be b x plus b transpose x okay so we can use this rules to differentiate this function with respect to theta and come up with a solution so how do I do that so can you differentiate can you tell me what is the solution just differentiate just use these rules can you expand this say this will be phi is equal to u transpose u minus u transpose a theta minus theta transpose a transpose theta transpose a transpose u plus theta transpose a transpose a theta I have just expanded this I have just expanded four terms u is a vector u is a vector a theta is a vector right and then I am just expanding this simple matrix multiplication okay now what is u u is a vector of known values is a constant vector so what is doh phi by doh theta this part will be 0 what about this guy apply the rules correctly will be a transpose u a transpose u so if I differentiate this doh phi by doh theta I will get minus 2 a transpose u plus 2 times a transpose a theta I do not know whether you all agree with me why I am writing 2 times here this a transpose a is a special matrix what kind of matrix it is but me no why 1 by 1 it will be m cross m no in this case it will be a 3 cross 3 matrix can I say that a transpose a is always a symmetric matrix you take you take transpose of a transpose a you will get back a transpose a okay positive definite matrix matrix naturally appears here we do not have to make any effort okay it just plops out from the equations so this is a positive definite matrix in fact this will be turn out to be positive definite matrix this is a symmetric matrix this is a symmetric matrix will go on to show that this is positive definite also so this particular matrix is a symmetric matrix so I can write 2 a transpose a okay that is because that is because a transpose a just check this a transpose a transpose is equal to a transpose a so whatever is this matrix just remember that this particular matrix whatever is this matrix you know also at all it is thousand equations and 3 variables and it might be filled with all kinds of strange numbers this matrix that is a transpose a will always be symmetric matrix always be symmetric matrix in fact we will show that if these columns are linearly independent then that particular matrix will be positive definite matrix okay it will be a positive definite matrix so now the task is done if this is 0 if I set this equal to 0 what do I get I set this equal to 0 then what is the theta how do you compute theta what is this matrix what is size of this matrix it is m by m okay so if m is 3 this will be 3 cross 3 okay so this particular matrix so setting gradient equal to 0 gives us this equation a transpose a times theta is equal to a transpose u okay so this is m cross m what is dimension of this matrix a transpose is so this is m cross n into n cross 1 so this is m cross 1 so finally you are getting an equation which is 3 cross 3 matrix and how many equations you wanted and m equations or you wanted 3 equations in this particular case we wanted 3 equations we have those 3 equations now this is m equations okay in how many unknowns m unknowns this is m cross 1 this is m cross 1 m unknowns I solve this problem now if columns of a are linearly independent just think about what I am saying I am not going to prove it if columns of a are linearly independent then this matrix is invertible not only that not only that this matrix will always be positive definite okay before we proceed further and I show that why it is positive definite what is the second derivative of this a transpose a right so the second derivative dou 2 phi by dou theta square is equal to 2 times a transpose a right so now depending upon whether this matrix is positive definite okay if this turns out to be positive definite matrix then we are done then we have reached the minimum okay in fact this is the linear equation you can show that this minimum is not just local this is the global minimum for linear set of equations we reach the global minimum okay now my task is to show so that a transpose a is a positive definite symmetric matrix is very straight forward positive definite matrix okay now let us assume that a has columns are linearly independent columns of a are linearly independent okay what does it mean when will you get a x equal to 0 when will you get this if columns are linearly independent which vector x will give you 0 vector only 0 vector if columns are linearly independent this will be this will be 0 is same as saying x is equal to 0 you cannot get if a columns are linearly independent only where you can get a x equal to 0 is x is equal to 0 okay now how do you check what is the definition of positive definiteness positive definiteness is that x transpose a transpose a x should be greater than 0 if x is not equal to 0 right okay I am just going to write this I am going to write this as I am going to write this as a x transpose a x am I correct I am just clubbing x and a x and a a x transpose x okay so what is a x transpose x norm a x to square when will this be 0 if so there are two situations if a is a singular if a has columns which are linearly dependent there might be an x which will give you 0 but we have said that columns of a are linearly independent so if you have an x which is non-zero this vector is going to be non-zero and this is square of the norm square of the norm is always a positive number so whatever x value whatever x vector you give me a transpose a is going to be a symmetric positive definite very nice matrix okay matrix which we keep studying in linear algebra here it just pops out naturally as part of the development so a transpose a is a symmetric positive definite matrix I am showing here that for any x you know you will always get a positive number if x is not equal to 0 so this will be greater than 0 if x is not equal to 0 so this satisfies the definition of positive definiteness okay I do not there are algebraic conditions like eigenvalue should be positive and all that let us not worry about it right now we will visit that little later but here using the basic definition of positive definiteness I am showing that this a transpose a is always positive okay which means not only that I have reached a stationary point I have reached the minimum is the linear model there is only one minimum I have reached the minimum okay if I have to do all this using one norm of e or infinite norm of e is not possible because one norm of e is not differentiable I cannot use this nice theory okay infinite norm is not differentiable I cannot use all this nice theory so there are problems if I use some other norms two norms very very nice you know you can differentiate and get this result now I am going to relate all this to projections I will show that this is nothing but projecting a vector onto a subspace geometric interpretation what is happening here that is what I am going to do next so why we are able to relate it to projections because as I said in a product space in a Hilbert space you know with an inner product norm gets free you get a free norm two norm and you get angle you can talk about orthogonality you can talk about projections you can just you know generalize the ideas which you know from three dimension school geometry okay so this is a symmetric positive definite matrix we have reached the global minimum this is the solution any such problem I can solve using now do I have to stick to polynomials I really do not have to okay I can talk about any function approximation so I am going to take a scenario here I looked at cp as a function of temperature that was one specific example okay I do not have to stick to only polynomials I could any function which is linear in parameters I could actually estimate the parameters by this approach let us say I am approximating a function say u which is I have this function uz and I know its values at z1 z2 zn and these values are say u1 u2 un this is my data set I do not want to fit a polynomial I just let us say I want to fit sin and cos you are perfectly allowed to do that okay so I want to fit a function which is I want to fit a function say fz which is which is or let us call this function by some other notation okay so I want to fit a function say u cap z okay which is theta1 well this did not be function of only one variable right it could be function of multiple variables in this case okay let us take first situation when you have function of only one variable say z1 plus theta2 f2 z2 so I am fitting this I am fitting this function approximation of course there will be an error here there will be an error here so u cap is my approximation u cap is my approximation theta1 into f1z theta2 into f2z and so on okay polynomial was one type of approximation I had chosen this to be one I had chosen this to be z I had chosen the second to be z2 z3 and so on that is one particular type of approximation I could have chosen first one to be sinz second to be cosz third to be 2 sinz third fourth to be 2 cosz and so on cos2z sorry cos2z sin2z and I could have chosen some other functions I could have chosen the general polynomial I could have chosen shifted the general polynomial so it is up to me what should I choose I could have chosen all kinds of different functions here I can apply the same theory which is very nice because I know this functions I have chosen this functions okay I can evaluate them at a particular point okay so let us take a scenario where you want to approximate this let us take instead of being abstract let me put it in a concrete form so theta1 say f1 is my sinz plus theta2 cosz plus theta3 sin2z plus theta4 cos2z plus error let us say this is my this is my this is my approximation function okay what do I do I do the same thing which I did earlier I just write u1 u2 these are the values which I know at n different points they are the values that I know at n different points I can say first one is sinz1 cosz1 sin2z1 cos2z1 second row is sinz2 and finally sinzn cos so I write this huge matrix how many parameters 4 parameters 4 parameters so this is theta1 theta2 theta3 theta4 plus I have this e1 to en same equation okay so this f here could be any complex function mind you this particular model is linear in parameter okay we have formally defined what is the linear function sometime back so you should just check when you call a function to be linear function f of alpha x plus beta y will be alpha times f of x plus beta times f of y that will exactly hold here so this is a linear in parameter function any linear in parameter model you can do this okay you can actually so what is what what do I get here this is my u vector okay this is my a matrix this is my a matrix this is my theta and this is my e how do I get least square estimate of theta a transpose a so least square estimate of theta is simply theta hat least square that is what we write theta hat y hat it is an estimate of theta there are different ways of getting theta if you were to define one norm instead of two norms we will get a different theta so I am calling this as theta which is obtained through least square of errors okay that is why theta hat least squares is nothing but a transpose a now we have shown that this is a positive definite symmetric matrix a positive definite matrix means no eigenvalue 0 it must be invertible so I can write inverse a transpose u in fact this matrix is called as 0 inverse of a why 0 inverse a is a non-square matrix a is a non-square matrix okay a is n cross m I cannot invert it by the normal sense of n cross n square matrix this matrix why it is 0 inverse what happens if you multiply post multiplied by a when you call something to be inverse you multiply the matrix and matrix inverse you should get identity matrix take this matrix post multiplied not pre-multiply post multiplied by a what will happen identity matrix so that is why this a transpose a inverse a transpose is called as 0 inverse of a matrix in matlab there is a function called p i n v 0 inverse i n v can be used for square matrices p i n v can be used for non-square matrices you just say p i n v doing this in matlab is just two minutes formulate a matrix p i n v a times u you get the theta the square theta okay fraction of a second you can do this should know the theory behind this that is very important okay so is this clear I do not have to have polynomials I can have any complex function okay I will give you an example from chemical engineering and sometimes you do not have to begin with you may not have a linear equation but you might be able to come to a linear in parameter equation once you do a transformation say I will give you an example I want to fit data I am carrying out some reaction and I want to estimate the kinetics of the reaction okay so I have this r a is equal to or minus r a is equal to k c a to power n okay or let us put it k 0 e by r t right now maybe I should write in the order should be k 0 e to the power e by r t into c a to power n this is my model okay actually I should write this model with an error here we do not write it because this will not be exact model we are proposing this model it is an approximate model so we should strictly speaking should write an error error here say epsilon because do not confuse with e here e here is exponential so this is my model and I want to get d square estimates what are the unknown parameters k 0 e r is known okay and n what is the data that you have c a and t c a and t okay is the data that you have I could do a log transformation of this model okay I can say log of minus r a is equal to log k 0 minus e e by r t plus n log c a right n log c a okay now I can write this as log k 0 plus or rather minus e by r into 1 by t plus n log n log c a I have collected data for concentration and at different concentrations at different temperature I have data for rate rate expression I want to get a least square estimate okay is this a linear in parameter model what are the parameters log k 0 log k 0 e by r or e whatever e if you want to take r on this side you can do that since you know r so e is unknown parameter and n okay so here in this matrix what you will get in this matrix what I will get is 1 1 here then minus 1 by t 1 minus 1 by t 2 minus 1 by t n this column this column is coming because of this variable 1 by t okay the third column what will be the third column log c a 1 log c a 2 actually log I should write ln natural log log k 0 e by r and n and what should be here what should be here r 1 minus log r 1 minus log not minus log log of minus r n that minus is a notation so it should be log minus r 1 log minus r 2 you know all these reaction rates this remember minus is a notation not so we are not taking log of a negative number so this is the equation that you get this is the u vector which you know this is the u vector because you know the rates at different temperature and concentration okay you have estimates of the rates this matrix is known to you a matrix temperatures are known to you concentrations are known to you so this is known matrix this is known matrix this is my theta which I want to estimate okay what is the least square solution a transpose a inverse a transpose u this remember this formula very very important a transpose a inverse a transpose this is called a pseudo inverse of a transpose a inverse a transpose a transpose a transpose a is always a square matrix and if columns of a are linearly independent it is always invertible matrix and then and what you get here from the least square sense is the minimum okay there cannot be any other value of theta which will give you smaller sum of the square of errors for a linear in parameter model you cannot get you cannot get a model which will give you smaller value than this optimum value which you get okay so this is my least square problem this least square problem is used in many ways in the assignment I will give you other problems from chemical engineering where you can you do least square fitting to estimate the parameters sometimes what happens is that sometimes a model is not transformable to a linear it cannot be linearized I will call this as a linearization step in the linearization step you can do some transformation and convert originally the model is not in not a linear in parameter e and n and k0 they are multiplying and they do not appear in you know the fundamental definition will not satisfy here that is f of alpha x plus beta y is same as alpha f of x plus beta f of that will not be satisfied for this function but it will be satisfied this transform function now this is not possible always to get this I will tell you one more model in chemical engineering where you can do this transformation okay and estimate the parameters this is how it is done one more classic model in chemical engineering is let me if I am wrong Nusselt number is equal to alpha 1 into Reynolds number raise to alpha 2 and Prandtl number raise to alpha 3 right Nusselt number is equal to then how do I this is not a linear in parameter model what are the known quantities here I would have I would have data for Nusselt number Reynolds number for different flows I have collected data for you know Prandtl number reset number and I want to fit I want to fit and I want to estimate alpha 1 alpha 2 alpha 3 okay this is not a linear in parameter model but simple log transformation so if I take log of Nu it will be log alpha 1 plus alpha 2 log Reynolds number plus alpha 3 log Prandtl number this transform model is linear in parameter okay I could use least square estimation technique to estimate log alpha 1 alpha 2 alpha 3 if I get log alpha 1 I can get alpha 1 not an issue right so this is this is simple transformation what is mass transfer correlation Schmidt number is equal to or Sherwood number sorry Sherwood number is equal to yeah so SH is equal to some beta 1 plus Reynolds number raise to alpha 1 beta 2 plus what Schmidt number well for me it is almost 20 years now so you are fresh so right this is this is some 0.63 you also you may have viscosity correction mu by mu s raise to something fourth parameter so you may have four parameters but once you do this transformation it is a linear in parameter model you can use least square parameter estimation these columns will be different this column will be different same idea works you get least square estimates of alpha 1 alpha 2 alpha 3 alpha 4 just by taking A transpose A inverse so polynomial fitting is not the only thing I am just now polynomial fitting we started with because we wanted to do something with discretizing a boundary value problem or partial differential equation but least square estimation goes much much much beyond all this was well discovered by Gauss the famous mathematician well in the realm of mathematics or in the realm of history of mathematics Gauss is called as prince of mathematics and the work he did actually is now major fields okay least square estimation Gaussian densities Gaussian quadrature in so there is so many things so many things I mean I do not know what we would be doing if Gauss had not discovered Gauss was a child prodigy he discovered many things at very very early age and the things that he started actually he started looking at least square estimation because he wanted to fit I think he was looking at the problem of fitting a orbit into the data for obtained from motion of planets around the sun that was the problem that was being discussed at that time so what is the best fit it does not happen to be a circle actually it is a ellipse and you have to fit because if you look at a data the data has some errors and then you have to do a best fit to get the model correct model so what is started about 150 or 200 years back is now just spans I mean it is all these tools are used in image reconstruction they are used in soft sensing they are used in just name it least square estimation forms the foundation okay or work by Gauss forms the foundation so we are here today because of the sprints of mathematics Gauss.