 So, pre mid-sem we were looking at problem discretization and towards the end we looked at how to discretize a boundary value problem using method of these squares and then we moved out to the Galerkin method and I suppose you have learnt more about Galerkin method in the mid-sem, you know more about the Galerkin method. Now before I move on to now tools, I just want to finish off one topic that is least square estimation. So, there is one more piece remaining as far as least square estimation is concerned. We talked about model building using least squares method of least squares and I gave several examples though through while talking about least squares though I did not formally introduced to model building. I am going to do that a little bit today and then I am going to discuss a method called a Gauss-Newton method. A Gauss-Newton method is a cross between Newton's method also called Newton-Raphson method and least square estimation. So, this method is used when the parameters in the model appear non-linearly and there is no transformation by which you can you know linearize the non-linear model. If you have such a model then you cannot use linear least squares, you have to use something else and that is Gauss-Newton method. So, today I am going to spend time on a little bit on model building first then move on to Gauss-Newton method. So, this function approximations in engineering is very common and that is what I would concentrate on. So, I am not just restricting myself to polynomial approximations, I want to talk about function approximations in general and here we are talking about a function y is some function of a vector x and some parameters theta plus error, we are talking about a model of this form where f is some function okay, y here is typically a scalar value, x can be n dimensional vector in general and theta is an m dimensional vector, e is the approximation error. Well, when I am writing this model I am already making some assumptions and I want to free you from that particular assumption right now to begin with. In the begin with let us not worry about this error, where the error appears we will look at the error little more systematically, so y is some dependent variable, x is a vector of independent variables theta are the parameters in the model and then I want to fit a model of this form okay, I want to fit a model of this form and I have given you examples, first example was cp versus c square or cp is equal to a plus bt plus ct square plus dt cube okay. In this particular example we have y corresponding to cp and x corresponding to t, I gave you second example was Nusselt number equal to you have correlations like this and in this case y corresponds to Nusselt number and x corresponds to Prandtl number and x corresponds to Prandtl number, Reynolds number and mu by mu s okay, x corresponds to these three variables Prandtl, Reynolds and mu by mu s whereas y corresponds to Nusselt number, you have correlations for spitt number, you have your correlations for Sherwood number, mass transfer. Now the nice thing about these two particular models, models in engineering I am talking about function approximations because here you are using exponential function, here you are using polynomial function okay, using exponential function for approximation, you are using polynomial function for approximation, different kinds of functions are used for approximations okay, polynomial is the simplest one but not always you use polynomial, you use all kinds of other functions okay and then the problem is given data of cp and temperature you want to estimate abc abcd or in this case given data of the set number, Prandtl number Reynolds number and mu by mu s you want to estimate a alpha beta gamma right and what was nice about these two models was you could transform them to a linear in parameter form, this is already a linear in parameter form isn't it, the model parameters appear linear when you say something is linear, you have to remember the basic definition any function g of x, so g of theta okay, if you can write alpha theta 1 plus beta theta 2 is equal to alpha g of theta 1 plus beta g of theta 2, when I am saying that this particular model is linear in parameter, what is the parameter vector here, theta is abc, what is the parameter vector here, a alpha beta gamma okay, so when I am saying that this particular model is linear in parameter which means this definition holds with respect to theta okay, if I take 2 theta, theta 1 theta 2 okay, I can write f of theta g of theta will be in this case g of theta will be a plus bt plus ct square okay, if I take 2 theta parameters a, bc and a prime, b prime, c prime, then I can write as addition of 2 functions okay, which means abc here appear in a linear manner, if you try to apply this definition to the second function it will not hold okay, because if you this is not a linear in parameter function right, the parameters are alpha beta gamma obviously they are not appearing in a linear manner okay, so but the saving grace here was I could do a transformation okay, so I could actually make a transformation which made the model look like a linear in parameter model and that transformation was log of nu was log a plus alpha log pr plus beta log Reynolds number plus gamma log mu by mu s okay, so with respect to this transform model when I said the y was y corresponding to log of nu, theta corresponding to you know log a alpha beta gamma, when my theta was this with respect to this parameters, these parameters and a transform model okay, we had a linear in parameter model and we could use linear v squares okay, we could use linear v squares but the problem of estimating model parameters okay appears in many engineering problems where the model is not transformable okay, I will give you simple examples from what you know in chemical engineering, so let us take 2 examples which you cannot do any linearizing transformation. So one of this is the friction factor, just remember all these are correlations, functional approximations these are not many times derived from some fundamental physics, there is some fundamental physics of course when you write dimensionless groups but beyond that when you find out those alpha beta gamma okay those are correlations, so I would call them as semi empirical models because part of it comes from actually physics the dimensionless groups arrive from, you arrive at the dimensionless groups from physics okay but then the correlations coefficients are from data okay, so we would call such models as a gray box models okay, so this is the friction factor correlation, so this is alpha log okay, now this is a very very funny thing because I want to estimate alpha beta, my y here let us say if I call y as you know root f, if I call y as root f okay and if I call x as Reynolds number okay, I want to fit this correlation between friction factor and Reynolds number and my theta here is alpha and beta okay, if you are given a data of friction factor versus Reynolds number okay and if you want to find out alpha beta for this particular model there is no linearizing transformation because y appears here you see here, because y appears here and it is very funny okay, so you cannot do this in a very simple way you have to think about something else, well the other correlations which you are very very familiar with are say Redlich-Koch equation say P is equal to RT minus V minus B minus A upon root T V plus I hope I am correct, so this is the Redlich-Koch equation you do not know A and B okay, you do not know A and B you will be given data of PVT okay, I want to fit R of course you know, I want to fit I want to find AB values okay and estimate this from data, I just have PVT data for a particular substance I want to find out AB okay, so this is a classic problem in thermodynamics you want to find out A and B by these squares, now mind you when you say that this is equal to this okay, when you write this that this is equal to this, this is equal to this these are all approximate correlations in reality they are not equal okay, in reality they are not equal actually though the correct way of writing you should say that estimate of P, estimate of P is equal to this that is what you should write okay it is not true P but nevertheless we while using in engineering we do not really get into these niceties of statistics we just assume that this is the exact correlation and then we use it we do not worry about the errors in the approximation and so on and when we take care of errors in approximation by doing some kind of over design so that you know your modeling errors are somehow covered okay, so when you do all this when you use heat transfer coefficient estimated using the correlation asset number or when you use mass transfer coefficient all these are approximations they are not exact and then that is the reason why we do some kind of over design and try to get over the problem so this now here the problem is there how do you estimate model parameters, well I would like to still have my paradigm of least square estimation in there and I want to get a least square estimate of theta nevertheless but you know the third example is what is this correlation antoine correlation antoine correlation so antoine correlation again is not exact you know for a particular substance you go to Paris handbook and find out ABC okay these are fitted coefficients this is not a true relationship this is a correlation this is a function approximation okay that correlates vapor pressure with temperature okay so in all these models all these models you have a problem see so I would classify the models in two classes one is I have a model which is I have this model y is equal to f of x theta in some cases if theta appears in linear model if the theta is appearing in non-linear way in the model this is the only way I can write if theta is somehow by some transformation or something is appearing in linear model then in that case you will get y is equal to theta 1 okay see this is linear in parameter model this is general non-linear in parameter model and this is linear in parameter model okay when you had a linear in parameter model okay you had a nice way of estimating theta 1 to theta m least square estimates what was the method when we collected the data for linear in parameter models for linear in parameter models we could write this we could write this from the data that is y 1 y 2 okay we could write it as a linear equation large number of equations less number of parameters and we could solve this we could solve this like this up to y capital N let us say f 1 we have collected N data points we wrote like this we said that we have large number of equations and we we assigned error in approximation only to error in the equation we assigned error in approximation to error in equation what if what if your x itself is wrong temperature measurements I am collecting temperature measurements from a plant okay and my temperature measurements have error in measurement errors when I collect temperature measurements I never get exact value of what is there in the plant what is there in the system I get always an error picked up because of variety of reasons it could be you know if I have a electronic system which is collecting data there could be noise entering my data because of variety of reasons some fluctuation in the circuit some loose contact okay so temperature inside let us say my system is 100 degrees I might get 101 I might get 199 I know that boiling point of water I am dipping at thermometer in the at you know in Bombay in boiling water it should be 100 I do not get 100 I get 100.5 100.2 in fact if you collect the data in the computer you will see it is fluctuating around 100 you never get 100 actually you expect 100 right should be exactly 100 but so when I am developing a model of this form okay my x is t but my temperature itself could be wrong okay well there could be errors in the estimation of PV okay so there are actually three kinds of errors one is errors in the measurement well in this one lecture I cannot answer how to deal with all three of them it is very very complex business how to separate these errors and get a correct model would probably require almost half a course to deal with all of them the simplest one is the so-called equation error so we have three possible sources of error your x here your y measurements your y measurements could be wrong okay your x here inputs to the model they could be wrongly measured okay PVT data I am collecting pressure sensor can give me wrong data slightly wrong data volume estimates could be wrong temperature measurements could be wrong okay so now see when you when you propose a model you see the trouble when you are developing a model when you propose a model you are saying that true pressure equal to true volume into true whatever PVT relationship you say that it holds for the true volume and true pressure and true but when you measure there are errors you do not know what is the truth and you take 100 measurements each one of them has an error okay so how do you how do you arrive at the correct so it is a very statistically or in terms of statistics it is not so easy problem to solve well that these are the two possible sources of errors in the measurement itself the third is that your expression which you are fitting itself is an approximation right TP versus temperature okay I may choose to fit a line or I may choose to fit a parabola or I may choose to fit a cubic equation depending upon the range I am covering okay and the fit I get you know if there is a temperature range is large I may want to fit a cubic equation the temperature range is small linear equation might do okay so there is an error when you propose a model because these are approximate correlations the error committed when you do that and there are error committed in the measurements right now I am going to blame everything to equation error I am not going to worry too much about these errors okay I am not going to worry too much about errors in the measurement let us say we try to remove the errors in the measurement by taking repeated measurements and taking averages okay and we try to do some kind of compensation for let us assume for the time being because dealing with modeling problem when you blame everything to equation error is much easier than dealing with the modeling problem when you want to consider all the three errors in the notes I have discussed briefly about what is done when there are all the three errors but solving that problem can become fairly fairly complex okay now I just want to do one thing now my aim right now is that if I have a non-linear in parameter model okay can I can I somehow extend what I could do here what what you could do here you could use method of these squares how was the method of these squares so I call this vector as capital Y let us say this matrix was my A matrix this is my theta vector okay and this is my error vector this is my error vector and then I had a very nice way of estimating or generating the v square estimates and then we looked at its variety of interpretations from geometric view point and from we arrived at the results from the algebra and so on so minimum of with respect to theta error transpose error this we could find analytically and we got theta v square is equal to A transpose A inverse A transpose Y everyone with me on this we got this v square estimates okay that is because that this particular model was linear in parameter model we could get least square estimates for the special case when all the errors were blame to equation errors these are called as equation errors okay I am going to blame everything to the equation errors this model general model which is non-linear in parameter can I can I do something so that I use this kind of expression to arrive at a solution for the non-linear model I am not able to transform I am not able to transform if I am removing here all those models which are transformable okay so all those models you know Nusselt number equal to something which are transformable I am just removing them by linearizing transformations I am talking about the models like PVT relationships where you cannot transform or that friction factor that Blasius correlation is called Blasius correlation friction factor correlation is you cannot transform there is no way you can do it okay how do you now come up with a way of using this business A transform A transpose A inverse A transpose Y okay and still estimate this parameters of this theta this method which I am going to derive is called as Gauss Newton method okay Gauss Newton method plays that trick well as I told you that there are very limited options when we when it comes to approximating something we have looked at two options previously one option was interpolation other option was Taylor series approximation in fact the Taylor series approximation gave rise to Newton's method if you remember right I am going to use the same trick I am going to use Taylor series approximation okay transform this model which is non-linear in parameter to locally a linear in parameter model okay and form an iteration scheme by which I will be able to use something like this okay I will be able to use something like this okay and then use it for parameter estimation okay so let us see how we do this okay so I have this model which is non-linear in parameter model I have collected data first of all I have collected data Y1 Y2 Y3 up to Y capital N this is the dependent variable data which is collected and well X could be a vector in general X could be a vector that is why I am giving this superscript this is the notation which we have been using throughout the course X1 X2 correspondingly Xn these are the data collected this is the data collected and I want to fit a function form which is Y is equal to f of X theta plus error I want to fit a function form okay now before we go to this method before we go to this method of these squares or successive these squares Gauss Newton method let us first look at what is the raw problem then we will see how to form the so what kind of equations that you have here if you have this data and this Y if I am proposing a model which is equation error model I will call this equation error model this is equation error model why equation error model because I am not saying that there is any error in X or Y measurements of X or Y I am saying there is an overall error which is blame to equation error okay which is blame to equation error so this is a combination of all possible errors that can occur in approximation that means blame to one additive term here E okay now this is a non-linear in parameter model okay and what is the least square problem that that would arise from this when I want to estimate theta so what was the problem here E transpose E if you remember here what is E transpose E this is summation I going from 1 to n EI square right okay so in this case my problem would be very very similar so I have these equations coming from this data Y1 is equal to F X1 theta plus even Y2 is equal to F X2 theta plus E2 these are my data points Y1 Y2 X1 X2 I am just substituting them in my equation form how many equations I will get capital N equations I will get large number of equations okay so likewise I will get Yn is equal to F Xn theta En right I got this N equations capital N equations is how many unknowns theta there are only M unknowns so here here theta belongs to Rm there are M parameters to be estimated M could be typically 3 or 4 parameters to be estimated you have data which is 100 200 data points large number of data points so now here you have large number of equations is only 4 unknowns only 4 unknowns okay how do I want to estimate the estimate theta so I will define an error EI that is YI minus F XI theta if I guess a value of theta if I somehow guess a value of theta I can estimate the error right because X is known to me Y is known to me right X is known to me Y is known to me somehow I guess a value of theta okay see for example this relationship you are fitting in some homologous series you take ABC values for the previous series you get a good guess okay there may not be too far away from the next compound then you are giving a good guess and you can compute EI right if I give a guess now the way I want to estimate theta is again by these squares so I want to estimate minimize with respect to theta sigma EI square I going from 1 to n let us call this let us call this objective function as phi this is my objective function this is my objective function okay how do you solve this problem what is the where will you get the optimum what is the necessary condition for optimality doh phi by doh theta equal to 0 so you will get the necessary condition for optimality the necessary condition for optimality is there are M parameters I should differentiate this doh phi by doh with respect to theta I set it equal to 0 how many equations I will get I will get M equations in M unknowns typically these M equations in M unknowns will be non-linear with respect to theta I can solve it by Newton Raphson I get M equations in M unknowns theta unknowns I can solve these equations by Newton Raphson that could be a way to go okay that could be a way to go you could just give this problem to the Newton Raphson solver Newton Raphson solver keeps guessing every time it differentiates very very painful task every time it differentiates and then it well I can do the same thing by a little bit modification which is called as Gauss-Newton method so that is what I am going to now derive at okay so now what I am going to do is I am going to come up with the iterative procedure to solve the same problem I want to solve this problem okay but with a little bit of modification okay so what I am going to do now here okay I am going to linearize this model okay if I linearize this model that will help me solve the problem in slightly different way okay so let's say let's say theta bar is my guess theta bar is my guess solution okay theta bar is my guess solution now I am going to linearize this model in the neighborhood of theta bar okay so what will it be y is equal to f x theta bar plus dou f by dou theta one okay I am going to I am going to linearize this using Taylor series expansion okay now this e here e here was defined when the model was exact okay now I am approximating the model so I am going to replace this by a term epsilon I am going to replace this by term epsilon now just remember this data x this data x is known to me x is given to me I have collected data y is known to me if x is known to me and if I have a guess theta this value is computable this is a known value okay what is what is this delta theta one delta theta one is theta one which is unknown minus theta one bar perturbation from guess solution this is perturbation from the guess solution okay so this is my theta one bar likewise this one is theta m minus theta m bar you have this partial derivative is appearing here okay when you do linearization where do you compute the partial derivatives at which point at the known value right you compute the partial derivative at the known value so this partial derivative is computed at theta equal to theta bar this partial derivative is computed at theta equal to theta bar this partial derivatives are known to me okay now if you look at this linearized model so you can view this as a transformation a local transformation of the model in which the transform model looks like a linear in parameter model why this becomes a linear in parameter model with respect to delta theta one delta theta two delta theta three this is a linear in parameter model original model is not linear in parameter the transform model through Taylor series approximation is linear in parameter okay so instead of solving instead of solving for theta directly I could choose to solve for delta theta I could choose to solve for delta theta if I solve for delta theta then I can recover a new theta by adding delta theta to theta I will get a new guess and I go on doing this like in Newton Raphson okay so the next step after linearization is to write this equation for the linearized model so okay so what I am going to do is this will be y1 is equal to fx1 theta bar plus dou fx1 theta bar dou theta one delta theta one plus this is computed at theta bar epsilon one then you have the second equation y1 is equal to fx2 theta bar plus dou fx2 theta bar dou theta one delta theta one dou f epsilon two and likewise I have capital N equations the notation becomes little bit complex but if you understand the concept it is not at all difficult so I write these equations I write these equations at theta m delta theta m plus epsilon n so I have capital N equations with me now okay which are linearized linearizing transformation through Taylor series okay these are partial derivatives these are the partial derivatives computed at x1 theta bar x1 theta bar this partial derivative computed at x2 theta bar x2 theta bar and so on so at each point each data point you are linearizing the nonlinear equation using Taylor series approximation ignoring the terms higher than the second order and getting this transformed linear equation okay now with respect to delta theta one delta theta two delta theta m these are linear in parameter because these partial derivatives are computed at a fixed point they are known to you okay these are known to you this this f is known to you this y is known to you okay so now I can apply A transpose A inverse business okay I have to do it iteratively so now how do you come up with the A transpose A inverse business let us go back here now watch carefully what I am doing now on the left hand side is going to be y1 minus f x1 theta bar y2 minus f x2 theta bar likewise yn minus f xn theta bar is equal to is equal to this dou dou theta one and we create this matrix okay this plus of course we have all these errors which are copying up so plus epsilon one epsilon two up to epsilon n okay epsilon n so let me call this matrix as a theta bar y theta bar because theta bar is a guess theta bar is a guess okay so this matrix is a function of your guess okay but if you give me this if you give me this equation here y1 to yn is known theta bar is a guess so this part f is known so this vector is known okay let us call this vector as delta delta y let us call this vector as let us call this vector as a theta let us call this vector as delta theta okay then then we know how to solve this we know how to get least square estimates of delta theta isn't it we know how to get the square estimates of delta theta so the model that we have finally got is delta y is equal to a theta bar a which is function of theta bar into delta theta plus let us call this epsilon error vector okay that is these are all small epsilon and this is a big epsilon if you can see it you should think about it as a big epsilon so let me draw it as capital epsilon so this is my capital epsilon and what I would like to do is to find out now minimize epsilon 1 square to epsilon n square with respect to delta theta this problem is can be solved analytically how do you solve this problem so what I do here is delta theta least square is equal to a theta bar transpose a theta bar inverse a theta bar transpose delta y this is my solution okay this is my solution and then what do I do with this solution okay what I do is to generate an iteration so I will say now new theta theta new equal to theta bar plus delta theta least square okay delta theta least square what do I do with new theta I put it back I really realize around this new theta okay and then keep on doing this when will you stop not not not yeah first vector delta y should become as small as possible delta y should become as small as possible or in other words you want you want sigma i going from so we had this phi which was sigma i going from one to one e i square this should become smaller than certain value pre-specified value you wanted to minimize error in the original model this is a transform problem okay minimizing error in the transform problem does not mean minimizing error in the original problem okay so your termination criteria is based on the original errors not on the original errors can be calculated actually actually what she has rightly pointed out is that this first vector will give you the original errors for every guess okay if that vector becomes small you are done okay so you are iteratively going to calculate just like Newton Raphson method or Newton's method what we call this is Gauss Newton method why Gauss neither Gauss nor Newton invented this but the word Newton comes because you are doing Taylor series approximation and locally linearizing okay why Gauss because we are using these squares linear these squares okay so these two giants are merged to form this name name of these two giants is merged to give this name Gauss Newton method to this so if I write a formal algorithm of course I will say that instead of theta bar I will say that you start with a guess theta 0 okay and then you get theta 1 from theta 0 and theta 2 from theta 0 and so on so my iterative algorithm would look like my iterative algorithm would look like theta k plus 1 is equal to theta k plus delta theta k delta theta k my iterative algorithm will read like this and where delta theta k is a solution of A theta k transpose A theta k delta theta k is equal to A theta k transpose delta y I will call this delta y k because this right hand side also depends upon theta bar okay so if I write a iterative algorithm where k goes from here k starts from k starts from 0 1 2 okay and you stop only when the termination criteria is met okay so this is Gauss Newton method this can be used to estimate parameters of any non-linear in parameter model any non-linear in parameter model you do not have to worry about linear in parameter case and just like Newton-Raphson you can use this iteratively to come up with a estimation of the parameters well remember that whether you get a meaningful solution or not will depend upon what is your initial guess so this guy is most important theta 0 okay you have to give a correct guess and this will come only from your engineering knowledge okay so how do you give a good guess okay that is where you read your engineering physics chemistry whatever background to come up with a good guess if you give a good guess this method will converge to the solution this is an iterative procedure unlike the earlier case where you could form you could find the global solution in one shot okay that is not possible here here you may not get a global solution you might get a local solution depending upon your guess you might go towards a local minimum you are minimizing problem iteratively by you know formulating a linearized problem which is which looks like linear v squares and then a sequence of linear v square see what is done in Newton's method nonlinear problem is solved by sequence of linear algebraic equations you know you form set of linear algebraic equations and you hope that that sequence converges to the true value of the solution that is a true solution okay what is done here you form a sequence of linear v square approximations okay you form a sequence of linear v square problems which you hope to converge to the which you hope to converge to the true optimum solution okay so this was this was the last in the series of lectures of approximations and now on from so we come to close of this module on problem discretization and problem transformation in the next class onwards we will start looking at the tools okay so the tools will be solving ax equal to b nonlinear algebraic equations od initial value problems and the fourth tool is of course stochastic tools which i am not going to discuss in this class so i hope to cover now three tools post midsem and these three tools are ax equal to b what you will see is that solving ax equal to b there is lot more to it than what we have done in your undergraduate okay i will be spending almost three weeks on just how to solve ax equal to b okay you already have taste of large problems right 10,000 equations in 10,000 unknowns not not uncommon when you are solving partial differential equations so you better have some better you know y schemes to solve ax equal to b otherwise you will end up into you know the computation time can become exceedingly large so we look at ax equal to b look at nonlinear algebraic equations solving and move on to od initial value problems and that is where we will close the post midsem so the tools will be covered in the post midsem that will cover the entire course