 Okay, so let us have a short recap of what we have done, we have been looking at this ARX models, AR stands for auto regressive, X stands for exogenous inputs, external inputs. This terminology comes from time series modeling, as I said time series modeling is used everywhere not just in control. So for us X would mean measured input, known input, manipulated input okay exogenous for us would be input that is known, okay. We looked at this example I just very quickly go over it and then I will relate it to something that we have been looking at correlations, okay. So we started looking at this model we said that YK the measurement at instant K is some function of past measurements YK-1, YK-2 it is also function of past inputs okay. So a dynamical system very clear to see here the dynamical system as a memory what happens now is function of what has happened in the past and it is also function of the inputs the new inputs that you are given to the system, okay here we wanted it to be a white noise well the problem was estimating A1, A2, B1, B2 from data we have data for Y1, Y we have data for inputs U and we want to estimate model parameters A1, A2, B1 and B2. So we started writing these equations and then we got this set of equations we have a large number of equations we collected them into matrix form there was some error in my last time when I presented this slide I have corrected it. So now I have this matrix I have this matrix omega times theta plus E is the error vector this is equal to Y Y is the measurements stacked in time I have taken large number of measurements in this particular case there are 250 measurements in the real problem I would take 1000, 2000, 10,000 depending upon how do I choose how many number of measurements are required I am going to talk about it today, okay what is the relevance of this large numbers why do we need large number of measurements, okay. So we wrote this in this matrix equation capital this Y here Y vector is equal to omega times theta plus error vector and then we developed a least square estimate of parameters theta in this particular case the model is linear in parameters as I explained in my last lecture and you can estimate parameters theta you can find the optimum value the global optimum value analytically okay this analytical solution was obtained by setting the gradient equal to 0 okay we said this E transpose E is nothing but sum of the square of errors okay we said that sum of the square of errors derivative of this with respect to the parameters is set to 0 that gives us this equation and then this particular equation is a simple matrix equation it can be solved like this well you assume here that matrix omega has full column rank if it has full column rank then you can invert this omega transpose omega and then you can get least square estimates of parameters a sufficient condition to know that this is a minimum is to find the second derivative of the Hessian and the Hessian is always positive definite so we are guaranteed to reach the minimum so those of you know about necessary conditions and you know sufficient conditions for optimality this omega transpose omega is positive definite you reach the global minimum and then I found out this particular set of values from using MATLAB I can I can actually construct this omega matrix omega transpose omega will give me this particular model we said that this model was okay but you know the error was not white we wanted this error sequence to be white white noise we checked for the correlations so we checked for correlations autocorrelation of this error we found that this error is autocorrelated which means you know this model is not acceptable because if you want an ARX model error y-y predicted this error should be a white noise it should be complete you know free of any relevant signal so we also found that there is a correlation between the residual and the inputs okay there is a correlation between the residuals and the inputs so this model is not acceptable we moved ahead and said well we should develop higher order models so I develop third order fourth order fifth order I realize that beyond sixth order model there is not much gain by increasing the order so I have stopped at sixth order and then for sixth order the innovations or errors are almost white noise sequence there is hardly any correlation between inputs and you know the innovations or the errors these errors are also called innovations so this is an acceptable model sixth order ARX model yeah I want innovation to be a white noise because I want to remove anything that is like a signal that is like a signal means something that is correlated in the past a white noise is something which has no relation with the past and future it is it is like complete you know what you will say dirt in the signal you just are removing the dirt factor from the signal okay so there is nothing nothing what you can call a signal left once it is white noise okay all important dynamics of the processes captured okay all important dynamics now this has two parts actually when you have developed this model when you have when I have developed this model a and b it has simultaneously captured deterministic as well as stochastic component everything that is signal okay is captured in this okay so I talked about the model structure but I am going to come back and talk about something else right now okay I will I am going to talk about the properties of these square estimation and these properties of these square estimation are deeply related to how do you plan an experiment okay it is not that you will see here lot of you know interplay between your understanding of maths behind this and the practice and I am going to explain that next few sites first thing is we began by talking about autocorrelation cross correlation right we talked about autocorrelation between the signal within a signal stochastic process cross correlation between the two signals and when I developed ARX model I suddenly you know did something else I never talked about autocorrelations cross correlations so is there some relationship between autocorrelation cross correlation business and ARX models is a natural question you know I talked I spent almost three hours talking about autocorrelations cross correlations what is the relevance you know it suddenly seem to have disappeared when you but it is not so it is all hidden there and I will show you what is where is it appearing okay now I want to look at this matrix maybe if you can just try it on your notebook you can try this that if you what is this side matrix what is this omega matrix just go back and check omega matrix let us go back and check omega matrix omega matrix first column is minus y2 minus y3 minus y4 okay this is a what is the dimension of this matrix omega it has four columns alright how many how many rows n minus n minus 2 rows so it is a it is a tall matrix right it is a tall matrix it has 250 rows and it has only four columns okay what should be omega transpose omega what should be dimension of omega transpose omega it should be 4 cross 4 omega transpose omega should be 4 cross 4 just imagine what will be first element of this matrix what will be first element of this matrix it is this column this column transpose into this column right it will be this column transpose omega transpose will be transpose of will be this column into this column so the first column will be y2 square plus y3 square plus y4 square okay what will be the second the element on the first row it will be multiplication of this column and this column okay if I sit down and be a little bit patient and then work out I will get this matrix if I am a little bit patient and work out these multiplications I will get this matrix it is a 4 cross 4 matrix you can see here the element 11 is you know y2 square plus y3 square plus y4 square up to n minus 1 right likewise I have done all possible multiplications of columns and rows okay I will get this 4 cross 4 matrix does this look similar to something we talked about autocorrelation and cross correlation do these elements look similar to autocorrelation we had estimate of autocorrelation estimate of cross correlation what is missing 1 upon n is missing if you go back just recall that autocorrelation within a signal okay and cross correlation between 2 signals okay what are the 2 signals here yk and uk these are 2 signals yk are the measurements uk are the inputs okay so here I have called w and v w let us say let us call v as u and w as y okay so this matrix you will realize if you correlate with this particular slide and then go back to and look carefully at all the 16 terms in this matrix you will see that this is very very close to the definition of autocorrelation cross correlation except 1 by n is missing okay 1 by n is missing see in cross correlation I will have this term 1 by n into the first term will have 1 by n here 1 by n is or 1 by n minus 1 whatever is the relevant number you will have that particular number coming up what I can do is I can multiply this I can multiply this matrix by 1 upon n and take limit as n goes to infinity okay what you will see that this matrix actually consists of autocorrelations and cross correlations this matrix actually consists of autocorrelations and cross correlations just look carefully is everyone convinced about this slide see all what I am doing is I am taking this matrix multiplying by 1 upon n and letting n go to large value I am letting n go to large value for large n I can approximate this matrix if I take n to be say 10000 okay this particular matrix will approach autocorrelation 0 of y autocorrelation 0 autocorrelation 1 of y okay cross correlation 1 between y and u cross correlation 1 and 2 between y and u and so on this one more thing you will notice this matrix is symmetric is always symmetric okay so in 4 x 4 matrix how many independent elements are there see it is symmetric matrix not 4 well you know 10 so this upper triangular 6 plus 4 diagonal so there are 10 elements which are it is a symmetric matrix so so first of all autocorrelation and cross correlation is very much appearing when you are actually identifying this model okay next part is that what about you can also show that omega transpose y okay omega transpose y I can pre multiply that by 1 upon n and actually omega transpose y asymptotically tends to autocorrelation 1 autocorrelation 2 cross correlation 2 cross correlation 3 yeah why 0 are appearing all the zeros because you will get sigma y square will be identical actually it is only time shift see you will get here y y square k going from 2 to n-1 this is y square from n to n-2 okay when you let n go to infinity this is actually variance which one in this particular case they are not so we will actually fill still have lower dimension okay so now this theta least square I am rewriting do you see this what was theta least square omega transpose omega inverse okay into omega transpose y I was just inserted this 1 upon n okay and that gives me a completely different interpretation okay that gives me an interpretation from the view point of autocorrelation and cross correlation so these coefficients of this model which I am getting are actually deeply correlated to autocorrelation and cross correlation business okay coming from autocorrelations and cross correlations it is not it is not all that I talked about between two stochastic processes you know autocorrelation cross correlation all that is at work here that is what I want to show okay so we have actually used all those concepts while deriving this model okay next fundamental question okay let us say you know there is a true process in quotes which exactly behaves like this see when I am using a method to extract parameters from data okay the question that I should ask is this method correct will this give me the correct parameters see because I have a process I have a system I am collecting data and I am using some least square estimates I am using some method to estimate the parameters what is the guarantee that estimates are close to the truth okay now forget about the real world where the real world is actually non-linear with you develop a model for real world system ARX model is an approximation forget about all that let us say you know I am writing a program in matlab okay I am creating data using this equation ARX model okay I created a data and I gave it to you and I asked you to construct the model I told you this is of this is a second order process which I have simulated the white noise is like this and so on I give you all the information I just give you data I just tell you there are four parameters okay question is will least square give you the correct values the true values is a valid question because if my method does not give me true values my method is useless is given to give me wrong values then you know I am going to get a wrong model if I am going to get a wrong model I cannot use it for control right so first thing is to check whether the method makes sense okay I want to actually assess whether this method itself is good or bad so if I do a if I do an ideal world computer experiment in which I generate why and I give some known inputs you okay I give some known you know white noise sequence and I generate why and there are some true values of these parameters okay suppose some 0.9, 0.1, 0.5, 0.4 there are some true values will least square estimation give me 0.1, 0.2, 0.4, 0.9 is the question if it does not this method is useless okay so we have to ask this first question okay in statistical terms I am going to say that will this method give me unbiased estimates of parameters okay what does it mean what is the unbiased estimate when I am getting an estimate we said that estimate is a random number okay estimate is a random number so I am never going to get exactly the true value okay so what is the meaning of unbiased estimate okay what I want to know is this if I start collecting more and more data points okay I right now collected 250 data points if I collect 1000, 10,000, 50,000, 1 million okay as entrance to infinity will the estimates tend to the truth then I am guaranteed that this method is correct okay that is the first fundamental question I am going to ask okay so will the estimated value tend to the truth okay now right now we are again let me remind you right now we are in a ideal world where we have created the data okay we know that ek is a white noise we know what are the true parameters I mean I know and I have given you data and I asked you to construct estimate model parameters using least square if you want if you want 10,000 points I will give you 10,000 data points if you want 1 million data points I will give you 1 million data points okay the question is whether your estimate will tend to the truth okay that is the question I am going to ask okay some preparation for doing this I have written this as so my true plant my true process which is computer simulated right now okay behaves according to this matrix equation okay y has all the measurements okay then omega has all the past stat measurements y and u theta true are the true values are the true values okay and e is the white noise sequence which we have introduced into the system okay so I am writing this theta true okay just what is e is a white noise sequence 0 mean Gaussian Gaussian white noise sequence with standard deviation equal to lambda square okay it is a white noise sequence so this e this capital E vector okay consist of you know all the 0 mean white noise numbers okay what is the expected value of this e is 0 if you take e transpose what is e transpose will it be a vector will it be a matrix will it be a scalar e transpose e will be a scalar e transpose will be a matrix let me just move here and show you see this this is my e vector e transpose is so this is this is e transpose this is my e vector so if I take e transpose I will get e1 square e1 e2 e1 e3 e2 e1 and so on do you agree e3 square this will be a 3 x 3 matrix this will be a 3 x 3 matrix do not confuse this with e transpose e what is e transpose e that is e1 square plus e2 square plus e3 square this is inner product e transpose is inner product e transpose is outer product okay e transpose is outer product now if this is a 0 mean white noise okay what is what is expected value of e1 square sigma square lambda square what is expected value of e3 square lambda square what is expected value of e1 e2 0 okay e1 e3 is 0 and so on okay so that is what I have written here that expected value of ee transpose is nothing but lambda square i this is just my preparation for you know coming up with the properties okay to make the notation simple I am going to define this matrix omega transpose omega inverse omega transpose as omega per here okay this is only a notation because later at some point I am going to get multiplication of these matrices and it becomes very cumbersome to write so this is only a this is only a notation okay notation that I am introducing that is omega transpose this not transpose this omega part is nothing but omega transpose omega inverse into omega transpose what happens if I post multiply this if I post multiply this by omega what happens to this matrix I will get identity matrix so actually this omega part is called as left inverse of it is called left inverse of omega remember is a tall matrix it has fewer columns than rows it is a tall matrix okay so it does not have normal inverse this has either left inverse or right inverse so this is a left inverse of this matrix so this is just a notation okay let us move ahead okay so my model is my model is actually given by y is equal to omega into theta true okay the data generated by this equation right this is how the data generated plus error if I take expectation of y okay if I take expectation of y then I will get expectation of omega into theta true plus e what is expectation of e 0 okay so this is expectation of y is nothing but omega into theta true okay how is this going to help us what is theta hat theta hat is omega part into y but what is y y is omega into theta true okay I have removed ls ls ls least square estimate it is understood that it is least square estimate to make notation simple I have reviewed ls ls okay we are going to talk about least square estimate in this course mostly we are not going to talk about other one norm two norm one norm infinite norm okay so now what is omega part into omega identity matrix okay so you get this theta cap least square estimate is nothing but theta true plus omega part into error okay now I am going to take expected value of theta hat what is expected value suppose I do I do you know hundreds of experiments okay numerical experiments from each experiment I estimate theta hat I can do that okay I can generate data from computer I can give you each one of you one data set and ask you from the same process from the same stochastic process I am generating data each one of you generates a theta hat okay then I will take a mean of all of that okay I can take a mean of all of that the mean should tend to the truth right mean to the tend to the truth that is what I mean here if I take expected value of theta hat what should I get expected value of theta hat is theta true plus omega part into expected value of e what is expected value of e 0 so this simple analysis tells me that least square estimate is going to give me the true values if I if I collect large number of experiment large number of data points okay this analysis tells me that if I if I collect huge number of data points this square estimate will tend to the truth at least in the ideal world where you know we know the true parameters okay this is how this question we are asking about you know goodness of the method itself of estimating theta okay is this clear is the idea clear any doubts at this point yeah yeah right now right now assume that you know perfectly the order okay you know perfectly the order I am giving you this information that this data is collected from a stochastic process of order 4 that is there you know why order is for you order is for I do any of this okay now now the question is suppose you know perfectly know the order but you don't know a1 a2 b1 b2 you just have data for why and you okay from data will you get the truth okay see if that itself is not happening then you know you should change the method okay so now I am guaranteed that even though I will not get the truth from one experiment or 10 experiments if I conduct large number of experiments I will tend to the truth that that much I am guaranteed so this method is a good method okay if I have patience to do you know last data collect last data now in reality collecting last data has some other implications see you know this you know this mathematical truth that you know if you collect last data you will get a good model okay but if you want to collect last data you have to put up your system your plan for a longer time if you are put up in your plan for a longer time it means you know you are disturbing the production okay if you disturb the production okay then your company will say well to get your model you know you are disturbing my plant okay so you have to convince your management look if you want a good model I need to put up but you as a control engineer has to strike a balance and say well I need significant amount of data points at the same time I should not cause too much loss economic loss to my company okay so this this understanding of this particular thing is deeply related to where you are going to conduct an experiment okay so this analysis is not just just mathematical analysis you know it tells you that you know well there is a trade-off if you collect long bigger set of data if you put up the plant for a longer time you can develop a good model at the same time you know you know that if you put up the plant for a longer time there is a loss of production so you have to you know strike a balance as to yeah we should actually so remodeling re-identification when do you identify you know how much time do you put up all these things are I mean golden questions if you know how to answer them as a model will be paid very highly you know so one thing I am guaranteed that if I collect sufficiently large number data points the estimates will approach the truth that I am guaranteed about the least square method at least for ARX model okay well you can show this even for the other models the second thing is suppose I am not able to do this you know I am not able to collect millions of data points I am going to collect thousand data points or 250 data points or whatever I am going to collect small number of data points okay then what I know is my estimates are you know not close not equal to the truth they are slightly away from the truth I would like to know how much away they are from the truth what is the confidence interval what is the bound what is the error bound on the estimate of parameters that is what I want to know okay so in statistics terms can we estimate parameter interval of you know confidence intervals on the parameter estimates where to do this is to find out the covariance we found out the mean okay what is the second movement we showed that mean of estimated parameters will tend to the truth okay next thing I am going to do is to estimate the covariance of the estimated parameters this will help me to put bounds on the estimates so I want to find out this covariance matrix that is theta estimated least square minus theta true okay into theta minus theta true transpose I want to find out expected value of theta hat and the diagonal elements of this matrix will tell me individual variances for each parameter estimate okay there is some algebra which I am going to leave it to you to figure out there is nothing it looks complex it is very simple matrix multiplication okay okay what is what is theta hat minus theta true that will be you know it is omega part into y minus theta true okay which is which if you just do a little bit of algebra you will realize that this is nothing but well I will just go back and show you this we have this expression here okay that is theta hat equal to theta true plus omega part e okay same thing I am writing here I am just saying that theta hat minus theta true is omega party okay is the same same thing which I have written here okay so what is what is this vector into this vector transpose will give me omega part into ee transpose okay if I take expectations if I take expectations of this omega part is a matrix once you have data it is a constant matrix okay so this will be covariance of theta will be omega part into expectation of ee transpose okay which is nothing but lambda i square we have already worked this out okay so finally if I do this multiplication so this is a scalar I can move it out this is identity matrix okay so actually it boils down to omega part into omega part transpose okay and then you can show that this omega part into omega part transpose is nothing but omega transpose omega inverse this identity you can work out if you are not convinced with this go back and work out so the point is this last equation it tells me that covariance of theta is lambda square into omega transpose omega inverse okay omega transpose omega inverse this is my this is my covariance of theta there is a trouble here okay it involves variance of e what is e here white noise okay if I had given you information about variance of e see I am generating the data if I had given you that variance of e is 1 or whatever 0.1 you will be able to estimate this covariance because I have given you data for why and you from data you can generate omega transpose omega this matrix you can generate you can generate this inverse this is a 4 x 4 matrix by the way this matrix is nothing but all those auto correlations cross correlations that we looked at right so if I give you this information about lambda square you can generate those covariances in reality we do not know what is lambda square so we need to estimate even lambda square okay how will you estimate lambda square okay what I am going to do is I am going to find out e hat I am going to find out e hat estimate of e okay which is why measured minus omega into theta hat okay this will give me e hat estimate of e not the true e this is an estimate of e when will you get true e if you collect large number of data points if you let n tend to infinity which is not going to happen actually so you will get an estimate of e okay if I get estimate of e vector I can find out its variance okay this is the expression for finding out the variance of okay there are d there are d just go back and check why I have written here it should be p here is a small error there are 3 parameters there are p parameters to get unbiased estimate you need just correct in your notes well why unbiased estimate requires 1 by n minus p you have to go back and look at your basic statistics book I am not going to explain that if there are p parameters you have to divide by 1 upon n minus p to get correct estimate okay so which means which means I can generate covariance estimate of n minus p okay using this expression this is my final expression for estimate of parameters if you see on the right hand side everything is known to you omega matrix is known to you it consists of data okay omega transpose omega inverse is known to you right then innovations e you have calculated from that you have calculated this variance which you are multiplying here okay all that I have done all that I have done is I have replaced lambda square by lambda hat square you realize this what is this lambda hat estimate of lambda square okay is this expression here okay now I want you to interpret this expression I will show you the interpretation you tell me what are the interpretations how will you interpret somebody should think about this not just look at it why I am writing this how will you reduce covariance number of samples increase the number of samples with relation to what for the given set of parameters if you take if you take if you have a model with 2 parameters okay then probably you can use 30 samples but if you have model with 10 parameters you know you should use in relation to 10 parameters you should use larger number of samples this tells you that if you have larger number of parameters you need larger you need larger number of samples okay it also tells you that variance in the estimate will become smaller and smaller if you take more and more data points the estimates of parameters will become sharper and sharper if you take more number of data points okay but as I said there is a trade off the trade off between getting good parameters and perturbing the plant for a longer time okay so this is a very very crucial decision when you actually put up a plant you know you should know the maths how long should I put up why when should I stop you have to take into consideration the you know need to get more data as a modular at the same time you have to consider your companies you know should not waste product or should not disturb the plant for a longer time so you have to strike a balance and to strike that balance you have to understand this equation of variance and mean okay. So basically if I increase the number of data points I can get better and better estimates now we develop two models you remember we developed output error model output error model was giving me good estimator deterministic component using how many parameters only four parameters a1, a2, b1, b2 for ARX model how many parameters we required 12 parameters 6 plus 6 12 parameters so almost three times when I went from OE to ARX I needed 12 parameters so the data points I need to you know roughly I will tell you roughly if I take 100 data points for output error model I would need 300 data points to get a similar quality model for ARX model. So ARX model is bridging that gap giving me a good model but there is a cost it is not free there are no free lunches okay you want a good model you better pay for it by perturbing the plant for a more time okay so now you know my search is for a model which has less number of parameters but will also give me good noise model okay. So I want you know to look for a model which has less number of parameters which has less number of parameters what is the significance of less number of parameters I have to perturb for a smaller time I can save my company's money okay so ARX model is good trouble is you need large number of parameters in real industrial problems ARX model would require something of the order of 80 to 100 parameters to get good model between one input output okay large number of parameters are required to get good ARX model it is very easy to develop just these squares okay very easy to develop but the number of parameters are large means you need larger data set okay and if you happen to use ARX model without knowing this fundamentals that means if you conduct a experiment for a shorter period of time okay develop an ARX model of high order you have committed large errors in your model please understand this okay so unless you know this maths you will not be able to appreciate because MATLAB will give you some numbers you know you give if you have to estimate you know 100 parameters if you give 200 data points MATLAB will give you some number but that number might be meaningless because there will be large errors reduce the errors you have to increase the number of data points okay. So the property first property is that the covariance decays as 1 upon n so the variance of each parameter actually decays by square root of n that is the first lesson to learn when n becomes large you can neglect that p which is appearing okay and this variance is also proportional to what is called as signal to noise ratio okay this thing is going to haunt you if you are going to continue into system identification signal to noise ratio signal to noise ratio is power in the input that you are perturbing okay which part of this is talking about power of signal in this expression one is this n what about this guy omega transpose omega how can you reduce variance can you make omega transpose omega inverse as small as possible see there are two ways of reducing variance I am going to I have to play with the terms in this equation okay I have to play with the terms in this equation to get good model one possibility is that I will increase n other possibility is that see this is something like lambda square divided by or multiplied by I can say divided by but this this omega transpose omega it brings in the information about signal spectrum signal spectrum variance so this particular this particular matrix I can make this particular matrix strong you have to understand we will talk about it little later what you mean by you make it in such a way you choose this matrix in such a way that this ratio you choose this matrix you choose this matrix in such a way that lambda square multiplied by this matrix becomes more so I can play with larger data set I can play with perturbation and modified properties of modified properties of omega transpose omega okay change signal to noise ratio or noise to signal ratio here if you take this as noise and this is the signal component then it is noise to signal ratio okay so actually actually in the real system when you have to identify a model from data how do you put up the plan it becomes you know piece of art you have to be an artist know your maths no physics of the problem okay know what are the priorities of the company or the system that you are handling and then gently introduce perturbations to find out it is like you know a doctor see doctor when he wants to let us say you have some stomach problem and then nowadays you put some endoscope it is painful for you but you know he has to take a judgment whether to perturb you to get the information how was really perturbed to get the information okay so depending upon you know the patients condition you have to judge how much to put up the system unless you put up the system you cannot model it okay so you have to put up the plan and so put up the plan okay which has implications how long do you put up how much do you put up okay all these things are actually very very critical okay then I have to underline maths and unless you understand them you cannot use matlab package to develop these models in matlab what what you realize I am going to give a demo next week is that after all these theory and profit about you can develop these models in the fraction of a second you just say click I want ARX model it will flash it will give you the model unless you understand when you are developing an ARX model you better collect the data set otherwise your model is bad it is useless to use matlab okay so I have to you know now there is a problem to get good noise to signal ratio you have to increase the signal component which means you have to introduce perturbation which are larger if you introduce larger perturbation plan gets perturbed you know more if it is perturbed more there is more disturbance in the production okay so again it is a cascaded situation to get a good model you would like to put up more but if you put up more you are disturbing the plan too much okay disturbing the plan too much so you have to strike a balance you have to understand maths and strike a balance so managing this you know system identification is a very very tricky affair well I have given here some expressions for how to estimate the confidence bounds if you make an assumption that this noise in Gaussian you can go back and check how to compute the confidence intervals so this part is probably you can see in any statistics book I have already done this in some other slide so if you assume that each one of them is a Gaussian normal distribution you can take the diagonal element of the matrix okay I am calling this matrix P here P is nothing but lambda square or omega transpose omega inverse or lambda hat square whatever it is then from that you know you can estimate you can estimate confidence intervals for each parameter and so actually MATLAB will give you all this I will show you MATLAB will give you confidence bounds on each parameter MATLAB will give you an estimate of sigma square on each parameter okay when MATLAB gives you model for ARX you can also tell you that this value of this parameter is the estimate of sigma square possible error is of the order of 0.06 okay you should like just minus one sigma okay then you can find out what is the possible possible error that is committed while estimating this parameter so these estimates are available from MATLAB you have to know the theory and you should know how to use it okay so that is important okay let us move on to okay so for so good we said that after developing ARX model we have this form of model which is consisting of two components one is the deterministic component why I am calling the deterministic component because it is arising out of you manipulated variables which are known to us okay it has one more component which is stochastic component stochastic component is mind you it is this model has to be looked at as a combination of two things one is this filter okay and there is this innovation sequence these two are jointly estimated by our model okay so effect of unmeasured disturbances has been modeled as a transfer function which is given by white noise this is something which I discussed in my last lecture okay at the end of the last lecture I talked about this so I am just going to go over it very quickly because we already talked about this and so in my last lecture I had jumped I had not talked about its properties of the estimates I covered the properties of the estimates and now I am coming back to this model so if I just put u equal to 0 then I get this model okay and I showed you that this model is nothing but so called auto regressive process okay so called auto regressive process that we considered is everyone with me on this auto regressive process we talked about this ideal process for auto regressive process okay and what I am showing you is that ARX model is actually giving me a noise model which is same as auto regressive process so auto regressive process is a very nice simple way of modeling and then unknown disturbances stochastic disturbances okay which are arising from unknown sources we have played a trick we do not know what the source is we have considered as Vk okay then white noise source is passing through this 1 by q okay it is giving me this Vk which is unknown disturbance okay please remember that this ek is an imposter this is not the truth this is not the truth this is a model for explaining relationship within that unknown disturbances okay this is a trick to this is something which behaves similar to unknown disturbances there is no real source called ek in the real system this is a model okay when you do exercises you will do this actually if the poles of aq are inside the unit circle then you can actually expand this and then get the moving process this also I talked about last time that we can convert the auto regressive process into moving average process or moving average process into auto regressive process we can do back and forth so these are only two different ways of expressing the same thing moving average or auto regressive okay I can go from one form to other form in the I have uploaded by the way I uploaded the new tutorial sheet which is your mid-sem is going to be based on this new tutorial sheet okay majority of it so please start working on the tutorial sheet unless you start working on the tutorial sheet you will not understand what is happening in the class you work out those problems so we will have some session in this week and also in the next week other than the lectures to discuss about the tutorial problems okay I will announce that at the end of the class so I can move forth between you know moving average representation or auto regressive representation I can combine the two okay I can combine auto regressive and moving average into one model this is called as ARMA model auto regressive moving average process okay so this has auto regressive terms which are coming from past VK it has moving average terms which are coming from past EK okay this kind of a model is called as ARMA model it is more convenient to use this model because you need less number of parameters okay well whenever you need less number of parameters the parameter estimation is difficult so there is a price to pay so wherever you need wherever things are easy you know you need large data set so there is a price to pay in terms of you know loss of production and so okay but I can actually think of this combined model which is auto regressive and moving average combined together gives me power of both the ideas together so you can actually have models which require less number of parameters and then well I can actually move between one form to other form I am just showing you an example here of how the noise will look like I have fabricated one ARMA process I fabricated one ARMA process in the computer this is driven by a white noise signal Gaussian white noise signal with mean zero variance one how will the output look like it will look like this just to give you a visual feel of how the output look like what is the auto correlation of this signal it is a colored noise okay strongly auto correlated colored noise so ARMA process can be used to model colored noise okay yeah because normally if I use only AR or only ARMA only moving average the number of parameters I need becomes large okay what I can do probably with only four parameters here 1, 2, 3, 4 okay see if you do a long division I will come to that actually if you convert this into only moving average or only AR model okay then the order of model that you need to get good disturbance model it becomes large so you know same if the same model is converted into AR model probably you will require 20 parameters instead of four parameters. So why I want to go for four parameters because N capital N is small okay now for the time being you have to trust me that this is then this is required when you solve the exercises you will realize this I have actually given exercises in which you convert from one form to other form and you will realize that you know when you convert from AR to AR or ARMA to moving average okay the number of parameters to be estimated becomes large number of parameters becomes large means perturbation time is large if perturbation time is large means you know loss of production okay so we want models which have less number of parameters yet are able to capture the noise properties okay that is why I want to go for ARMA model okay so now I am going to introduce a more complex form called as ARMAX okay so what is ARMAX AR stands for auto-regressive MA stands for moving average and X stands for exogenous input okay so this is you know it has terms coming from past you exogenous input past auto-regression and moving average E has to be a white noise sequence okay E has to be a white noise sequence so in the transfer function form it will look like this what has changed from ARX model to this model C for ARX model is equal to 1 in this model it is not equal to 1 okay then you might say why am I asking you to have same denominator polynomial here okay I can have a model which has you know B by A for exogenous input some C by D for you can do that no problem okay this particular model is named after box and Jenkins are you know this is a very very famous book on time series modeling by box and Jenkins any one of you interested in pursuing this as a serious thing probably should have this book this book is like Bible of time series modeling so box and Jenkins model is the most general model one can think of you can convert this most general form you know if you if you multiply if you cross multiply then you can convert this into our max model if you take a special case of this to be 1 and D to be equal to 8 becomes ARX model this is the most general form okay this is the most general form one can think of okay more the general form is less than the more of parameters more difficult it is to estimate parameters okay so moment we go for this form it becomes more and more difficult to okay so now I can I do not have to identify only ARX model okay or output error model I can develop our max model I am going to see how to develop our max model so what is my problem I have this data which is collected Y0 to Yn I have this input sequence U0 to Un okay and I am going to use some parameterization that A is second order B is second order you know C is whatever I am going to parameterize this models and then I want to estimate the coefficients of AQ BQ CQ using data okay that is our problem how do you do this you do this in such a way that we are going to use the B square paradigm which means some of the square of errors what is errors here EK some of the square of EK should be minimized with respect to the parameters same idea which we use for ARX life is more complex here because the you know the model is very the big problem is that EK is not known but we have a requirement that EK should be a white noise sequence okay the trouble here is that we have a model in which okay I have this model I can convert it into a difference equation right all of you know how to convert this model into difference equation so this is my difference equation I do not know A1 A2 B1 B2 okay I do not know EK C1 EK-1 C2 EK-2 all these quantities are unknown to me what is known to me is yk yk-1 yk-2 uk-2 uk-3 these things are known to me okay and this extra things this extra things that is EK-1 EK-2 did not appear in ARX model but they are appearing now here now there is a trouble okay how to estimate this model why because first of all only yk and uk is known EK is not known okay so how do I estimate the parameters this parameter model is what is called as nonlinear in parameter model because see you do not know EK you do not know C1 so you have to guess EK you have to guess C1 okay so it is a it is not a linear in parameter model it is difficult to estimate you cannot use linear least squares omega transpose omega business is unfortunately does not work here you have to use something else so you cannot analytically compute this solution you have to use nonlinear optimization tools to solve this okay so there is a trouble when you want to do this what is known to you now let me let me go back and tell you what is my aim now for the next step this model this model is expressed in terms of u and e right but e is not known to me okay I want to convert this into a form I want to convert this into a form which is like this see this model is okay this is fine now I just suppose this with ARX model ARX model was yk is equal to b by a uk plus 1 by a ek remember the nice thing about ARX model was I could write this as a yk is equal to b uk plus ek only ek appeared there was in the regress error only known things were there there were no unknown things okay can I convert this model into something similar so this is this is my ARX okay I want to convert into something okay which is which is having only ek having only ek can you think of doing that here right now multiply by a by c good so I will say a by c yk is equal to b by c uk plus ek I am multiplying by a by c okay this and this look qualitatively similar except this guy 1 by c is appearing what was c in the case of ARX model 1 so life was simple but right now c is not equal to 1 okay c is not equal to 1 there is one more thing that becomes very crucial here you get this 1 by c 1 by c polynomial okay 1 by c so you want to find out inverse of c okay it is required that is inverse of c should be stable okay inverse of c should be stable if inverse of c is not stable we have trouble okay let us take one particular example of c can you do this 1 by 0.9 q to the power minus 1 what will this be can you just solve this you take 2 situations 1 by and 1 by 1.1 q to the power minus 1 what will be the first case can you do the long division and tell me how will you do this okay so this is 1 divided by 1 minus 0.9 q to the power minus 1 right so my first thing is 1 here 1 minus 0.9 q to the power minus 1 I subtract I get 0.9 q to the power minus 1 okay then what should I do multiply by 0.9 0.9 q to the power minus 1 so 0.9 q to the power minus 1 minus 0.81 q to the power minus 2 what will I get 0.81 q to the power minus 2 okay what next less 0.81 q to the power minus 2 so 0.81 q to the power minus 2 minus 0.72 72 q to the power minus 3 and so on you can just go on doing this what is happening here 0.9 0.81 what will happen next it will go on reducing okay it will go on reducing so after some time I can truncate this series after some time I can truncate this series right and then I can you know I do not have to worry about the higher order terms because they are becoming smaller and smaller okay just contrast this with the other situation now let us say you have 1 and 1 minus 1.1 q to the power minus 1 so what will happen here so this will be 1 minus 1.1 q to the power minus 1 so if I subtract this you will get 1.1 q to the power minus 1 plus 1.1 q to the power minus 1 so I subtract this I will get 1.1 square q to the power minus 2 then this will be 1.1 square q to the power minus 2 so and so on what will happen to these parameters here they will start growing okay if they start growing okay it will go to infinity after some time this coefficient will go to infinity and this thing can cause trouble when comes to so what is required what is critical when you do noise modeling I am taking a very short cut there is a fundamental theorem called spectral factorization theorem I am not stating the theorem I am giving you essence of it in simple words okay it is necessary that if you want to model a stationary process okay as a colored noise okay it should be invertible inversely stable it should be stable and inversely stable okay so which means if I am constructing a noise model C by A all the poles of C should be inside unit circle all the poles of A should be inside of the circle if this condition is not satisfied we cannot develop a model here because what will happen is if the noise model is not invertible the terms will start growing and it will cause trouble in the parameter estimation okay so it is very very important that the noise model is invertible I should be able to go from you know e to v v to e it should not be a problem so h should be stable and h inverse should be stable this is the fundamental requirement of stochastic time series modeling when you do Armax model or Babach Jenkins model and if you if this does not happen you cannot develop models I will just show you one quick example this is a Arma model okay and h is given by 1 plus 0.5q and 1 minus 0.8 this has a 0 at minus 0.5 this has a pole at 0.8 and if you do this long division you will get this terms here to do this long division then what you can see is that you know you can these coefficients will start diminishing fast this coefficients will start diminishing fast and you can approximate this by first few terms in the series okay that is very very crucial when it comes to noise modeling will go over this again in the next class and connected to estimation of the model parameters crucial property of Armax model and Babach Jenkins model is that should be stable and inversely stable without that you please go ahead and look at the notes it is not possible that you just listen to this you know you have to go back and start solving the problems okay.