 Now, let us now take this specific special case, the special case is this. So, now my dynamics are given in this sort of form, I have x k plus 1 is given as a function x a k x k plus b k u k plus w w k and this is we define this for k equals 0 till n minus 1. The cost function, if you want to minimize, you want to minimize this cost function over all such policies, cost function in this takes this form. So, the terminal cost is actually is quadratic in the terminal state in this form. So, it is x n transpose q n x n and the cost at each stage, the stage wise cost are also quadratic and in this sort of form, it is x k transpose q k x k plus u k transpose r k u k. Now, the matrices here are of appropriate dimensions. So, we assume that these matrices q n, q k, r k these are all square matrices a k, b k are matrices of appropriate dimensions so that they have so that this state equation can actually the dynamic state dynamics can actually be written out properly. Now, we assume that these q k's are so assume that these q k are symmetric, they are symmetric and positive semi definite matrices and for all k and also assume that r k are symmetric and positive definite matrices. So, q k's are positive semi definite, r k's are positive definite. Now, there are no constraints on the control actions, the u k's are unconstrained constraints. So, we do not impose any constraints on the control action, the constraints actually would make the problems somewhat complicated. We are not mentioning any putting in any such constraints that the disturbances or noise w k are independent. We are also not assuming that they have any particular distribution such as Gaussian or anything like that, they are just independent. But we will assume that their means are 0. So, we will assume that they have been centered means the mean is 0 and we will also assume that their variance is finite. So, the second moment is finite. So, now, so this is a very popular formulation of the problem. These problems arise all the time in control theory, in inventory management and so on. So, quadratic cost is often is a very reasonable cost because it many costs can be modeled in this sort of way where one is trying to say orient, minimize the distance from a particular target or minimize the deviation from a certain nominal value, etcetera. A lot of these errors end up taking the form of the errors that we want to minimize often end up taking the square quadratic type of form. So, that is one of the motivations for this particular problem. So, there are many generalizations of this particular problem also, but I am not looking at any of those generalizations. Let us look at this simplest one, much more general problems can be considered. For example, you can also have terms that are that involve a mixture of these q, q, xk and uk in your cost. So, we are ignoring such terms here and for simplicity it makes it does not add too much more to the problem. So, now let us apply the dynamic programming and let us now apply the dynamic programming algorithm. So, what is the what is the approach we would have to start from start from the last time instant and over there we define simply the value function to be simply the terminal cost. So, we define jn of xn to be identically equal to xn transpose qn xn alright. This is our this is the initialization for the dynamic programming for the dynamic programming algorithm. Now, what we need to do is do this what would be the dynamic programming equation at each time step k. So, let us write jk of xk that is going to be equal to the minimum of uk of the expectation of xk transpose qk xk plus uk transpose rk uk plus now jk plus 1 ak xk. So, let me write this a little neatly plus jk plus 1 of now remember I am I need to I am going to substitute in jk plus 1 xk plus 1 xk plus 1 is given through this equation. So, I am going to substitute this as ak xk plus bk uk plus wk alright ok. So, now let us let us do this this does not help us much because we do not know what the form of jk plus 1 is. So, but let us put do this more more concretely. So, let us so what what I want you to observe is that through these calculations they may seem a little you know kind of routine or mundane first, but the thing I want you to observe is the following that it will turn out that when I since that jk plus 1 would be quadratic and that would ensure the if jk plus 1 was quadratic was a quadratic function then this entire function that we have here the entire function that we have here would end up becoming a quadratic function of u and once it is a quadratic function of u minimizing it in and moreover it would be a positive it would be a convex quadratic function because because of the choice of because our R k's are symmetric and so on and so forth all of that will ensure that this is actually a symmetric convex quadratic function of u alright and when I when you minimize it it will turn out that you would get uk to be a function you would get uk by setting simply the derivative of that quadratic to be equal to 0 and on solving that equation where you the derivative is set equal to 0 the uk the optimal uk uk star would be a linear function a linear function of xk. So, when this jk is quadratic you can imagine that they the only terms the only cross term between xk and uk would be would be coming from this and those also would basically involve uk lean a term where the degree of uk is 1 and the degree of xk is 1 putting that together you would get uk star as a function of as a function of xk and it would be a linear function therefore alright. So, let us let us illustrate this for k equal to n minus 1 so let k be n minus 1 but k equal to n minus 1 I will now I am now referring to jn minus 1 of xn minus 1 as equal to the minimum over un minus 1 of this xn minus 1 transpose qn minus 1 xn minus 1 plus un minus 1 transpose rn minus 1. So, this is I will just write this a little more compact nicely here plus un minus 1 transpose rn minus 1 un minus 1 plus now remember jn ok jn is here jn is a quadratic function already ok. So, for k equal to n minus 1 I have got what I wanted which is simply that I which was you know I wanted jk plus 1 to be a quadratic function. So, for j for k equal to n minus 1 jn which is j is already a quadratic function. So, all I need to do is just put this in. So, now I am going to put in xn I am going to substitute for xn from here using this ok. So, I am going to get an minus 1 xn minus 1 plus bn minus 1 un minus 1 plus wn minus 1 whole thing transpose qn the same thing a n minus 1 xn minus 1 plus bn minus 1 un minus 1 plus wn minus 1 bracket close ok. So, the expectation is now of this entire term. So, remember the expectation the bracket opens here and it closes here. So, there are it is a sum of all of these terms out of all of these remember xn minus 1 is just the term xn minus 1 is simply a nominal initial state. So, it is not a random variable it is not the true state realized during the problem it is just a nominal state that any candidate state that we are starting off with ok. So, and so now and we are now finding un minus 1 as a function of that xn minus 1. Now, as you can see here we this the so, let us let us let us try to simplify a few things here you can. So, the last the last term here that we have this last term this term can be expanded this can be expanded. So, if you expand this term out. So, let me write out this the last term the last term when we expand this out let us see what kind of form it takes see it would take the following form you I would get term which has wn minus 1 transpose qn times this whole thing an minus 1 xn minus 1 plus bn minus 1 un minus 1 right and now the last term has this plus several other terms. Now, what can we say about this particular term. So, notice that because I as I said xn minus 1 is simply some constant or some deterministic nominal value that we have taken and un minus 1 is also deterministic it is a function it is simply a function of xn minus 1 right. So, as a result of this the only randomness here is in wn minus 1. Now, when I take the expectation of this particular term. So, let me write out the expectation of this what would be the expectation of this expectation would necessarily be 0 right. So, this term would have expectation 0 and that is because wn minus 1 is remember we assume that all the W's are of all the W's are of mean 0. So, this term here would have expectation 0. So, this term would be 0. So, what you would be left with. So, essentially the only kind of terms you would be left with are the following you would have a term that in that is this term multiplying this term via Q you would also have a term in which wn minus 1 multiplies wn minus 1 via Q right. So, there will be terms that are quadratic in a quadratic in these 2 in this and quadratic in this. So, this is the sort of term we would get right. So, now let me let us actually write that out. So, as a result we have jn minus 1 as a function of xn minus 1 that is equal to now minimizing minimization of over un minus 1 of the expectation of xn minus 1 transpose Qn minus 1 xn minus 1 plus un minus 1 transpose Rn minus 1 un minus 1 plus un minus 1 transpose bn minus 1 transpose Qn bn minus 1 un minus 1 plus 2 times xn minus 1 transpose an minus 1 transpose Qn bn minus 1 un minus 1 plus xn minus 1 transpose an minus 1 transpose Qn an minus 1 xn minus 1 plus the final quadratic term which would be wn minus 1 transpose Qn wn minus 1. So, let me run you through what these terms are these terms are simply. So, the last term here is the product of wn minus 1 transpose Qn wn minus 1. So, that is this term here the other three terms are this transpose Qn times itself times this term again and that again has been expanded out what you get is as a result you get a quadratic term in un minus 1 you get a quadratic term in xn minus 1 and you get this cross term. So, remember this cross term this is with un linear in un linear in xn that is the sort of term you get. So, what is the lesson here the lesson here is that now what whatever is there in the bracket let us look at what is all in the bracket and the stuff which is deterministic and the stuff which is which is which is random here. The only thing that is random here is actually just this wn minus 1 this is the only quantity that is random right that is because xn minus 1 is fixed un minus 1 is obtained is deterministic as a function of xn minus 1. So, the only randomness is here. So, I can actually move my what I can do is I can move this expectation from here all the way till just here. So, this is great because now I have this is the only expectation and moreover this term that I have underlined this is also has nothing to do with un minus 1 it is a constant it is a it is a it is an additive constant that you have picked up right. Now, what else is a constant here let us look at this. So, you can see this term here has is constant as far as minimization is concerned. So, this was the only random term so the it was all the other terms were constant for as far as the expectation was concerned this is a constant as far as the minimization is concerned. So, it has been moved out this term is also a constant as far as minimization is concerned because it depends only on xn minus 1 this term is also a constant it depends only on xn minus 1 this does not affect the minimization right. So, I have therefore left I am therefore left with only these 3 these 2 or you can see these 3 other terms there is this one which is a quadratic in I am left with just these 3 other terms which are which are going to be which are my which which actually affect my minimization the terms are this one this one and finally this one. Now, let us observe these green green these green boxes you can see these are these are there is something to be noted here these are actually quadratic terms. So, the u this term up here u appears in a quadratic form here also u appears in a quadratic form this is a bilinear form in which this is linear in u as well as in xn but point is as far as the minimization is concerned it is just linear in u. So, this is therefore a quadratic function of u and what kind of quadratic function well you look at this the Hessian here the Hessian here is Rn minus 1 these are supposed to be symmetric positive definite matrices. So, this is a positive definite matrix the other Hessian here is Bn minus 1 transpose Qn Bn minus 1 you can verify easily that this is all this is now a positive this is a positive semi definite matrix. So, this is a positive semi definite matrix this is a positive definite matrix. So, consequently what we are minimizing is actually a strictly convex function strictly convex quadratic function of u n minus 1. So, its solution can simply be obtained by setting the derivative equal to 0. So, setting derivative or gradient with respect to u n minus 1 as 0 what we get is actually this equation. We get Rn minus 1 plus Bn minus 1 transpose Qn Bn minus 1 whole into u n minus 1 equals negative of Bn minus 1 transpose Qn An minus 1 Xn minus 1 or in other words u n minus 1 star is equal to this Rn minus 1 plus Bn minus 1 transpose Qn Bn minus 1 the whole inverse put a negative outside here Bn minus 1 Qn An minus 1 Xn minus 1. You can see what we have got here we have got that u n minus 1 star is a linear function. So, this is a linear function of Xn minus 1. So, what this means is that your the optimal policy at time step n minus 1 should take the form some matrix multiplied by the state at that time alright. Now, and you substitute this back in you actually can evaluate also you can also evaluate Jn minus 1 which is the value function as a function of Xn minus 1 that turns out to be taking that turns out to take this form. It is some Xn minus 1 Kn some Kn minus 1 times Xn minus 1 plus this expectation of Wn minus 1 transpose Qn Wn minus 1. Now, what is this Kn minus 1 this Kn minus 1 is actually something that we I will explain what is Kn minus 1, but where did I get that last trailing term that last trailing term is actually nothing but this term. This was the this was the trailing term that you picked up the Kn minus 1 is what will get constructed using this quadratic in Xn minus 1 this quadratic in Xn minus 1 and the optimal value that you would get after substituting u n as u n star that will become that being a function of Xn minus 1. So, since u n is u n star is a linear function of Xn minus 1 all these green terms will combine together to give you another quadratic in Xn minus 1. So, this will also become a quadratic in Xn minus 1 you can check that this the the expression is like this that Kn minus 1 is equal to a n minus 1 into n minus 1 transpose Qn minus Qn times b n minus 1 times b n minus 1 transpose Qn b n minus 1 plus r n minus 1 the whole thing inverse b n minus 1 transpose Qn whole thing into a n plus Qn minus 1 this is this is Kn minus 1 right. But what is the lesson here the lesson now is that your the value function now at time step n has also turned out to be quadratic. So, the value function at time step n has also turned out to be a quadratic function and so this so what is also a quadratic. So, what we got in the previous time step what did we conclude when if the if the value function at time n was quadratic then the value then the value function at time n minus 1 also ended up becoming when the value function at time then in computing the value function at time n minus 1 all you had to do was set derivative equal to 0 and you got a linear policy. Moreover what is happening now is that the value function at time n minus 1 is also turning out to be a quadratic this is also turning out to be a quadratic. So, as a result now the value function if I now put this back in into step n minus 2 then I would be able to repeat similar calculations. If I put this back into step n minus 2 then also my value function at time n minus 2 would become quadratic and the policy at time n minus 2 would be linear in the state at time n minus 2 which is xn minus 2. So, u n minus 2 star would also become a linear is would be linearly dependent on xn minus 2 and this would go on and I would therefore be able to recursively work back all the way till time step 0 where again I would be solving some sort of a some quadratic optimization. It is of course true that you are the Hessians of these quadratics and the coefficients involved all of them continue to get complicated but that is much better than having a non quadratic optimization. So, the beauty of this particular problem structure is that there is this inherent invariance that you start off when you have a quadratic cost and linear dynamics the there are lot of nice coincidences that come into play in such a way that the complexity that of the problem does not change as you go backwards through the dynamic programming algorithm. You continue to keep getting quadratic function quadratic optimizations and you continue to get linear policies. So, this helps you very elegantly find the optimal cost function, optimal value and optimal policy. So, this can solve to for completeness let me state this here. So, hence jn minus 2 will also be a quadratic un minus 2 star b linear. Like this ln minus 2 times xn minus 2 for some matrix ln minus 2. So, for I will give you the formula also for ln minus 2. So, this there would be therefore, this kind of a relation. And moreover hence this will also be quadratic hence for each k jk will be quadratic and uk equal to lk times xk will be the optimal policy. So, what is this lk the lk is this term the lk actually can be given this way lk is negative of bk transpose kk plus 1 bk plus rk this is a capital K this is a small k please note inverse bk transpose kk plus 1 ak what are these kks they are k at time n is just simply qn and the k is at every time can be computed recursively in this way. So, k at time small k is equal to ak transpose k at time k plus 1 minus k at time k plus 1 bk transpose into bk transpose k at time k plus 1 plus rk whole inverse bk transpose k at time k plus 1 the whole again into ak plus qk. And what would be the optimal cost the optimal cost is j 0 of x 0 which is equal to x 0 transpose k 0 x 0 plus. So, this is your the initial the cost due to your initial state and plus these terms that you were picking up at each step you recall we were picking up one quadratic variance like term for the W's. So, that is exactly the term that you would accumulate at each step this would be the optimal cost as a function of x 0. So, it turns out that the optimal cost is also quadratic as a function of the initial state. So, very several nice things happen as in this problem the optimal policy turns out to be linear. So, this is so uk is equal to so maybe I will write it here. So, the optimal policy is the optimal policy is to simply apply lk on xk just this application of the linear of a matrix on the state. And the state the matrix itself can be computed through this matrix kk the this particular equation is what is called the Riccati equation this equation is what is called the Riccati equation the through that equation you can recursively compute the kks and substitute them back in to get your optimal policy. Now, here there are a few other things to note here which are not related much to optimization but but something more again about this this structure see notice that lk has no dependence on the W's you notice the way the noise appears in the problem is that it somehow drops out of a lot of the calculations and only keeps adding up as a term as you know an additional term here. So, if the noise was deterministic means the variance was 0 this term would vanish this last term would vanish and you would be left with just you know you what you would be left with is just simply this this particular term x0 transpose k0 x0 where k0 would have to be calculated recursively through this Riccati equation. But look at the look at the amazing result here that uk is uk star the optimal policy to take that a optimal policy or the optimal action to be applied is simply lk times xk so that means whatever be the state you just apply this particular matrix and this matrix itself has no dependence on the noise on the variance of the noise. So, the noise makes no appearance in the calculation of this matrix. So, if you see the KK recursion the only things that appear here are the R's the Q's the A's and the B's and the W's have make no appearance here whatsoever. So, which means that the Riccati equation is the same regardless of whether you have noise in the system or you do not have noise in the system which which also means therefore that your feedback or the optimal policy that you have to choose is also the same regardless of whether you had noise in the system or not. This is again an amazing miracle actually that comes about due to this particular problem structure. It is effectively telling you that you can pretend as if there was no noise whatsoever and continue to apply the same control that you would have or the continue to apply the same policy that you would have in the absence of noise. In the absence the policy that you would have applied in the absence of noise you just continue to apply the same policy in the presence of noise but on the state that would get realized in the new system of course. You do not take the same actions necessarily because the state itself would be different but the plan or the strategy or the policy that does not change due to the presence of the noise. So, the only way the noise affects is that it keeps adding this little offset term to your cost function to your optimal cost which depends on the variance of variance of the noise. So, it keeps adding this little bit of you know an error due to noise in some sense or an offset due to noise but it does not affect how you are going to steer your system. So, this because of all of these reasons this particular problem structure is a pet problem and favorite problem across many disciplines operation research control and so on many people have applied this. In fact, much more deeper coincidences are coming to play in more general problems again with quadratic cost and so on and that is a subject for later. So, with this I am also I also end the content of this course there is this forms essentially gives you a way by which you can relate deterministic optimization and dynamic optimization. There are other ways also of relating deterministic and dynamic optimization those are for instance there is something called as the a minimum principle of dynamic optimize dynamic control problems or dynamic optimization problems and that can be obtained through KKD conditions in KKD conditions that is something that is beyond the the purview of this course but this is something that you can look up. As my all my credentials are there on the NPTEL website if you would like to get in touch with me about any problem you are working on or if you would like to talk about anything feel free to get in touch with me you can look me up also on the internet. So, this brings us to the end of our course on optimization we have covered a vast and vast domain we have started from unconstrained optimization and gone on to constrained optimization KKD conditions and finally we have ended up with dynamic optimization but the subject itself is much larger than all this it is an extremely evolved and well developed subject. If there is anything you would like to talk to me more about feel free to get in touch with me you can look me up on Google you can simply search for my name Ankur Kulkarni on Google and my webpage should show up there you will find all ways of contacting me you can also you can email me at on on kulkarni.kuru at iitb.ac.in. So, if there is anything you would like to talk I will be happy to respond do get in touch. Thank you for this course.