 In this lecture we will look at a particular type of stochastic control problem which has been studied in in in depth and has been analyzed in many different ways. In this lecture we will look at a stochastic control problem which comprise in which the dynamics of the system comprises of is it takes a linear form and the cost of the system or the loss function of the system is it takes a quadratic form. So this is what is popularly known as the linear system with quadratic cost type problem, linear systems. So linear system with quadratic cost is has innate in build a number of beautiful coincidences and what I hope to bring to you today is is a glimpse into what the the the nature of the the way in which these things coincide so elegantly so as to for us to be able to eventually solve for the optimal policy in closed form. So a linear system comprises of the dynamics given in this in the following way x k plus 1 is evolves as a k x k plus b k u k plus w k k x values from 0 to n minus 1. So a k b k here are matrices so these are these are known to us w k is a is a sequence of disturbances so these these will be we will assume are independent random variables. So w k are independent we will also assume that they have mean 0. So we will assume that w k have mean 0 and we will we will assume also that they have their covariances are finite second finite second moment. So w w the expectation of w k w k transpose this this this particular this this these are this is a matrix with some with with finitely many with finitely many in which the values are all finite. The problem is to minimize a cost which is given as follows the expectation of we have a terminal cost written as x n transpose q n x n plus the summation of stage wise cost k equals to 0 to n minus 1 x k transpose q k x k plus u k transpose r k u k this here is the x is the cost and what we want to minimize this is overall policies we will we will be looking for Markov policies in this particular problem. Now the x k's the u k's as I said are these are the a k's and b k's are matrices x k's and u k's are vectors remember x k u k are vectors of appropriate dimension these are vectors of appropriate dimension a k b k q k and r k these are matrices of these we do not assume anything specific about a k and b k but we will assume something about q k and r k we will assume r k here is a positive definite matrix we will assume q and we will assume q k is positive semi definite the as I said these w k's have been assumed to be independent and with mean 0 and and finite variance. So this problem that I have written out which is the problem where a system evolves according to a linear equation like this and the cost is quadratic is a very popular form and it it it is often called the linear quadratic regulator. The reason it is called the linear quadratic regulator is that it is in a particular form it can be thought of as a problem of regulating the a system. So regulation means to follow a particular trajectory. So a trajectory so for example one may be given a trajectory for instance as x 0 bar x 1 bar dot x n bar as a given trajectory and one wants to given trajectory and what one wants to do is to stay as close to that trajectory as possible while at the same time optimizing the amount of energy that it that is that is needed incoming is in staying close to that trajectory. So in that case the cost that we incur can be thought of as the is something of this form x n minus x bar n transpose q n minus q bar n plus the sum from k equals 0 to n minus 1 x k minus x bar k transpose q k times x k minus x bar k plus u k transpose r k u k. So notice here that the the cost here look at the form of the cost this here is the terminal cost as above that captures somehow the distance you are from the given reference trajectory. So x n minus x bar n is is appearing here it is a skewed this here is a skewed distance between x n and x bar n these here are skewed distances between x k and x bar k and this here is the cost on the energy that we spend in applying this control. So it is u k transpose r k u k that is the cost that is coming here. This entire thing here is our this I should write this bracket on this side here. So this entire thing is our total this is the total cost that we incur in regulating our given system towards the trajectory x 0 bar to x n bar. Now what we will do is we will solve this particular problem using once again the dynamic programming equation. So what the agenda for now is to apply dynamic programming to solve the above problem and by above problem I mean the not this specific form of regulation but but but this this general form that I have written out here. So this is the problem that we will be looking to solve the the one in the red box. Now the dynamic programming equation asks us to do do the following it first asks us to set up j n. We write j n of x n for every x n to be equal to the terminal cost and what was our terminal cost? The terminal cost was q n x n transpose q n x n. So we write this be equal to x n transpose q n x n and we write this for all remember this has to be written for all x n. So for every value of the state. So in other words for all vectors x n the function j n is defined in this quadratic way as q n transpose x n transpose q n x n. So this this is here we have written this for k equal to n. Let us now write this for let us now write this for for n minus 1. So let me use the the index t rather than k. So for t this is for t equal to n this is now for t equal to n minus 1. For t equal to n minus 1 I have j n minus 1 of x n minus 1 is equal to remember this is equal to the minimum of the this is going to be the minimum of the stage wise cost plus the the cost to go. Now the cost to go at time n minus 1 is simply is simply j n of x n alright. So we will write it write it that way this is going to be the minimum over all actions u n minus 1 of the expectation of the stage wise cost. Stage wise cost for us is x n minus 1 transpose q n minus 1 x n minus 1 plus u n minus 1 transpose r n minus 1 u n minus 1 plus now j n of x n but remember x n itself can be written through these dynamics here in terms of x n minus 1 and u n minus 1. So I need to write this but I need to substitute this with k as with k equal to n minus 1. So that then gives me j n that then gives me j n of a n minus 1 x n minus 1 plus b n minus 1 u n minus 1 plus w n minus 1. Now I will expand this out now and write out a complete expression this is going to be a long expression I have a minimum over minimum over u n minus 1 of the expectation of x minus 1 transpose q n minus 1 x n minus 1 plus u n minus 1 transpose r n minus 1 u n minus 1 plus now this I j n of anything is equal to a j n of any x n is equal to x n transpose q n x n. So I am going to substitute substitute this entire thing as in x n here so that would give me a n minus 1 x n minus 1 plus b n minus 1 u n minus 1 plus w n minus 1 transpose q n a n minus 1 x n minus 1 plus b n minus 1 u n minus 1 plus w n minus 1 this expectation is what this expectation and then this is the expectation here that we get and once we minimize this over u n minus 1 what we will get is our value function or the cost to go which is j n minus 1 as a function of x n minus 1. Now our task now is to first identify like we have done before what is random here and what is dependent on u n minus 1. So the first thing to let us first look at the expectation and try to see what is which of these terms are actually random. So notice here that x n minus 1 here denotes any particular state at time n minus 1. So for us x n minus 1 is actually deterministic since this is being written for all remember for all x n minus 1 just like we wrote it wrote the one above for all x n. So this is being written for all x n minus 1 so and so in other words this is here x n minus 1 is simply a parameter. So it is not a random variable so that is so x n minus 1 is not random. u n minus 1 is a function of x n minus 1 it is also not random. So as a consequence x n minus 1 is not random u n minus 1 is not random as a consequence we are left with only one term which is actually random and that term is w n minus 1 but w n minus 1 appears in this kind of form here it appears inside this quadratic. So what we will do is expand this out in order to pull out the various terms of w n minus 1 that are dependent on w n minus 1. So let us write this out more explicitly we get therefore that j n minus 1 of x n minus 1 is the minimum over u n minus 1 of so what I will do is I will remove I will pull the expectation inside over only this only the quadratic term the only the quadratic term here so that so I am going to write out the remaining two terms as they are so this minimization now is going to be x n minus 1 transpose q n minus 1 x n minus 1 plus u n minus 1 r n minus 1 n minus 1 transpose r n minus 1 u n minus 1 plus the expectation of this term so this term is a n minus 1 x n minus 1 plus b n minus 1 u n minus 1 plus w n minus 1. So now this let us look at what this quadratic is going to amount to so this quadratic will have terms that are firstly not dependent at all on w n minus 1 because those terms will come when when this term here multiplies with this term here. So you will get a term which does not depend on w n minus 1 at all and because those are those will involve only x n minus 1 and u n minus 1 and those are not random they will come out of the expectation. In addition we will have a term that in which there is w n minus 1 appears in a linear fashion so it will be w n minus 1 times something on the other side so w n minus 1 times q n times maybe this times this term again the term that is involves x n minus 1 and u n minus 1. A third term that we will have is a term that involves w n minus 1 in a quadratic fashion. So let us write these out so this is the minimum over x n minus 1 transpose q n minus 1 x n minus 1 plus u n minus 1 transpose r n minus 1 u n minus 1 plus a n minus 1 x n minus 1 plus b n minus 1 u n minus 1 transpose q n a n minus 1 x n minus 1 plus b n minus 1 u n minus 1. So this here is my first term which depends only on x n minus 1 and u n minus 1. Now I have a term that depends linearly on w n minus 1 so it is w n minus 1 transpose q n times a n minus 1 x n minus 1 plus b n minus 1 u n minus 1 and then I have a term that depends quadratically on w n minus 1. So this is the expression that we get. Now let us look at these terms a little more closely. So this term here which is linear in w n minus 1 this term has all these other terms that are deterministic. So consequently the expectation of this is really the expectation of w n minus 1 the whole transpose all of this and w n minus w is that we decided for actually we had assumed were all mean 0. So as a consequence of that this term actually is 0. So this term is equal to 0. So what we are therefore left with is only the is only the third term here. But the third term is one which depends only on w n minus 1 and has no dependence on u n minus 1 at all. So what we are let us now look at the expression as that remains therefore there we have this term which does not depend on u n minus 1. We have these terms which depend on u n minus 1 but they depend on u n minus 1 in a quadratic way. So there is a quadratic here and there is a quadratic here also in u n minus 1. So as a result when we minimize this over u n minus 1 we will get something where u n minus 1 is depend is we get a linear equation for u n minus 1 that linear equation can be written out as follows. We can after we so essentially the way you find that is by differentiating differentiating this entire expression here and putting the derivative equal to 0. So if I differentiate this entire expression and putting the derivative equal to 0 I get that u n minus 1 the optimal u n minus 1 u n minus 1 star let us say must satisfy r n minus 1 plus b n minus 1 transpose q n b n minus 1 u n minus 1 equals negative of b n minus 1 transpose q n a n minus 1 x n minus 1. In other words u n minus this equation must be satisfied by the optimal u n minus 1. So differentiating and putting differentiating and putting gradient equal to 0 with respect to gradient with respect to u n minus 1 equal to 0 gives us this. So that gives us that in other words u n minus 1 star should be should have the following form it is negative of r n minus 1 plus b n minus 1 transpose q n b n minus 1 inverse b n minus 1 q n a n minus 1 x n minus 1. Notice that we have actually now as a result of the dynamic programming theorem we have actually now found what the policy or what the optimal decision rule is at time n minus 1. We find we have found that at time n minus 1 the optimal thing to do is actually to apply a linear function on the state x n minus 1. So this and the linear the coefficients are given by this complicated form here. So this let we can write this particular thing let us write this as l n minus 1 x n minus 1. So this is what we conclude here where l n minus 1 is this. But this is only given as the optimal policy let us substitute this back in the substitute this optimal u n up in this form here in this equation here and from there we can conclude that j n minus 1 of x n minus 1 is equal to x n minus 1 transpose k n minus 1 x n minus 1 plus the expectation of w n transpose w n minus 1 transpose q n w n minus 1. Now what is this k n minus 1? Well what all notice that we have u n minus 1 is given as a linear function of x n minus 1. So u n minus 1 star is some matrix times x n minus 1. So once I substitute that out here I because my notice the expression that I am going to get my x n is appearing in a quadratic form my u n is appearing u n minus 1 here is appearing in a quadratic form out here all u n is also appearing in a quadratic form or in a bilinear form with x n u n minus 1 and x n minus 1. So once I substitute u n star in this sort of in this particular form here the resulting expression is going to be the resulting expression is going to be quadratic in x n minus 1. So it is going to be x n minus 1 transpose some matrix that I am going to denote by k n minus 1 x n times x n minus 1. I also have this trailing term which was this constant that was left behind when we took the expectation that term appears here. So for completeness I will also write out what k n minus 1 is. So k n minus 1 is equal to a n minus 1 transpose q n minus q n b n minus 1 times b n minus 1 transpose q n b n minus 1 plus r n minus 1 the whole inverse b n transpose q n times a n minus 1 plus q n minus 1. This is this is the expression for the matrix here k n minus 1. So notice that this matrix here is actually symmetric if you stare at this a bit you will realize that this matrix is actually symmetric. In fact this is not only symmetric it is actually positive semi definite. This is a positive semi definite matrix and as a result the function that we have here the function the value function that we get here is also a convex function. This here j n minus 1 as a function of x n minus 1 is in fact a convex function. Remember we had started off assuming that these all these q n minus 1's and r n minus q n minus 1's were q's were all positive semi definite and the r's were positive definite. What we have so we had that the terminal cost and the value function as a result of that was also a convex function. And what we are getting now is that the cost to go at time n minus 1 is also a convex function. It is a convex and moreover it is not just convex it is also a quadratic function of x n minus 1. So what we are therefore seeing is that the structure of a lot of this problem is actually being retained when we go from step n to step n minus 1. So what we will do in the next lecture is actually you exploit this to in fact find the optimal policy and an expression for the optimal cost.