 Welcome back. So, in the previous lecture we just saw that a that an interesting result about the Wittson-hausen problem which is that if the first controller here F we had reformulated the Wittson-hausen problem you can see the previous lecture to see this reformulation. The previous controller F here is linear then it turns out that the controller at the second time step is also linear. So, if your first controller is chosen to be linear then the optimal controller at the second stage is also linear. As I mentioned at the end of the previous lecture this does not mean that F is itself the optimal choice for F is linear that is not being claimed here. We are just proposing that suppose F is linear then and asking what is the optimal G and it turns out that the optimal G is also a linear function. Now, let us see the so what this suggests is that the best response the best response G to a linear F is also linear. So, this is what this suggests this is what it says. Now, let us see if the reverse is true. So, suppose instead of proposing that F is linear suppose I posit that suppose I fix G to be linear suppose G is linear. So, if so that means if I suppose G is what do I mean by this I suppose G is linear means suppose I am taking G to be equal to some mu times t and now let us ask we are not saying anything about F could be anything and we want to ask what is the optimal F in response to G. So, what is the optimal F optimal F in response in response to G well let us write out our J of F, G again. So, this here remember is expectation of K square times X minus F X whole square plus F X minus now G is taken as mu times t. So, it will be simply mu times whatever is the information with G which is F X plus V mu times F X plus V the whole square. Now, if you look at this now let us look at this expression more closely. So, there is here a in this expression we have a quadratic term in F here we have a quadratic term in F. So, this is some perfect square in F X minus F X this here is can be rewritten as some kind of as something like a perfect square in F X minus something. So, for example, you could just rewrite this term as F X minus mu times F minus mu times V because mu is mu is just being multiplied outside G is linear. So, this effectively is mu minus sorry 1 minus mu 1 minus mu times F X minus mu times V the whole square. So, this is that is basically this term. So, if I so what we have here is a perfect square plus another term that looks like that is almost a perfect square basic for all you know practically a perfect square right. So, consequently what we can do is we can complete the squares here and eventually write out something of the following form. So, we can write this cost in the following form where we write we have some constant say K dash outside then we have something like A X plus B V minus F X the whole square and plus there are additional terms there are additional terms here which would which would not depend which would depend on X and V but not on F. So, this is what we would get when we complete the squares. So, this is by completing the squares right. So, this is our usual technique of completing the squares which you learnt in you know in high school and so on. So, that is what we get here. Now, so and remember this dotted part here when we complete the squares is does not depend on X the sorry does not depend on F not depend on F. So, now, when we minimize this over F we are effectively just minimizing this part minimizing this minimizing this over F. Now, this is where we this now this is where things get interesting. So, once again remember X and V these were Gaussians X and V were independent Gaussians the X here. So, F here is chosen as a function of Gaussian and what we want to estimate is some function like this A X plus A X plus B V give from the information that is present in X. So, what we have here is basically an estimation of one Gaussian from another Gaussian and where the two are both jointly Gaussian. So, as a consequence of this this mean square this is also a mean square estimation problem and this problem also has conditional expectation as its solution. And because the underlying variables are Gaussian the optimal as conditional expectation is actually linear in the information. So, therefore, this F X here then is essentially the conditional expectation of A X plus B V plus B V given X and this is linear linear in X. So, this is some form this is of the form lambda times X. So, what this is telling us is that the best response best response F to a linear G is also linear. So, let me write this down. So, we have that the best response to a linear F the best response G to a linear F is is linear the best response similarly then the best response F to a linear G the best response F to a linear G is linear. So, what does this mean this now this is attempting to think that essentially from here it is attempting to think that F and G therefore have to be linear because after all if I choose F to be linear the optimal G to be is linear if I choose G to be linear the optimal F is linear. Therefore, it seems like linear comma linear the pair of F and G being linear is the optimal is the optimal choice. But this is not actually the case it turns there are many problems in which the in which the there are two variables and the optimal choice for one given the other is of a certain form and given the optimal choice of the other given the first is also of a certain form. But put that pair together that is not necessarily the optimal choice globally when we are looking at when we look at all possible pairs. So, this only this particular this particular thing phenomenon where linear is optimal in response to linear is we say that this sort of pair is what is called person by person optimal. There is a term used in this I will explain this a little more in detail later this is called person by person optimal and that is not the same in general not the same as global optimal. Global optimal or or what is also called global or what is also called team optimal. It is not the optimal one for for for amongst all possible pairs. So, in fact, this is the this is showing that this is not the case is also one of the results of Wittson-hausen that he find he shows that you can that this is in fact the the that the person by person optimal is different from the team optimal. Now, so now let us go back and see how exactly is is the how I will let us actually review how exactly did Wittson-hausen show that person by person of this particular linear pair is not actually optimal. So, what one can do is the following to begin with we can we have so far only shown that if I choose f or g to be linear the optimal choice for the other one is also linear. So, in that case now that does not yet tell us what the optimal pair of linear controllers itself is because linear this is just saying that if one is linear the the other the other one has the optimal choice for the other one is linear. Now, what one can do in this kind of a case is do what is called a mountain climbing procedure you fix one to be linear find the optimal in response to that then in response to that one fixing that one to be linear find your find this optimal one and then in response to this you find this the other optimal one and so on and so forth and you can keep doing this. Now, so what Wittson-hausen actually does in his paper is he finds firstly he finds an expression for the optimal cost that you can get under linear controllers. So, he considers so suppose so he says suppose so if f and g are linear what is the optimal cost we can get what is the optimal cost under linear controllers. So, this here then j of f comma g if f and g are linear what we can do is we can write f as fx equal to equal to lambda times x and g g of x equal to mu times x then j of f comma g can be written as a function j basically as a function of lambda and mu so it is a function j of lambda comma mu and remember lambda and mu we can actually compute the optimal mu very easily as a function of lambda. So, fixing the lambda we can compute what the optimal mu is. So, for this let us write out some let us fix some more form to the problem. So, let us assume here that x is Gaussian with variance sigma squared and v is Gaussian with variance 1. So, in this case the optimal mu the optimal mu is mu is given by sigma square lambda square divided by 1 plus sigma square lambda square. So, this is what we find is the optimal mu. So, this is mu of lambda. So, we can plug this in and we can plug this in into our expression for j. So, we can write then j of lambda comma mu of lambda this then becomes a function of lambda which we can then optimize and find the optimal lambda find the optimal lambda by taking the derivative with respect to lambda of putting the partial derivative with respect to lambda as 0. So, there are this itself turns out to be a fairly complicated calculation it turns out that there are this kind of once you put the derivative equal to 0 it turns out there are many lambdas that solve this equation and you have to take many cases and sub cases to decide which what is the optimal value of lambda in each case. So, the important thing is that you find is that he finds that if k square is small enough. So, for instance if k square is less than 1 by 4 then he finds what the optimal lambdas are he finds that the optimal lambdas turn out to be he finds what the optimal lambdas are and from there he finds what the optimal optimal affine cost is. So, this here is the j if I denote j star a to be the minimum of over lambda comma mu of j of lambda mu which in turn is the same as minimum of over lambda of j of lambda comma mu of mu of lambda that is denoted j star a this it turns out he shows that this is equal to some 1 minus k square. So, the optimal affine cost turns out to be some 1 minus k square for k square less than less than 1 4. Now, the here is so what is so this is so if k square is less than 1 4 then the optimal affine cost is 1 minus k square. Now what this means is that if in particular if I take if k is taken close to 0 if I make k smaller and smaller then my j star a approaches 1. So, as k tends closer and closer to 0 j star a approaches 1. So, it is in general less than 1 but it eventually you know as k becomes smaller it goes larger and larger approaches eventually approaches 1. So, in the limit now this is so we are not actually taking the limit but this is if I an indicative trend that we see that that when k is small the optimal cost is something like 1 minus k square and as k at then eventually tends to 0 it becomes closer and closer to 1. So, if you so now let us come back to the counter example aspect of the Wittzenhausen problem that now if you want to Wittzenhausen basically wanted to make the point that linear controllers need not be optimal and what he is now found is a regime which is where k is small. So, k square less than 1 4th in that regime he is computed the optimal cost that he can get from linear controllers. So, this here is the optimal cost for from linear controllers but it is applicable only in that particular regime it is applicable when k square is less than 1 4th. So, now what do we get from this well what we find now is that what we can now do is since we know what the optimal cost is in a particular in at least in a certain regime what we can try to do and which is what Wittzenhausen does is he tries to see if there is a way by which this cost can be beaten can you get a better cost by in this then 1 minus k square in this particular regime. So, what he does is then he does something that nobody had ever seen before he constructs a very very strange looking non-linear a non-linear control policy and using and in that non-linear control policy he computes a bound for that non-linear control policy an upper bound on it to and eventually shows that that upper bound is does not exceed a certain value that a certain value which is between 0 and 1 it is that bound you can say is 0.5 he shows that it cannot that the cost the cost due to this policy is no greater than 0.5 and the cost whereas k is going to k is going to 0 in in in the affine policy the best cost in the affine policy is actually close to 1. So, this is this is basically his what he construct what he shows. So, let us let us just see what his affine what his non-linear policy actually is. So, let us look at a non-linear policy. So, this is now a non-linear policy. So, as I said this is a completely novel looking policy nobody had ever seen anything like this before used in you know in LQG problems. It takes fx to be sigma times sigma remember is the variance of x it is sigma times signum of x or sin of x. So, what is signum of x? Signum of x is simply doing the what is called as 2 point quantization. So, if x is greater than equal to 0 it it gives you a value 1 if x is less than 0 it gives you a value 0 the function g is taken to be sigma times tan hyperbolic of sigma times y. So, obviously this is completely completely new you know we never encountered any policy that looks like this in an LQG problem the best we have seen is linear whereas this now is you can see there is a terrible non-linearity in f in fact f is not f is not even continuous the g is this strange tan hyperbolic function which again you know is very difficult to see where it comes from. But nonetheless what he shows that is the following that if you take k square sigma square to be equal to 1 then under this particular policy the j of f comma g evaluates to be less than equal to 2 something like the formula something like this 2 times 1 minus square root of 2 by pi plus square root of 2 pi and is 1 by k square phi of 1 by k where what is phi is simply the Gaussian PDF phi of t is 1 by square root of 2 pi e to the minus minus t square by 2. So, it is the standard Gaussian PDF with mean 0 and variance 1. So, this is what he finds if k square sigma square is equal to 1. So, if k square is chosen in such a way that k and sigma are chosen such a way that k square sigma square is equal to 1 then j of f comma g can be upper bounded by this quantity. Now, now let us see what happens as you let k go to 0. So, as k goes to 0 so as k goes to 0 this 1 by k square times the Gaussian PDF this quantity also goes to 0. The reason for this is the Gaussian PDF decays exponentially in fact more than faster than exponentially. So, the Gaussian PDF decays exponentially outside you have only a polynomial. So, the numerator here is going to 0 faster than the denominator as a result of this this product here 1 by k square times phi of 1 by k also goes to 0. So, which means as k becomes smaller and smaller the which means that as k approaches 0 or k becomes small enough or rather as k is small enough we have j of f comma g and of course k square and we need k square sigma square to be equal to 1. So, we have j of f comma g to be less is less than equal to 2 my becomes. So, the second term becomes as good as vanishes you can say it becomes less than equal to 2 times 1 plus or sorry 1 minus 2 by square root of 2 by pi and now this particular quantity evaluates to 0.404 and whatever even if I take k to be not 0 but close to 0 what we can say is for k small enough this is definitely less than equal to 0.5. Whereas, if you remember I just said that the optimal affine cost was approaching 1. So, what does this mean? What this is shown is therefore that you we have found a policy we have found a policy for which the optimal cost is for we have found a policy we have found a policy which is non-linear and whose performance is strictly better than the performance of the optimal linear policy. So, our non-linear policy you know policy gives a cost strictly less than the best linear policy. So, first in other words for k small enough because for k small enough j of f comma g is strictly less than j star a which means that there exists and which means that which what is this has what has this taught us that linear policies are not optimal. So, which means that linear policies are not optimal and this is therefore the counter example this is why this is a counter why it is a counter example in stochastic control. So, we have considered so what Wittson-hausen has basically shown is that you have taken a linear system quadratic cost linear observations Gaussian noise and yet you have been able to you have it is it is happened that there is a non-linear policy that outperforms the best linear policy and the reason this has happened is because there is because he is departed from the earlier assumption which is the assumption of classical information structure and gone to a non-classical information structure. So, in other words the all the earlier results that we had about linear quadratic Gaussian problems where the optimal controller was linear it has it is a superposition of the deterministic controller and the Kalman filter for a state estimation all of this is true only when you have the classical information structure. It is not once you move to a non-classical information structure it need not be true in non-classical information structure problems may admit non-linear optimal policies. The other see the other thing that lesson important lesson which I have been talking about so throughout is the issue of the dual effect and how the dual effect makes it very hard for us to compute what the optimal controller is. So, the policy that Wittson-hausen showed this strange policy here this strange policy is by no means guaranteed to be optimal. There may very well be improvements on top of this policy that one can make and you can get even even lower cost than what Wittson-hausen had got. So, as of today we still do not know this recording is being done in 2022 the February of 2022 as of today we do not we still do not know what the optimal controller is for Wittson-hausen's problem. It is a simple looking deceptively simple looking problem seems like a toy problem that one could one could solve but it hides in it an enormous amount of complexity which you know mankind has been struggling against for more than 50 years now. So, this a simple problem like this has remained unsolved and that tells us why the role that the dual effect has in you know in stochastic control problems and how it makes reasoning about those problems or computing solutions of these problems an extremely complicated task. So, this therefore is the state of the art we know that linear controllers are not optimal that has been shown we do not know we know that nonlinear the optimal one has to be found in the class of nonlinear controllers we do not yet know if what the form of the optimal nonlinear controller actually is.