 You had a good question about how to find the initial condition x hat, I don't know if you looked at it some more or thought about it some more. Okay, so I looked at the original paper, it's a Garamani and Hinton paper and what they do in that paper is to use a form of common filter that runs both forwards and backwards and basically the way we run the common filter is to make predictions about the next state. So we use the state update equation to say that if we know x hat at time n, we can make predictions at time n plus 1 and then we compare it to our observation and we change it. We can also write that equation so that given n of plus 1, we can make predictions about n because basically what we have is that, so we can simply write this equation as x of n is equal to a minus 1 times x of n plus 1 minus b u of n plus the noise. And in a scenario where you are making predictions about things that are going forward, it's just a common filter and then if you take the common filter observations and then run it backwards, it's called smoothing. And it was an extension that was made to the common algorithm about 10 years after that paper was published in the 70s. So what Garmani and Hinton do in their paper is they say that get your x hat measurement both first through the common approach and then run it backwards so now you take into account the observations that are you making in the future. So you get a little bit better estimate of what x hat is. But this way if you have x hat of 1, which we assume we do, we also get x hat of 0 because we just run it backwards basically. We say that if we have an observation made at time point, we can make this estimate x of 0 if we know x of 1, which in our assumption is that we have that. And so the mean of the, we form the log likelihood, log of the complete likelihood. And then we said that if you look at the log of the complete likelihood, it has a Gaussian that has a center associated with our x hat of time 0. Well, what is x hat of time 0? Well, obviously the best estimate of that, whatever it's going to be, it's going to be such that, you know, you set it so that basically becomes like this, the common filter. So the x hat of time 0 is going to be a minus 1 x hat of time 1. That's the expected value of it, right? So that's the approach. And then to find its variance, you multiply both sides of the equation by the top equation by x transpose. So you just get expected value of x times x transpose. That means it's called the backwards prediction. Yeah. No, it's smoothing in the sense that in the common filter, what you're doing is you're making a prediction about the future given all the past observation. When you run it backwards, you're making a prediction about now given all the future observations. Does that make sense? So it's smoothing in the sense that your initial prediction depended on all the things in the past. And in engineering, you assume you have all the data. And if you have all the data, then what you do is you just run the time backwards. And that's just called smoothing. Okay. Thank you. So, so far what we've been doing has been using the state estimation approach and the system identification approach to make estimates of state and make predictions about how our inputs to the world is going to change those states. If we extend an input, it's going to change the state of the system that we're interested in. But all we're doing is making predictions, right? So we're not actually doing anything in the sense that we're not generating a command to do anything. We're just saying that if such a command was generated, here's what's going to happen. But how do you generate good commands? How do you generate commands that get you things? In the last, I would say, 10, 12 years, there's been thought that, well, in control theory, there's been approaches made to understand how does one control things in the sense that if we want to move, you know, a tractor from here to there, if we want to move a rocket from where it's sitting on the platform to the moon, is there a way for us to generate commands to it in order to achieve something? And in the last 12 years or so, the question has been does that framework help us understand how the brain controls our movements? And this is the idea that there are states that we want to achieve because those are where the rewards or the valuable things are. And then there are actions that you're going to produce that are going to carry certain costs for you. And the question is, what's the optimum behavior? And in a sense, in biology, the concept of optimum behavior is something natural because you would sort of imagine that evolutionary things that have happened have made it so that perhaps the way we walk, the way we swim, the way we run, the way we behave is in some way making our fitness best, whatever is a measure of fitness. Now in Darwinian terms, fitness is your ability to live long enough so you can do biology. Whether that's the only cost, it's unclear. In engineering, what has taken place is the concept of describing behavior and control in terms of a cost and a cost cost. We're going to assume that when you generate an action, it's because you have a goal. You have some goal in mind. You want to make the state of whatever it is that's going to be controlled by your actions so that it reaches some place or some region where things are good. So you come to class today, you sit here because it's better than being in the rain. This state is better than that state. And maybe you also will learn something. So you made a value judgment about a state. And you said, if I go to the learning theory class today, that's a more valuable state than me going to whatever else was there. Okay. But now the question is, all right, how do I achieve that state? You have to do something to your body, get out of chair, get out of bed, whatever was the state before for you to come to here. So you carry a cost for that. So here's something and then there is some effort that you have to spend to get to that notion of good. So at the end of the day, when you sit in the class and you've achieved the state after spending certain amount of energy to get here, the question is, did the way you perform that action wasn't as good as it could have been? And this framework of thinking about actions in terms of costs and rewards is the basis for the way we think about some of the ways individuals make movements and the way they make decisions. And it's a broad framework because it has a potential for us to understand not just how disease affects the way you move, but also is there a link between the diseases and its effect on your movement and also the effect of the diseases and how you make decisions? So for example, you may decide that you're going to go to, say, the pharmacy and you're going to look at the line, waiting for people to get good prescription, and when you look at that line, you're going to say to yourself, I think it's going to take me 20 minutes, and you're going to make a decision. Is the passage of time worth it for you to wait, or is there something better to do during that time? So that means that if you have to wait 20 minutes, you're going to miss the beginning of a test in some class. Well, you're probably not going to wait. If on the other hand, nothing's happening today. You are bored as could be. You might as well wait in line to get the prescription. So you made a decision about this or that, and our life is full of such decisions. You make decisions about what time to get up, whether you should invest your money in this stock or that stock, whether you should react patiently or impulsively to stimuli around you, and all of this is decisions that consider costs and rewards. And so there's certain costs to doing something in the sense that you're going to invest some effort, you're going to invest some money, whatever is the mechanism, and then by doing so, you try to acquire a good state. So in the framework of applying mathematics, what economists, what psychologists, and what control theorists have done is considered control and describing things in terms of mathematical costs. So we're going to do the same, and we're going to show that by thinking about control of simple things, the simplest of all movements perhaps in terms of these costs, we can understand why the nervous system moves the way it does, and then ask, well, you know, what does this say about the representation of these costs in our brain? So the typical formulation goes like this. We have a state that is going to be controlled by us. So we have an object here, we have something, maybe on body or something like that. And then it has a state, let's say x is a state, to give it some interview, this changes a state. You make an observation, this is a sensory system, that gives you y. And when you make that observation, you're going to have to now imagine something where it's going to say, all right, this was my observation. And here, this was my model, this was my model. You're going to combine these, and this is going to give you a posterior, y of n plus 1, given n plus 1. And this is going to go into something that's going to be our control. That's going to say, all right, this is the motor command that I'm going to produce next. So, so far, what we've been talking about is essentially, we've been talking about this, we have a model of the world, and there's going to be a real world out there. And then you're going to combine these things, and you're going to do something like that. And that's going to give you a posterior. This is the estimate that has happened. This is the estimate of the world now. But the objective remains to generate actions, to do something. And who cares if you can be a root predictor? That just makes you a soothsayer, right? People will come to you and be able to pay you for you to read the tarot cards, the tea leaves, and you can predict the future for them, right? But you can't actually do anything until you generate the commands you. What we're going to be talking about is this thing and this view. And the idea is as follows. So we're still going to have a system. No, we'll make it more complicated, but that's the basic system. But in addition to that, it's going to be such that it's going to depend on our actions. So there's going to be some function. So, you know, there's going to be some location, some state, where it's going to be more valuable than other states. So for example, you know, when I'm reaching, well, it makes sense for my hand to get to the cup of coffee out here. So if it gets out here, it's not as good as if it gets to the cup of coffee. So there's some sense of spatial representation of goodness of state. So this genobax is going to be something good, something good to be at that location. This f here has some function of effort. And it's, you know, as you generate your commands, it just makes sense to be as lazy as possible. So this function f of u is going to be such that it's going to grow in some way, maybe quadratically, maybe in some other way. It's going to grow as a function of u. So it's good to be as lazy as possible. Okay, but we're not quite done with this because we also have to have some concept of time. Why is that? Why is it that it's not enough to arrive at a good state with minimum effort? Why is it that we also have some kind of measure of time? Yeah, because if I were to say, I'm going to give you ten dollars today and I'm going to ask you how about if you were to consider ten dollars today versus a thousand dollars at the end of the year, which one would you prefer? That's better. Well, there's a question. Ten dollars today or a thousand dollars in the end of the year? Which one do you want? Yeah, most people would take a thousand because a thousand dollars to them is somehow valued more despite the fact that you have to wait a year. So what I'm asking you to do is to think about a thousand dollars at the end of the year discounted to today. How much does it get discounted? Is it more than ten? If it is, then you would pick that. So let me just show you what I just asked you to do. So the reward that I'm trying to get to, this state is going to have value, right? And so this value isn't constant. It depends on when you get it. So this is time. This is a subjective value. So when you reach to this cup that has coffee in it, there's a cup at that location. So I have to get to that, right? And there's some effort that I'm going to spend to get there. But if that cup has value for me, that value is higher if I get there sooner than later. So this state gets discounted the longer it takes for me to get to it. Okay? So these are the costs that we're going to be talking about. You perform actions to maximize success. You want to achieve success. But it turns out that that success is better if it's better than later. That success is not as valuable if you have to bust your back to get it than if you could just walk to it. So effort that you spend is going to discount the amount of success time is going to discount your success. What you want is to generate the command U. So what we want to do is to minimize some kind of a cost. Based on this order, we have to end up with what's called a feedback component. So what we want is a policy. And so where does this estimate of state come from? Well, so what you mentioned what I want to get is as follows. So I'm going to give you a cost. It says here's where the good stuff is. And I want you to produce an action that allows you to get to the good stuff given a cost that has a cost of time and a cost of effort. You're going to find a policy U. And by those, the strain equation is the state update equation because that's the thing you control. You're generating actions, it changes the state. So that's the machine that I want you to control. You want to minimize the cost given a constraint. The constraint is the state update equation. The cost is the cost. And at the end, what you're going to end up with is a feedback controller. So what do I mean by feedback controller? A feedback controller is something that doesn't compute all the commands from now until forever. What it does is it says what's your best estimate of where you are? What's your estimate of X? Where are you? And given this cost that you have, I'm going to now generate the best command that I possibly can to get you there. And then, on the next iteration, it's going to ask you where are you now? I'm going to generate the best command that you can so you can get you there. So you have a terminal thing that you want to get to. This idea that, you know, you want to drive to Las Vegas. There you go. Start out in Baltimore. You don't plan out the entire trajectory because who knows, the road may be closed. There may be a bad weather. You may see a call from a friend in Cleveland. Ask you to stay. So you have a policy that has a cost. It says, you know, I need to be there at this time. I'm willing to spend this much money. And here's my current plan of how to get there. As you actually drive, you evaluate where you are with respect to where you want to go. And then you change your actions. That's a feedback control policy. It's similar to, you know, life in the sense that you get up in the morning. You ask, okay, what's happening in the near term? What sort of things are happening today? I'm going to do my actions based on this stuff. But I have this long-term goal of, you know, graduating, getting a job, those kinds of things. So your today's actions are sort of vaguely related to this eventual goal. But they're mostly related to the current state that you're in, things that are around you. That's a feedback control. You don't, you know, come to college and plan out four years, six years of life in front of you. You just have some pseudo-goal of where you want to go, and then you react to things around you. Okay. So, like everything else that we've done in this class, what we're going to focus on are various things that we can write mathematically. So, you know, we're not going to talk about how to retire from Bahamas. What we're going to talk about is how to move simple systems so that they minimize the cost. But, you know, the principle is that simple mathematical equations that we think can be understood in terms of minimizing these costs. All right. Okay. So that's what we're going toward figuring this out. And the way we're going to figure it out is using what's called the Bellman equation. Bellman equation is fundamental ideas in this class. One is the Coleman equation, the second is the Bellman equation. The Bellman equation is how we're going to minimize this cost function. And it allows us to solve the feedback control problem. But, you know, you guys can go take optimal control class from mechanical engineering and then we'll teach you that. So, why are you in this class? Because this class is concerned about biology. So, we're not going to be talking about just the Bellman equation. What I'm interested in is how would the brain solve this thing? You know, what is the example that it can actually be implemented in anything associated with biology? And so, that's the notion. Any questions? All right. So, things are pretty vague. And that's the way they are at the beginning. But hopefully, we'll fill in the parts. So, today, what we're going to do is begin the path by talking about how to find the optimum view, but in an open loop control. I spent too much time talking and I'm not writing much. In a loop scenario, what happens is that we're going to have to cost function in a single step. So, we're just going to find the minimum condition that the u that minimizes j, but this is going to be done not for some passage of time, but for a single condition. And so, the j that we're considering, and we'll see how to do that. So, it's going to be very simple. There's only going to be one command that will minimize it. Now, to do this, what I want to show you is data from the humans as well as monkeys were asked to move a joystick. What I'm talking about today is a lecture on one chapter. So, let me show you the results. And then you can just find an optimum set of commands to produce the resulting action. So, if you were to take a muscle, there's a scalar quantity, another activation, and there's going to be a pulling direction associated with the force that's going to be produced. So, if you were to take a circle to represent the various muscles that act on a satyr wrist, there's some muscle that if you want to activate it and pull here, it would be another muscle maybe that would be activated and pulled here, maybe one here, and then maybe one here, and maybe one here. So, these are the muscles that you have, and these are the pulling directions that you have. So, the experiment that I want to show you establishes something interesting, and that's a follow-up, that if you look at how humans, monkeys, and other animals produce activation of their muscles, what you find is that there is a scenario where if you plot activation as a function of the direction of which force is being generated, so you can imagine how the goal, so maybe my goal is to produce force here, this is my goal, fg, and if I change this goal all around, and I measure the activation of this muscle here, this muscle that has a pulling direction there, what I find is that if the goal is here, this muscle is activated a lot. If the goal gets closer to this, it gets activated maybe a little bit more, and then as the goal force gets farther away from the pulling direction of this muscle, it gets activated less. So, what you see is a tuning function. So, if you plot fg, the angle of fg, this is theta, some angle of x-axis, and then you plot ui for muscle i here, what you find is that there's some direction by which it's activated a lot, and then it falls off. That's some width to it. So, the experiment that I want to show you, by using a measure of cost, helps us understand why this thing has that width the way it does. Why isn't it like this? Why isn't it like this? To help you understand this, I want to give you a sense of what it's like to have many actors that can contribute to an end result. So, there are many, many muscles here, and you could make it so that only one of those muscles is producing all the force you need, or you could spread the force across the various muscles. So, the idea is if you have two people that can pull on the rope, you could make it so that all the force is produced by one of those people, or you could make it so that half the force is produced by A and half the force is produced by B. In the two conditions, what's the difference between these two conditions? Well, if I were to say there's a cost, the cost is the sum of the squared force being produced by each person, then if I have two people that are each producing half of the force, the square root of 5 times 2 is smaller than the square root of 1. Let me write this down. So, suppose I have two muscles, f1 and f2. That's equal to 1. You can choose f1 and f2 any way you like. You could make it so that, well, I'm going to say f1 is equal to 1, f2 is equal to 0. Now, let's say I had a cost, and that cost was f1 squared plus f2 squared. So in this case, my cost is 1. I could also have a scenario where f1 is 0.5, f2 is 0.5. This is fine, right? I could produce a force of 1. Now, in this case, the cost is 0.5 squared plus 0.5 squared, which is what? What is this? So this policy costs less than this policy. So what do they do? I share my activity among the various actors. So a turning function we're going to see is sharing. And when we have a cost function that is squared like this, sum of fi to the m, or m is equal to 2, it generates policies that are saying share things between people because it's going to cost you a lot to have a win or take off. And the larger this m is, the more sharing is going to take place. As m approaches 1, then you can later take off. So in this case, m is equal to 1. If it doesn't matter to me, so if my cost is this, and m is equal to 1, so j is equal to sum of fi to the power of 1, then this cost is the same here as this here. It makes no difference. This policy is not better than this if m is equal to 1. But this policy is better than this if m is equal to 2. So by penalizing effort to the power of 2, you are imposing a sharing policy so that there's not going to be a win or take off. Everybody's going to pitch it. So let me show you a couple of questions. So is m the parameter we're trying to find? Yeah, I'm going to show you that with actual data, if we just have a model that says we have a gold force to produce, we can biologically measure the various pointing directions, the pulling directions of the muscles, how does the nervous system allocate activity to the muscles? Let's find the m. Are we assuming those two new functions are symmetric? Well, we make no assumptions about their shapes. We're going to say that for the m that you choose, what is the tuning function? In fact, we're going to see they're asymmetric because it really depends on this distribution here. So if this distribution is weird like this and most of the things over to the left, it will not be symmetric. Any other questions? So we have force in muscle i is ui pi. So the force that's going to produce is the activation times this vector that's going to tell me the pulling direction. So this is just a 2 by 1. This is a 2 by 1 and this is a scalar. Now what we have is that we have a set of activations u which is just u1 to un, so how many muscles we have. And then we have matrix p which is p1 pn. And so we can write the total force that the system is producing f as p times u. So now what we're going to do is we're going to say the cost and that cost is the difference between the goal force minus what you did analyzing the simple way. So you want to generate your forces so that you arrive at this goal that you want and you want to do it with least effort as possible. To generate the commands so that you produce the goal and activate these muscles so that they generate the total force that you want, but do it in such a way that we're going to analyze the amount of commands that you produce by the cost that depends on u and the power that depends on that. And all I want to show you is that, well, how do we minimize this cost? 1 and 2, what does the result look like depending on the shape of this m function? It turns out that if you look at what m is doing, as I've described for you here, is that it is basically imposing a cost that if u becomes large for one particular muscle, it's going to cost you a lot. So what this does is that it makes it so that when m is equal to 2, what we get is a scenario where the preferred direction, p i, the angle of p i, is going to be at some location. We're going to see the activity for that muscle tends to be maximum in that direction in fall when m is equal to 2, when m is equal to something less than 2, it doesn't look like this. It's going to become narrower. Or m becomes a narrower than it becomes. As m gets larger than 2, it becomes large. And, you know, this is nonlinear stuff here so that you would have to be a search mechanism where you would find a view. Write the equation for you. You just simply have to search it in the space to find a view that minimizes that. But the effect of parameter, parametrically penalizing the command is that you get greater sharing when the penalty is large and you get less sharing when the penalty is smaller. I should have... This is not quite right. The peak becomes larger as the... So it's like an distribution. The peak grows larger as the width becomes smaller. In other words... Okay. So this is really the beginning of the idea that by looking at real behavior, measuring activity, we can potentially arrive at these cost functions. Now, when this paper was published, this paper was, I think, 2004 or something like that, where they measured this activity and they said that no one can describe the cost function that seems to be predicted. The problem with it was as follows. I want to describe it to you because it is not the case that we can write such an equation. So really what we have is we have force in the muscle is equal to ui times pi. But there is also noise associated with this force where pi is a random variable with mean, zero, and variance 1. So what did I write? I wrote that the force that the muscle produces depends on its activation but also depends on a random variable that is being multiplied by something that has a gain of ki. And what that means is that the variance of this force so what's the mean of this? So the expected value of fi is just ui pi but variance of fi is equal to ui squared ki squared times pi pi transpose. So what this says is that the variance of my force depends on the input u. But what this means is that as u increases the variance of force is going to quadratically increase. Some function that's going to increase. Because you notice that the relationship between variance and u depends on the squared u. So why is that important? Because if force depends on not just the input but also the noise associated with the way that input generates force then our cost function is not a deterministic value. It's a random value. So what we need to do is minimize, not j what we need to do is minimize the expected value of j. So what I want to show you is that if you have a system that has this kind of noise signal dependent noise arises in biological systems and what that means is that if you look at force time and you ask someone to produce some amount of force like this you get a little bit of a cold. As the force increases you get more of a cold then you get more of a cold. So you notice that the variance of the signal is changing from a critical idea to a biological system because it's very important to know what the noise structure is because as you remember when we were doing the Coleman filter we assumed the noise structure was Gaussian a mean that was independent of its variance. Here we're dealing with systems that have a mean but a variance that depends on it that's the way most realistic systems work. So what's the consequence of this? Well, the consequence of this is that when we see that in our cost function when we find the expected value of the cost we are naturally squaring the U as a component of our cost meaning that just by trying to be at the target produce the force that you want me to do represent the expected value of that force I am penalizing the squared force why I'm doing that is because when I want to find the expected value of something that says be accurate if I have commands that increase the variance of my force the larger these commands are the less accurate I'm going to be so effectively I'm going to have an equation that's going to have be associated with my more so let me show you so as a preliminary what I need to do is to show you how to find expected value of squared random variables because our J is going to have squared random variables so if we have X as our random variable and we want to find the expected value of X squared how do we do that? well we have variance of X which is equal to the the value of X squared minus the expected value of X squared so therefore the expected value of X squared is going to be the variance of X the mean of X squared so what if we have the expected value of the scalar quantity X transpose X so this is the scalar quantity and the reason why it's important for us to consider is because this quantity here J is also a scalar quantity and it's going to have random variables to U that are going to be multiplied by themselves so this is going to be a useful thing for us to consider so alright how do we approach this the variance of X is the expected value of X X transpose minus the expected value of X X transpose so this quantity here I'll take one more step here so this quantity expected value of X transpose X X transpose X is a scalar quantity which is also the same as the trace of X X transpose I think the trace of X X transpose and finding the expected value is the same as finding the expected value of X transpose because the trace of this is just the diagonal terms of X X transpose so the trace of the expected value of X X transpose this is just the expected value of X transpose of the expected value so this is our solution of x transpose x, our scale of quantity is plus the trace of the variance, scale of quantity. Okay, so if we have a term of expected value of x transpose A x here, some matrix in between, then that's going to be equal to the trace of A of the variance plus the expected value of x transpose A. So these are the very important equations for us, because we're going to be using this a lot, because what we're going to be doing is find the expected value of that j, and to do so, we're going to have to find the derivatives with respect to u, which means we're going to have to be able to expand it to our variance root. All right, so let's go back to our basic problem and minimize our equation. So if I have the following cost that I want to minimize, that says minimize force difference. We've got a goal and the individual forces that are produced by the various constants. Suppose we have a very simple cost like this, and individual forces at pi are equal to ui, pi. So what we're going to do is find ui, and so we have a goal, we want to find, I'm sorry, pi, well ui is our activation in each muscle, so let me minimize this cost. So to do that, we want to find the expected value of j, so the expected value of j is kind of a scalar quantity. It's going to be the expected value times plus the variance, plus the trace of the variance, the expected value. This term here, the trace of the variance, the variance of fg minus sum of f i is going to be the variance of f i, the sum of f i. The trace of the sum, the variance of the equation is ui squared, ai squared, pi, which this trace here is just going to be the sum of ui squared, ai squared, pi times what we've got. So the objective here is for me to show you that I started with this cost function, that all it says is that find the commands that sum together to produce the goal, and the closer you are to the goal the better, but if the forces that I generate are single dependent, then the commands ui are going to be such that they minimize the cost in which I generate forces that I get as close as possible to the goal and I am minimizing another cost that's associated with the square. How do we, inside the trace, go from the variance of fg minus sum of f i to this, that sets up? Trace the sum, yeah. Well, the only random variable is f i, the sum of f i. And they're independent. Yeah. Okay. Because the noises here are fi's, but they're independent. So, if you have a single dependent noise in your system by trying to measure, by trying to reduce the cost associated with bringing your, you know, the sum of action to the goal, you are implicitly minimizing the cost that says, activate your things as small as possible. It's a square cost that you have to do. You see, the expected value of j includes accuracy plus, that, does that make sense? So, one of my former students, Jordan Dietrichson, after he got his faculty position in the University of College London, did a neat experiment to test whether it wanted to come up with this j in the way individuals control the force of the instrument. He said, take two fingers, left hand and right hand, and push on a force transducer. So, you're going to have a force transducer on the left, a force transducer on the right. You're going to generate some force on the left. You're going to generate some force on the right. And what matters is the displacement of a cursor, x, that depends on u l plus u r. So, your goal is to displace this cursor, and you can do it by pushing all the way with your left hand, pushing all the way with your right hand over the sum of the two, however you want. And you just solve this problem. How much force do you use with your left and the right hand? And what's nice is that, you know, some finger, like our index finger, is going to have probably low noise, whereas you know, this little finger may have a lot of experience, but we're not so good at controlling it. And so, k is going to differ for different fingers. That signal-dependent noise term on the f i there. And you wanted to know, and maybe also with your right hand, that maybe the noise would be different. So, you wanted to know, when you are splitting this force between the right and the left hand, what's the policy that you're using? How are you solving this problem? So, let me show you the approach. So, what we have is x i, the cursor displacement that's produced by input u left or right, is going to be a normal with mean u i. So, let me put this on the right. x l with the variance that depends on k l squared u l squared. So, displacement, cursor, u l is the force, and this is the variance of that force. It looks like this. The variance of the force depends on input u squared and the noise associated with that, which is k l. Okay. So, just to give you a sense of what this means, what this means is that if you were to look at the standard deviation of force as a function of force, you would have a slope vk. So, the standard deviation of the noise grows as a function of the beam of vector value of the force. And the slope is k. So, the variance is quadratic, the standard deviation is linear. So, the optimal control problem is as follows. What we want to do is minimize the difference between the displacement and some gold. So, the expected value of this is what we want to minimize. Why did u that minimizes this u star is what we want, and we want to minimize this expected value? So, just before I do it, it's going to be very simple, and I just want to step back a little bit so that we get a sense of what's going on now. So, these control problems that we're considering have very simple relationships between input u and some force. So, there is no dynamics in the sense of what we're going to be going to want. So, right now, we just have this very simple relationship. Action, consequence. In the real system, action has dynamics. When you generate u, there's a dynamical system that has a hidden state x that gets translated in time. So, we're not considering that. We're just considering u generates a force, and that force is going to have some variance. So, it's an open loop, one single time step scenario. So, you want to minimize this cost. What is this cost? The cost and the difference between what our cursor does and where the goal is x left plus x right. So, the displacement cost on the cursor by the left and right forces that we're producing should be such that we get as close to the target as possible. So, do that well. You want to find the expected value of that random variable. So, the expected value of j is going to be equal to the expected value of xl plus xr minus g squared plus the trace of the variance xl plus xr. What is xl and xr? These are our random variables. So, the expected value of this is going to be ul plus ur minus squared. So, this term here is going to trace variance of this. Variance of xl is going to be gl squared, ul squared. Variance of xr is going to be r squared. Trace here is meaningless because it's a scalar variable. So, I'm just going to get rid of these. So, now I find the optimum command. So, let's find the derivative of this with respect to ul. So, we get 2sg plus 2kl squared. I said that equal to zero. Then similarly, we can find the ur. The command for the left hand, command for the right hand is going to be g minus ul. What we just did is that we said you have a goal forced to produce. You should produce it in such a way that it depends on the noise. And now what I can do is I can take ur. Here's the equation for ur. This is the optimum view of the star on it. And I can take this and put it in here and solve for ul. And what I get is the following. Ul star divided by ul star plus ur star kl. So, this is mean. The larger the noise associated with your right finger, the larger should be the force produced by the left hand. So, you know, if you have two fingers, one is really noisy, you should give more command to the other one. That's what the other equation says. So, if we have such a cost function where all that matters is to generate a force so we get as close as possible to the target and we want to minimize the expected value, then the activation, the amount of force we produce with each finger should be related to the noise in each finger. That was the theoretical prediction. So, you're measured the noise in the right finger. Each asset of the subject to produce force one, force two, force three, force four at different levels. So, you measure pain with different amounts of force. You ask them to produce different amounts of force. You measure the standard deviation of force. You fit it as k and you found the noise with the left finger for the right finger. Does that make sense? Then what he did is then I said, okay, so now you've given a chance to activate each muscle, however you want, each finger however you want. And do you produce force on your left hand and right hand in such a way that minimizes the cost that depends on these noises? Do you allocate each finger that's out? And you found the answer was no. It wasn't as simple as that. So, then he said, all right, what else is wrong? It's not just these noises that matter. It's not just that cost. So, he said, all right, what if I add to this another kind of cost? So, this is the cost that he started with. Let's say a cost that has a weight of lambda one. Then he said, let's add another one, lambda two, in which we explicitly are going to penalize effort. So, what we're going to have is that we have these U's in there. So, we're going to have U L squared plus U R squared. So, this penalizes effort in a particular way. See that? It's sorry. It penalizes effort by the amount of noise in each one. Right? So, what if we just penalize effort independent of noise? That's what the second term does. And then he added the third term that said, well, really, these U's, what they really are, matter isn't just by themselves. What they matter is with respect to the maximum voluntary U, the maximum voluntary U R. So, force by itself, maybe it doesn't matter so much. Maybe what matters is how much force that finger can produce maximally. So, not an absolute value of force, but we'd rather think of its maximum voluntary force. So, he fitted a function that looked like this. And then he was able to explain the results very well. So, he seemed like really what mattered was not just noise, but also the effort that is being spent, not in terms of absolute value of the force, but in terms of how did that force compare to the maximum force. And then the things were worked out. I don't feel like the noise doesn't matter at all. Yeah. In fact, in this case, we look like the noise didn't matter much. What really mattered was the effort, the U, the U term, normalized to maximum voluntary force. Is the maximum voluntary force proportional to the noise? Like, I think about my pinky, which is noise is also weakest. Right. No. I don't know if there was any relationship between K and MPF. Usually, noise has not to do with the size of the muscle. It has to do with the neural control of that muscle, meaning how much brain space do you allocate into that. So, you may have a very delicate muscle that you can really control well, or you can have an opposite. It just depends on what you use it for. So, I think our thumb, for example, may produce a huge amount of force, but I don't know if we could just associate it with that. Yeah. That's because our vocal cords are very precise. The U is very small. Very small. Yeah. So that has to do with use, right, and how well we can practice it. So, he measured these, and he came up with some estimates of these lambdas, and with a scenario where force was normalized and maximum volume through force, he got a good fit. Any questions? So, just to summarize the basic ideas, what we're moving toward is thinking about control of action in terms of having a cost function. And this cost function, so far we've been describing it in terms of having a effort cost and something that says the state that you want to arrive at has some distance away from where you are. So, it's good to be at a particular state. And you have to find commands that provide you with an invention that will add a cost to time to instead of being able to get there as quickly as possible while minimizing the December. And to do so, first we're going to describe this spelling equation and then we'll be realizing when it's like, okay, see you Wednesday.