 Good to see you guys. Gentlemen, it's all gentlemen today. So, what did you guys think about the Bellman equation on Monday? Change of life? Why not? It's an amazing thing. Let me ask if you have any questions about it. So, I have a question. What's this value function thing that we've been talking about? What does it actually mean? Can I give you an intuition about what it is? And I think that I can. I'll do that today. The idea behind the value function is that it represents the costs under the optimal setting in which you're going to find yourself at some state. What is the best way that you can perform something given that you find yourself at this particular state? The best way would mean the value of that state. So, the value of a position on a chess board. You can think about it that way. So, you have the state on the chess board. How do you define a value associated with it? Well, on our game that we did on Monday, the value associated with any of those points was the minimum cost that you could incur from that point if you were to perform the best actions possible. So, the value function associated with the state represents the accumulation of the costs that you will incur under some optimum setting. It's as good as you could do if you find yourself at that state. That's what the value function is. And what we're going to be using that value function for in the context of control today is using the Bellman equation. So, the Bellman equation allows us to solve all kinds of problems in which there is something called a cost per step. It's called an alpha. And for today, we're going to imagine this cost per step to be an algebraic function, a quadratic function in nature. So, suppose that we have it like this. Now, I did not include cost of time in that step but you could just as well put a cost of time and the reason why I didn't include it in there is often that cost doesn't depend on you. Neither does it depend on X. So, because we're going to be finding derivatives of these things with respect to you and X, it doesn't really matter what the cost of time is. What it matters is comparing policies after we come up with some optimum policies. So, when we have this policy, we want to know, all right, this is the best policy for this particular duration P. The cost of time plays a role when we compare different durations because it asks us, okay, what's the best movement that I could do across all possible durations. But for now, let's put that aside. Let's put aside the cost of time and let's just concentrate on the basic notions of cost per step, which would have a cost of state and a cost of action. Now, what happens is that we imagine that we are at the final time point P and we say, okay, what's the optimum policy at X of P and that's the policy that minimizes this cost, alpha of P at X of P. I guess I should. Yeah, close my parenthesis here. Okay, so we evaluate our cost at the last time point for some X of P and we say, okay, what's the minimum that we could do and as we can see in this scenario, this is equal to zero. The argument U that minimizes this, you just said U is equal to zero, that gives you the lowest cost per step. So, what's the value associated with that policy? That's just equal to alpha of P at X of P and the optimum policy executed at that location. So, tell me what my cost is going to be from X of P, assuming that I did the optimum policy. That's going to be my value, right? Yeah, yep, X and U. U is unknown. Right? Does that make sense? I'm sorry, the nomenclature is probably not the best. I hope you get the intuition of what I'm speaking of, although mathematically I may not be writing it as elegantly as I could. So, what's the policy that I need to do at X of P minus one? That's the one that minimizes the cost per step at time point P minus one plus the value associated with the optimum policy, which means that if I were to execute my optimum policy, I would be generating from X of P minus one some command U, let's call it pi of X of P minus one. That would be the motor command that I generate. That's going to take me to some location X of P. And, of course, this is this, right? That's the value of where I end up. And so, what we do is that we start at the end. We assign a cost per step. We find the optimum policy. We then get a value. That value is critical for the next step because it allows us to ask, okay, what's the best thing that I could do so that I minimize the cost per step plus the cost associated where this step is going to take me to? So, what is this value function? This value function ends up accumulating all of the costs that you're going to incur from a given state, assuming that you perform the best actions possible. It's the value of that state under the best scenario. That's where the value function ends up being. Another way to think about it is the minimum cost to go. If you're at some state, the costs that are going to accumulate in the future depend on where you're starting at and the actions that you're going to perform. The minimum of that, the minimum cost to go is the value function. That's the value of that state. All right. So, let's do some examples. Any questions before I start? Yeah. So, you keep doing this until you get to x of 1. Mm-hmm. That's something you can write, or is the pi star of x1 something that you can write? Algebraically. Algebraically. We'll write it. You'll see. We'll get a recipe. We'll get a recipe. Now, we'll be able to get a recipe in scenarios where we can write these as functions. So, as you will see, in the case where our cost per step is quadratic, we will find that this value function is also going to be quadratic. And in that case, we can find derivatives and we can write the relationship between pi, which is the action that you want to perform, and the state. So, we're going to end up with scenarios where the optimum policy is going to be something that looks like this. Some matrix, which I'm going to show you what it is, that's going to multiply your state and that's going to give you the best action you can. Yeah. So, is this kind of like a discretized or the Lagrange equation, kind of? Like, because it looks like what you're doing is you have like a functional over you, like as a function of time, and you're sort of finding like that particular functional that traverses the space in a minimal way, like you have like an action functional that takes you to a scalar cost over the entire path. The value function is a scalar function that's associated with the state. So, in that sense, it's a functional. It's a special function, though, because it is associated with... with not... it's not a function that says how good this state is. It's a function of saying how good this state is given that you can perform the absolute optimum things from here on. Not just at this state, at every state that it's going to take you to. Right? And so, I hope you begin to see that this is how machines learn to produce value associated with states so they can play games, just like our game on Monday. So, that was what we were doing was called dynamic programming. That's what we just did. We began at the end and we assigned the value to each state and we found the best action for that state. So, what we're going to do in optimal control is a special form of this where the costs are quadratic. The value functions, as I will show you, will end up being quadratic and then the policy will become a linear function of the state. It becomes a feedback controller. U is going to depend on some matrix G that depends on the state. So, what today we're going to do is we're going to find out how to represent the value function as a closed form equation that is a quadratic function of X. How to find the derivative of this sum with respect to U, therefore, get what's called G. So, the two things we're going to find is what's the value at any given time. What's the quadratic function of X? What's the gain associated with my feedback controller which is going to tell me how to produce actions given a particular state? How is this related to all the stuff that we learned before? Well, X here is state, right? So, you have to tell me where you are in order for me to produce the best actions. This is going to come from the Coleman filter. Your prior belief about how your previous actions have changed your state. So, this is going to be replaced by an X hat that tells you where you think you are. This is going to tell you what's the best action to produce at that state. In order to minimize some overall cost that's going to accumulate from beginning to the end. And the way we're going to work it is that we're going to begin at the end. We're going to find the best action. We're going to find the value function associated with that best action and that state. Then we're going to do it again and do it again. And what we'll see is that this is going to form a recipe and that recipe is going to be a bunch of matrix equations associated with finding V and G. So, we're going back in time to the early 60s now to thinking about how this is done. And the course that we're on is going to go from having no noise in our states to having Gaussian non-signal dependent noise in our states. We'll do that today. Then on Monday I'll show you the advances that were made in the last decade where signal dependent noise were introduced in the state and we'll see how that changes the policies and the feedback gains. But today we're going to do first no noise in the state update equation. So we're going to have this. This kind of functions. So, our simple state equation without noise. So let's do the Bellman equation for this problem, assuming that we have this cost per step. So, let's begin at time point P. So alpha P suppose is XP transpose TP XP plus UP transpose L UP and we find the optimum action here, Pi star at X of P is it going to be equal to zero? Not going to do anything. That's the minimum of this function. The U that minimizes this function. And so the value associated with this at X of P is going to be equal to X of P transpose TP X of P. So now let's move to step P minus one. We incurred two kinds of costs. We're going to incur a cost associated with our step. And we're going to incur a cost associated with the value function at that step. So, I need to find I need to find the minimum of the sum of these two with respect to U. And this U at time point P minus one. Right, because I'm at time point P minus one now. I want to find the derivative with respect to U P minus one. So, my cost is going to look like this. It's going to have alpha at time point P minus one plus value of the optimum policy at time point P given that I was at P minus one and I produced the motor command U P minus one. Well, let me just write it out. So this term is going to be this function is going to be A what's X of P? X of P is A times X of P minus one plus B U of P minus one transpose right, this is X of P and that's the value function associated with it. So, this guy here alpha P minus one is going to be X U of P minus one transpose L U of P minus one. Alright, I'm going to so this is alpha here at P minus one. This is the value function associated with the optimum policy at X X of P. Right, is that clear? I want to find the derivative with that function with respect to U of P minus one set of equal to zero, that would be my optimum policy. Let's do it. So, I will get that's the quadratic term here plus okay I'm going to introduce a matrix here that's going to be useful for us to continue. I'm going to represent the value function at time point P as not this T of P but I'm going to represent it as X of P transpose W of P X of P and obviously in this case W P is equal to T of P. The reason why I'm going to do that is because I'm going to show you that when I minimize this function with respect to U I get my optimum action U star and then when I compute the value of this state at this time point I'm going to get a quadratic function in terms of X. So I'm going to get another W but now that W is going to be at time point P minus one. That's where we're going. Alright, so let me so this is equal to this so far. Alright, so I'm going to find let me find the derivative of this function with respect to so I'm going to minimize this function with respect to U of P minus one. I get the following these are all scalar quantities so I get two times this transpose B plus this is going to be two times B transpose T of P U of P minus one and this is going to be two times L U of P minus one. That's the derivative of this function with respect to U of P minus one. Thank you. So set that equal to zero and I get L plus B transpose T of P times B P minus one is equal to minus this is wrong. Why is it wrong? Because it has to be a vector. It has to be a column vector. So that was the that was the derivative of this that I was trying to find. So this is going to be two D transpose T of P A X of P minus one. That's the derivative of this with respect to U. It has to be a column. Because it's a scalar quantity. When I find this derivative with respect to U when I find the derivative of a number with respect to U what I'm doing is find the derivative of that with respect to U one, U two, U three, all those things. So it's going to be just a column. For that to be a column this has to look like this. X has to be on the right. So this is equal to minus D transpose T of P A X of P minus one. So what that means is that U star at P minus one is equal to minus L plus B transpose T of P B times B transpose T of P A X of P minus one. So I have the U star as a function of X. This is a matrix that we're going to call G at time point P minus one. Minus G P minus one X times P minus one. So that's the best action that I can produce. Go ahead. Is G of P zero? Yes, good. G of P is zero. W of P is T. G of P minus one is this matrix. What's W at P minus one? So what I need to now do is find the value function associated with pi star at X of P minus one. What is that? This was the policy that I just found, right? Of pi star at X of P minus one. What's going to be that? It's alpha at P minus one evaluated at X of P minus one and the optimum policy at X of P minus one plus the value under the optimum policy of X of P given that you went to that you started at P minus one and you gave the optimum policy at that state. We can write this as an expected value or just a in this case, since these are deterministic, we can just write it like that. Alright, so this is going to be this function that I just wrote at U replaced with this G of P minus one times X of P minus one. This is going to be equal to the following. X of P minus one transpose A transpose T of P A X of P minus one plus two times G of P minus R U transpose. That's going to be X of P minus one transpose G of P minus one transpose. That's U of P minus one B transpose T of P A X of P minus one plus, we have a quadratic equation now so we're going to have a plus here this is going to be X of P minus one transpose G of P minus one that's U of P minus one transpose B transpose T of P B G of P minus one X of P minus one plus X of P minus one transpose T P minus one X of P minus one I can write, I guess I'm just writing at the end here plus X of P minus one transpose G of P minus one transpose L G of P minus one X of P minus one if you look at this equation now you see that this is equal to X of P minus one transpose times a new matrix W of P minus one times X of P minus one it's quadratic in X yeah? okay let's simplify it a little bit so we get a recipe for what WP minus one is what does it represent in terms of W of P well this is W of P is T of P's so I'm going to write W here okay so let's see what we can simplify the things that become simplified have to do with the fact that there is an L here and there is a there's a B transpose WP transpose that times B here so this plus this you notice is this right? so that's in G that has this component into it so I can probably simplify things and you can see that there is a G here as well things are going to get a little bit easier if we put these together so let's bring together this term and this term let's see what we get so I have X of P minus one transpose G of P minus one transpose times L plus I have this term here B transpose WP times B times G of P minus one X of P minus one this term here has this as an inverse in it so that's going to cancel so this is going to be equal to X of P minus one T G of P minus one P times B transpose T of P A times X of P minus one let's see if I got that right yeah good let's call this W okay so we can get rid of this term and this term alright let me show you that we can also get rid of this term and let's see let me write it out so I have this value function this term is a scalar so I can write it as follows this I can write it as A transpose WP transpose B G minus one this term there and when I do that I have ATW times did I miss the transpose here there must be WPA one of them must be a transpose I can write this as A transpose W of P A minus two times transpose times B times G of P minus one plus X of P minus one transpose G of P minus one transpose B transpose WPA X of P minus one I'm sorry this isn't here yeah go ahead yes now I think there's one more simplification but I but I don't know where it is anyway it doesn't really matter because you see that I've written this in terms of a new WP so here's so here's a new W and this is now WP minus one and it's written in terms of WP and everything else that I know so I think there's one more simplification that we can do but I don't know where my I probably have a typo here someplace that I don't see it so the point being that I have taken my cost function at the last time point what I did is that I wrote my optimum policy and I have G of zero which is G of P which is zero and a value function described for that policy which is in terms of X and WP then what I did was I found the optimum policy for time point P minus one which also has a G in it and this G is now written in terms of the value function in time in time P this is W that's W of P so I can write the optimum policy in terms of the W that's associated with my value function in time point P and write my value of the policy in terms of a new WP minus one that's in terms of everything that I know which is W of P and G so it just depends on G of P minus one so basically what I've done is that I have a recipe now I can just for every time point I can say for whatever time point K I want I need to know what the W is for time point K plus one and I can tell you what the G is going to be for that time point and if I know the G for that time point I can compute the value function for that time point for that state and that time point which is going to be just W of P minus one yeah what does it mean right let's see if I can give you an intuition so let's begin with L which is the cost of generating a command so you can see that as that cost goes up you're going to generate smaller commands so it's inversely proportional to L more than that I'm not sure I can tell you because what this is doing is of course generating commands in such a way that minimizes the sum of the cost of the command plus the value of the state that is going to take you to right so that's what U star is going to be and so it's balancing how good is I'm going to be able to take you to which is going to be X of P plus one this is going to take me to a good state that the best state is the one that has my gold sitting at the bottom of it of that state right so I want to get there but to get there I might have to generate a lot of commands I have to balance the desire to getting there versus how big of a command I need and so I don't I don't know if that makes sense oh because it just happens to be that there's a whole lot more positives here than there are negatives so might as well define G to be positive it's just it's arbitrary right so I could have had G be the large thing in which case then this would be a positive and everything else would be and everything everywhere else you see G well I mean no it would change the signs I think all it would change would be the sign here everything else would remain the same because these are all quadratic in G so it doesn't matter if G is positive or negative in feedback control in feedback control what's done this is a form of negative feedback I mean it's just it says you move you produce actions in this case it's arbitrary G is just I just have a negative and this is the matrix I want to call it G that's all I mean so you see where that came from right it came from here alright so we end up with a recipe is the point that I wanted to make here and the recipe the idea here that you begin with cost at the end you find the policy that minimizes that cost you assign a value function to that policy next you move to the step behind and you say alright what's the cost per step and where is it going to take me to the place that it takes me to has a value what I want to minimize is the sum of the cost per the current step and that it gives me the motor command that's best to produce at the current time point alright so the rest of it is manipulating that equation there and adding noise to it so let's do it for the simplest case when we have the following X of where this has variance Q and this has variance Y and I'm going to do it for you today suppose that our cost per step alpha K is defined in terms of X if that's my cost per step then how this is going to affect things is I'm going to show you the final thing I'm going to show you is that when my cost per step has a form that looks like this if it's in terms of Y so I'm going to compare these two conditions and what's going to happen is that the noises are going to influence this time point this cost per step in this condition because Y is going to be a random variable whereas up there that cost per step is going to be affecting the value function associated with it because the value function now is going to be probabilistic because the state that you're going to go to given that you're at this state and generate this motor command U if just one state is going to be a state with some uncertainty associated with it so then we have to talk about not just the value of it but the expected value of the value function so that's the change that's going to produce this value function here this function here this value function value under the optimum policy if I generate the optimum motor command I'm going to go to potentially many places not just one because that's defined in the state update equation it has variance cubed so then that's going to make it so that value function of the optimum policy is no longer going to be deterministic it's going to be a stochastic thing so what I need to do is minimize the expected value of this sum of this sum and so the expected value it's not going to be a problem for us that's the only difference that's going to happen so let's do it so again at time point p I minimize the cost per step and that gives me the policy of zero that's the thing that minimizes alpha of k and the next and the value so this is going to be g of p times x of p where g of p is zero okay so the value associated with that step value of is going to be x of p is going to be just x of p transpose t of p x of p the value of the policy at x of p given that I'm x of p minus one and have done some action u of p minus one why is it that I need to have this well because this is what I need there right via pi star I need to know what this is what's this value function this value function is a x of p minus one plus b u of p minus one plus epsilon x that's the state that I'm going to go to times t of p so what I want to minimize when I want to find arg min of u of alpha of p minus one plus v of pi star of x of p given that I'm at x of p minus one and I did the optimum thing pi star of x of p minus one then this is going to be the expected value of this function I'm going to minimize the expected value of this so what we have to do is multiply this out and so we're going to get everything that we had before except two new terms so v of of x of p given that I'm x of p minus one and did u of p minus one is is going to be equal to all those things that are on the left side plus I'm going to have two new terms here associated with epsilon or sorry three new terms there's going to be epsilon x transpose t of p times a x of p minus one plus x transpose p times b u of p minus one plus epsilon x transpose t of p epsilon x so the expected value of this is going to be zero zero but this is not going to have a zero expected value right so the expected value of this function is going to be everything that I had before on that side plus what's the expected value of this what is it? okay so let's suppose I have this what's the expected value of x transpose a x where x is a random variable that's equal to the expected value of x transpose a times the expected value of x plus the trace of a times the variance okay so if that's the case then this is going to be equal to the expected value of epsilon x is zero so that first term is going to be zero the second term is not going to be zero it's going to be the trace of the variance of this tp times q okay so the value function it's going to be a stochastic random variable it's expected value in the case that we're talking about is going to be different from before because it has the trace of the variance of the random variable in it alright so what this means is that when I find the derivative of the sum right alpha plus the value function then that derivative is going to have no different than before right because this has no derivative with respect to u you see that the trace of the t times q has no derivative with respect to u the noise is independent of u what we care about is the derivative of the sum of the time the cost per step plus the value function so the expected value of the value function has no added elements that have u in them it has added something to it it has this trace of q but it doesn't have u in that so when you've added noise to the state equation and the observation equation you have not changed because the derivative of that cost function with respect to u hasn't changed the expected value has other terms in it but those terms don't depend on u so the kinds of functions that we've been solving are called linear quadratic regulator LQR linear quadratic regulator LQR problems have the characteristic where noise that is the classic kind of noise where the variance is independent of the mean the kinds of noise that you typically see in these linear systems it doesn't change the feedback controller and the value function the scalar at the end of it but who cares that scalar doesn't change the commands that we're going to have to produce because this expected value doesn't have an additional term that depends on u let me do this term let me do this thing for you suppose the cost per step depends on y not x so what is y y is equal to t of k plus epsilon y transpose t of k which is equal to that's by cost per step it's a random variable right because it has epsilon in it so the expected value of this is going to be equal to plus this expected value is zero this one on the other hand is going to be the trace of tk times variance of epsilon which is r plus u of k transpose l u of k okay so this function is a little bit different than this right just x transpose here do you have x transpose c transpose right but that's fine it's not a big deal just have one more matrix there let's do the why don't we do the why don't we do this problem so time point p my optimal policy at state x of p is going to be equal to zero because u star at time point p is equal to g of p minus times x of p where g of p is equal to zero the value associated with this at x of p is going to be equal to um alpha k alpha at time point p which is this function it's going to be a random variable the expected value of v at x of p is going to be equal to um x of k transpose c transpose t of p c x of k which is equal to x of k transpose w at time point p times x of k where w at time point p is going to be equal to c transpose t of p c okay so now at time point p minus 1 you want to minimize the expected value of the cost at our current time step which is this so the expected value of alpha at time point p minus 1 plus the expected value of the value function under the optimum policy it's going to take me to some state x of p given that I'm state x p minus 1 and I generate some command p minus 1 so this is going to be equal to this function here is going to be x of k transpose c transpose well let me just I'll write it like this x of p it's going to be x of p minus 1 transpose c transpose t p of minus 1 c x of p minus 1 plus u of p minus 1 transpose l u of p minus 1 plus this this function which is this it's going to be A x of p minus 1 plus B u of p minus 1 time plus epsilon x transpose w of p times A p minus 1 plus B u of p minus 1 plus epsilon x that's the value function associated with the state that I'm going right this is the value of time point it's it's x transpose w times xk and here's x written in terms of x of p minus 1 and x again so what I need to do is find the expected value of this and then sum this up and find the derivative with respect to u and so you can see that this expected value is going to have nonzero terms in there associated with epsilon x transpose w p epsilon x everything else is going to be zero and that derivative is not going to have any influence on our g that we computed here it's going to be the same same g as before g of p minus 1 is going to end up looking like this exactly like that well not exactly like that because I have it I have now this c here as well let's why don't we do it so we can see what it looks like so what I need to do is the derivative of this function with respect to u of p minus 1 and the only thing that I have that I didn't have before is epsilon x transpose that's the only random variable and that function its expected value is not going to have any influence on my derivative with respect to u so I can just ignore that so I get derivative of this u of p minus 1 that's what I want to minimize that's going to be equal to 2 times l u of p minus 1 that's this term plus I'm going to have the derivative of which is going to be that derivative I'm going to have so I'm going to get 2 of these and then I'm going to have the derivative of this b u of p minus 1 2l u of p minus 1 plus this so I have okay so we find the linear function of x again I think that's identical to what we had before isn't it it's just t there is w u that's all so the c makes no difference for us just we absorb it into w and we get a function of in terms of x and w the optimum policy for that time point p minus 1 okay so now you put that in to the value function and that gives you the value associated with that state under the optimal policy okay so let me just summarize the way it works our objective was to find the best action that we can do so that at the end of all of our steps we minimize the sum of cost per step that cost per step is defined up there and the way we do it is that we begin at the end we find the best action that's our policy that action is defined for every state then if we do that action, that best action we assign a value to that state that value tells us how good that state is assuming that we can do the best thing possible at that state then we go step back and say what's the best policy for any state and the way we define it is by saying well we're going to incur a cost for that state that you're at to perform, plus the value of the state that that action takes you to and the best action is the one that minimizes the cost per step plus the value of the state that you go to and that value function what it means is that the value of a state is basically the cost that you will incur if you had the best policy possible for that state that's the Melbourne equation first finding the best path for a given number of time steps and then finding the best number of time steps is that usually an independent or two separate optimizations? yes, yes because the time component is something that typically doesn't depend on the command, the cost of time but if it did we would have to put it in here so our problem is we don't know the length of time that would be optimum in producing our cost so because to compute the cost we have to know the actions so we begin with the notion of let's assume that we have a certain amount of time for that amount of time find the best policy and if I do that then I end up with over the entire time is going to be some number that's going to be the optimum cost that I have it's going to have some minimum then what I can do is if I now change my time to be a little bit longer do I get better in terms of my total accumulated cost or do I get worse? so there's no there's no relationship between yeah, no I mean there's no depends on how time affects this accumulation of the cost and typically there's no I mean I don't know how TK changes yeah depends on how TK changes but more importantly it depends on so what is the cost of time here so if the cost of time and it's actually something that it's it's it's not really well studied to be honest with you so if there's a cost of time it's going to depend on this K so I'm going to say J of K right so how if this cost is going to be a constant so if J of K is a constant B where I'm going to add maybe you are at the beginning or at the end this cost of time is increasing linearly is accumulating as a function of time but cost of time is linear now so that's one kind of cost and we've been talking about the kinds of costs that look like this these things are hyperbolic and so to give you a sense of why cost of time should be that way not this way, quadratic you might find in control theory books so if you think about biological systems when I ask you to so I want you to think about I want you to think about cost of time so what I'm going to ask you to imagine is a scenario where you're going to compare waiting suppose I ask you to imagine waiting 100 days versus 101 days okay suppose you have to wait for something 100 days versus 101 days for most people 100 and 101 days is about the same on the other hand let me ask you to compare waiting one day to two days waiting one day versus two days for most people the difference between one and two is a lot more than the difference between 100 and 101 right, so you know time is not measured linearly in your head the cost of time is not linear and it's not quadratic either if you were a quadratic time measure the difference between 100 and 101 is huge whereas the difference between one and two is tiny so 100 squared minus 101 squared is much bigger than 1 squared minus 2 squared so that's very clear so your brain is something that has cost of time that's increasing much slower than linearly what this function looks like is this function here is something that has something that looks like this so this like adding the cost of time component and it seems like that just like in terms of the brain computations makes like the idea of like movement planning and incorporating that like incredibly difficult because now you actually have to compute trajectory for every single possible time that it can find like an optimum in that whole space exactly exactly but if you think about physics if you ask this question earlier suppose that there is a cost associated with action and so how do we do it so f is equal to ma where does that come from well it's minimizing a cost and it's the integral of what's called the Lagrangian which is difference between kinetic and potential energy so but you know objects in space move do they compute that cost before they move no it's so the reason why that equation is important it doesn't give us any insight into how physical systems do it it's a way to describe at the macro scale the way things move so it's quite far from describing how biology makes you move but it's a way to describe movements of biological systems it's the same way as f is equal to ma is a description of how planets move around you know each other but it's not probably telling us much about how you know at some atomic level gravity is generated that's making this thing they're not computing anything that looks like that function but it's a way to describe the motion that's the way I think about it alright good see you guys Monday