 Okay, so we'll get started. So welcome to the last of this series of talks. So today is a little bit different, as I mentioned. We're going to start with more tablet stuff. And then the second part, I want to turn into kind of a mini seminar on a bunch of noisy measurements of the system. And you want to infer the state of the system at the present time, in order to be able to control it, for example. And we came up with a solution called the common filter for linear dynamics where we made use of this observer structure that we've been talking about where the to define the parallel of the dynamical systems, the upper physical system. We assume that we know it and that's important. And then we're coupling the two models. So the frequency, the actual observations and observations, something we're calling the innovations. And based on those innovations, we can construct feedback that would synchronize it, and then we can take the state from the synchronized system. And this is something that works quite well. What I want to do for the first part is to kind of go back and more or less re-derive the common filter and things like that, but from a more general point of view that decides Bayesian analysis and points the way to generalizing this, first of all, giving a deeper understanding of where this common filter is coming, why doesn't it have the structure it does, and also pointing the way to how to generalize it for more complicated situations. Okay, so let's get started on it. So the topic is the Bayesian version or formulation of state estimation. So state estimation is the general problem of estimating a hidden sometimes called latent state from some kinds of observations. And I should say this, this comes up in many, many contexts, having to do with physics or dynamical systems. Some of the techniques that I'll talk about were developed actually for speech recognition by computers, so you have the audio signal and the hidden states of the words that are being spoken. And it's used in all of our machine learning. Okay. So, one, one change in perspective is that the common filter, you remember what we got to last time was a set of equation kind of complicated to think set of equations that were describing the mean and covariances of the states that we were trying to estimate. And I think I didn't emphasize the covariance part enough, because one of the things that's nice about the common filter is that not only is it estimating the state. It's estimating the confidence that one has in that state by the other covariance matrix. So you're getting not only to these are the P matrices that I kept talking about. So those tell you how certain you are about the estimates. That kind of check on on sort of what your estimates are like is always a nice thing to have for an estimate to have a number but to have some uncertainty. And our point of view. And that's from that kind of language to the language of probability density functions. And so, you know, if you have a Gaussian distribution, then it's characterized by a mean covariance, but more generally you could have just some punky distribution that had some different shape. And that's, that's kind of where we'll be going today. Okay, so let's start with a toy model of just one noisy measurement. So we have, this is, this is the kind of measurement relationship that we've been talking about in in time but let's just imagine one measurement and that's all we have. And so we observe why and we want to infer X. In one of the things that the Bayesian formulation and I'm sort of assuming that people and you've talked about that I think I think other people talked about this in the course and that people have seen some of the Bayesian formulas is that right. Okay, there's the whole story about different interpretations of probability of frequencies versus degree. Okay, so I think we'll start with a certain. And one of the important parts of the Bayesian story is how to incorporate prior knowledge about the variable you're trying to make estimates on. And in this case, for example, we can say that well maybe our prior knowledge just for this toy example is that we're trying to measure X, which is something that on average is zero and has an uncertainty of Sigma X and described by a Gaussian distribution. And the other measurement noise is also and base theorem will tell us kind of how to how to deal with this situation. So, remember that the joint probability of X and Y can be back. Either is conditional probability of X given Y times P of Y or P of Y given X P of X and rearranging this gives us base theorem, which in this case here says that the X after or when we incorporate the Y and Y given the likelihood times P of X, which is the prior and then there's a normalizing constant P of Y, which is called the evidence but but often is calculated just as something to make sure that this is this is a a bona fide probability density function that e raised to one. Okay. So, neglecting that that normalization for the moment, then our inference which is P of X given Y so so we have the thing that we're interested in X, we make a measurement why. So this allows us to say, you know, what do we know. And it's relating it to things that we can calculate which is P of Y given X this is the likelihood, you know, given that it's actually at some definite position X, which is the likelihood of getting an observation why spread about it. And then there's our prior P of X which is what we know about X without incorporating this measurement why. So in this particular case. We know that that these are both galaxy distributions. The, the first one, the likelihood is our noise, remember variable here. So why minus X is just C. And so this is this is our noise distribution how much is why is distributed by the X and then this is our prior. And so these are just Gaussian distributions, and again neglecting the normalization constants are just the product of to to. That's the sum of the exponent and so you can, you can combine this and this at these two, and rearrange it to complete the factor that comes out. And what you find is another Gaussian distribution. And it has a non zero mean given which is shown in blue and a new variance. Sigma not squared, which is related to the original variances as follows. So the posterior, then what you know about X is is is another Gaussian. The mean is Sigma X squared over Sigma X square plus Sigma Zeta squared. Why. So it's somewhere in between where you would put the part put put put the state X without a measurement which is somewhere around zero and the measurement. And because we have some prior information about what X should be we're not we don't just naively trust the measurement which would be the likelihood, which would just be the likelihood function. We kind of correct it by this prior knowledge that we have about what X should be. Okay, it's, I had to say it's one of these things where the mathematics is very simple but but the ideas behind it take some thinking to. Okay, so, so any questions on. And how would you infer the deviations to your lack. No, no, well okay so so so. Yeah, the likelihood so we're, we're starting sort of assuming that that these are things that we have to turn. So how do you determine the noise well you could just, you know, and in the lab, if you're trying to measure variable quantity try to fix the quantity and do repeated measurements. You know, look at the histogram and see if it's a Gaussian and what's the standard deviation. The prior knowledge on X could come from many things. Yeah, it could come from theory or mix of theory and empirical that that for example if I have a particle trapped in optical tweezers. And I, you know, then, then, then from theory we expect a Gaussian distribution of the center trap. Of course there's, do we know where this end of the trap is that might be a little bit experimental so. Okay, here we're just sort of sweeping those questions under the rug and saying okay terms of these distributions that somehow we've already determined. But the interesting thing is that the new. The equation is actually narrowing width and either of the two previous. You know, it obeys this relationship. Notice that it is huge you just get the other has extreme uncertainty, you're still limited by the other sort of saved by the other one, but in the case where they're, for example, all equal to these, then this would be a factor of two smaller, the same division will go down by a factor of two. But it's always at least a little bit to somewhat smaller. Okay, so so this is this is a very simple test case but it's one to kind of keep in mind this as things get a little more complicated. So the situation that I want to think about is a this is really supposed to be sure this is here. A dynamical system, which in at least in discrete time would be evolving between time step came as one K plus one so this is the the X's that that evolved according to some in terms of dynamics, and each time point K plus one and so forth. There's an observation. Why associated with it so why came in as one, why K, so we can give this kind of a very simple graphical structure here. And so, if the X's are some kind of Markov process and for example if the X's were discreet we would say that we have a hidden Markov model in the sense that the X's are a Markov model. And we're, we have the Y's that we're observing, then we might want to infer the X's given the sequence of Y's. Okay, which and of course the Y's are non Markov. So, but but here is slightly different these X's are dynamic systems, and the X's are evolving, these are discretized dynamics of a continuous dynamical system, but but punctuated with instantaneous observations. This is just one setting of course there are there are cases where the observations are also some kind of continuous process like if you have an analog needle, and the observations could be continuous process and then we'd have to process these events a little more complicated. Kind of describe this the situation. Well if we start with the state X one at the beginning, and it's initially distributed for some probability density function, then the kinds of dynamics that we'll be thinking about our XK plus one will be some probability density function of XK plus one given XK so in this formalism it's a conditional probability, which we'll talk about more, the YK will be given that the system is in some state XK, then there'll be a range. The statement about Markovian dynamics. Okay, so my notation for a sequence here might be different from somebody I kind of like it. So it's, I'll say X upper K is the set of X one X two X three all the way up to XK so it's just a collection of X's up to XK. The YK will be the sections of Y up to YK. And so in general, the dynamics could be a function of a K, you know, the state of X, a time K plus one could depend on all of the XK's before, and perhaps even the YK if there's any feedback. And we'll say that no, in fact, if we give it XK, we know the state at time K, then we'll know at least the range of outcomes with K plus one, and it doesn't care what XK minus one was or any of the previous observations and so forth. Likewise, the observations in principle could depend on prior states and all of the prior observations and so forth. And again, we assume. Yeah, so the first, you know, the first, these two assumptions basically mean that there's no memory and no dynamics to the observation. And again, physically, this is an assumption process. So, in the kinds of experiments I'll be talking about in a moment, the, you know, they have a particle moving around in a fluid, and we make observations by shining light on it, and not exactly taking a picture, but, but recording the position according to some electronic detector in a circuit. And in some sense, we're just saying, well, those dynamics of whatever's happening there are fast enough that it just gives the instantaneous measurement. Sorry, but when you measure, you are not altering the system. And you're assuming that you're not altering the system. Yeah. Yeah. So again, these, you know, of course, if you're doing one mechanical measurement that you could. Sometimes people think that that defines quantum mechanical measurements, but that's not true. So there are plenty of classical measurements that disturb the system as well. For example, I could, you know, you know, yeah, well, that seems unclassable, but I could, I could on a larger scale, you know, sound wave, like, like in geophysics, people explode. Sound waves go down and they reflect off, or you could imagine like a little, a little robot that goes in search of something and when it finds it, it explodes. So it destroys the system, but you know where it it was. So, so there's nothing quantum mechanical. How do you say the only thing one mechanical about the disturbing. Okay. Sounds good. Yeah. Okay, great. So, so, so we take p of xk plus one and xk given the y case, and then we decompose it like this. This should be also a function of the y case, but because it's Markovian, it isn't. And then, and then we're left with the p of xk given the y case. And so now, now we'll make an assumption about the dynamics that are xk plus ones are some nonlinear function of xk UK but also now a noise process. So this is something that will will add to the description. And we can express p of xk given xk plus one given xk in terms of this function as follows, where we we first write, we marginalize so to introduce p of xk and so we integrate over and so this is this is something we're always allowed to do. This is this is one of the standard tricks in this business where you, you introduce a joint probability and by by integrating over it. And then decompose it into conditional properties so we decompose this into p of xk plus one given xk and then and the new case. So we're just putting here time p of new k given xk new case the noise so it's not dependent on the xk so we can go from here to here. This guy here p of xk plus one given the state and the noise is deterministic. Okay, so xk plus one is some sense it's a stochastic dynamics, because it depends on this random variable new k which will be like the thermal noise in our system that we were talking about before. But if I put if somebody tells you what the value of the thermal noise is it's a deterministic function of that noise. Okay, so this means that that in this relation, this is this is actually a deterministic relation which means delta function distribution of xk plus one minus f of xk new k new k. And then this is just p. Okay, and so this is the the xk plus one given xk is just the noise distribution where new k is what solves this relationship. And the probability of measuring xk plus one given xk is the probability of the noise that would have to reduce this value of xk plus one. Yes, yes. Yeah, and in the simple case of additive noise, this will just be the distribution for example, yeah, So this is the first state so this tells us kind of, I mean we can just leave this as is. Okay, but this is just telling us in a case where we know what the dynamics is, how to actually kind of connect this to things that we can do. The second step of all of this story is the update. So, here we use a basis rule. So we want to the prediction and so this is, we turn this around. So we're going to do this just for the quantity. So, okay, notation here's a little compact but this is all the yk is from like a plus one yk yk minus one all the way down to why. But I'm going to be interested in this, the single value y lower k plus one that's the single measurement at time. Why cake upper one is the collection of measurements going from now back into the past. Okay, so I isolate in the likelihood the lower just the single measurement p of yk, which left or why uppercase all the rest of them, and it's given xk plus one. So it's, you know, this is really being written as yk plus one comma yk, I guess. So doing that, and then I've got a normalization and then this is the prior here. So again. So, so we then look at this thing here and it looks like a big mess, except that we remembered that the observations only depend. All of this stuff here. Because you're. So I have a prior estimate of the state given all of the measurements. Okay. And so I'm decomposing it into two steps. So I so I started time cave. And I have, because this is a recursive argument, I have an estimate of exit time cave, given the wise from time cave back into the past. Okay. And so now I'm going to do two things using that estimate and knowing the dynamics. I'm going to say how does that xk evolve to an xk plus one in the absence of any. Okay. And so it's just, you know, it's, it's probably dynamic so you have a state xk, you can evolve it to xk plus one somehow and just talked about how that might look. So that's step one. And then a time k plus one all of a sudden another measurement comes in. And so now all I'm doing, you know, with the fancy notation is updating the probability, you know, probably density for p of xk plus one that's the prediction given all the yks. Now I get a new piece of information and I just want to update that based on. So that's all. So it's just two steps. But you see that once I've done that or once we've done that, then we have an estimate of exit time k plus one, given all the wise time k plus one. And so we've completed the recursion sort of task. And you couldn't like, maybe this is this tonight, but I don't know. Like, you couldn't just take the last measure you did, and evolve the dynamics like to solve the focus like equation you have for the stochastic differential equations. Well, that's kind of what we're doing, but just carefully. Okay. I mean, the wise themselves don't really have a dynamics, right in the sense of, you know, or it has a dynamics, but it's not more povian. So the yk does not tell you how to get to yk plus one, knowing, like the distribution of yk. Right. A bunch of past. My question is like, we're trying to determine the distribution of the states. Yes. It's like, when you make sure you are. Sorry. When you make sure you are saying what is the state. But it's a noisy measurement and it's not all the state. It's only one one noisy number. And you have an indimensional state. So it's not telling you the state. Okay. It's only, as we've seen, you can only learn the state by incorporating past measurements. Okay. So this is, I mean, how to say, we try to approach this in several ways with the observer system and two classical systems, two deterministic systems. And they kind of just evolve and kind of synchronize. And then we went to kind of the, the, the, the Kalman version, which was just an observer in, in discrete or continuous time, but with a rational way to choose the feedback coupling between the two. Now we're trying to kind of give it a day and spend. Thank you. Oh, okay. Which one is It's wonderful. Oh, do I get trends. Yeah, I think so. Yes. Yeah. Yeah, I mean, so, so the new Let's see. Okay, there's nothing. For some reason, there's, there's okay. Yeah, so. Right. So, so, so. So we use Bayes for just the, the passing likelihood here. Using measurements and then, and then the fact that it's independent of the wise. And so this then, and then we add the normalization constant. The normalization constant is just integrating over all the possible linear states xk plus one, and, and maybe we've written this yk plus one given all the previous case. Okay, so if we put it all together, we get this collection of equations so that we can use this to predict xk plus one, given all the yks, and using a previous estimate. Okay, so we go, we start from xk given all the yks. We use this Mark Markovian dynamics rule to evolve the xk to xk plus one integrating over all the possible xk that can be present here. We use Bayes to update from xk plus one given yk to yk plus one. Okay, so we have our likelihood, and, and this this services are prior and then we have a likelihood. So the relationships will be using xk plus one is some function of xk UK UK and the yks or some other function of this. Actually, it's a lot simpler if, if the F has additive noise structure as I said, even even if the, the noise can be multiplicative noise. If it's if it's adding onto the structure that that simplifies the solution of new given given the axis. So in a case like this, then you can, you know, you can, you can actually solve the continuous time dynamics with Parker Planck equation for the distribution so we know how this is a separate story and statistical physics but if you have a stochastic system with a described by probability density p of x and subject to to known dynamics and known diffusive noise processes then then we know how to describe the evolution of that for the prediction so the prediction can come either by making discretized dynamics and evolving them according to some discrete jump rule, or perhaps more carefully using the Parker Planck equation and just integrating it in continuous time from time k k plus one and so on. And, you know, you get effects where the deterministic parts will move. You know, for example, a constant force will just move a probability density function like a drift one direction. If the force is space dependent and different parts of the density will drift at different rates so you can get some stretching or but that's all determined by the, by the deterministic part of the dynamics. And then if there's noise terms, they'll make it, you know, they added diffusive influence so that distributions tend to, again, the noise could be state dependent so the rate of diffusion depends on where the system is. And when you do, when you have multiplicative noise there's all sorts of subtleties that people like Rafael understand better than I do. And so, okay, we're not going to be really focusing on that kind of complicated situation. So, the next step is to connect this to what we did yesterday with the common filter, and I'll do this in a kind of approximate way because the full story is a little bit boring and algebra but but just to give a sense so we have the specific model that we use for evolving is now a linear one. The observations are linear observations and we still have our noise terms in the two cases. The quantities like p of xk plus one given yk in the language that we had yesterday would be a Gaussian distribution. So, so in the common filter what we'll be leaning on is a distribution called theorem or is it's a fairly clear statement that if you have a Gaussian distribution and you multiply it by some distribution percent 20x, if you add a number to it, and if you multiply the x by some scale it's still a Gaussian distribution just to change the variance and so forth. So, that's, in a sense all we do here like if you think about like if, if new K is a Gaussian distribution, UK is deterministic, and if the xk is Gaussian and xk plus one is also going to be Gaussian, the probability densities of something as it goes through this relationship and same thing here. So the special feature about linear dynamics and Gaussian noise is that the distribution probability density describing all of these quantities always stays Gaussian. And we know that a Gaussian is just characterized by mean and a covariance matrix and so that's all we need to do is find these. So in the language we had yesterday. This would be a Gaussian distribution for the variable xk plus one with mean xk x hat k plus one minus and pk plus one minus the minus was for the prediction quantities. The observations would be yk plus one minus C xk plus one given a covariance matrix C and so on. So all of these quantities that enter in, we can assign names that we had yesterday. And basically the step that I'm not going to attempt to do is to just stick all these quantities. And show that the same recursion relations of this condition on Gaussian, if you have a joint by various Gaussian distribution and you condition on say one of those variables with the distribution for the conditional distribution and so forth. So you always get Gaussian distributions but the calculations of what the mean and covariances are is a little bit complicated. And so, I guess at this point I'll just say that when you do that, you in fact reproduce the common filter. But it's a calculation you really have to kind of go through on your own or with the help preferably something like mathematical or maple to convince you that it really is, is working out. There's a lot of kind of computing the squares in it and and and you know tricks like that. So, so one moment. So, only when you have those are nice linear dynamics, both. No, no, you need both, you need both the features. So everything. So for the counter filter to work and for this to reduce to something solvable easily solvable, the dynamics have to be linear. All the noise distributions have to be Gaussian in a particular for example the uncertainty in the state initially should be Gaussian. Okay, so you start with Gaussian distributions and the linearity means that everything will always be just multiplied and translated and stay Gaussian. So for example, if I work with my, I can use to think about whatever I want. So the base filter that we've just been describing is a general set of equations. Anything it found with the general equations often is another story, you can solve them and you get the common filter under the condition that the dynamics are linear and the voices are Gaussian. And then the noise distribution to get to know some of the question. It kind of fit in the way we introduced it just on this definition of the update with innovations. Also they could have been as a bit of a follow. This update rules seem to me like a very smart choice, but it's also some kind of arbitrage there. Probably we know that this is the proper way of doing the update and not some kind of different like a function of the innovations. Well, so in the picture. So here I think I mean this is just using base. So these are these are kind of definitions in some sense because they're defining. But now you have to stick them into these formulas and make relationships among them and then you see that this relationship just comes out of the algebra. Okay, so it's just coming out of the whole structure that we kind of assumed in a somewhat arbitrary way is what comes out of this. Well, historically, the big comment came first and based actually not not very long after. It seems to me. But but but but but but but but now you can essentially approve that this is the right thing and so similarly in the, in the LQR calculation of regulating something. There was a step where I assumed that the input should be some matrix. Well, characterise the autocorrelation of the and characterise the noise but then characterize the filter by asking, you know, are these predictions independent because the idea is sort of that your, your, your predictions have extracted every possible piece of information about the system right that that you're predicting it basically up to the measurement noise in some sense. And if there are any correlations between them, or any offset like if you're, if you're predicted observations are always too high, or if they're always correlated in time and that means something is wrong. And there's some piece of information about the system that you haven't extracted. That means that your predictions are kind of systematically not as good as they should be. Okay. Oh, do I have to join the meeting. Yes. Okay. I'm so we question you just a second. Let me just get this started. Yeah. I mean, when you are doing like this update process, you are supposed to be doing instantaneously, right. While you are observing the process. Yeah. You measure you calculate something and you predict at the same time. Yeah, so this is this is still a kind of idealized way of proceeding where, yes, the measurement transfer to the computer calculation, then the K to K plus one time. I'm about to talk to about an experiment where that's not true. We are doing. All right. Okay, I was going to say, I mean, we're pretty much getting to the end of what I wanted to say here. Maybe, maybe given the time and stuff. Let me let me just say what, what else was I going to talk about on this before going to just two quick points. Two, just two quick points. One is, why do we, we have, we characterize the X hat is sort of a single number actually that the conditional mean and you can you can show that this is in a sense something that minimizes the, the, you know, you can choose X, X hat is the, the, the, as the representative value. And so the idea is that if we are interested in some p of X given why, and we're using Bayes relationship to estimate it, then if we form as a cost function, the average value expected value of the deviation of whatever particular X is involved from our hat that we're choosing and the square that deviation and look at the ensemble average of the expectation that if we, if we define this as our kind of cost function that by choosing X hat. We can minimize the conditional mean, you minimize this. Of course, this function is a little bit arbitrary. We could have for example, that use the absolute value, and then you would get an L one, you know, that's like an L one and then you would get the medium. I think. I think, I think it's the anyway doesn't matter. Okay, it's different. The point is, this is, this is, this is a little arbitrary, but given that then this is, this is what you would use. The other comment was that, you know, this is Bayesian formalism allows a way to kind of go beyond linear dynamics and noise. So the question is sort of, you know, what, what to do. And the, the, there are perturbed approaches which try to tweak the approach a little bit, but make it like a counter filter with a bit of extra stuff. And then there are kind of strongly different strongly not in your dynamics strongly non Gaussian noise, where you might resort to some kind of numerical solution. You can either try to solve them as a set of equations like on a grid or a basis expanded over basis functions. It's also possible to try to do Monte Carlo representations in the perturbative approaches. The basic one is called an extended common filter, and you basically just sort of use the non linear rule to propagate the mean, a little bit of an approximation, but then always kind of make assume that it's Gaussian at each step, and just characterize it by the mean and even though that's not really true, but if it's not too far off, it's not too bad a thing to do. Then your ensemble method is kind of like a sampling method. Where you construct empirical co variances, but sample from the true distribution. So there's, there's, there's a whole approaches and like, like different approaches work better for different kinds of problems and different fields have different kinds of problems. If you come from meteorology, for example, the ensemble common filter and geophysics it's very popular is good for high dimensional state spaces. Extending common filter was used in the space program, you know, the 1960s, Apollo, that was sort of the basis for navigation of satellites and Apollo missions and things like that. But we don't need any of that fancy stuff. As I said, the goal was to kind of show that this common filter, although it has a seemingly arbitrary structure comes is kind of like the expression of Asian reasoning in the context of these dynamic systems. Okay, so I was going to stop and change gears into seminar mode. But are there any, any questions on this stuff before we switch. Okay, so I'm going to shut this down so don't panic on the internet. Okay, so I just stopped. Let's see, I guess I spent let me let me just go back to the thing and stop sharing all that. And I will leave the meeting here. I have a little question. The differences. And to apply the optimal control theory with basic information is the problem. No, well, no, this remember these are, these are, these are separate stories. So, so, so optimal control is, is how do you pick the you Yeah, you know, given initial state acts. How do I get the you that that gets me somewhere into the future. This state estimation is given the observations in the past. Well, well, either. Yeah, how, how, how to reconstruct the state at this time, or, and we'll do this in a moment. You can also use it to predict things the future like it's done by series of prediction and update steps. Okay, now, how do I do this. I want to go to zoom. Usually I do this to this is to SF using I just I just do general zoom I guess right. Okay, join me. If somebody has that idea. Oh, no, I did one. So, so we tested this. Yeah. Okay, so for some reason I'm not seeing the, I'm trying to put it in play. Well, we'll just do it. Okay. Okay, sorry, sorry, sorry, this isn't playing it's also kind of scrunched. But we'll have to make you. Okay, so, so I want to tell you a little bit about what we've been doing in my group and in a related problem, which is on trying to make a natural demon, and it's work that is done in people from from on from the middle side of my group to sharp. So hot. It just finished his PhD and will be ending his work. Okay. And it builds on previous work by a lot is from Belgrade and did a PhD my group and is currently working in Vancouver. It was in collaboration also with very talented theorist David Sivak, a professor at Simon Fraser. Carl was a postdoc with me for about a year. Yoni Garrett has been around for about three years. Joe Decerro is a master student with David, he's now a PhD student at Stanford. Okay. So, so to go back to the beginning of the story with the natural demon thought experiment that I think I think most of you want to talk about this. So, you know, this is a famous thought experiment they not an experiment that was trying to understand the nature of the second law, and there's the whole cartoon of, you know, the demon trying to sort out the molecules of different velocities into two departments and that if I develop any temperature difference that means to extract work. So, this is, this is, this is nice, but not so practical, at least to do it in this way. This. So in 1929, Leo Zular proposed a nice simplification of this where sufficient to look at one particle in a box, and you could insert a partition, and now it's either on one side or the other and if you make a measurement. You can attach a kind of pulley and let this one part of gas exerted forces it expands. It's doesn't work lifting up a mass, and the particle stays at the same average energy because the walls are at some temperature so it's connected to a thermal bath. And when you do this, you can show that, and I guess one did that you extract up the KT law to a work per cycle. So once we're going to over here with a cycle, we can go and insert a partition. So this is closer to what we're going to do, but still slightly different. We have a kind of modern version of this that can can Ken has been built. And I want to explore what this sort of new type of engine, which is extracting work from back and actually do. And if we have time and stamina might talk a little bit about the costs of a ratio and my dollars principle and so forth. But I think that the part I want to really focus on is this this first part. Okay, so imagine a particle that's got a mass and it's heavy, there's gravity, and it's hanging from a spring hanging from from support via spring. And because it's small, it's fluctuating up and down with an amplitude that we can measure and see it's motion. So after a while. So here's here's the equilibrium and it's going up and down. We can set a threshold where fluctuates up to this threshold. Then there's a demon which is measuring the position notices that it's hit the threshold, and then raises the threshold in just the right amount to not do any work on the system. So, intuitively, if the spring, if it's, if here's the support, it fluctuates up compresses the spring in a certain energy, we can move it to the place where the spring is stretched with just the same energy. Okay, and so the energy is not changed we're not doing any work, but now the new equilibrium of higher so the article set here, and then eventually go to a new threshold and going up and up and up. And it kind of seems like we're raising this particle, we're lifting a heavy mass up, but we're not doing any work on it right because it's just the thermal fluctuations that lifted, and just respond in a way that doesn't do any work with the fluctuations with it. So it looks like we're extracting energy from the back and storing it like work. And then repeat. Sorry, this is just saying where it was saying. And we have an experimental realization of this using optical tweezers which are sharply focused beam of laser beam that sharply focused and produces a potential that is like a spring in all three directions. And if we have a horizontal beam and heavy particle, the particle is sagging in the beam and that's like the spring. And so we can look at the fluctuations up, and then we raise the beam on. And so that kind of does these operations that I just talked about. So, historically, well, okay, maybe first let me just say so I'm not going to have too much about the experiment itself but just to say that that what's going on here. Just to say that that that it's a fairly standard home built optical tweezers setup. We have the trapping laser, and something called an acoustic reflector that can rapidly change the position. It then controls the position of a trap for the there's gravity pulling it down. We have a we have a separate laser that we use to measure it. And we can control its intensity. And, and so the light sort of scatters off the beat and it's reported on this. I'm sorry this kind of got all messed up here but this really. Things got a little messed up here that the FDA is not part of this but but it gets on it onto a quadrant photodiode, and then somewhere off to the side, there's a controller that is that is doing it you know taking information about the observations deciding when And as a technical thing, the loop rate here where you go from from detection to some calculation to do I do I move up the support is going at 50 kilohertz so 50,000 times a second. That's actually very fast for a standard computer so you need to use specialized hardware. You can't really do this on a standard computer so you need some special hardware to carry out this control. And they control it also moves the position of the lighter right. Yes. Yeah, so this, this acoustic optic beam collector. It's a little bit of a story about what it does but basically it just you can you can tell it you can give it a signal and it will change the direction of the beam which can change the height of the As it goes through the trap. And here's just a little picture snapshot of a piece of the experiment showing the sample kind of in center there's there's an objective to focus the laser beam to create trap and then there's another objective with the other laser to observe it which is Okay, so that's the experiment. So normally the question that people, the first question people asked is okay you're raising this thing up without doing any work. This seems to violate the second law of thermodynamics what's going on, and there's a long story which I won't get into for the moment at about the cost of the answer basically is that carrying out these measurements and the computations associated with it also implies some some some work that's required to do this and that work will when in the case of one temperature that always exceed Whatever you're getting from the extraction so you can't win in this. There is a caveat to this that if somehow the bath is at a higher temperature than the measuring device, then you have two temperatures involved and then you can get work because you have more thermal fluctuations than the thermal fluctuations that are controlling the costs. So principle that could give you something but if everything is at one temperature way to describe the best you could do would be to break even and if you're doing your find a break you won't break even but it still might be interesting to understand what this can do and that was sort of our starting point. So the, I guess one of the, you know, one of the main messages I wanted to get across was that that these are actually engines that at their scale can do significant things and whereas the thought experiment that Maxwell proposed, you know, seemed like it was vanishing small amounts of energy and it was relative to the large system that was being imagined that we have a small system with fluctuations that the energies involved are actually significant. You'll see that. Okay, so this is our feedback algorithm so this is kind of straightforward application of feedback where we have some some threshold here and if the position of the particle at some ignoring measurement noise for a while. If this position exceeds relative to the trap. So this is the displacement of traffic seeds some some threshold amount, then we can raise it by some amount otherwise we just sit away. So this is a kind of nonlinear. Because this this this condition here I can put into like a step function. You know, any kind of we have these sort of multiple, multiple paths is automatically nonlinear. Okay. However, we can we can implement it. And so here's here's a beam and we have a gain alpha which says how much should we raise. Okay, and you know the loose criteria is we want to raise it so that we're not doing any work on the particle we'll see how that that's how that's met in a moment. The result of it is is sort of a little bit of a dance between the particle position and the trap position so you can see that here's the particle position. And every time it goes up enough it hit some threshold and we raise the trap because that's the black line. And so, if it happens to go down and take a long excursion down we do nothing for a long time, until then it starts to go up again and we raise it so it's got this ratcheting motion. Okay, so the first test is to figure out what value of alpha that should be done so that no work is done. The naive answer would be to if we fluctuated here we displace the system, so that it's, you know, the potential is just matched. And in our notation this corresponds to a value of alpha equals two basically one to get to here and one to get to here. Okay, so alpha equals two will be kind of the naive thing that you do, but what if I just see what we do with them where it is. And so we find that in fact, in typical conditions it might not be two but one point five and one immediate reason is the kind of timing issues that Miguel's been alluding to which is just that, you know, there's a time delay between the time and measure it and the time you can respond to to move it and it's very small but it's, you know, it's small but it's not like this. In fact, we set it in this experiment to be one of the. So we're actually observing it at time k moving at a time. We don't have to deal with when you're doing this but but one of the effects so so that's this is partially to play fly this is why this is lower because you know in the time that it takes it sliding down and so you shouldn't go as far you should go like here. Let's see this this this turns out to explain part of what's going on but not at all, but we'll see the rest of it in a little bit. Okay, so we empirically tune it to the value of alpha that makes this work. And then we have this ratcheting going up like this. We were interested in how to maximize the power that's being extracted so there's work extracted in in raising a weight. And if it's, if it's on average going up at some constant velocity there's power being extracted. So, so we would like to maximize this and so we play around with parameters. One is the sampling frequency. So how often do we make these measurements this 50,000 times second relative to a kind of natural time scale of the system which is we've done optical tweezers is the corner frequency of the trap but basically, you've got a big particle moving in a viscous fluid it's overdamped. And so there's a time scale for it to relax when it's far away from delivering it just relaxes exponentially with the time constant. And that's, that's set by things like the viscosity of the fluid the size of the particle strength of the laser and so forth. But again, it can be measured. And so that defines a time scale or a frequency scale. And you can ask how does the frequency of measurements compared to the frequency of this trap. And so that's what we did here this is the sampling frequency in those units. And what you see is that initially, the, the rate of free energy extraction or power loosely the power is going up with the sampling frequency linear actually, and it saturates once you get well past one. So one is the sampling frequency is equal to the trap frequency. By the time you've hit 10 or 100 it's pretty well flat. And what this means is that out of all the fluctuations you know there's there's 10 to the 23 water molecules in this back and they're 10 to the 23 loads of fluctuation. So the ones that are useful for energy extraction are only the frequency ones that the trap sees the trap can only respond to to perturbations or below the trap frequency, the ones above are just sort of averaged out and and don't make the flow so we can't explain them to get work. So most of the modes, in some sense in the bath are not relevant. And there's a few low frequency modes that that are producing fluctuation that we take advantage of. Okay. So, so the first lesson is you want to sample faster, if you can faster than the trap frequency. There's a point of diminishing returns. And so this is the threshold for as it as it moves the trap. And what we see here is that the power that we get is increasing and then in some sense saturates are coming but it increases to zero threshold. So in some sense, every fluctuation, every fluctuation where the spring is starting to be pressed, want to take advantage of. Okay, now one subtlety is that it's hanging naturally because of its weight below the equilibrium so I'm talking about it first set. So when you talk about do a threshold it still has to go find a distance to kind of get rid of the gravitational side and then it's doesn't compress the spring after that like gravity is naturally stretching it, but the moment it starts to compress it that's when you want to ratchet. I'm sorry John, you are mentioning the free energy changes with the sampling frequency on the threshold right. Yeah, so okay. So, so the sampling frequency and we want to make it fast. And the threshold we want to set to zero, where zero noise is what I was just saying. It's zero in the sense that every as you will happen then is always going on. Except that it makes a big sag and goes down and you spend a long time kind of being stretched and then have to come back up to that. The moment it starts to get compressed. Every fluctuation is beyond that spring. Oh, wait, I know. Okay. You can do this with different laser powers and so you can see that the stronger the laser the bigger the spring and stiffer the spring constant the more you get. So this is kind of a physical limitation and I would say that this is analogous to when you make an engine, what are you making me a dog, you know, are you making a plastic are you making out of metal tungsten something special there's, there's the material properties of it which is in this spring, like how stiff the screen so the stiffer the screen in some sense the better. But in our case is stiffer spring means a bigger laser and so there's only so much that we can you can go so so but anyway but but to the extent that you can more is better. You can do this with different sizes. So this is a one and a half micron be this is a three micron be in a five micron be. And you can see that it just in terms of getting to the most power we get to about roughly about a thousand per second. And as a fact this is a significant thing is comfortable with a motor inside itself. Be putting out in terms of powers so these are on their own scale, a significant amount of power. I also asked, if I just care about the speed, you know how fast it's going. What should I do and so then it turns out that so so bigger beads were nice for power traction because they're heavier. Okay, a smaller be goes faster. And so you can see that this is the big be this is five three one and a half even to one half here. So the faster the smaller the be the less drag there is the faster the straight. This is a straight line here that this experiment either going up which can also go horizontally and so we can distinguish what's what's from gravity and what's So again, so again, this is like a very, very fast factor equalize about 20 microns per second. So again, at their own scale, the micron scale and so forth, the bacteria, these, these are competitors. We have an average on the position in one. Yeah, these are long time average. Yeah, obviously the instantaneous velocity. Yeah, but you already mentioned the position. But this is just over a long period. They have the average. Many ratchet events. Okay. Okay, and so then there's a there's a there's a nice theoretical analysis of it, which I will mostly not talk about but but you can you can treat this as a first passage problem. And say for example if I start from minus XT and go to plus XT where XT is the threshold. When is the time, how long does it take to first go here because when it comes up here we ratchet. So it really maps on to the passage time. And there's a classic result that's almost 100 years old for the mean first passage time going back to actually the same country origin from optimal control, but doing work 25 years earlier on this. And so so we can do this calculation and even come up with an analytic expression for what the velocity should be. And so you can take all of our experimental results that I was showing and more for velocity and power extraction. And so you can take all of our experimental results that I was showing and more for velocity and power extraction. And using this theory it tells you what what are the nice variables to plot so you can collapse all of these curves and collapse all of these curves curve where here we're using the reduced mass with this the buoy mass time gravity over the spring constant and equilibrium with. And then the time scale is in terms of the relax. So this is the signal is the with the fluctuations tower to relax. And then KT is the big dimensional spots that that collapse all of these separate curves on onto one. Okay, so so we know something now about how to kind of maximize the performance. We were then interested in what are the effects of noise on the measurement. So we assumed so far that the measurements are sufficiently precise that we don't know about the difference between the measured value, but if we go back to our setup with the detection laser and so forth, we can you can certainly use this device to reduce the intensity of the laser and use just a very deep and the amount of measurement noise increases if the intensity of the laser goes down that's fairly fairly clear and traces to shot noise. And why might you want to do this well there's some practical things if you were working with instead of light scattering fluorescent molecules they can photo bleach and general organisms don't like to be blasted with various laser. Like, so you want to minimize the intensity of of your illumination. And anyway, so there are reasons that would actually not want to have to precise a measure. So when we start to get to calculating efficiencies, their information costs or for a fire very precise and impatient, but not going to go there for now. Okay. So, so so so so now varying the noise or what I'll call the signal to noise ratio so the signal here is the not that the beat would naturally move which is the sigma and the noise would be the standard deviation of uncertainty of the measurements. And so their ratio is what I'll call signal to noise ratio. And the observation that we found was that this information engine stops working when the signal to noise ratio gets too bad or the noise gets too high. So we just trace through that so if the signal to noise ratio is relatively high, we have the situation that we described before where we would tune the value of alpha. And, you know, naively we thought it should be to we found somewhere around say 1.5. You know, we can tune it to so that the trap is is not, you know, as we move the trap up each time we're doing it on average, we're not doing any work on the system. Adjust the value of alpha and empirically measure the work that trap is doing to noise this is not a problem, but we can do this now as a function of the signal to noise. And what we find is that as we go to higher and higher signal noise, it goes not to two, but to 1.8. Okay, so that's one question, but perhaps more relevant here is that it goes down. And I don't know if you can see you can't really see this very well but I was using field circles here. And then over here, these are all hollow circles and what the hollow circles mean is that we actually weren't able to find any non zero value of alpha that would that would make the trap work equal to zero. Basically, we had to set it equal to, I mean, if you set alpha equals zero, then you're not ratcheting at all. So you're not moving the traps, you know, you're not doing any work. So that's always an option. The question is, can you set it to some non zero value that you can measure across over between doing negative work and positive work. And so it got so small that we couldn't do that. So it seems to kind of, you know, just, it just goes away. So this, the fact that it doesn't go up to two, you can actually show that this is traceable to the delay in the system. And I see that there's actually two reasons that alpha is not to one of them has to do with the delay. And when I talked about that, that you see the particles, and you respond to it a short time afterwards. Okay, but clearly this is not even the biggest part of things when you get down here that somehow the, the noise is causing problems in this as well. And so we want to try to understand what's what's going on from the measurement. This noise on the measurement. Yeah, yeah, of course there's always thermal fluctuations. But as I said, that's kind of our signal. Okay, so one of the questions, you know, is what's going on here is the alpha just, you know, very, very small and we didn't have the experimental resolution to measure it or is it, is it actually zero. And I want to claim that we can show that it's actually zero and that there's a kind of base transition between this engine working and it's not working. And the way that we do this is to measure. This is the power, the power as a function of alpha. This is for the trap. Okay, so this is the, this is the work done per time by the, by the trap as a function of alpha for very, very small values of alpha here. That that to show I was able to measure for various signal to noise ratio so the signal to noise ratio is encoded in the gray level from about point four here to six here. What you see is that when it's high, this, this goes, this is negative, and then we'll eventually will show in a moment comes back up and becomes positive and when it becomes positive that's the solution that we're taking. But you can see that here, for example, it never becomes negative it just goes goes up and if you, if you take the linear coefficient of it coming into zero and plot the linear coefficient versus it with noise. You can see that there's a transition at a finite signal to noise ratio, where the slope is going from positive to negative so when it's negative that means it's going down and will eventually come up and we can define a non zero value of alpha that makes the trap work zero. If it's positive then clearly you can't write it's only zero at zero, which means you're not raising the trap. And the fact that it happens at a finite signal to noise ratio tells us that this is actually a phase transition. This is a qualitatively different regime separated by some non analytic behavior. And so, so what this means in terms of the gravitational energy extraction. We measure as a function of signals noise and so it's so this is the value that we can calculate neglecting any noise. So for high single to noise ratio we match that, but then it starts to go down but a certain point, it goes down and again I'm using how once to say that basically we just set it equal to zero and these are just not zero fluctuations. Whereas this just looked to be like a decrease we can actually see from all this is simulations that Joseph did that this really is a kind of non analytic feature here where we're coming in at some finite slope. And I just lost. I think I just lost this news. It's connected here. I know it's just kind of frozen here. Let's see. I think I have to be trying. I'm sure.