 Thank you, thank you, Mathew. So you can hear me in the back? Okay, I sometimes talk a little fast. I'll try and slow down, but I think that's okay in Italy. I don't know, maybe. I want to get a quick overview. As you heard, I studied physics. I haven't studied any biology, but I do biology. So I'm hoping that I can get at least some of you excited about questions in biology. So just so I know who I'm talking to, how many of you have had some exposure to biology after school? In college or something like that? Okay. And how many of you are actually working on problems that are connected or weakly connected to biology? Oh, great, okay, fine. So there's some people. And the rest of you are doing physics, yeah? Anybody here who's not studied physics at all? Mathematics, okay, that's even better. Okay, good. I won't spend time asking you all for your names, but I hope I'll sort of pick it up as I go along. So maybe the first time you ask me a question, you can tell me your name. And maybe I'll remember, maybe I won't. So we'll see. So I picked something ambitious to do over the next two weeks. I realized that you're getting a lot of information. I saw the lectures in the morning. And so I'm telling you right now, you're not supposed to learn everything that I'm putting forward here. And I'll certainly adjust my pace to keep up with what people ask me. If I'm going too fast, you can tell me that. And I hope just to give you a flavor of the kinds of tools and techniques and questions that are open in this big area of noise or randomness or stochasticity in biology. I've given some reading. It's on the website. It's online. The reading is just three papers. One of them is a recent paper of mine. The other two are papers by Bill Bialek, who has lectured from this very stage many times. These papers are things for you to read. Again, I don't expect you to follow the whole thing. It's just to give you a flavor to inspire you to see that there is cutting edge work, looking at living systems, and applying methods from statistical physics, stochastic processes, and so on. Okay, and the entire series of lectures are going to be broken into two pieces, roughly the first week and the second week. And they split according to something like this diagram. If it's too small, font size bigger. But generally, you can imagine a physical system. Any physical system, it could be a cell, could be an electronic component, could be a black hole, I don't know. And you can think of the inputs as parameters or things that we want to control from the outside. Think of it like an engineer using a machine. And the fact is that in most physical systems, even if the input is held steady, the output is going to be fluctuating somehow. And these fluctuations are unavoidable. They arise because of the molecular, the discrete nature of matter. And biology actually presents a very nice place to see the roles of these statistical fluctuations. And the reason is very, very simple. It's because cells are small. An E. coli cell is one micron on each side. And if you're one micron on each side, then if you have some concentration of a molecule, turns out that if you have a nanomolar of some molecular species, of course you all know what moles are, right? So if you have a nanomolar concentration of something, that means there's literally one molecule of that in that cell on average. If you have a micromolar, there's only a thousand copies of that molecule. And at these numbers, statistical fluctuations are very, very important. They turn out to actually influence what cells do. And cells, in a way, have grown and evolved to deal with these fluctuations. So the two pieces of my lectures, series will be, one, the origin of noise, noise or randomness. So where does it come from? What is the molecular basis of it? What are the mathematical descriptions? I'll show you some tricks about how you can use the kinds of mathematical tools you know to predict what kind of noise you'll get and what the consequences of noise are. So that's one piece. That's sort of at this end. And the second piece, which is sort of at this end, is looking at extracting information. So when you have a system and it is noisy and it is fluctuating, but it is a system where you're reading the output and you're trying to sort of get a handle on what the input is, how do you use techniques like averaging or buffering or redundancy, things like that, okay? So in particular, I hope by the end of the first week in the noise component, I'm going to familiarize you and actually make you experts on the use of something called the chemical master equation, which is a sort of stochastic equation that describes fluctuating molecular species in a tube, for example. And by the end of the second week, if we're lucky and if I manage to get through everything, I'm going to give you a preview of Shannon's channel capacity theorem, which basically tells you how much extra information you have to pad to a signal in order that your partner somewhere else in the world would be able to extract the information in a loss-free manner. How many of you are familiar with, or at least have done some sort of course in stochastic processes? Not statistical physics, okay, very good. So that's excellent. Information theory, how many of you have done a course? Okay, not bad. So for some of you, this might be stuff you already know, but that's okay. I'm going to try and pitch it at such a basic level that these are just looking over systems you already thought you knew and gaining new intuition about it, okay? So that's where we're going to be. So what's a stochastic process? So the word stochastic is unusual. You don't find anybody on the street using the word stochastic. They usually use the word random, and these are not quite the same thing. But for me, a stochastic process, so the quintessential representative of a stochastic process is a graph of this type, right? There'll be time on this axis. There's some variable on that axis, some x of t. You start somewhere at t equals zero, and then there's some sort of equation that you get like some sort of curve that you get like that. This is maybe an observation of, let's say the temperature in this room or out there. It could be the observation of the position or the velocity of a Brownian particle. You know, it could be the number of molecules of a particular type in a bacterial cell. It could be any number of things. The x need not be unidimensional. There could be many x's that you measure as a function of time. So this is sort of an extension of your standard view of dynamical systems. You have a system. It's described by some state variable. In this case, x, the state variable changes over time. And if I just pick one of those variables, I can literally plot it on a graph like this. Now this on its own doesn't mean that this is a stochastic process. I don't know this is a stochastic process. The only reason I'm saying it's stochastic is because it looks very squiggly, but that's not the definition. Stochastic really means what? It means if I do the same thing again, from exactly the same initial condition, I'm going to get some other curve. So the point is not the squiggliness of the curve, but it's the irreproducibility of the outcome. That's what makes it stochastic. I can run the same thing again and again and again. Lots of runs like this starting from exactly the same initial condition. I'm going to get lots and lots of curves. And it's the mathematical structure of curves like this that we're going to be discussing over the next one week. It's the entire way in which one describes these things. Just to say that this kind of mathematics is, you know, it's not that old. It's only been 100 years or so since people got interested in systems like this, and really much of 20th century physics is defined by a deep understanding of processes like this, right? And I'm sure you all know this. Okay, so what can we do when we have curves like this? Problem with curves like this is our usual mathematical tools, which are based on your standard calculus and so on that everybody learns, are not very obviously applied to systems like this. Okay, there's no equation for any one of these curves. I can't write it. I certainly cannot write it down analytically. I can't write a formula for it. The only thing I can hope to do is to provide a recipe for it, okay? And I'm going to tell you today what kind of recipe we're going to use. Oh, I forgot to mention. So there is homework associated with this these two weeks. The homework I believe is already put up in the link against my lectures. There are sort of seven problems in the homework and these problems generally track the lectures that I intend to give and at the end of today or tomorrow's lecture you should already be able to start working on these problems and just keep up with me as they go along, okay? At the end of this there is an exam. The exam is much easier than the homeworks, which is good. And we're going to figure out how I'm going to work with you guys and with the TAs to help you sort out through all the homeworks, okay? That's just an exam. Okay, so looking at curves like this what's the first thing that we should do to characterize this, right? If I wanted to tell somebody else somewhere else some general properties of what's going on here do I need to transmit to them an entire series of say a thousand experiments that all started from some initial condition with what level of discreteness do I transmit it? How much data do I need? So let me ask you guys, if you wanted to convey you don't know anything about the system. You don't know anything about the system. You just have access to a box, right? And you have access to the output and you're able to in some sense recreate the initial condition fairly reproducibly and run the system, okay? What do I need to convey to somebody somewhere else in order for them to understand or to have a fairly reasonable representation of what I have here? Let me, yeah. Which distribution? The moments of this, you've already gone straight for the thing, right? The moments of which distribution? I don't, so where is, good, good, I like it. So there's some distribution, which distribution are you talking about here? At any time, very good. Let's pick a time right there and let's stop the experiment at that time and just take everything out of the oven and measure it and there'll be some distribution and this will be at some time. So that's one thing you could do. You could give, okay? So you've already said something about picking a time and this is also very important. It's also very standard if you're going to represent not a stochastic system, but a traditional continuous deterministic dynamical system, you have to pick some times where you're going to transmit, right? So typically you're going to have to discretize just because you can't send infinite information. You're going to pick a set of discrete times. They could be evenly spaced. They could be non-inventedly spaced. They could be highly bundled in a time window that you're very interested in, whatever it is. You pick a series of times, right? T1, T2, and so on. That's one thing you can do. You could, you would have to, in principle, discretize this as well because presumably you're transmitting it by a computer and there'll be some precision with which you're going to transmit the actual state of the system, right? And these levels of precision can be very, very high, much higher than the amount of information you actually need. So as far as this series of lectures is concerned, my stochastic process could be continuous in time. It could be discrete in time. It could be continuous in space. It could be discrete in space. However, for all practical purposes, you're going to have to find a way to deal with discretized time and essentially space for the purposes of transmitting information. Now, once I've decided to do some sort of discretization, I pick double precision arithmetic for storing X and I pick some set of times I'm interested in, then all I have to do is to transmit to somebody a list of T1 and X1, T2, X2, and so on to represent one of these curves. Let's say that one, right? For each one of these things. That's one thing I could do. And I could do the same thing a thousand times. That's a perfectly reasonable, complete representation of what happened in the stochastic process, yeah? It's overkill. There's too much information there. For some, in a sense, you don't want to send all this information, right? So as somebody said, oh, you didn't tell me your name. Nikhil, okay, so as Nikhil said, one other thing you could do is pick a time and at that time don't send values of what? At the time you send a sort of bind histogram of the values this curve has achieved up and down, right? So that's something you could do. It's in some sense just one of these times all the X's. That's one thing you could do. You could also send just some moment of this distribution. You could just send the mean of X at time T star. Or you could send the variance of X at time T star, and so on. So this is the stochastic process, assuming you're measuring it. I'm going to flip the problem around. How would you model this stochastic process and generate a single one of these curves as one outcome of the underlying system, right? So this is actually the point of the lectures I'm going to give. And this is the thing which is not at all very clear when you read a stochastic processes textbook because it is launched into all kinds of crazy integrals and proofs about convergence and so on without actually getting to the heart of the matter. If I wanted to, if I didn't have a system that I was observing, it's not a real physical system that I'm observing. All I have is a computer. And on my computer I want to simulate the behavior of some system under some assumed dynamical laws. How would I generate it? Now what would I need to do this, which I wouldn't need to do if it was a deterministic standard dynamical system with some sort of function, right? Suppose I have a standard dynamical system where I have x dot is some function of x and t, right? If that's what I have and I have a computer, I could easily generate a bunch of curves because I know x of t equals zero and I can do any kind of numerical integration that I want, yeah? That's very easy. Everybody knows how to do this. Apart from the computational machinery that I needed to do the solution to such an equation, what do I need to solve this? What do I need to make a curve? Special new ingredient. Please say your name first. Random numbers, that's what you need, okay? Now, how many random numbers do you need? How many random numbers do you need to make a curve like that? Cut around the number for the increments. For the increments, okay? So now you're gonna discredit time and sort of jump each step. So the same way that you would integrate little units using, let's say something silly like Euler's method, right? In the same way you could add increments, but the increments would involve each time the draw of a random number generator, okay? Good. How many kinds of random number generators will you need? I'm gonna make it very expensive, yeah? One. Yeah. Convert anything to anything else. Please tell me your name, please. Daniel. Daniel. Daniel, and... Francesca. Francesca, okay. So this is the magic of it, okay? So the entire theory of stochastic processes is sort of wonderful. It takes a huge amount of mathematical machinery, either a dynamical system type of equation or partial differential equations of this type, okay? All these good old, simple, calculus stuff that you guys learned and you learned how to solve, which I assume you're all very good at. And it adds a single extra ingredient, which is the ability to draw a random number and not a large number of those, but just one very simple one. In particular, all your computers are very good at generating uniform random numbers between zero and one. So let's start with that. And so taking everything else you knew and just that single random number generator, okay? You're going to be able to generate curves like this. Now, think about that. It's a very non-trivial statement. And if you want to grasp how complicated what I just said is, let's step back a little bit. Suppose I wanted to generate any old distribution in one dimension from a random number, right? So let's say this is x and there's some distribution which, let's say, looks like that. Let's say that's a normalized probability distribution. I give you a normalized probability distribution. I give you a formula for it or I give it to you in some numerical format. It doesn't matter. And I give you a uniform random number generator, which in fact is this generator. This is x. This is u of x. And this is the PDF of that. That's a uniform random number generator. How do I get this from that? There's many ways to do it, many ways to do it. So I know you all know how to do it. So I just want you guys to explore what you already knew about how to convert, how to massage one random number generator to give you random numbers that obey some other distribution. Let's call this y, in fact, for the sake of you. This is y. There's a few problems with this. And let's work out all the problems before we go along. So what does uniform random number generator actually do? Forget how it works. But what does the user manual say? If I call a uniform random number generator rand on your computer. If I just say rand, and I say x equals rand, what do you get as the output? Same probability, but what do you actually get when I, what will x be? It won't be a real number. I mean, what will it really be? Real numbers are not real, right? So what will it really be when you run a random number generator? It'll be some fraction, right? It'll be, let's say, double precision number, right? It'll be some number of bits, right? That's the amount of memory your computer has. So it's not going to be a real number. It'll be some, well, it is a real number, but it's, in particular, it's a rational number. In particular, it's a decimal expansion up to some point, right? So if you keep on running this, you're going to get some, and let's say it's a 16-bit number or a 32-bit number or whatever it is. So what the random number generator actually gives is all the numbers from zero to two to the sum m minus one. It gives you all those numbers divided by two to the n. It could give you zero. It could give you one. You have to really look at the manual for your random number generator to see what it does. I also urge you guys to go look and understand how these things work. I know many of you do, okay? They're not really random. They're pseudo-random. That's not the point I'm making. So first thing is that they're discrete, but there's enough of them, and you can pretend they're continuous, but perfectly fine. So now you get some x, okay? And from this x, you want to generate another series of numbers y, which approximately, approximately to the extent of this discretization, which is sometimes important, but otherwise mostly relevant, takes this distribution and massages it to give this distribution. So I know you've seen how this is done. So can I have two or three different strategies for converting the output of a uniform random number generator x into a number y, which if I keep on repeating this process and I make a histogram on a frequency count of y, it'll look like the curve I want. There's a couple of ways to do this. Cumulative method, how? So the point we made here is, I'm not very good at very quickly taking integrals here, but let's see. So if I take the integral of this curve, if I say y, okay, this is y. This is the cumulative distribution function of y, right? This will look like so, something like that. What I'd like to do is to make sure that if some number x is loaded uniformly between zero and one, yeah, then there are regions of space where each value of x will give some value of y, and then by dividing it by the slope of this curve, you can actually get, by formula, some formula of x, which gives you a number y, and if you plot that value y, you get this distribution. Give me an even simpler way, one for y, one for x. The other way to do it is to make a little box around here and draw two random, I never said you had to draw one random number, so you draw two random numbers, and the first random number drops you somewhere in this space, and the second random number drops you somewhere in that space, and if the coordinate of where you land is outside this curve, then you reject it. If it is inside this curve, you accept it, but you only return the first number of that process, and so on and so forth. So this is another way, there could be a third way. There are many, many ways to massage random numbers to get changes between multiple distributions, okay? So now that you're all experts on random numbers, one dimension, I'm going to ask you a slightly more difficult question. Suppose I had a joint distribution of two numbers y and z, for which, let's say, these are the contours, and so what I'm drawing is a contour map of some probability x, y, which is correctly normalized, so I have a two-dimensional distribution, and these are not, obviously, two independent random numbers. These are dependent random numbers, and I want to generate a different question. I want to generate x and y with the correct joint distribution. How do I do it? You can draw a three-dimensional box, for example. So you could draw x and y and make a three-dimensional box, and another way to think about this is if you've already solved the single-dimension random number generator problem, then there's a fairly systematic way to expand any joint distribution in any number of dimensions, right? In particular, suppose I have some p, x1, x2, x3, xn. I had some joint distribution of n numbers, okay? I can always expand that in the following way, right? So this is one of those standard identities in probability theory, yeah, very simple. So it's a sort of chain expansion of a joint probability distribution, okay? This first part, you just choose to order the variables in some order, in this case there's an obvious order, x1 through xn, then you marginalize. So this is just the marginal distribution p of x1. This is what you get if you integrate that function, integrate out all the other variables. And this you can easily do with a random number generator, right, because I've already told you how to generate a random number distribution in one variable, yeah? And this is also the same thing, because once I've fixed, once I've ran that random number generator, how have you managed to do it, and get the value of x1, if I plug that into this, then p of x2, given x1, is itself just a random number generator in one dimension, yeah? Or a random, I can use a random number generator in one dimension to generate this. Once I have x1 and x2, I can plug them in here. Again, I have to generate only one random number and so on. This is sort of related to what somebody said about adding increments to the process. You generate a number, see where you are, then generate one more, then see where you are, and then generate one. Now, you've all seen this in probability theory. Now, this very naturally simply collapses into the problem that we are going to address over here, where these are now not n variables like temperature and pressure and so on, different state variables of the same system, but in fact, they are different values of the same state variable at different points in time, yeah? And we're going to learn how to generate those, okay? So this is straightforward, easy, okay? Any questions about this so far? I know it's all just review, okay? Now, oh, I didn't want to erase that. I know all of you have also seen the definition of something called a Markov process. Now, in terms of the definition of a Markov process, what huge simplification do I obtain over here? The simplification I obtain is that, except for the first value, all the others just collapse onto distributions with a single parameter to parameterize it. Okay, so this is, if it's Markov. How many of you have not seen the definition of Markov process before? You have, okay, very good. It's actually not trivial, it's not trivial at all. And I'm going to explain why for a second. Is this an approximation? Is it an assumption? Is it an identity? What is this? So this is an identity, this is an identity. This is absolutely true. This is just math, right? What is this? It's an assumption, okay? Is it not an approximation? Okay, it's an assumption, okay? How do we know if we're correct in making the, if it's an assumption you have to check if you're correct. How do you know you're correct in making the assumption? See, if it's an approximation, I know that there are ways I can systematically make the approximation better. But if this is an assumption, either it's correct or not, then either it works or it doesn't work, right? So it's a little difficult to use. So is this an assumption that I can use practically? How do I actually use this assumption? What does this assumption say? It says something that has to do with this. I want you to now unpack, unpack for me. What does this assumption say? Assuming that those X's are actually these X's and that joint probability distribution is actually the joint distribution of the values that I sent to my friend at end points in time, at some end sampling times for the stochastic process. Right? It doesn't just say the next state only depends on the current state, and by implication, the whole future, right? Only depends on some current state that one has access to. Or not just that the future depends on the current state, but that once you know the current state, another way to say it is that the past is effectively erased. Right? It doesn't mean the past doesn't influence the future, obviously. Past does influence the future, but only through the present. It's like a funnel, right? So the whole point of a Markov process is that it has this funnel-like behavior. Now that word funnel is meant to be a visual analogy. So if I wanted to make a picture that captured that visual analogy from this curve or from this collection of curves, how would I do it? I want to prove or verify that this curve or this collection of curves actually satisfies the Markov assumption. How would I do it? Not easy. Think about it for a second. Okay, you want to go all out and check, yeah? I want a visual analogy. He's a Chapman, Kolmogorov, and that's correct. That's basically a very difficult integral identity that these curves have to satisfy. Now visually, how would I show it? Yes? Okay, what would you see? You want to see the correlation and see if this is independent of previous variables by feeding in. Okay, and that's a kind of regression, right? You could do a kind of regression. You could feed in a bunch of variables and see it, but you won't actually get it because it's still dependent on past variables. Right, so these are all absolutely correct mathematical ways. Now I want a very compelling visual method. Yes? I would take different curves. Yes? So this, what he's talking about, is a bit like the rejection method for probability distributions that I discussed earlier. Remember in the rejection method, you generate two random numbers, but you reject the outcome of that experiment if the second random number lies above the curve, right? In the same way, I have some method to generate these curves. I either have the real physical system or I have a simulation of it, right, whatever. I can generate a huge millions of these curves, right? And I can always say that at some time, let's say t bar, I reject the curve unless it passes very, very close to a point I'm interested in, right? And when I do that, what I'm going to get is a whole bunch of curves, same axes. The curves over here are going to be a subset of the curves over here, right? Because I've rejected a huge number of curves. I've rejected all curves that don't go through this problem. So what you'll have is something like so, which by definition, of course, is a funnel. By definition is a funnel. So what we need to find is that the distribution of these things, so now what do we need to do? Everybody understands what I've got so far? What I've got so far is I've sort of put a little hole here and I've got a big block. And I'm only letting curves that go, you know, I can imagine doing, I know somebody who does this with birds. So he does this experiment where he puts a small hole and he sends a bunch of birds and they sort of go through the hole and come out the other side, right? So this is a real experiment you can do. So I filter by some particular value of x at this particular value t. Now this by definition is a funnel, right? So it doesn't prove anything about Markov. It doesn't prove anything about Markov because obviously all my curves will start from here and go outside. So now how do I prove it's Markov? Excellent. So now what I want to do is do this simulation but do two separate simulations. One set of simulations where you start with this value of x and another set of simulations where I start with some other value of x, still going through this hole, yeah? And the whole point of Markov is that the distribution I get at this time and any time in the future conditioned on sending it through this hole at time t bar is independent of anything I do up here, right? So the Markov process is the Markov assumption is not simply that the future depends only on this time but it's actually about this very counterintuitive behavior where you have a dynamical system. You're able to change its initial condition here. You're holding this fixed and nothing after that is changing, OK? Now this turns out to be a very difficult assumption to satisfy in general if you just wrote down conditional probability distribution. It doesn't just happen spontaneously, OK? So the Markov assumption is a very difficult one and maybe if we have time, we have tutorials, right? So maybe if we have time, I'll work out one case which is the simplest example of a Markov process we'll discuss extensively, diffusion. Even for diffusion, if you don't take the right point of view, proving this turns out to be moderately challenging. So let's hold any questions about Markov. Now the idea of distributions has already been mentioned. So when I first learned about stochastic processes, the whole idea was merely to run the system until it reached, let's say, equilibrium, ran it long enough so it lost track of its initial condition or some such thing. And the whole idea was just to get the shape of the steady state distribution, assuming that there was a steady state, OK? And it is true that this is a very useful thing to know in many stochastic processes, assuming they've reached an equilibrium. And in fact, the whole of statistical physics basically ignores all these curves. It just ignores everything that's going on in the left side of this plot. And the whole of statistical physics, magically, directly gets you to the distribution, OK? And this should really worry you, OK? The idea that you can be taught statistical physics without being taught stochastic processes, how is that possible? The reason it's possible is because there is a well-defined equilibrium distribution. Given sufficient time from any initial condition you get there, this turns out to be an assumption, not a proof, but nevertheless, if the equilibrium exists, it has a formula, and the formula is the well-known formula, a Boltzmann or some other distribution. So when you learn statistical physics, you just learn how to generate these things. And the confusion here then is that the whole theory of stochastic processes gets converted to merely a theory of probability distributions. Whereas stochastic processes is in fact a theory of joint probability distributions of a whole bunch of observations of a single curve. So the theory of stochastic processes is like a huge dimensional version, right? This is statistical mechanics. This whole thing is stochastic processes. There's a big difference. If you just took a course in stat mech, for example, you would not automatically know how to generate the curves that reach the equilibrium and have the correct equilibrium distribution. OK, now, I want to make a simulator. The simulator I want to make is basically one which generates these curves, these points correctly. Assuming the Markov assumption is correct, we just discuss how to check it, it's very straightforward. I just need to know how to make leaps in time. If I make a measurement p of x2 at time t2, if I want to generate that, all I have to do is to measure the value of p of x1 at time t1. These t2s and t1s are merely labels. They're just the labels that you and your friend agreed would be the times where you wanted to sample the stochastic process. This probability distribution is the thing now. This probability distribution says if I gave you time t1, and somehow I knew the value x1 at time t1 because you already got that far, I now have some description, perhaps even an analytic formula, perhaps a numerical formula, for the distribution of x2 at label time t2. What does that formula look like? How do I find such a formula? If you can't write it analytically, how do you find it numerically? So now I'm going to get into a little bit of history, tell you one of the first cases where this kind of thing was actually solved, and it's Einstein's solution for diffusion. Let me just pause here and ask if there are any questions. So where are we so far? We've made quite a lot of progress. Your understanding of stochastic process is a squiggly curve, but a squiggly curve where it has been sampled at some set of discrete times. And you want to give your friend the output of these samples. Oh, here's another question I have. This is interesting. So suppose my sample times are rather far apart, spaced. The stochastic process is not doing nothing between observations. It's doing something. Just because you happen to open the box and look at it at some discrete times doesn't mean it's not doing anything in those intervening periods. Now, if this was a deterministic dynamical system, then while the system you're not watching it, you still have to integrate some curve in order to find out what happens at the other end. In other words, you already know all the things it did between the previous sampling time t and the next sampling time t plus 1. Is that clear? So if I'm simulating a discrete dynamical system on my computer, and I think of it as a box, and the thing is running, and I'm only interested in time t5, and I measure it at time t4, I have no choice but to also calculate its value at all intervening times up to some matter of precision. Is that clear to everyone? So it's overkill. There's a lot of redundancy involved over there. There's a magical thing that happens with stochastic processes where if I'm interested in the state of the system at time t5 and I measure it at time t4, I can directly get its value at time t5 without calculating its entire series of values at all intervening times. So this is something very interesting and unusual that makes this field different from the usual. If I threw the ball, if I threw a ball in the air, and it was following Newtonian mechanics, and I wanted to see where it was in 30 seconds, I'd have to calculate the whole parabola. Well, if it was more complicated than a ball, I have a formula for that. But for a stochastic process, somehow I don't have to calculate all the intervening values. Now, someone tells me how it's possible to know where a system is getting to without knowing all the intervening steps. Oh, if you're lucky and it has some sort of scaling law, you can sort of guess at it, but that's not quite the reason. When I say I don't know all its intervening values, what do I mean? Do we need to know? It depends on the type of fluctuation. So it doesn't depend at all. Because of this telescopic expansion of probability distribution, probabilities, right? What you do need to know is the conditional probability that you're going to be somewhere at time t5, given you are at t4 and all previous times. If it's Markov, you need to know the conditional probability distribution of being at t5, knowing its value at some t4, which could be very far back in time. It could be right now. You just need to know those two things. Something in the black box is allowing you to propagate forward in time. So what is that thing? Yeah, good? OK, so that's one assumption. Sorry, I didn't mention, but thank you for mentioning it. So this label t, which says that the rules of the game could change with some fixed lab time. You can also have that in a stochastic process with the rules of the system actually change over time. It's not important that it's homogeneous or non-homogeneous. But it's something slightly more subtle. See what I'm saying? If I had a discrete dynamical system, which was x dot is f of x or f of x and t, then I would numerically integrate the value of x. And in order to find the value at t5, I start at the value of t4, and I do a complete numerical integration. And once I finish that full numerical integration, you would also be able to tell me, for free, the position of the ball at all intervening times. For free. For a stochastic process, you don't have to do all that. Somehow, if I make the measurement at t4, or I know where it is, for any stochastic process, I do something, and I propagate in the box, and I get the value at t5. And then if you ask me to get the value at some intervening time, you'll say, wait, I have to go calculate that again, because I didn't waste my time calculating that before, because you didn't tell me you wanted that information. So for a stochastic process, unlike for a discrete dynamical system, for a discrete dynamical system, the discretization is implied. You discretize based on the numerical precision that you want, and the accuracy of your ODE solver or PDE solver, and you run it, and then you can give all the intervening states for free. For a stochastic process, your friend asks you just for these times, and you give them the answer just for those times. And you don't need to find out where the system was anywhere in the middle. So somehow, the propagation doesn't involve the use of random numbers, is what I'm saying. So remember what I said. To generate one of these quickly curves, you need all the old tools of calculus plus a random number generator. But you only need the random number generator at the points of observation. You don't need it in the intervening region. In the intervening region, you're very happy to live in your standard, old world of, typically, partial differential equation. And I'll explain how that works also. So we'll just pick one example of that. Just some notation, because I always forget to do this, but let me do it. So capital letters, for me, are the names of the variables. It'll be like the temperature in this room. Little variables, so name, what variable it is. This is the value. So whenever I write something like this, and sometimes my little x's look like big x's and so on, you have to forgive me for this, but I'm sure you can keep it straight in your mind. A thing like this is actually a sentence in language. Big x equals little x means, in your mind, you should read the temperature in this room is equal to 25 degrees. Or the person in this room is equal to some name from the list of names. So this is the which variable I'm talking about. And this is a value sampled from a list of all possible values you could have. Sometimes I'll omit the big x when it's obvious. And then it's just a function of little x. And all the functions I'm writing are just functions of the values. And sometimes I put the big x as a subscript px x, which means something like the probability that big x is equal to little x. These are all just notational things. There's a reason we keep all these things separate because sometimes these equations depend on keeping these things separate in your mind. Usually you don't have to worry about it. Fine. So let's talk about diffusion. I know you've seen this before, but I'm going to tell you something maybe you didn't know. So diffusion is a process of random motion. Brownian motion was a sort of well-worked for example of this. And it was in 1905 that Einstein figured out a decent mathematical description of what's going on, which is still the one that we use today. So before we get to that, just some basic statistics, very, very simple stuff. If you have a variable y, which is the sum of n random variables x, and let's assume for the moment that these x's are i, i, d. Everybody knows i, i, d, independent, identically distributed. And you all know what's going on over here, right? The mean value of y, oh, and just a few things. The sample mean of x, by the way, is this thing. It's called the sample mean of x. The sample mean of x is a sort of really strange thing that you do where you add a bunch of individual numbers and you divide by the total number of numbers that they were in that list. When somebody told you to do this for the first time, you should have really asked why you have a definition of something like this. Anyway, that's called the sample mean of x. There's also something you could write down like this, right? There's an expectation value of x, right? And the whole point is that this, the whole point is that this thing often converges, right? And if it does converge, it has a value which can be calculated from the frequency distribution of x which is known. And usually this thing is written as sum over possible values of x, x of a. So these are two different sums. This sum is summed over n quantities, right? This is the n samples that you did. It could be 1,000, it could be a million, whatever it is. This sum is a sum over the different possible values of x, yeah? And you should be very easy to switch between these two different representations of a sample. By the way, there's also this quantity which is by definition this quantity, right? And that's called the variance. And again, this is a very, very strange quantity. This is a very, very strange quantity. Why would you subtract the mean which has to be calculated ahead of time from each observation and then square them and add them up? It's a really strange recipe. The first, I hope you remember the first time you learned how to do this in school. I certainly do. It seemed like the most absurd thing, yeah? And I'll tell you that this is not the only way to calculate something like the dispersion value of x. There are many, many other ways of doing it, yeah? But this is the first way you learn in statistics, right? It's a very, very unusual thing. Okay, this is the variance. This is the expectation value. How would you calculate the sample variance? How do you calculate the sample standard deviation? You all remember this from school, right? I'll give you a list of n numbers. How do you calculate the sample variance? So I find the sample mean first, reveal the find the sample mean, and then I subtract the sample mean from every one of these values. Then I square all the values, I add them all up, and then I? Divide by n minus one. Okay, everybody knows n minus one? How many people have not seen n minus one? Okay, so the sample variance, the sample variance is actually this quantity. It's not divided by n, it's divided by n minus one. Okay, so why is it n minus one? Because this is a bad estimator for that thing if it was n, right? So this is all just to say that there's a lot of these tiger traps in statistics if you're not watching carefully. And the assumptions that when you first learn statistics all these things are as easy as you thought they were, they're not. Not very important. For myself, expectation values and variances are all we need. Okay. This will also be written as sigma x squared sometimes. This will be written as mu sub x throughout this class. And then there's also other moments of the distribution you can calculate, moments of x to the power of k, which is done in the usual way. This is p of x times x to the power of k, right? Any function of x you can also do the same thing. And there's, you should just commit to memory if you haven't already done so. That sigma x squared is this quantity. Yeah, totally obvious stuff. Everybody's happy with this? Okay, variance is the mean of the square minus the square of the mean. We're gonna be using all this. So now suppose x, suppose x has this distribution. Suppose x has some distribution. And let's assume for the moment that, well, it doesn't matter, it has that distribution. This distribution has some mean. It has some standard deviation and some variance. I'll be loose with my notation. I might write sigma squared when I show you on a linear graph, but you know what I mean. And we know some simple things, right? From this entire thing, we know that the expectation value of y is n times the expectation value of x. How do I prove that? I prove it by taking a sample mean of y, which would turn out to be a sum of n sample mean of x's, and then I'll take the limit as n goes to infinity, right? We also know that sigma y squared is n times sigma x squared. And this is the first unusual thing you'll learn in statistics. That means add, but standard deviations don't add. Variances add, right? It's a sort of striking result. Everybody knows how to prove this. It's very simple. You just write down the variance, and then what you find is you get a square term here, x1 squared plus x2 squared plus x3 squared and so on. Then you get a bunch of cross terms, two x1, x2, two x2, x3, or two x1 minus its average, times x2 minus its average and so on. And since those variables are independent and identically distributed, they're uncorrelated, they all cancel out, right? And therefore, all you're left with is this. Does anybody want me to actually go through the derivation? No, cool. Okay, so now I'm going to assume that there is a random, there's a particle, and the particle is going to be observed at certain instances of time. I'm only watching its position at certain instances of time. And the x's are actually, as somebody said, the increments to the position from one time point to the next, yeah? But the y is the cumulative amount the thing has moved, yeah? So I'm going to use this simple formula as our first example of generating a curve like this. So in particular, I'm going to have a curve which is uniformly sampled in time, right? The separation between the time intervals are all just tau. And I want to generate a curve that starts at some point. Right, so I start off at some point, why not? I know the distribution of x. I use a random number generator to calculate some x, which is x1, and I move y up by that point, by that value, let's say y starts at zero, I move y up, this is x1, and then maybe it moves down by x2, which could be negative, then it moves up by x3, which could be positive, and so on. So this is a very simple random walk. I know you've all seen it before, but it's the first time really that we've generated a curve rather than a simple draw from a random number generator. Okay, and this, I just want you to internalize the difference between what we just did and what we had done before. I'm not plotting the value of x at each point in time. I'm plotting the value of y at each point in time. And the value of y at each point in time is given by the sum of all the previous x's, yeah? So there's a correlation in time, and that's the difference between just a joint distribution and the nature of a stochastic process which has some notion of causality. What's the distribution of y after n steps? So after n steps, it's easy. The distribution of y after n steps should be, let's say n is large. If n is large, it's a sum of large number of identical random variables, therefore it's a Gaussian. So it's a Gaussian, and the mean of y is n times the mean of x, and the variance of y is n times the variance of x squared. Now let's assume the mean of x is zero for now. Let's say the mean of x is zero. So in the end, what you get after a huge number of steps, if I had to plot the distribution like that, you'd actually get something like, let's write down the formula. Well, there's no mean, right? But sigma y squared is actually n sigma x squared, and n is what? n is the number of steps we did. So n is obviously t divided by tau. Right, tau is the little discretization of steps. t is the total number of time we had, so that's the number of steps that we had, times sigma x squared, sigma x squared. So if we plug that in, and I call this, for some reason I'm going to call this 2d, and you know why, because that's the diffusion coefficient. Then what I find is y of t is one over root four pi d t e to the minus y over four d. Okay, so very simple, any questions about this? So what did I use here? I used the idea of summing variances. I also used the central limit theorem, which is a pretty heavy-duty theorem. I haven't proved it here. If you've not seen the proof, it's not so obvious, but I used the central limit theorem to show that after n steps, I get a Gaussian, right? And this is the position of the random walker after n steps, yeah? And the most important thing I get over there is that the variance of y after n steps, which is, in this case, like so. We need not sigma x as we are sending tau to... Yes, absolutely, sigma x is defined for that tau. Sigma x, very good. Sigma x is defined for that tau. There's some natural tau in your system and sigma x is defined for that. Now, this is straightforward, right? Now, this is not what Einstein actually did in his paper. Yeah, yeah, go on, go on. So that's the rest of the course of what happens when tau goes to zero and sigma x behaves in a funny way. Okay, so thanks for the question. I'll get to that. You should have lots of questions now, right? So already, we've done some discretization. I've assumed that between discretizations, there's some kind of jump probability, right? I haven't assumed anything about the jump probability other than the fact that its mean is zero and that its variance is finite. If its variance is finite and its mean is zero, then the square of the variance divided by tau times half is something called d. Now, why did we call that d? So let's walk through this in a different way. So instead of calling this p of x, I'm going to use the notation that was there in Einstein's book. Okay, you're going to kill me, but let me just change the names of the variables on you. Okay? I'm going to call this delta and we can call the other thing y, it doesn't matter. This is why chalk is much better than paper. I'm just going to call it delta because x is usually just the position delta means increment, yeah? Okay, so we're going to approach this in a completely different way. This is obvious, right? You can actually do this without using the central limit theorem. How would you do it otherwise? What you would do is just write down the answer as the binomial distribution with some n after some n and some number of kicks, let's say if the kicks were just plus and minus one, that's one way, right? But this actually works for any distribution of delta. Okay, now let's do this differently. What I did here was jump straight from the definition of the stochastic process to the distribution at a certain point in time. Okay, I don't want to do that. Instead, I want to have a way to find that black box propagator, right? I want to be able to get from any point to any other point in time using a certain type of partial differential equation, yeah? And we're going to learn a lot more about how to do this but the way Einstein did it is the following way, right? He said the following. Suppose, and his reasoning was very interesting. He wasn't dealing with a single curve, okay? He was interested in the problem of diffusion and in the real physical problem of diffusion, there are 10 to the 23 little molecules floating around in the system. It sort of comes pre-packaged with a large number, okay? So he says then let's just look at the distribution of particles but a large number n of particles over these many states, right? It's not the probability distribution of a single particle. These are two subtly different things, right? So the probability density of n particles over many, many states, yeah? And he says if there is a number of particles per unit volume at some position, right? Between x and x plus dx, right? The probability density of the particles at time t plus tau will be given by some sort of integral, right? Overall possible values of delta of where you start from but essentially x minus delta at t, right? Times phi of delta. And let me read out what this equation means, yeah? This means that the chance that the particle is at x at time t plus tau is all the ways the particle could have been at x minus delta at t times the chance that the jump size was delta. Overall possible values of the jump delta, right? So this is a sort of propagator equation. Are there any questions about this? You can think of this as the kernel, right? This, so this allows you to jump forward in time assuming as somebody said you already know the shape of this distribution. Now what we're gonna do is expand and I'm going to give a disclaimer here. Everything I'm doing here is numerically legitimate. It does converge but I'm not going to give you any proofs for all these things, okay? I'm just gonna do the physics thing and just expand. So the one mathematician in the crowd should live with these approximations. How do I expand this? f of x, the value, the number of particles at position x or near position x at time t plus tau. I want to do a Taylor expansion of this. How do I do it? Usual way, right? It's f of x at t plus df dt times tau at t. Is that fine? How do I expand this thing? f of x minus delta, let me erase some of this. Let me erase all of it. How do I expand f of x minus delta? In the usual way, so f of x minus delta at t is f of x at t minus delta df dx at x plus 1 half delta squared d squared f dx squared at x plus hydrogen, and that's this thing, okay? So if I plug that in here, and this is the kind of stuff that you should always be worried about, but if I take this derivative and I plug it under this integral for a completely uncontrolled approximation, it actually works out, right? So then what I get is, I get the integral of f of x, which is this term, times phi of delta d delta, that's this guy, minus delta df dx phi of delta d delta integral, and I get plus 1 half integral of delta squared phi of delta d delta d squared f by dx squared plus, yeah? Now, this is just one of the formal manipulations, and I don't want you guys to get distracted by formal manipulations, but I just want you to stare at this and see what's going on here. These integrals are over the jump size, okay? It says where I am now is where I was plus exactly the right size of jump I need to get to where I am now. The integrals are not over x, they're not over t, nothing like that, right? So this is the thing I'm integrating over. The f of x and t is a constant. f of x and t doesn't look like a constant because it looks like a function of x and t, but it's a constant with respect to the variable we're integrating over, right? Similarly, df dx at x is a constant, d squared f dx squared at x is a constant. So that's the first thing. Secondly, this particular integral is trivial. What is it? What does it evaluate to? The whole, the f comes out and the integral is just a normalization condition for my probability distribution, right? So this is just f, yeah? This integral is not so obviously trivial, but if you stare at it a little bit, it becomes. It's zero. Why is zero? Because we're assuming that the mean value, the mean value of x is zero, right? So this is zero by assumption. This integral is what we previously called the variance of x, which is in fact the variance of delta, right? With a half in front of it and a d squared f dx squared, right? The whole thing together drops out and I know you can't see down here, right? So let me, where should I put this? Let me put this. Well, I'll put it up here. It's sufficiently important I'll put it up here. Finally, what equation do I get? This f cancels with that f, which is a trick you'll see happening quite often and you get the following equation. Yeah, which is exactly, it's exactly what we had earlier. I erased that Gaussian, yeah? But it's exactly what we had earlier, but a slightly different way of thinking about things. But let me recap what I did. I had a random walk. My random walk has jumps of some size, which are independent, identically distributed with a mean of zero. I went directly and I used central limit theorem to calculate the solution and I did that. Instead, I can use this more subtle approach where I literally calculate the propagator, what happens from one time point to the next. What happens from one time point to the next is actually described by partial differential equation. And which equation is that? It's a diffusion equation. And that's why Einstein called that thing 2D, right? And you already know that the Gaussian is a solution to this equation, right? Everybody, and if you don't know that, you can just plug it in and check. The Gaussian that I had previously, one over root two pi sigma squared, e to the minus y squared over two sigma squared is the solution to this equation with the correct value of D, jumps. All we need for this thing to work is that the mean of the distribution is zero and the variance is finite. The other moments, no, we haven't assumed that Gaussian. We assume this has some funny curve, yeah? So you're asking about the nature of the control approximation, right? So it turns out that the other moments don't matter. They get killed when you do a certain kind of discretization and I'll get to that later in the course. If it was Gaussian, but I haven't assumed the Gaussianity of this. In fact, it's very unlikely to be Gaussian, okay? Fine. Wow, why is the second term zero? Because the second term is the mean of delta, which by assumption is zero. I've just done that to make sure that you guys now are up to speed on diffusion equation and on probabilities and so on. Now I'm going to show you something magical. So here's an experiment you can do. You can actually go out, walk down the sea front over there, and you can pull out one of the boats at the dockyard, right? And these boats are these little ropes and you can stand at the sea front. We can do this experiment tomorrow, right? You tug and the boat's there in the water, assuming the water's very still and so on and it just obeys a certain viscous law and you apply a constant force on the boat, right? This is also equivalent to having an electric motor that's applying a constant power and having a fan start to spin up, right? In either case, what happens to the velocity of the boat or the angular velocity of the fan as a function of time? Is the experiment clear? The boat's in the water. I'm holding a rope, I'm walking along. At constant velocity, at constant force, I'm pulling a constant force. What happens to the velocity of the boat or the velocity of the fan? Starts at zero, then I'm applying a constant force. Therefore, acceleration is constant, yeah? But not to the speed of light, but what happens? No, it doesn't fluctuate. Isn't that what happens? If you actually measure the angular velocity of a fan or the velocity of the boat as a function of time, that's what happens, right? You get an exponential adjustment and then a steady state velocity. So now I'm going to ask you like a school physics question. In school physics, they kept focusing on converting one type of energy to another, right? They kept going on about this. So when I start this process and I have my battery that's running the electric fan, right, then what kind of energy has been converted to what kind of energy? The potential energy of the battery is being converted to the kinetic energy of the fan. And this is what we learned in school, right? What we didn't learn in school is what happens to this machine later. At very long times, what kind of machine is this? It's converting the potential energy of the battery into what? I mean, okay, so when you say, this is the problem I have. So when you say heat, you're right, of course. Eventually the fan is running at a constant velocity and the kinetic energy of the battery is going into the fan and eventually the fan whose average velocity is not changing is just making heat in the room. I agree with that, right? But the problem I have with this is that the concept of heat is one of the most loaded and subtle, huge implication concepts in all of physics, right? So when you say heat as the answer to the question, you better really know what the implication of that is, okay? So I don't want heat as an answer. I want an answer you could have given me in school the day you learned F is equal to MA. I'm sure that heat was not one of those things, right? And again, think about the boat, right? I'm pulling the boat, I'm pulling the boat and eventually I'm applying constant force and I'm moving at constant velocity, right? And the boat is moving at constant velocity, right? So what, how come the boat is not accelerating? If I'm applying a constant force, who is accelerating here? I need to move the water, okay? So it could be that the entire system of water eventually starts accelerating, but that's not what happens. It's not that the sea gets faster and faster, yeah? So, right, so again, good, good. So the answer is friction, right? So now you're saying there's friction. He said heat, you said friction, okay? These are all very different explanations for the same thing, but fine. Is friction a fundamental property of physics? I mean, which law of physics, which fundamental force is applying the friction? Okay, there are molecules, good. Okay, so what if I were to tell you that if I apply a force F and the boat has a mass of M, right? It's accelerating at the rate F over M all the time. What would you say? You would say I was crazy, but think about it for a second. I'm applying a force F, right? And I know Newton's laws are absolute, and I know there's no magic in the world. So if I'm pulling the boat with a force F, the thing must be accelerating with the value F over M all the time. How come I don't see that? It must be, how come I don't see it? Okay, I'm not even talking about a model. I'm talking about in real life, if you could go out and make the measurement. Or with the fan where you could actually make the measurement, you could do an experiment. And the reason is, okay, you're right. You're all right. I know you're all right, and you all know this, right? You know that the reason the boat is not accelerating is because it's hitting a bunch of molecules. It's hitting a bunch of molecules whose combined unproductive energy is called heat. And the reason that the boat is not accelerating by the heat is interpreted by this English word friction. I mean, I know you know all this, but if I actually watched the acceleration of the boat or the acceleration of the fan very, very closely, what would I see? So I'm going to zoom this and expand it. I'm going to really expand it. I'm going to really expand it. And I'm going to really look and find intervals of time. And if I look and sufficiently find finite, non-zero intervals of time, I will see the boat. Okay, let's call this V. I will see the boat accelerating at a constant rate as a function of time. After all, you will find intervals of time when the boat is not colliding with any molecules. Because molecules are not everywhere. That's the whole point, right? Molecules are not a continuum. So if you go to small, very small, but finite amounts of time, then the boat must be accelerating because there's no other forces here, right? There's no other forces. It's just the force you're applying. Is everybody happy with this? If this is what a little stretch of that looks like, how do we then get a straight line, a flat line? The reason is because the next time the boat hits a molecule, right? Then very, very suddenly, its velocity will drop, right? And then it'll accelerate at exactly the same rate. Then maybe its velocity will increase. Then it'll accelerate at the same rate, and so on, okay? Now this is really what's going on when you apply force to a physical system that happens to be interacting with a bunch of molecules, okay? Since there are no other forces of nature other than the ones we know, and at these scales, the only ones we have to worry about basically gravity and electromagnetism, right? Both of which are helping you apply forces. But if you zoomed in, you would actually see something like this going on. And in the intervening time that the boat is not hitting any molecule, there's no way it can do anything but obey Newton's second law. It's just not possible for it to do anything else. Now this is fascinating. What it implies, among other things, in a sense, is if there was no discreteness in the system, then your experience about pulling the boat and having it respond to your movements would be very different. Another way to put it, how is the force from your hand being transmitted to the boat? It turns out it's being transmitted in little jerks. You didn't know it, yeah? That's how it's actually transmitting through the rope, not because of a constant tension in the rope. If you actually zoomed in here then, you would see a little amount of fluttering, right? You would have to. Now, Einstein and Langevin and everybody else, all these other guys, they immediately jumped to the next absolutely profound insight, right? We know that when the boat is being pulled at a constant force, because it must have been Newton's second laws, but because molecules exist, there must be wiggliness, squiggliness in the velocity. We know that, right? It's absolute. So then they said, well, what happens if you're not applying a force at all? If you're not applying a force at all, the molecules must still be doing something. These are the steady state equilibrium fluctuations of any physical system, okay? They are what happens at any finite temperature, the randomness of any physical system. These fluctuations are sort of a very different thing. They, you never see them. You never, but you do at these levels of measurement, but you experience these as friction. You experience these as friction. The very existence of friction implies there must be fluctuations here, because friction is nothing but the collisions of random molecules on your system that in very short intervals of time take away momentum. If it's happening up there, it must be happening down here, okay? That was the major insight. And are there any questions about this? I'm gonna go forward. The completely mathematical and hard to follow version of this that you might have learned in statistical physics, this is called the fluctuation dissipation theorem. The fluctuation dissipation theorem says there is some squiggliness in equilibrium, and that squiggliness in equilibrium must be related to the thing we experience as friction. They must have something to do with each other, okay? Any questions about this before I now show you how to make that relationship concrete? So, good. And these days, we have enough precision that you can actually do these measurements and you can see little acceleration of the system in the mean free path before it collides with another molecule. This is something that's absolutely true. And these are perfect straight lines. No, so here we're assuming that there is a fluid, right? But there's no convection. We're assuming that the fluid is composed of a bunch of molecules that, to a first approximation, is an ideal gas, bunch of molecules just moving in random directions and having momentum conserving collisions with the, with the boat, yeah? Some question here, yeah? Ah, this is the boat. Now what happened here? The boat is moving and suddenly a molecule hits it in the opposite direction and the boat loses momentum over a very short time, over a very short time, right? In this case, the boat was hit in the same direction as it was moving, so it suddenly gained momentum over a very short time, okay? Is that fine? So until you get to zero, these approximations are fantastic, yeah? In the large molecule number limit. Okay, so what I'm trying to tell you guys is that what you had learned as heat and friction and viscosity and all that good stuff, right? Fluctuations, equilibrium, everything. It's great, everything you've learned is absolutely true, it's correct, all the derivations you've learned are absolutely true and correct, but the implications for how the world works are even more profound than you might have been taught, right? It has to do with a completely different way of thinking about the world, yeah? Not simply heat diffusion and so on. So now, now that we know that the force that we at macroscopic scales experience at friction must, because we believe in Newton's laws, be the result of a tiny amount of jiggling, right? That's at high force. Then we can predict, predict that at zero force, the same amount of jiggling must be there, right? Why the same amount? We're saying, well, this is sort of linear approximation. We're assuming that this velocity is not so high that I have to account for new physics, right? This is sort of linear approximation. So however much jiggling there should be here, the same amount should be down here. Yes? Oh, well, this, okay, so let's do it. Let's write down the equation now. So we're gonna write down the equation and see if what you're saying is true, okay? I'm going to write down an equation for the velocity of the boat, right? When I say boat, obviously, I mean Brownian particle, right? Okay, so dv dt times m is equal to some force, right? And we know that the entire business of viscosity or friction, phenomenologically, can be accounted for by a single parameter that characterizes the fluid. We know this, right? And that parameter is some viscosity parameter, gamma, times v. We know all this, right? This equation is sufficient for me to calculate this curve, right? The steady state velocity over here is f external, right? Over gamma, over gamma, okay? We know all this. This is the entire theory of 19th century physics put into a single equation, right? We know all this, cool stuff. Those guys in the 19th century were really, really good at partial differential equations. So they would be upset at me just writing gamma here. What would they want? They want some eta, six, pi, something. And the way they calculate that is by solving some crazy boundary value problem around a sphere, right? You know all this stuff, right? But anyway, there's some constant here. And if you don't like to calculate the constant, you can measure it. You can measure that at higher external force, the velocity rate is proportionately higher. And it's perfectly linear, and the slope of that straight line is gamma. But we now know that there must be a fluctuating force, eta. There must be a fluctuating force, eta, which is, you know, in a sense is kind of small. You can't really see it. And it's not that it's small, but that it reverses so fast that it averages itself out, yeah? And we're going to write down some properties of this eta. But among other things, the mean value of eta, the expectation value of eta at some fixed time is zero. This strange, unusual kind of equation to write down. What that means is if I did literally the same experiment with the same boat and I looked at the same time, the mean value of that forcing term would be zero. Okay? But I don't know much else about it. And I want to calculate some consequences of this. So how would I do it? So then let me expand this, right? So then you get m d2x dt2 is equal to, okay? Now I'm going to move to the limit where I remove the external force, okay? Remember, this equation is perfectly fine. That equation is perfectly fine. However, that equation is not consistent with my view of the world, which is based on Newton's laws and gravity and electromagnetism, right? It can't work. So to make it work, there must be some fluctuating time. So that's why I add this eta. Once I add the eta, I say, well, now let me move to the case where there's no external force. Right, so that's this case. Right, so now if I do that, if f external is zero, then I get minus gamma dx dt plus eta of t, okay? This is my zero force stochastic equation. So how do you solve this thing? It would be easy to solve it if I had a formula for eta, right? But the whole point of what I'm going to teach you the next few days is that eta is one of those mathematical objects that has no formula. Eta is a thing which cannot be written as a formula. It can't even be written as a function. I can only give you a recipe for it and the only way to actually use that recipe is to use a bunch of random number generators, okay? So eta is a mathematical object unlike anything you've ever encountered. In fact, it's much stranger than most other kinds of math you will ever learn. And hopefully tomorrow or the day after that, I'll show you some really strange things about eta. Among other strange things that requires you to confront these kinds of things. Square root of dt as opposed to dt. Calculus is dt. Stochastic processes are square root of dt. That's how mind-blowing this stuff is. And for small things, square root of something is bigger than the thing itself. So be careful with that. Okay, so now, yeah, I'll get to it later. I will cover it later. You don't have to follow it now. I'll spend the whole lecture on it. Yes, yes, this whole thing is Brownian motion differentiation. Absolutely. Okay, so the whole point of this exercise now is the previous time we had looked at diffusion, which is the Einstein version. We had derived the partial differential equation. Right, and a partial differential equation is one of those well-known things. And at least if you lived in the 19th century, you'd be solving this stuff in your sleep. Yeah, and so, you know, when you get partial f, partial t is equal to d times del squared f del x squared. You'll easily be able to solve it, and more complicated ones. This doesn't look like a partial differential equation. Okay, it doesn't look like a partial differential equation. Now what I'm going to show you is that this is actually simply a partial differential equation in disguise, right? An equation like this, if eta was a traditional function, would mean something. It would mean something because you could just integrate it and find the answer. But an equation like this, where eta is this fluctuating force, right, has no meaning, or at least its meaning is open for interpretation. And it's up to us to interpret what we want to do with that symbol, right? It's not at all obvious, just because you can write down an equation like this, it's not at all obvious what it means. For me, an equation is obvious what it means if I can put it into a computer and figure out the outcome. So far, it's not at all obvious how to put this into a computer and figure out the outcome. It's pretty dangerous. Now, I guess I have 10 minutes left. Yeah, more or less? Okay, so I'm just going to finish this and give you a teaser for what we're going to do next time, which is, I guess, on Wednesday. There is a way to solve this, and let me now not make a mess of it. Okay, so it turns out if you multiply this by x, multiply it by x, then you get the following equation. Okay, this is all in the, you don't have to take this down, you'll find this stuff online. It doesn't really matter. You get the following equation. So multiply by x, and then you do this crazy thing where you take the expectation value. Okay, when I take the expectation value, remember what I'm doing. I'm doing exactly what I did in this figure. I am running the same process many, many, many times. And after running it many times, I take sample averages, and if I run it sufficiently many times, the sample average will converge to the expectation value, right? So I get the following equation. Plus t, and where did the eta go? Okay, whoo, there I get everything right. You don't have to, I mean, you can just take my word for it. Just a couple of extra minutes of calculating. Multiply this by x, take the expectation value, use the chain rule and the product rule in calculus, run the whole thing through, and you get this answer. You have to change dx dt to v, and so on back and forth, right? It doesn't really matter. Okay, now, this is the implication of this equation under the assumption, right, that this is equal to zero, okay? So that little assumption is something that we have to do with, this is where all the implications of that funny term are hidden. Yeah, because as I said, we don't know anything about this. If we know literally nothing about this, we can't calculate anything. So at least let's use something like this. The data is as high or as low independent of the value of x. So let me wrap up now. This is the equation one would get if one took seriously the idea of pulling the boat through the water, took seriously the idea of friction, Newton's second law, wrote down the big equation that must have a fluctuating term, examined the implications of that equation when there's no external force applied, and if that happens, you get this, okay? Now when you get this, we already know a few things about this answer. We know also that the right side must be equal to kT, right? So for a Brownian particle or for a boat, it doesn't matter, right? The velocity squared average in the absence of an external force must obey the equipartition theorem. That's a different part of the physics from the 19th century, okay? This is a different part of the physics, but the two of them must get the same answer, yeah? This term is actually the rate of change of the rate of change of this thing. And it turns out this entire quantity is rather small, and not just small, but the contribution of that quantity to the equation goes to zero. So in the end, you're left with a rather simple equation which says that d dt of this funny thing, x squared, right? Is equal to two kT over gamma. Can everybody see what's in the bottom here? Yeah? You can get this by actually solving this differential equation. It's a linear differential equation of the second order. It'll have a homogenous term, inhomogeneous term. One of those things is exponentially decaying, goes to zero, and eventually you get this answer, okay? All extremely standard stuff. The mode of calculating is not important. What is important is the following thing. This equation now predicts that the expectation value of this thing, well not v, but x squared, increases in a constant way with time, okay? So if I were to rewrite this, it says that x squared of T must be equal to x squared of zero, I mean expectation value, right? So the variance at T is equal to the variance at zero plus two kT over gamma times T, yeah? So this says that the variance of x squared is equal to two DT, where D is now this term. So we've just derived D in three different ways. What are the three ways? Recount. The first way we derived D was I said that I had a particle. I knew it was making jumps of size delta. I knew that the deltas were independent and identically distributed. I add them all up. I used the central limit theorem to find that the answer is a Gaussian and the variance of Gaussian increases linearly in time. The second way I did it was that, well convolution, thank you. I expanded the distribution of many particles in a Taylor expansion way and then I integrated the Taylor expanded form up to the second level and I found that the distribution of particles must obey a partial differential equation which is the well-known diffusion equation and the coefficient of the D squared x, Dx squared term is D. The third thing we did was we wrote down this funny equation and this funny equation is sort of just an extension of your standard Newton's second law plus a really funny term, a really strange term. And using some tricks honestly, we solved that equation. We didn't solve it fully. We can never solve it fully because every time we run it, this is different. All we can do is to solve for some of its moments, for example. So we did that and by the end of it, we found that the variance of x must be increasing linearly because why? Because the velocity is never zero. It's either positive or negative. It's never zero and that is also consistent with the diffusion equation. So they're all three completely consistent, independent ways of deriving the same thing. And if you notice, by the way, they're all based on completely distinct physical measurements. One thing is based on observing, let's say a solution of potassium permanganate and watching how fast the cloud moves with time. The other thing is looking at measuring the viscosity of a ball as it falls through the water and so on. The fact that they all give the same answer that is measurement of D in all these independent ways, one of which requires knowledge of temperature independently. The fact that you get the same diffusion coefficient whether you measure it using viscosity or position or velocity fluctuations literally was the first true evidence that molecules existed. People sort of knew that, I mean the whole kinetic theory of gases is based on discrete molecules, right? But truly until 1905, there were many people who simply thought that was a convenient description of the problem, but not the truth. So what we have now is a very compelling story which says friction exists because molecules are bumping into you. On average, if you tend to move in a certain direction, you're going to get more bumps against you in that direction, which is why this is a minus sign. But how strong is this quantity? You know how strong the quantity is going to be because it's exactly given by the diffusion coefficient divided by temperature. So the friction term is totally fake. It doesn't come from any fundamental laws of nature. This friction term comes from microscopic diffusion divided by temperature. I'll stop here. Tomorrow I'm not teaching, but on Wednesday, what we're going to do is take our first look at a real biological process, okay? Which is we're going to take the implications of what we did today and figure out the limits of how a cell can make a measurement, okay? Who are the TAs? Are the TAs in the room? Who are the TAs for? Ah, could I meet the TAs now? Yeah. Hello. Hello, what's your name? Claudio, hello. Luigi, nice to meet you. Luigi, we've been in touch. Hi. I have two questions. You have two questions. Okay.