 It's a fairly simple exam compared to the material that I covered during the lectures on the last day. There's also a homework which is already posted and you should be able to do each problem as each lecture progresses, but the idea is that we try and solve the homework collectively during the tutorials. So we'll do one or two problems up to the subject that's been covered in the lectures in the tutorials. As soon as we all have a collective attempt at solving these problems, I'll release the solutions. One important thing is that many of the homeworks actually require you to write a little program or a little numerical solver, a numerical simulation. So there's only so much you can get by participating collectively, most of it you'll get by doing it yourself. So we won't grade it, but in your spare time do try and attempt the numerical parts of the homework. So last time I sort of just jumped in, this is what happens in the first lecture of any course, you have to just jump in and try many, many pieces just to show you the landscape that you're going to confront. But now I'll focus down and just to explain my philosophy here, so because we're looking at biology, experiments are very important and one should always focus on what can be measured and how can it be compared to the kinds of calculations we do. Just keep that in mind. The second thing is in order to understand what's happening in a biological experiment, it doesn't require often any amount of sophisticated mathematics, right? So the sophistication of the math is not the idea, it's to try and gain some understanding of the system. So in fact, the simpler the collection of equations that you have that can capture what's happening in the data, the more clearly you'll understand what's going on. And thirdly, based on that same philosophy, I think it's very important that everybody looks at how to implement things numerically, which is how I'm going to teach for the next, whatever, over these two weeks. So if you've seen these things done analytically, you've seen them done using very nice compressed sort of notation. Those notations are so powerful that they actually hide a lot of really important stuff sort of under the hood. So if you can understand it numerically and you can simulate it, to me, that's a real sort of step in understanding. So just keep all these things in mind while we proceed. Okay, so last time we had no biology, if you notice. So this time I'm going to jump into the biology, at least start jumping into it. And we are already equipped to answer quite an interesting question, which is the question of how a cell can make an inference about something in its surroundings. Okay, how many of you have read the famous paper by Berg and Purcell from 1976? If you haven't, it's a wonderful thing to read, Berg and Purcell, right? And they basically ask the question, you have a bacterial cell and there's some sort of stuff outside and the cell is trying to read the information about what's outside and then react accordingly. This is something that cells have to do all the time. And they were asking for the physical limits of how accurately a cell could infer something about the surroundings. Now we already have enough math to answer this question, okay? I'm going to unroll the chain of logic in this paper. The paper is actually quite complicated, but the key ideas are quite simple. So remember last time we started off with this very simple thing, that if you have a variable y, which is the sum of a bunch of n iid variables x, independent identity distributed. This is one of the most important equations in statistics, in the sense that finding out how sums of random variables behave gets you quite far, okay? So for the first half of today's lecture, the first 45 minutes or so, I'm only going to make use of the properties of this kind of summation, yeah? I'm going to make use of it in three completely different ways and you'll see how that works out. Often, one also chooses to define another variable, which I call z or z, which is y divided by n, okay? And so this stands in for the sample mean of n observations of x, yeah? And we already worked out last time that the mean of y, or the expectation value of y is n times the expectation value of x. And similarly, if you just look at this then, the expectation value of z is simply the expectation value of x. But the variance of y is n times the variance of x. And this is because uncorrelated variables, they add up plus and minus. And therefore, the variance of z is very important. Variance of z will be given by 1 over n squared times the variance of y, because variance is living in the square, it's a square of something. So it'll be the variance of x divided by n, yeah? And when a square root sign is applied to this, on the right side, this thing has its own name. In other words, sigma of x divided by square root of n. And that thing is called the standard error of the mean. It's a sort of terrible term that they use in statistics. Keep these things in mind. The other thing we did was we looked at diffusion. And in fact, we looked at a case of diffusion which was discreet in time. Every time step tau, you had a particle which had an increment in position. And there was some probability distribution that said how big that increment was in the plus or minus direction. On average, the expectation value of the increment is zero. And then we worked out that this thing then gives a Gaussian distribution over time. But that's not so important. What's important for now, what we're looking at is what is the variance of the position of a particle that is given by the summation of n random stochastic increments, this is the random walk, right? If you imagine, so this is the first application of this equation. The second application of this equation is if you imagine that your random walk probability distribution is in fact probability half of minus delta and probability half plus delta, in other words, you're taking a step of size delta in the plus direction or a step of size delta in the minus direction depending on whether the coin came up heads or tails. This is actually just a simplified version of what we had last time. Last time we had a distribution which had some other shape, right? Now what's actually important is merely the variance of this distribution. If these things were all the little x's, right? Then the total displacement of the particle after n steps and each step taking a times tau, so each step takes tau, each step takes tau. Then after n steps, the total time would be n tau, and the total displacement would be zero on average. But the variance, which is the important thing, would be given by, so the variance of y is again n times the variance of x, which in this case n is some sort of total time taken over little time step tau. And the variance of x is in fact delta squared. The variance of this thing is delta squared, okay? And typically this is written as a diffusion term. Where this thing is taken as 2D, okay? Fine, so that's the second use of this equation. The third use of this equation is actually something rather interesting and important, and we're going to keep on using it again and again and again. Imagine a process which has some sort of density. Well, for the moment imagine this is actually time. And imagine that you have an interval of time of distance of total length capital T. And now imagine that this is broken up into n little time increments, n blocks. And now what you're doing is actually running a Bernoulli process, which we discussed last time anyway. Bernoulli process is just the toss of a coin. With a probability p, you get heads. With a probability 1 minus p, you get tails. Or with a probability p, you get plus delta. Whatever, 1 minus p minus delta. In this case, I'm saying with a probability p, you have a 1 in each of these intervals. And with the probability 1 minus p, you have a 0 in each of these intervals, okay? So then there'll be some number of these intervals which have 1s. And all the other ones will have 0s. And a relevant question that you might want to ask is, how many total 1s are you going to get here, okay? So we already know then, so if you have a binomial distribution and you use the two events, probability p, you get 1. Probably 1 minus p, you get 0. Then, and this is x, right? And you're adding up a bunch of x's. How many x's are you adding up? You're adding up nx's, right? So the mean, the expectation value of x, right, is simply p. It's p times 1 plus 1 minus p times 0. The expectation value of x squared is also p because it's p times 1 squared plus 1 minus p times 0 squared. And therefore, the variance of x is the mean of the square minus the square of the mean, right? These are just very, very simple calculations. So it's p squared minus p, which is p 1 minus p, okay? Simple stuff. So this is a standard Bernoulli process, yeah? The variance of the total number of events after n of these will be n times the sum of this number of things, okay? Now, what we're going to do now, and in fact, for many other examples later in the course, and you've all seen this before, we're going to take the limit where you actually have a very large n. So you split up into very, very small intervals, right? But you still want a finite number of events, right? So therefore, what you want to do is to take a limit where, so in this case, y is the sum of x's, same as this, right? The expectation value of y, mu y, is n times p, which is n times the expectation value of x. And the variance of y is np 1 minus p. What we're going to do is to take the limit where n is very large. And this thing actually reaches some finite value. Okay? So in fact, p becomes very small, right? And in this limit, n very large, p very small, but np is constant. It's some constant value. And in this limit, this also goes to np, and they're both finite, right? So this is a situation where the total number of ones that you get on this interval has some distribution. And we know what that distribution is. It's called the Poisson distribution. That's not important. The point is that the mean of that distribution is some number, and the variance of that distribution is the same number. And this is unusual because the mean and the variance, if this had been a dimensionful thing, should have had different units, right? The important thing to keep in mind is that this is a dimensionless thing. We're asking how many of something. That's why we don't have to worry about the sort of division by n squared and things like that. So here's a case where we're using the same equation to work out that if we have a system, this is called the law of rare events, right? If we have a system where there's many, many, many tiny increments, and in each one, a given thing is highly unlikely to happen. But given a finite interval, it's going to happen some number of times. The total number of times it happens has some mean, and the variance of the number of times it happens if you repeat the experiment is this, okay? And that's the third use of this equation. I'm just going to write down. This is called the Poisson process. We'll get to it later, right? But just for the purposes of today's lecture, all you need to know is this equation I promised. So for Poisson process, the variance, sigma y squared is equal to the mean, sigma mu y. Any questions about this? It's just a limit of a binomial distribution. So that's the three separate applications of this equation. The first application is the definition of this thing, which is called the standard error of the mean. The second application is to understand that during diffusion, the standard deviation increases as the square root of time, right? So this is diffusion. The third application of the same equation is the fact that for a Poisson distribution, the variance is equal to the mean. Three completely different interpretations of exactly the same equation, which is why I say it's one of the most important things that you should completely internalize. Any questions? Okay, so here's the biology question. So I'm a cell, and there's some stuff in the outside world. You can imagine the cell sitting in a tube, and the tube is full of a liquid, and there's some chemical dissolved in the liquid. And the cell is trying to estimate the concentration of that chemical. And the way it estimates the concentration of a chemical, the way all cells actually interact with their environment. There are so-called receptors, which are little proteins, which have a very fixed shape and size on the surface of the cell. And these proteins have binding sites. There's a thermodynamic affinity to bind a particular chemical in the environment. And having bound to that chemical, something happens to the shape of a receptor, which then transmits information to the inside of a cell. This is how information gets from the outside of the membrane to the inside. And the question we're asking is, because the cell is measuring something through a molecule, a molecular receptor, how accurately or inaccurately is it able to sense what's happening outside? And if I just asked you, if you were only to look through a very small patch of the window here, and wanted to estimate something about the nature of the environment outside, you would be quite restricted in your abilities. So a cell in that sense is quite restricted in its abilities, because the membrane of a cell is largely impermeable to any spontaneous motion of things from outside to inside. And that's by design. A cell can't afford to just equilibrate with its environment. So that's the question. So before I address that question, I'm going to ask a much simpler question. How many of you have ever been in a chemistry lab? Okay, so you have, right? It's been a long time. But have you ever measured the pH of something using one of the modern digital pH meters? Anybody done that? Yeah, some of you have done that, right? So it's quite easy. You make a buffer. You have this little rod that you stick in there, which is an electrode. You have a digital readout. And the digital readout will measure the pH of the buffer that you've made. Now, what is the output of that machine look like? It looks like so. By the way, you'll find every lecture I'm going to draw one of these quickly lines, right? But it's going to look like this, right? This is whatever the readout is. It's called that X, right? So the digital readout on the machine, so in modern day machines, it'll have some number of decimal places and there'll be some readout of the pH. So in practice, in fact, this will have some amount of discretization. Up to the precision of the display. But nevertheless, for the moment, just think of it as a big squiggly curve. Now, the question is, what is the pH in that tube? And if somebody asked that question, how would you use this squiggly line to give an answer? Average, average over? Average over time, okay. So then, in fact, suppose you're looking at it and your eye has a certain refresh rate or you're measuring with the camera and you're refreshing at 60 hertz or whatever. So even time will be discrete. So this is not really a squiggly line. It's a bunch of points. It's a bunch of points at discrete intervals of time and at discrete y values, fine. So you want to average over time. And that's a very reasonable thing to do, right? Well, the question is why? Why is that a reasonable thing to do? So what is one trying to capture about this curve? You're trying to capture its central tendency, right? You're trying to somehow say that it's, you know. You're claiming it's sort of fluctuating around some sort of average, right? That's the idea. Okay, do you want to average over time? That's fine. Any guess about what time interval you should use? Nanoseconds or seconds or hours? How much time should you average over? Depends, sure, of course, sure. Yeah, so it totally depends on the nature and the specific characteristics of the system, right? So how would you use, you know, I mean, think about this. Suppose I literally gave you this piece of equipment and you got to play with it at home and then come back and do the measurement. What characterization would you do of that piece of equipment and that buffer, for example, to pick how to average over time? Now, you don't want to average at nanosecond intervals. It's totally useless. So what should you do? It's a fairly straightforward exercise. Suppose I zoomed into this curve literally at the nanosecond or the level, right? What do you think that curve would look like? I mean, assuming I was able to sample from that machine at such scales, and it was really measuring what it thought was the pH in the beaker, right? So here's the beaker, here's the rod, here's the digital display, and it's reading some sort of pH. So if I was really looking at that display at very, very short time intervals, it's not going to change, okay? Because everything in the system is slow. It's a physical system. At sufficiently short time intervals, nothing is going to change, right? The display is not going to refresh. So that's the point, that at very short time intervals, nothing changes. In other words, the autocorrelation time of the signal is some finite value, right? If you look at time intervals far below the autocorrelation time, you're basically seeing the same measurement over and over again. So what you really want to do is to measure something beyond the autocorrelation. So there will be some time lag beyond which, the system actually relaxes back, okay? So that's one thing to keep in mind. Well, I mean, I want you to think about that. So if you get a bunch of signals, you can numerically get the autocorrelation, at least to a reasonable order of magnitude, right? And that should be quite straightforward. You just do time series. You do a standard X of tau, X of t plus tau expectation value over many measurements, okay? That's one thing, okay? So assuming that you had some tau that you picked, right? It's totally unnecessary to start averaging quantities below that tau, because they'll all be the same. You get a thousand of the same number, then you get a thousand of a different number, you get a thousand of a different number. You might as well just average at that time interval time. So let's break this up into a bunch of similar sized time intervals, right? And then do the usual thing. I mean, saying that in this total span of time, let's call that capital, well, let me call this t sub a for autocorrelation time. There'll be n of these. There'll be n of these, and the total time will just call that big tau. So tau is n times tau autocorrelation. That's the total amount of time you want to measure, okay? So why is it useful to do this time average? Why not just pick the first number you see and report that? It is a reasonable thing to do, right? I mean, when you're driving your car, you're not taking, I mean, you're looking at the speedometer and saying, that's the speed I'm driving, right? Maybe that system is doing some time averaging, but it's not in any visible way. So why not just take the instant measurement, the first time you got it, right? I just measure it, and I say, this is the pH. I record it in my lab notebook and move on. Why waste the rest of my day? Good. So suppose I just measured the value of x at time t1, yeah? The distribution of x at time t1 has some sort of variance, which might be sitting around some value x naught, right? And there'll be some variance signal, right? So the idea is that the variance becomes smaller. That's true. The variance becomes smaller for precisely these reasons. By adding a bunch of, by assumption, independent, that's why you have to go beyond the autocorrelation time. Identically distributed is an assumption, processes. You're hoping that the variance of something, which is the reported average, becomes smaller. By the way, obviously I shouldn't be measuring in nanoseconds, right? But maybe you're very conservative. You're like, I don't, even though the autocorrelation time is like one millisecond, I'm going to wait one hour between measurements, right? Because you're very, very conservative. That's not a good idea either, right? Because by that time, stuff would have evaporated from the tube and other things would have changed. So in real practical terms, you're limited on the small side by the autocorrelation, below which it's, you're free to do it, but it's totally unnecessary, and on the long side, by actual changes in the physics of what's going on. So you don't have that much room to play with in real measurements, right? So if you actually had to go out and do this measurement, you would think about these things quite a lot. OK, fine. So instead of x1, we're going to represent y, or z, in this case, which is 1 over n times x1 plus xn. Very good. So let's assume that I take a whole day to make this series of measurements, right? Or rather, I spend a whole hour of my day to make this series of measurements. And then I record in my lab notebook, I record n of these values, right? And I divide it by n, and I record in my lab notebook that day, here's the value. And I hand that value over to whoever's asking me and say, this is the pH. So let's say, fine, but I really want to know how far off this pH is from what will happen if you do the same experiment tomorrow. So we know that the variance of the original variable is very, very high. It could be, say, 10% of the actual value. The variance of this, empirically, how would you determine it? Not by formula. So the question is clear. What I've got is a pH meter. I've decided what the autocorrelation time is. I've decided I have one hour to spare. And so I find whatever value of n there is. I make that number of measurements. I divide by n. I record it in my book. And now I want to know that variable. How far off it will be if I measure the same thing tomorrow. So the answer is obvious. I have to repeat the experiment tomorrow, right? And I have to repeat it the day after tomorrow and the day after that and so on. So now I'm going to measure this value z on day one. And I'm going to measure it on day two. I'm going to measure it for a whole year. I'm going to waste a whole year of my life measuring this pH of this buffer. I mean, I'm not the same tube. I'm going to prepare the tube again as closely as I can to the original setting. And I'm going to keep on making these measurements. I'll make n measurements on day one. And I'll do the same thing on day one, day two, day three for a whole year. And I take all these. And they make a plot of the distribution. And the idea that somebody said is that somehow this histogram, because it's a real histogram made out of a list of numbers, is somehow going to be narrower than this one, yeah? And so I say, at the end of a whole year, I find out and I say, look, empirically, this is the variance I get, right? And so you should be satisfied as long as this variance is below the precision you wanted, say, 1% or whatever it is. So the magic of statistics says that I don't have to waste a whole year doing this nonsense, right? Somehow, and this has always been very mysterious to me, but somehow literally on day one, you can make one measurement of z and then literally guess what the variance would have been had you repeated this measurement over 365 days of the year and plotted the variance of that thing, OK? So how do you do that? The answer is precisely this equation, OK? So this is, in fact, an extremely huge time saver, right? And it seems almost unbelievable, right? The variance of a bunch of measurements that I've done over a whole year, empirically, I literally have to do all of them. I have to make a histogram, write it down, see how wide it is. But somehow, by making a single measurement, I have the audacity to say I know what the variance is going to be at the end of the year, yeah? But this is true and this is how it works. And the variance over here is precisely going to be, well, the standard deviation is going to be the original one divided by square root of n, not square root of 365, right? 365 has nothing to do with it. It's the square root of how many measurements you did to do the averaging on the first day. Is there any questions about this? So this is literally how a good experimentalist does or plans her experiments to get a certain degree of precision. If they need 1% precision, they'll find some value of n that works. If they need 0.1% precision, it'll have to be a bigger value of n, but it'll actually have to be 100 times bigger, because the precision is only increasing as the square root of n. So to buy higher precision is a very costly exercise, always. So this is how you do it in the lab. Now what I'm saying is that the cell is pretty much doing exactly the same thing. What is it doing? The cell has some sort of sensor. The sensor is sitting in a fluid, like so. And the fluid is jiggling around, and there's particles that are bumping into the cell, and so on and so forth. And through this entire exercise, the cell needs to do something like this, to average out the fluctuations and what it's trying to measure, so it gets a good, more reliable, more accurate estimate of what's going on. That's the biology question. This is really what cells do. So I'm going to pause and ask physically, why is this curve in the case I'm talking about, which is the pH meter and so on, why is this curve even fluctuating? Why isn't it flat? Why is it a squiggly curve? OK, you're appealing to Boltzmann. OK, that's fine. Can you give me a precise mechanism? Why is this curve fluctuating? One reason the curve is fluctuating is because the electronics are at finite temperature, the digital display is going. Is that the kind of thing you mean? Yeah. OK, so it's like finite temperature effect. OK, thermal fluctuations, OK. So I'm going to argue that we can make machines, and if I wanted to, I could spend a huge amount of money and buy a helium tank and cool this thing down, and really get all those electronic noise out of the way. I'll still get fluctuations like this, why is that? Quantum, who said that? Not quantum, no. Quantum is not the issue here. We're not even close to the pH meter of this size. We're not going to be measuring quantum effects. So I'm saying that thermal fluctuations of the machine we can ignore and other sort of vagaries. Let's assume we have a really good machine. A really good machine. It's still going to fluctuate, why? It's not because of temperature, and it's not because of quantum mechanics, yes? Finite size of the system. Finite size of the system, very good. It has something to do with the fact that we alluded to yesterday, which is that the system is made up of a bunch of discrete particles. So the same reason that we see diffusion happening because a different number of particles are hitting from the left and the right, we discussed yesterday, it's the same reason. And that reason, indirectly, depends on temperature, but temperature is not the point. The point is, if you zoomed in to the tip of that device, that device has not got a global way to sense everything. What is pH? pH is the concentration of protons. It's a concentration of free protons. This device doesn't magically have access to all the free protons in the whole tube. If it did, and assuming there's no evaporation and the gases are staying inside the tube, the total number of free protons in there should be reasonably fixed, modulo, the sort of ionization and deionization of water. But that's not what this little electrode is doing. It's only able to sense the protons in a certain volume. And the thing is, protons are particles, they're molecules. They're atoms, so they can come and go out of the range of sensing of this device. And they're all going in random directions. So literally, the number of protons this thing thinks there are in the tube keeps on changing. That's why there are fluctuations, that those fluctuations you cannot avoid, even if you buy a billion dollar machine. So I want you to use the same reasoning now to look at the cell. And one of the beauties of this paper, it's one of the first that attempted something like this, though now it's reasonably fashionable to talk about stuff like this. They asked a very ostensibly, a very difficult question. They said, how does a cell sense something about the environment? And when somebody asks a question like that, there's no way to go forward. It's so vague. It so depends on the nature of the system. So you have to add some caveats to that. So then they added as a caveat, if you read this paper, you'll see they said, let me add. If a cell could do the best possible thing it can, assuming our knowledge of the laws of physics and chemistry and so on, how well can it measure the environment? And that places an upper bound on the level of accuracy the cell can achieve. So that's what we're trying to do. So now, here's the real question. So this is a question I've given you. Let's say this is a research question. It's not. That's the point. It's very system specific. So all we're doing is calculating an upper bound. Every real cell has to do just as good or worse. How do we know? That's the calculation I'm doing now. So what do we need to know? What numerical quantities do we need to characterize? We have to do a calculation. You can't just talk about this entity. So what numerical quantities do we need to characterize about the cell so that I can start to calculate this? So one obvious thing is that the thing we're trying to measure in the giant tube in which a microscopic cell is sitting, the thing we're trying to measure is sitting at some concentration C. And for the purposes of this calculation, I'm going to assume C is measured in number of molecules per unit volume. Not in moles or grams per liter, but it's measured in number of molecules per unit volume that is C. So that's one thing. That's a physical quantity. What other physical quantities do I need to know before I can answer this question about how precisely a cell can measure this concentration? Anything else? If you want, I'll add it to the list. What other information do you need to go ahead with this calculation? Yes? Cell surface area. Cell surface area. So there's some big surface area, A. And the cell surface area must be an absolute limit to what it can sense about the outside. Surely a cell with twice as much area has twice as many ways to interface with the environment. That's good. Anything else? Binding affinity. Ah, OK. So now he said binding affinity. So let me zoom in on this patch of the cell membrane. The cell membrane, which is a lipid bilayer that has these two head groups and tail groups. And this is the inside of the cell. This is the outside of the cell. And there'll be some sort of protein embedded in this lipid bilayer. And there won't be just one. The total surface area of the cell will be covered in little proteins that are sort of swimming around in the two-dimensional membrane. It's not a rigid membrane. It's like a liquid locally. So this protein is going to do the job of actually doing the sensing. It'll have some little notch somewhere. And that little notch will be just the right shape to pick up this particle. And when that particle is bound to this receptor, in real receptors, what happens is it triggers a change in the shape of the intracellular domain of the receptor. And that change in shape can lead to a bunch of downstream consequences. For example, the cell can run away, things like that. Now the problem is, as you said, so this process is extremely complicated. This molecule is going to bind to the receptor, but it's going to bind to it stochastically. It's going to be binding events, unbinding events. Once it's bound, the receptor has to change shape and size. Maybe there's some sluggish time scale where that happens. We don't know what that is. Even if it's bound, maybe the receptor makes a mistake. And it doesn't actually transmit the information to the inside of the cell. And the beauty of this paper is they said, let's ignore all those sort of measurement artifacts. And in the same way that I said, you can buy a pH meter, which is so precise and operating at such low temperatures that thermal fluctuations in the meter can be disregarded. It's an obvious fantasy. But here, let's take all the details of the receptor out by assuming it can literally, it is doing the best it possibly can, given the amount of available information. This then leaves the problem away from calculating fluctuations of the receptor, binding and unbinding affinities, transmission of information. It takes all that away. And it reduces the problem to just the question of how wide an area or volume of the whole tube is a single receptor on the surface of a cell able to actually access? That's the problem that turns out to be the relevant one. So now suppose I look at this receptor. This receptor has a, let me draw it a little more, let's assume it's a sort of cylindrical receptor here. It has some binding site, right? So what's important, just like the surface area of a cell is important, well, it's not the whole cell that's actually sensing anything. It's only the parts of a cell that actually have receptors in it. Now a cell might have 1,000 copies or 50 copies or maybe one copy of the receptor. For the moment, let's do the calculation for a single receptor on the cell surface. It can be extrapolated in the obvious way. So here what we have is a single receptor with a length scale, little a. The total surface area of a cell is big a. So you have a single receptor with some sort of binding length scale, little a. And this receptor then is sitting in a bath of all these little molecules. The molecules are bumping on and off the receptor. The receptor has its own internal machinery. In fact, it has internal machinery that can even burn energy. So it's not even restricted by detail balance or any of these other things you worry about. So think of this receptor as literally a fancy machine you bought from some shop, right? It's doing all these little calculations for you, yeah? Now the question is, which set of molecules in the environment this receptor can possibly be sensitive to? And this is actually a very complicated diffusion calculation. And if you read the paper, half the paper actually sets up this very complicated calculation. And I'm just going to tell you the answer. And the answer turns out to be that if you have a receptor of length scale sort of a, then this semicircle, this hemisphere of the same size, the particles within that hemisphere are more or less guaranteed to bump into the receptor before they go away. Whereas the particles outside that hemisphere are more or less guaranteed never to bump into the receptor. Now this is actually a subtle thing, right? So this is the little bit of magic which I want you to take my word for it. Otherwise you can read literally 15 pages of this paper where they calculate this thing. But if you're willing to take my word for it, the intuition is sort of clear. The reason is because diffusion is a slow process, right? It takes a long time to get anywhere. So if you happen to be close enough, right? The chances are you're still going to be hanging around long enough to hit the receptor. And hopefully somewhere in the internal register the receptor says I picked up another copy of this, right? And by the same token, if you're over there, it's really hard to imagine that molecule is going to make its way here rather than the entire rest of the space where it could go. That's intuition. I haven't proved this. I just wanted to take my word for it. So now we're going to make some assumptions about how the receptor is working. We're going to have to, if you give me a real receptor, which was, let's say, some sort of hormone binding protein, I could make measurements using concepts like neutral information. But how accurately the state of the receptor captures the number of molecules in this little neighborhood, yeah? I'm going to assume that amount of information is perfect. That somehow, because the receptor has internal states and is keeping track and so on and so forth, it's able to instantaneously know the total number of molecules in this little volume. And it'll know that at all times. Any real receptor can do no better than this. Because these are the only molecules that can interact with this little object anyway. The rest of them might as well not exist. So we're going to assume an ultra-perfect receptor that has access to all the available information. If that's the case, where's the noise coming from? So now the answer is, of course, the noise is coming, because these molecules are not sitting around. They're moving in and out of the measurement volume. And because of that, their numbers are fluctuating. And so what we're going to do is just go through the calculation step by step using just these three results. We're going to work out the physical limits of signaling. Any questions? Yes? The cell is fixed. Well, no, the cell is not fixed. The cell will be moving. But for the moment, we're going to ignore all such complications. We can only make the situation worse. But I'd be interested in hearing if you think some of these assumptions are too conservative in the sense, could the cell actually do better? The question is, is the calculation I'm doing an upper bound? So the argument is that it is. So stepping back now and making a connection with the sort of physical model that I had earlier, I'm going to assume that the cell has a single receptor on its surface. The single receptor has a sort of length scale of A. You could think of it as the diameter of the surface. And this receptor, even if it's doing the best it can, is only aware of the total number of molecules in this little volume. And this is going to be a plot of the total number of molecules the receptor has measured in that little volume as a function of time. And now we know that the best thing to do when confronted with this kind of squiggly, jiggly process is to do some sort of time averaging. So the next question arises, what is the autocorrelation function? What is the autocorrelation time for this system? So for that, we need to introduce yet another physical quantity, which quantity is that? Good, temperature via something else, though. Mean velocity or diffusion? It's more diffusion because it has to be random movements in space. So temperature, velocity, mean-free paths, collisions, all this together, all this together, like we worked out last time, collapses into a single phenomenological variable called the diffusion coefficient. It doesn't matter how the diffusion coefficient arises. Of course, it arises due to temperature. That was Einstein's famous result. It doesn't matter how it arises. You can measure diffusion coefficients. It's very, very easy. So we need to know the diffusion coefficient. And that's the diffusion coefficient of what? Of these little molecules. Diffusion coefficient of those things, which is something like the step size squared divided by the time between jumps. This is a different tower than that, sorry. So the diffusion coefficient tells us the diffusion coefficient is needed in order to calculate the correlation time. So now I'm going to ask you to step in and fill that calculation. The correlation time means, below the correlation time, the molecules, the exact molecules in this little hemisphere are the same ones. So the number is not going to change. For times much longer than the correlation times, all those molecules will have exited. So what is the correlation time for a little hemisphere of this size? Order of magnitude. So all I have to do is wait for every one of these molecules to diffuse a distance a. So if it diffuses a distance a, then it's already gone beyond the sort of capture radius of this little receptor. So in this particular case, the autocorrelation time is the amount of time it takes diffusion to push the particles beyond some limit a. Now barring sort of factors of pi and two and so on. So d times that must be equal to a. Sorry, sorry, a squared. Because you want to travel the distance a, it takes the square root of time to do that. I've just squared both sides, right? So tau a will be a squared over d. But remember, a squared is sort of the square of the sort of characteristic size of this receptor. It's the surface area. And, sorry, sorry, by the way, a is a length, this big a is an area. Sorry if there was any confusion. A is a length, okay? So this is the autocorrelation time. It means for time intervals below that value, the cell doesn't have to repeatedly average anything. And even if it did, it would gain no new information. Yeah, for time intervals beyond that, the cell is effectively sampling a new collection of molecules. And can then store that number. And then the next one, and the next one, and the next one, and so on. Good, yes. So in this particular case, we're assuming there's no convection, the fluid is at rest, and everything is sort of diffusive motion. All these other complications will arise. And there are people who've measured the effect of those things. This is the simplest possible case. Thanks for the question. So the fluid is at sitting at rest. Okay, now, what else do we need to know? We need to know how many times the cell can make a measurement. Now, how do you work that out? How long can a cell afford to wait before it has to output the result of a measurement? I mean, it's hard to say, right? It totally depends on the context. Maybe there are some slow growing cells that live in the ice that don't care. Maybe there's some fast growing cell that's trying to escape a white blood cell, and it's a bacterium. It needs to make decisions on one-second time scales. We don't know, right? So we're just going to leave tau as some user-defined parameter. When I say user-defined, who's the user? The cell is the user, right? The cell says, I'm happy to wait for one hour. Or the cell says, I'm happy to wait only one second, or whatever it is. Okay, so now on the board, we literally have everything we need to finish the calculation about precision. So let's do that. What is the cell going to do? One, the cell is going to take tau over tau A measurements. Two, it's going to average these and report this quantity. And so the real question is, what is the variance of the quantity the cell reports at the end of this exercise? It's the same as if you had done it in the lab. It is literally no different. Okay, so what's the variance the cell is going to report at the end? Now, often the raw variance is not a quantity that engineers, for example, would care about. They care more about some sort of proportionate measure of spread, right? So another way we can do this is ask what is this quantity? Which is sort of coefficient of variation. Is it 1%, is it 10%, right? This is just a dimensionless quantity that shows how far off from the external measurement is the internal one going to be. Okay, so we know what the answer is, right? So this is going to be sigma z over mu z is going to be one over root n sigma x over mu x, that's from this equation, yeah? So we know that we have a formula for n where tau a, we wrote down is a squared over d. So we have a formula for n here, but how do we know what sigma x and mu x are? So what are sigma x and mu x? Mu x is the expectation value of x. Mu x is the expectation value of the number of particles in this little hemisphere. So what is that? We know what that is, because I've given you the concentration. I've given you the concentration in extremely convenient units, right? So we know that mu x is simply the concentration times the volume, right? That's the expected number of particles in that little hemisphere, modulo factors of two and pi and all that. What is sigma x? What is sigma x? If I waited and took independent measurements after sufficient time has passed of the number of particles in that little volume, how different would all those measurements be from each other? This is where the third little calculation I did comes into it, yeah? I did this third, the third calculation that was the time interval, and in any little interval of time, the chance of actually having an event is very small, but the total number of events was finite. That calculation is independent of dimension. I could do it in a hemisphere, yeah? So the law of rare events has nothing to do with dimension. It just has to do with the fact that any small subset of the space you're looking at, the chance of an event happening is very small. So the total number of molecules there'll be in that hemisphere, if I measure it now, and assuming the autocorrelation time is less than a second, measure it after one second and two seconds and three seconds, did it a thousand times, made a histogram and looked at the histogram. That histogram would look like the Poisson distribution, and the Poisson distribution has the unique feature that it's entirely determined by its mean, at the entire shape. Therefore the variance is equal to the mean, and therefore the standard deviation is in fact equal to the square root, okay? That's this third calculation. Literally everything here I proved in the first three minutes of the class. Question. There could be proteins, there could be small molecules, there could be a proton. Sure, sure. So we're assuming that these particles are not interacting. But those interactions by the way will only make it worse. They'll only make it worse. You'll think your autocorrelation time is something, but in fact you're making repeated measurements of independent events, of dependent events. So the number of independent samples is not n but even less. Okay, so we're done. So we're done, in particular we're done. So this accuracy is equal to one over square root of tau divided by tau a. So tau divided by tau d divided by a squared times sigma x, which is square root of mu x. So this will be just one over root c a cubed, okay? And that comes out to this very simple formula. Okay, so that's actually the main result of this paper. Surprisingly simple. The left hand side is something a very practical question. Is your error of the estimate 1%, is it 10%, is it 0.001%, okay? And this is an upper bound on that, or rather it's a lower bound on the present. It's an upper bound on getting this number small, right? If this number needs to be calculated for a real system, the accuracy can only be worse, right? This percentage error can be higher. It depends on the diffusion coefficient of the particle you're trying to measure, because that determines how fast it's getting away from you. It depends on the size of the thing that you're using to make the measurement. In this case, the linear dimension of the receptor. It depends on the concentration, obviously, because that's primarily where the fluctuations are coming from. And it depends on this user-defined parameter, which is how long are you willing to get away waiting for this measurement to happen. Now just to see that this is not ridiculous, let's use this result. So it turns out this is, I mean, it's very far from what real cells do. Real cells do much, much worse, yeah? But let's use this in an interesting way. So Bill Bialik, who I mentioned yesterday, use this in a very, yeah, question. It's the total amount of time you're willing to wait before you give the result. You should, but you can't afford to wait years because then the white blood cell would have already killed you. Something like that, right? So remember, the lower bound on the measurement time is determined by the physics of the system. But the upper bound on how long you're willing to wait is determined by the biology of the system, which is somebody coming after you, or are you just sitting in the ice by yourself, right? No, that's why I've said it's user-defined. You have to tell me. They're also. This tau is the most unknown part of this whole equation. Hmm, it's clear that the longer you allow yourself to measure things, the more accurately you can get the measurement because you're averaging over many, many more independent events. That's why. And the higher the diffusion coefficient, the more rapidly it clears over, the bigger the receptor, the more particles you're trying to get. So all these things sort of make sense. The only surprising thing about this is that it comes down to such a nice product form. So yes, so the distribution that I'm talking about is if the molecules are distributed statistically evenly through the tube and you're sampling a small volume of that, right? Then the expectation value of the number of molecules in there has some value, but the actual distribution will be possible. It's a sampling, it's a sub-sample of an even distribution of molecules in the tube. Good question. So I'm gonna plug in some real numbers for this, but I'm gonna plug in numbers for a rather different context than the original Bergen per cell paper. I'm gonna plug in numbers for a biological context in which inside a cell, not a cell looking at the environment, but inside a cell you have DNA and there'll be some stretch of the DNA, which is a gene. And upstream of a gene, there's a region which is called a promoter. And the promoter is a particular, so everybody knows what DNA is, yeah? Anybody not know? It's a polymer, ATGC, anti-parallel, okay. Genes are stretches of DNA, typically thousands of letters long. And through the process of gene expression, through the genetic code, they get converted to mRNA and to proteins and the proteins are the things that do the job in the cell. But there are stretches of the genome which do not encode proteins, right? They rather determine whether these genes are turned on or off. And those are called promoters. Now what is the job of a promoter? If you read the first paper I assigned for reading, which is this idea of drosophila morphogen gradients. What a promoter does is it's trying to measure the presence of other proteins in that cell and by measuring the presence of those other proteins which are called transcription factors, the promoter, if this transcription factor's present, it's going to turn the gene on. If it's absent, it's going to turn the gene off. That's a very simplified view of the matter. In real life, what happens is the promoter is looking at the other proteins which are these transcription factors floating around in the cell and trying to guess what kind of cell it is in because all your cells contain the same genome. They all contain the same genome. So the only reason they have different fates is because promoters, there's some initial symmetry breaking and there's some proteins that change their concentration in different cell types and these proteins then have a cascading effect in causing the fates of these cells to diverge. So the transcription factor, which is these proteins floating around, there's many copies of them, the promoter is sort of being a sensor for transcription factors. And it's trying to accurately measure the concentration of transcription factors, even though the promoter is rather small. And if the transcription factor is in the right range, it'll turn the gene on and that's basically how promoters work. So let's plug in some real numbers for this. The average size of a promoter is not going to be much more than 10 nanometers. It's a stretch of DNA. It's not going to be much bigger than that. The whole cell is a micron. The whole bacterial cell is a micron. The concentration of transcription factors, the concentration of transcription factors can be anywhere from 10,000 in a cell volume, which is one cubic micron because a bacterial cell is about one cubic micron. Turns out the diffusion coefficient of proteins measured in cellular cytoplasm, and you can literally do the measurement. These are effective diffusion coefficients. They're no longer just thermal, right? But it doesn't matter the source of diffusion. The cell is burning all kinds of energy, stuff is moving around. You can still measure diffusion coefficient. And it's a pretty accurate measure. You can see the variance increase in the nearly with time. It's about one micron squared per second. One protein will diffuse across the whole cell in one second, okay? Thao, what time scale is of interest to a bacterial cell? Well, we don't know, but for example, cell division is a reasonable time scale, right? You'd like to make a decision in one lifetime rather than after your daughter has split into two or whatever it is. So an E. coli cell divides, it takes roughly 20 minutes, yeah? So, you know, 20 times 60. Or it could be a fraction of that, yeah? So, you know, let's say this is 100 to 1,000 seconds. Certainly somewhere between a minute and an hour. Okay, so plugging all this in, the measured accuracy should be, let's see, one over squared off. But D, which is one micron squared per second times A, which is 10 to the minus two microns times concentration, which I said was anywhere between 10, let's do both, and 100. And Thao can be anywhere between 100 and 1,000. That's why I'm taking the worst case over here and I'm taking sort of the best case over here, right? So the real case will be somewhere between these two. So plugging it all in, what do we get? So this cancels, so you get one over root 10. Plugging all this in, you get one over 1,000, square root of 1,000. Did I do that right? I think I did. So, given the number of transcription factor molecules and so on and so forth, a real cell, if a bacterial cell really wants to make decisions on the time scale of its own lifetime, right? It's living in a world where it's making, I mean, assuming this limit is closer, it's making sort of a few percent error in its estimates and it cannot really do much better than that. Now, of course, it's another question whether a few percent error is enough to get you by, right? And that has to do with fitness and what problem it's trying to solve and so on. But this just has to do with the physics of signaling. And all I want to say is if you plug in reasonable numbers for what cells do, then you get sort of reasonable numbers for precision, right? A rule of thumb measure for precision somewhere between a few percent and 10 percent for most measurements that cells are making, just based on how they behave, okay? Any questions about this? And just to step back, we got all this literally from looking at additional random variables, right, nothing else, questions. Tau is a made up thing, right? I don't really know how often the cell is reporting and acting on its measured information. So, something like this would be a better calculation to do here would be in a neuronal system. We're asking, you know, should a neuron fire? How many action potentials should it average over before it fires if it's trying to make some precise measurement? Over there, you'd have really solid numbers. And people have used these kinds of tricks in that case also. So do read the paper, but don't be put off by it. It's a very long paper, but most of it has to do with doing this little calculation about diffusion. The rest of the papers, a pleasure to read. I haven't given it as assigned reading, but it's freely available online. Burg and Purcell, 1976. Okay, let me get rid of this. Now let's get back to more general stochastic process. So I just promised I would give you an intro to biology and that was the intro. And we'll be touching upon these things as we go through with more examples. So last time we discussed the idea of a stochastic process. What is a stochastic process? It's a bunch of sampling times. And it's a bunch of values of the variable at each of these sampling times. Yeah? Now these sampling times are just the times when you look in the box. In fact, the system is going to be doing something in the intervening time as well, but in a sense, you're not interested in what it's doing. So at this level of course, graining, the stochastic process is in fact identical to asking for the joint distribution of these n observed quantities at the previously defined n observation times. These observation times are just labels, they're not part of the process. So at these observation times, what are all the variables going to, what values are they going to take? And if I repeat, if I keep on repeating this with exactly the same initial condition, I get a big distribution, an n dimensional distribution. If I sufficiently sample this, and of course that's a very hard thing to do, but if I sufficiently sample this, in principle I have the entire n dimensional distribution. So a stochastic process is really nothing other than a big n dimensional distribution, except with a notion of time included, so these n quantities are indexed. It was the first one, the second one, the third one, the fourth one, and so on. The benefit of indexing is then we expand this, right? We expand this in the usual telescoping way, conditional probabilities, and the benefit of doing this as I'm assuming the Markov assumption is correct as we discussed last time. Each of these things depends only on the, conditionally, only on the previous measured time, right? Even though lots of stuff is happening in the middle. Fine, so in general, the trick or the puzzle we want to solve is we want to make a simulator to generate one of these curves. We want to generate one of these curves with the correct statistics, and we're going to do that by making heavy use of random number generators, right? So the way I would do it is I would start at some t equals t zero with some x zero. I didn't put t zero, x zero, but fine, I can call it t one, x one. And somehow, using prior knowledge or empirical measurements or phenomenology, whatever it is, I derive this sort of propagator which says where things are going to end up when I sample at time t two, right? What is this? This is a sort of sideways, I've just flipped my head sideways. This is a sideways distribution. This is basically p of x two given x one sampling time t two, sampling time t one. That's this probability distribution. Then I roll a random number or I use a bunch of random numbers, whatever it is, to pick one of these with the correct statistics. So let's say I pick that one. And then I simply again make a propagator. This time maybe it's for a much longer time because I didn't care about the intervening sampling interval. And here I get some other distribution that it's going to be, right? And this is p of x three given x two. And then I roll my random number again and I pick one of these with the correct distribution. So let's say I pick that one and then I propagate again and so on, okay? So this is how I'm going to generate one of these curves. To generate the curve, I need to get these distributions and what are these? These are conditional distributions. I need to have a formula for conditional distributions for all possible values of the intervening variables because when I roll my random number, I don't know ahead of time which one of these is going to be, right? So when I say p of x two given x one, it's a complicated thing. This distribution was calculated assuming I started here. If I started here, the distribution might have a completely different shape, right? p of x two given x one is an entire family of distributions parameterized by this control parameter x one, right? It's a huge amount of information. It's in fact ridiculously huge. So most of the time we're going to have to simplify to get values like this. You may not be able to sample it empirically. You're going to have to make some assumptions, okay? Now, I've already sort of gone through this last time, but the question is how so for the next two classes, I'm going to show you two or three different techniques, methods to obtain these conditional probability distributions, okay? But just stare at this for a second, right? Suppose I was interested in sampling time t two and time t four and I didn't care about time t three. I suppose that was the case, yeah? Because that's what I need to get my work done. I could still, of course, use this recipe, right? t three is then a variable I don't report back to the user. But I say, I go from t one to t two. I report what happened to t two by drawing one random number. I go to t three, I find out what happened. I don't report that. I go from t three to t four and then I report that because whoever was asking for this information only wanted t one, t two, and t four. But maybe there's a better way. Maybe I simply work out what this propagator is that goes directly from t two to t four. Then I don't have to get a random number and get a whole series of distributions and so on, yeah? So this is the weird thing about stochastic processes that I was trying to highlight last time, right? So you should not waste time giving the actual values of the variables at any values other than the desired observation times. Because it's just a waste of the use of random numbers and you're going to have to calculate two sets of distributions instead of just one. Now for this to actually work, yeah? Let's assume for the rest of this class, right? That we're living in the Markov world, right? This is Markov, which is typically a very, very good assumption, right? Let's assume we're living in a Markov world, right? We are sort of asking whether the conditional probability distribution of x three at t three given x one at t one is equal to this integral, okay? So what we're saying is just what I said in English words earlier, which is if I don't want to report something at an intermediate time, I can directly calculate the distribution up to the desired observation time, okay? Or I could literally calculate the intervening variable and then not report it. I could do it either way. I could either calculate an entire jump from x one to x three or I could do the entire thing where I go from x one to x two for many, many values of x two then from x two to x three, yeah? So these two conditional, this conditional distribution has to have this property, right? If it's a Markov process. And this property is, I mean it looks simple, right? But actually it's not simple and I just want to show you how complicated it is for even the simplest case of a stochastic process we've treated so far, yeah? And that simplest case that we've treated so far is diffusion. So let me show you what it looks like for diffusion. What do one of these propagators look like for diffusion? P of x two at t two, even x one at t one for a particle with diffusion coefficient d, right? So we worked it out last time. So by the central limit theorem, this turns out to be a Gaussian, yeah? So it's one over square root four pi d t, right? E to the minus x two minus x one, the whole squared over four d t. We worked this out last time, but it's basically just this. It's just saying that the sum of a bunch of random variables approaches a normal distribution at sufficiently large. And we already know what the variance is. The variance is four d t. Oh, sorry, two d t. So this, everybody's happy with this? So this means that these propagators for the simple case of diffusion will all be Gaussian distributions. Very nice. There'll all be Gaussian distributions centered at exactly the previous variable. That's what this means. And the variance of that Gaussian distribution will increase linearly in time. Or the standard deviation will increase at the square root of time. We worked all this out last time. So let's just check. Let's just check. And by the way, this must be true, right? So it must be true that the Gaussian diffusion problem satisfies this case, right? Because we worked everything out about it, right? Whether you stop at t three or t two or whatever it is, everything must be correct about this, right? So if that's true, let me plug it into this equation and see what that equation looks like. That's why, is it really true? That's why let me plug it in. One over square root of four pi, oh, sorry, this is t two minus t one. It's the time interval, right? The time interval. So I'm gonna plug this in there, okay? And some of you who are like really good at integrals are going to work out the answer, right? So let's see how fast we can do this, okay? So the left side must be one over square root of four pi d t three minus t one. E to the minus x three minus x one, the whole squared over four d t three minus t one. That's the left side, yeah? This is propagating from one to three. Simple. Is this equal to integral d x two off? I'm gonna write these two things down. One over square root four pi d t three minus t two. E to the minus x three minus x two, the whole squared over four d t three minus t two times. So you can stare at it and you can try and work it out. And it's actually really hard. I mean, if you just go through this, you're going to make some change of variables. It's gonna be a huge mess, right? If you do this brute force on a piece of paper, you're going to take an entire sheet of paper and by the end, you'll have one minus sign off or something. And what you get on the right side is not going to work out to the left side because there's enough space to make mistakes, yeah? I mean, just look at this integral, yeah? You're gonna have to do some change of variable where x messes up with the t, but the t is sitting here in the square root. I mean, it's really awful, right? So this is obviously not the way to do it. This is obviously not the way to do it. But just for fun, if you guys are masochists, you can go and do this calculation, yeah? So this equation looks trivial. The real information content of this equation is stuff like this. It's making guarantees, it's making guarantees about the behavior of certain functions which are extremely laborious to work out by brute force. So if this is the wrong way to do it, what is the right way to do it? If this is the wrong way to verify, I mean, we're saying it's Markov. If it's Markov, then the Markov property should be explicit in the notation itself, right? The diffusion equation, although it is Markovian, the Markov property is not explicit in the formula, yeah? It's explicit in the diffusion equation, but it's not explicit in the solution. So what's the right way? I'm gonna tell you the right way. So the right way is my favorite trick. The right way is you just discretize everything. So we're going to move from a world of diffusion where x is a continuous variable to a world where the x's are all over some discrete lattice and countable discrete, let's say for the moment, even a finite discrete lattice. In that case, these propagators, which are p of x2 given x1 and so on, right? So let's say the values belong to some, the values go over to some values, you know, 0, 1, or minus 1. There's some discrete lattice for the x's, right? You've discretized it at whatever precision you want. It doesn't really matter. So, well, let me put it this way, right? So the continuous x's go over to sort of integers. And these integers are not the positions, but these integers are the index of the positions in some list, right? It's the first point of the lattice, second point of the lattice, third point of the lattice, 0th point, minus 1,000 point, whatever it is, right? So in that case, a propagator like this, a sort of conditional probability distribution saying where you'll be at t2 given where you are t1, goes over to matrix, right, of this type. And I write p i to j just to remind myself that I'm using column vectors and matrices multiply from the left, OK? And what does this mean? This matrix actually has the following form. There's some column vector that the matrix multiplies. I won't write the column vector, right? But essentially, p ij times vj summed over that is vi, right? And vj is at time t1 and vi is at time t2. Anybody uncomfortable with this sort of representation? Everybody should be happy with it. Linear algebra is the most beautifully compact notation ever invented, yeah? And the story, by the way, is that Heisenberg didn't know that this entire collection of calculations he was doing was just matrix multiplication until Max Born told him that. So in the early 20th century, physicists were not taught matrix multiplication, which is a travesty. Anyway, it's very simple to write down, right? So what is vi? Vi, which is a vector, column vector, indexed by some time t. You could also write it as v indexed by some time t, right? It gives you a probability distribution of where you're going to be at that time. The p ij converts the probability distribution at one time to a probability distribution at another time. How does it do it? It just adds up all the column vectors. So for example, what is this calculation? So vi is equal to vi t2. What is this calculation doing? It's doing the following thing. I have given the vector at time t1, and it manages to get the vector at time t2. How did it do that? Because this entire matrix is simply this conditional distribution. Remember I said that the x i's are the parameter, and the x i's parametrize the entire collection of distributions. And here, we've just discretized the process. Every column here corresponds to starting at a different point. Every column is just the conditional distribution starting at a different x1. So p of x2 starting at different x1s, and that's all the separate columns. Since they are conditional distributions, the sum of every column must be 1. So that must be true. Other than that, and of course, each of these values must be between 0 and 1, but other than that, there are no restrictions on this matrix. It's a huge world of matrices that can satisfy these conditions. So if you are comfortable with this, I'm going to move forward a little quickly because I have like 15 minutes left. Everybody happy with the notation? So when you first learn about Markov processes, you might have literally learnt it like this because it's sort of the simplest way to introduce a Markov process. It turns out to be a universal way to describe Markov processes, but it doesn't quite capture the point of these matrices. The point of these matrices, they are literally these distributions. These matrices, in a compressed way, capture everything we know about how the probability of the system will propagate from a certain starting point in time. That's all they are. The Gaussian can be written as an infinite dimensional matrix, not infinite dimensional, but infinite size matrix, two-dimensional infinite size. And when you do it that way, the Markov property becomes obvious, and I'll show you that in just a second. OK? The important thing about these matrices, when I first learned Markov processes, I thought a Markov process is literally one that was defined for finite time steps. And somehow the matrices all had to work for those time steps. So that's not true at all. Markov process is defined merely as this conditional independence property. It doesn't say anything about the time steps. So these matrices have to be indexed. You have to have a little label saying which times they apply to. So this matrix is meant to take you from T1 to T2. The matrix should have a label which is T1 to T2. You should somehow write somewhere that this matrix applies from T1 to T2, and not for any other set of times. These are the various kinds of things that will be important to you. Any questions? So far. In your homework, there'll be a nice, I don't know if people even play board games anymore, but how many people have seen this game called Monopoly? You've seen it, right? If you haven't seen it, then you can just Google it. There's a Wikipedia page. You can probably buy it and Amazon will deliver it tomorrow. So the homework is very simple. It's to make this matrix for a Monopoly board. And by using this matrix, it's to work out which properties Monopoly are overvalued. It's a simple little calculation. But you will need a computer to do it. Maybe we'll do it in the tutorial. Fine. Question? Yes. At this stage, it absolutely does, because it's a two dimensional thing. If it had depended on prior times, then you couldn't write it as a matrix that has only one independent and one output variable. Markov only has to do with the fact that it has erred time and not many prior times. In the real world, by the way, if the thing is not Markov, it's not Markov in a nice way that you have to go from one time to two prior times. If it's not Markov, you're totally screwed. You have to go from one prior time to infinitely many prior observations. That's typically how badly things break down. This is the reason why Markov is such a well-used approximation, not just because it's reasonably applicable to many places. But because once it's not, then the kernels you have to write down are just too high dimensional to characterize. Yes. Yes. So you want to approximate it. Yeah. So there are controlled approximations people can do. And they have to do with the. So I'll answer your question by saying that in principle, of course, by adding enough measurements to your instantaneous time description, everything will become Markov, because the underlying laws of physics have that property, right? But the question is, if you want to compress it down to a small number of variables, then there's coarse-graining separation of time scales, various kinds of averaging approximations you can do to get something like that. Many are uncontrolled, but you can do it. You don't. Yeah. And that's true even here. No, no, no, but are you saying we've already assumed it's Markovian? You never have to observe the intervening states because all those properties are captured inside that matrix. You don't have to draw the random numbers to see actually where it was, because it would go all possible ways anyway. And when you do the statistics, all those average out precisely according to this equation. So what does this equation mean in matrix notation, right? This equation, matrix notation says that, OK, let's, let me call it aij. Let me call it aij because it's a matrix, right? So this says, a sort of aik is equal to sum over j aij ajk. That's all this says, right? So it doesn't matter, right? So if the two points are different, the matrix will be different. The matrix itself depends on what these two points are, right? But if you can show that your two points can be split halfway, then the total matrix must be a product of the individual matrices. Remember, these are square matrices. These are square matrices, right? There's nothing complicated about them. Square matrices, eigenvalue of 1, I mean, it's the most well-behaved kind of thing in the world. So also, this basically says that a of t3 t2 is equal to a t3 t1 is equal to a t3 t2 times a t2 t1. And this is what I mean. This equation becomes trivial when you write the whole thing as a matrix. It's just absolutely true, that's all, yeah? Because writing a Gaussian down as a matrix is a complicated thing, but you can always do it. OK, now that I've said this, the only reason I brought it up is because you needed to solve one of your homeworks. Sometimes I'll use a matrix version of the world. Sometimes I'll explicitly draw these diagrams. Sometimes I'll use a px2 x1. In your mind, I want all these things to be equivalent. I want you to think about px2 given x1. I want you to think about a j to i. I want you to think about a conditional probability distribution as a curve whose properties change as you change the parameters. These matrices are not fixed. These matrices are totally dependent on where you start, where you stop. How you get the matrix is totally up to you, and it depends on the problem. You can't guarantee anything about it. It could be quite complicated. And often, they are infinite in size, and sometimes uncountably. So let's keep moving. I want to introduce something I'm going to use next time, and then we'll finish the calculation next time. Back to biology. So we are now going to look at systems of particular interest, which is the case of stochastic chemical kinetics. Stochastic chemical kinetics. In particular, we will be interested in equations of the following type. But this x1 and x2 and so on and so forth are distinct species of chemicals in a tube or in a cell. There could be transcription factors. There could be protons. There could be water, whatever it is. The 1 and 2 and 3 represent different species, and the x's represent, for example, concentrations of molecule numbers. And the reason you write it in this particular way is because chemistry involves the synthesis and the degradation of molecules. And so physically, it's useful and essential to separate processes that create new copies of a molecule from processes that destroy existing copies of a molecule. So these f's and g's that we use will always be positive numbers. These kinds of equations cover literally all of chemistry, all of biochemistry that's happening in a cell. The number of chemical species in the cell could be in the thousands or the tens of thousands. The f's and g's are perhaps unknown functions. These f's and g's, by the way, it could be f1 of all the x's. And the gi could be gi of all the x's and t. So this is just a full-blown dynamical system. It's a standard dynamical system, except I don't use negative quantities on the right side. I always separate it into creation and destruction events. Standard dynamical system with this one funny caveat that you're not allowed to use negative rates. But you have to bring it in as a positive rate. Let's just start off by looking at a simple case. Let's say dx, for example, dx dt is some alpha. And alpha is some constant. Let's say x is the number of molecules in some system. So these are the kinds of rate equations that you write down when you do chemical kinetics. You can do mass action equations. These things sometimes get multiplied by concentrations and so on and so forth. What we want to now do is to calculate under this kind of rate equation what x looks like as a function of time. What does this rate equation imply about what x looks like as a function of time? It's totally obvious linear equation, right? So it looks like that. But this is chemistry. And if it's chemistry, then the x's are not continuous. They must be discreet. The x's are discreet. What does the actual curve look like? Question is clear. So often somebody will say this chemical is being synthesized at a constant rate per unit time. Let's say it's being synthesized at 10 molecules per minute. They'll say something like that. After one hour, how many molecules is there going to be? Yeah? 600 molecules, right? But actually, you can't have continuous amounts. So something happens to this curve. And what happens to this curve? Well, it can only jump to discrete values. Now, the first guess, and maybe not bad one, is that it just literally the straight line that comes closest in this sawtooth. The sawtooth curve that's closest to the actual straight line, right? You might guess this is how it would be. But you would be wrong. And the reason is because in the chemical system, if a molecule is being created in this way, there's literally no way that the underlying system has this kind of molecular clock that's going to be producing x's with fixed time separation between independent events. Because that would imply a huge amount of internal complexity. It would imply some sort of internal clock. It makes one x. It waits. It waits. It waits. It makes another one. I mean, that's not how chemical reactions work. They work by collision, rapid changes, and molecules moving apart. So if it doesn't look like that, what does it look like? So now we need to step back a little bit and use a slightly different notation, which we've already used earlier in the class. Suppose I'm looking in a big time interval, capital T, 0 to capital T. And I split the time interval into a bunch of little pieces. And there are n pieces here. And each of those pieces is of size dt, and n dt equals capital T. Same game we've been playing all along. Same game. In order to get this curve, traditionally, we would have just integrated the left side. Or rather, we would have integrated the right side. dx dt is alpha, so x is integral of alpha dt. So in the same way, we want to integrate something. We want to add up all the creation events of x. But we know that x can only be created one unit at a time. That's what we know. So it must be, now, if this T is big enough, then how many events of x creation would there have been in this large interval, alpha T? If this T is big enough, then alpha T will not be very far from the actual value in percentage terms. Because you've added up a large number of numbers. So the law of large numbers says you're going to converge the expected value. So this is not a bad approximation to make at all. If the approximation is not good enough, then you can go to other values of T and so on. So that's the number. These number of events are sprinkled in different bins. They're sprinkled into different bins. And I've made the bin sizes sufficiently small that no two events land in the same bin. It's your standard way of doing these calculations. So I'm going to draw the creation events as these little notches. If necessary, I'll put little arrows on to show that they represent creation because it drives up this curve. And so the true stochastic process, the point is the chemistry, since it doesn't have an internal clock, cannot control the precise timing between successive creation events. In your homework, which will not be assigned for lecture from this class but from the next class, you're going to be working this process out for a well-known case, which is radioactive decay. Where you know that individual nuclei are going to decay randomly, and they don't collude with each other to decide, you decay first, I'll wait one second, somebody else decays. They can't do that. In the same way, the creation events of x, each creation event is independent of all the others. They don't collude. They just have a constant chance of dropping into any of these bins. If that's the case, then how many of these bins will have an up arrow, alpha t? How many bins are there? There are n. So the probability of a creation event is the total number of events divided by the total number of bins. Is this totally obvious? It should be. There are alpha t events, and there are n bins. And I've made n small enough that there's no bins with two events and so on. So the total number of bins with events is alpha t. The total number of bins in all is n. All the bins are identical, are priori. So the chance of any given bin having an event is alpha t over n is alpha dt. This is not t. This is having an event. Now this I want you to stare at very closely, and it's very, very important. This is the maneuver that transitions from chemistry to stochastic chemical kinetics. Previously, what would you have called alpha? You would have called it a rate. It's just a rate. It's 10 molecules per second. But now look what alpha has become. Alpha is interpreted now in such a way that alpha times dt is a probability. So alpha now becomes a probability per unit time. This is very, very important because it allows you to take every existing equation you ever knew from chemical kinetics and just reinterpret it all the way through. You don't get dx is alpha dt. You get the probability of a creation event in a small time interval dt is alpha dt. Any questions? So I think it's 4 or 15, so I'm going to stop now. But this is actually a good place to stop. So the point is that next time what we're going to do is to take this sort of model, and I'm going to teach you how to build these distributions for models like this. Any questions? Tomorrow I think I started 9, and I have two classes. So there's a lot of material tomorrow. Has everybody signed?