 This is Alexandra and with Thierry, who's here, but will soon disappear, we'll be teaching the biophysics, well, the two first weeks of the biophysics lecture and we'll see exactly what we cover, right? So just think in your mind, statistically, biophysics. We're going to go from evolution, from inference to neuroscience. So anything that's alive, okay? And the way we're going to, so this feedback, of course. The way we're going to do it is with the still feedback. It's too powerful. Yeah, okay. So the way we're going to do it is we're going to try and make your life for the exam extremely easy by giving you homework, okay? The homework's going to be really easy, but the only point of it is that you go through your lecture notes and you try to write down the same thing once again so that then you go to the exam and we're really not going to be mean or evil. We're just going to try and see whether anything's stuck. And as I said, there'll be two homework sets and you should do them together in as big groups as you find the optimal cluster size. This is, as Angela said, completely collective, completely non-competitive. We just want something for you to retain something, okay? Now I have to try not to kill myself on this stage. And chalk, yes, there's chalk, okay. So yeah, so I think that's everything for practical information. Let's start if I've forgotten something, it'll come up. Again, as Mattel and Angela said, you have a very diverse background and I'm going to assume that some things may be repetitive from your physics classes, some things may be repetitive from your high school biology or whenever is the last time you were taught about biology. I am going to assume that you're somebody born in the last 20, 30 years and you did see some biology in school in your life, okay? So if you have no idea what the hell a cell is, look it up on Wikipedia, okay? Sorry, so this is not... So we're going to start with a simple puzzle just to show you that life is a vast concept. So it starts with the statement that the US counties in which the incidence of kidney cancer is highest are mostly rural sparsely populated and located in the traditionally Republican states in the Midwest, the South and the West, okay? So if you know anything about world politics, this makes sense to you because if you think about these states in the US, you think about bad food and you think about basically poor people and things like that. So that makes sense. They're not very healthy, they get sick. But there's a similar statement which is also true is that the US counties in which the incidence of kidney cancer is lowest are mostly rural sparsely populated and located in traditionally Republican states in the Midwest, the South and the West, okay? So these are exactly the two same sentences with the only words changed being highest and lowest. And both of these statements are true. So what's going on? Any ideas? It's a physics question. I could replace kidney cancer with possession of beetles, okay? So what's the key word in these sentences? Good idea, good idea, but no. No. Counties. Yes, okay. So you're getting the idea that cancer and rural and where they're located are not the important words. The only important word is sparsely populated, okay? Basically what this is a statement about is small numbers and that small numbers gives you noise, okay? And that's what we're going to be talking about today in different settings. So this is just to sort of show you that, you know, the idea that small numbers gives you noise is present really everywhere even in things that look abstract. And so in bio... I'm sorry, this is really bothering me. This is going to sort of defeat the purpose of having the mic, but... So there's other... So in many biological systems, this is an issue because in many biological systems, we have to deal with small numbers of noise. And this is just an illustration of a reaction, so okay. So in the cells, we have many elements. We have DNA, which is what gives you genes. We have proteins. We have enzymes, and these things come together and interact usually through chemical reactions. And if you have a lot of reactions, if you have a lot of elements, then there's a lot of reactions taking place at a given time. So this is the mean number of reactions per time and this is the cumulative number of reactions. So you see that the cumulative number of reactions grows steadily. So if the same purpose of this reaction is to produce something like a protein, then you would have a steady increase, a smooth increase in proteins, okay. Because you have here in this example about 40 reactions per time where a reaction is when two balls hit each other. However, if you have small numbers, you don't have many reactions per time. You have either 0 or 1, and then the cumulative goes up in steps. Okay, and this is essentially the limit that cells find themselves in that we have small numbers of everything. And so, okay, so what is everything in a cell? What we're going, well, there's a lot of things and different people will tell you about different things. But just to give you some sort of one concrete example for the rest of what we'll be talking about today, we're going to be talking about genes and proteins. So genes, so which are just bits of DNA, which we will often draw just like this. There's some sort of gene here, and this gene, when it gets expressed, it produces a protein. Okay, so that's the gene or DNA. This is a sort of schematic way of picturing it. But the way, so you also know that probably you've heard this before, that in all the cells in your body, you have the same DNA, right? Everybody's heard that? Yeah. However you have different cells, right? You have a nervous cell or you have an epithelial cell, which is a part of your skin, and you have a muscle cell, and you have a kidney cell, and blah, blah, blah, blah, blah. So how come they're different? So they're different because they express different proteins. Proteins are the workers of a cell. They actually are the ones that make things happen. And the way they express it is that this information encoded in the long-term thing, in the long-term information storage of the cell, which is the DNA, which gets extracted through you transcribe this thing, the DNA, which means you make mRNA. So this is cheating. And then from mRNA, you translate the mRNA and you make proteins, okay? But how does it happen that not, if this is just continuously happening, which it is, how do we get different cells? That's because different proteins are expressed in different cells, and that's because this process is regulated. So there's upstream of the gene. There's a site, which is called a promoter, and some proteins, which are called transcription factors, bind to it and just tell this gene, you're going to be expressed or you're not going to be expressed. So that's the simple story. And then the story gets very, can get very complicated, but basically the bottom line is that there's some sort of regulation. So let me give you a concrete example of regulation before we go into sort of dealing with small noise and everything else in this. And this is a classic example, which is called the lag operon. So people often say this is the hydrogen atom of biology, but oh, I, you know, whether you want to think of it that way or not, then I'm guessing Matt Scott will actually bore you to death with the lag operon in week three and four. But who's heard of the lag operon? Okay. So that's a small number. So the basic thing is this is something that bacteria have. Okay. But people like to study bacteria, physicists like to study bacteria, because you can do things to them and get quantitative numbers. Okay. They're easy to deal with labs. And a lot of things that are true for us are true for bacteria. So bacteria like us like to eat. And bacteria's favorite food source is called glucose. And actually it's your favorite food source too, right? It's what makes, you know, your, your sweet, sweet and all that. It's the basic sugar. And it's the sugar that's most that it's, you know, when you take it in and you're happy, you don't need to do any work. And bacteria are the same. They take glucose in and it's instant happiness. Okay. But just like us, bacteria don't always get what they want. And there's different sugars out there. All of them way more complicated than, than glucose. One of them is called lactose. There's other ones. But the problem with these other sugars is that they have to, when the bacteria takes it in, to be happy to use it, to get energy out of it, it first needs to break it down into glucose. Okay. It needs to break it down. So if you remember anything from high school chemistry, you know, you, I actually couldn't draw either glucose or lactose or lactose at this moment. This is more complicated. So there's a bacteria swimming around. And as long as it has glucose, it's going to eat it. But if it doesn't, it has to do, if it doesn't find glucose, it's, you know, it's not going to die. It's going to eat lactose. So the lack of Peron is basically a set of genes that it turns on when it figures out that I don't have any glucose in the environment, but I only have lactose. So this is the, the set of rules that a bacteria will express these genes only when glucose is low and lactose is high, only in one of, in this condition and all the other conditions, even when glucose is low, but there's no lactose, it will still not express it. Okay. So it has this switch. It has this machine. And what, so what these genes do that they have names and what they do is they first bring in lactose from the outside. They're one of them pumps it in and the other one actually breaks it up. And there's a third one that does something get more complicated. Okay. So there's these genes that actually make the bacteria be able to eat it, but they'll only be produced when lactose is there because otherwise it's useless. Right. It's a waste of these proteins and it takes energy to produce. So it's a switch. So how does this switch work? So this switch works in two ways. First of all, when lactose is there, sorry, before let's, let's forget about lactose. Normally this set of genes is repressed. There's a protein called the lag repressor, which is coded in a gene which is upstream of this, which physically the repressor binds to the site, to the beginning of this set of genes. This set of genes is called an operon and it physically represses it. Okay. And then this gene, this set of genes cannot be expressed. What it actually has, so the way genes are expressed is using something called RNA polymerase, which is a machine that reads out DNA. And what it does is it binds in the place of the polymerase and then this polymerase cannot bind. It physically occludes. Okay. So this is the repressor. So the repressor represses this set of genes. However, when lactose is present in the environment, if you find lactose, the lactose represses the repressor. Okay. So minus times minus gives us a plus. Repressing of a repressor unrepresses the operon and now these genes can be transcribed. However, that's not the full story. That's not enough. They can be, but they won't be until they get an additional signal saying, go for it. And that's called an activator. It's called CRP, which is a molecule that will only bind and activate this set of genes. If it sees a molecule called, say, CAMP, it doesn't matter, but this molecule is only produced when there's no glucose. So when there's glucose, it represses producing this molecule. Okay. So then this is repressed, so there's no plus sign. Okay. So this is like a, this is the plus information, this is the minus. So only if there is lactose to unrepress it and there is no glucose to activate it, will you get expression. And it's all encoded in these molecular binding reactions. Okay. And so now what's interesting about this system is that these, for example, the lag repressor and all of these molecules are there in very small numbers. So these molecules that tell genes to do something, to be expressed or to be repressed, they're called transcription factors because they transcribe and they're a factor. They're just proteins. They tell them to do something. And they usually, in the cell, in about one to 10 molecules. Okay. So really small numbers. What happens here is I said mRNA gets produced. This is, again, a few copies of mRNA per cell. And then you have one, two, a few copies of DNA. You can have a few copies of DNA in bacteria because when a bacteria divides, it can actually, a bacterial chromosome is circular. The gene of interest is here. If it starts dividing, you can, the rest of the chromosome can be dividing. You know, it's making two loops. You'll have two copies of a gene. And before it finishes, it can start to divide again. But generally, you can think of the order of one. So the numbers are small. Okay. Okay. What is IPTG? IPTG is just a synthetic version of lactose. It's just a molecule that's used in a lab that has the same form as lactose. So this is sort of a historical aside. The lack of form has been, was discovered in the 1960s in Paris by Jacob and Mono and Revolve and others. And they figured this out just by looking at how bacteria grow on different sugar sources, right? So it's like the experiment you will go through here with you as the bacteria for the next four weeks. Of the cafeteria will give you different food sources and very quickly you will figure out what you like and what you don't like. And so that's what they did. They just saw how quickly they grow and they were able to figure out by the logic of it. It's really, you know, sort of the power of the mind in use. It's a very beautiful experiment. So, but now, while even now, people are still studying this to understand it, not just as the story that I told you, but actually using numbers. And I'm not very good with this thing. So you have many more tools. You can put proteins that fluoresce into cells and then you see signals and then you can actually measure how much signal comes from one protein. There's ways of doing that. And so I'd just like to tell you sort of a quick story that hopefully will inspire you to think about things like cells quantitatively. And this is a story from Terry Hart's lab in San Diego where basically he said, well, this is really the hydrogen atom of biology and I'm a physicist, so I know that for hydrogen atoms we can calculate relativistic corrections and all things like that. I know the numbers have to add up. Well, then they should also add up in biology. So I'm going to put in different concentrations of this synthetic lactose and different concentrations of this CAMP which is this thing that induces it and at different concentrations I should see different levels of expression of this gene. And so he went in and he measured it. Well, actually a student, Tom Coleman, who started up as a string theorist, went in and measured it. And then they did the theoretical calculation and they figured out the same thing and when they compared the two things, how much it should grow, they saw that in the theory they should have had a hundred-fold increase here and in the experiment they only had a three-fold and here they should have had a thousand-fold and they only had a ten-fold. So basically it didn't add up. So he said, okay, so this system that's been studied since the 1960s and every biologist in the world tells me it's boring and we understand everything about it, we actually don't understand anything about it because the numbers don't add up. And then he went, well, poor Tom went in and built many, many mutants, changed, fiddled with many things in these cells and finally got the agreement between theory and experiment and he understood in fact that this synthetic IPTG molecule is not the same thing as lactose because it's pumped into the cell at a different rate. He understood that looping of DNA is another important feature that actually represses things much more than just having the repressor and he figured out a few other things like that. But the bottom line is that if you set your mind to it, biology follows the same or the living world, follows the same world as a semiconductor, as a hydrogen atom, as anything else and if you really want to say you understand the system, the numbers have to add up, okay? So with that, let's do the numbers. So maybe, no, maybe before we do the numbers, one more experiment. So I set small numbers and I sort of motivated you where the small numbers come from but do they actually matter in cells? So this is an experiment that comes from Mike Elowitz's lab. Well, from Mike Elowitz from a long time ago, so this is now 2002, a very long time ago, I guess you were all in primary school or something, but what Michael said is that, okay, if there is really this noise and all these physicists are getting excited about noise, he was also a physicist, then we should see it in cells. And he did a very simple experiment in E. coli where he took two colors, he, so let me put this up, this is what an E. coli chromosome looks like, this is the DNA in E. coli, it's circular, okay? And he put two fluorescent probes. So he basically puts in a protein, he puts in the DNA that codes for a protein, that when it's expressed, this protein, one will light up in red and the other one will light up in green, okay? And he puts them equidistant from the origin of replication, which is where the E. coli starts to divide, and he puts them under exactly the same control, everything's the same. So he basically builds on one DNA, he builds a system that is as controlled as possible, and that the two colors should be doing exactly the same thing. And he says, well, if they are going to do exactly the same thing as a function of time, if I mix red and green, we learn when we're about three years old that when you mix red and green, you get yellow, right? And, but if they're not doing exactly the same thing as a function of time, then I'm gonna get an ensemble of colors, right? I'm gonna get a set of different colors because I'm gonna have some sets that have more red and some cells that have more green and everything in between. So I'll get a rainbow. This is what he got in the experiment, okay? He gets exactly this thing. So although this is as controlled a situation as he can hope for, what you see in the cell is that each of these two promoters does two different things, okay? In exactly the same soup. So eliminating any environmental... any environmental contributions. So then he can actually quantify the noise. And he did that. He wrote down some Langevillum equations. But the basic idea is that if there's some external signal that they're both responding to, the red one responds with some noise 801 and the green one with 802, then if he plots the red versus the green, everything that is common, the noise that is common to them will make their expression change in the same way, right? If there's some signal from the outside saying, oh, you should express more. If it's really upstream, it'll make both X and Y express more, right? And if it's less than less. But if it's something that's specific to the red or the green, then it'll change in this direction, right? Everybody agrees? Yeah? There can be differences from the... due to the fact that there might be a difference between the two. Yeah, so that's exactly why they put it this way. They put it at exactly the same positions. But the same distance from... Yeah. But they can be of different lengths. They don't seem... They are. They're exactly, you know, they're very similar genes. You know, the... Okay, I'm not an experimentalist. I can't exactly tell you what's the difference between the GFP and RFP that they used. Because it actually, it wasn't red and green. But there is very... There's not a lot of noise from that. You can also ask, okay, it actually takes some time for these proteins to fold and fluoresce. You could say there's some difference. But it's not that and we'll see in a second line. That's a good idea. Be sure that they're drawing the rest of the slide. Okay. So bacteria have circular DNA. We don't, right? We have DNA that is elongated and then folds in the way. And things are expressed on the surface? Yeah, so it's still double-stranded DNA, just with the helix and everything. And then they're just expressed. DNA opens up in a play, in a certain place and then they're expressed, right? So, but because basically in lab conditions, E. coli replicates all the time. So they wanted to eliminate problems with that. When it replicates, then you start having, instead of having the two strands, you have four strands at a given time. So they wanted to deal with maybe having too many copies. So they wanted to say, if I'm going to have four strands of the red one, at the same time, I'll probably have four strands of the green one. So that's why they wanted them equidistant. That's the basic idea. Okay. I'll also add, this is not precision science. This is not a quantum optics experiment, right? So, you know, roughly is the key word here. So, but, you know, basically that's why they will still get noise in both directions. But the question is how much noise in which direction? Okay. So, they saw noise in this direction too. So that could be any external factors. But they also saw a lot of noise here. So they called this extrinsic noise because it comes from the outside and this intrinsic noise. And they saw quite a lot of it. Okay. So where does this intrinsic noise come from? And this is really to answer your question. So what they looked at then is noise as a function of the rate of transcription. So they built different strains, so different types of bacteria, different colonies, where they expressed more or less, well, basically they expressed more or less of these red and these green proteins. And while the total noise and the extrinsic noise followed some pattern, intrinsic noise, so this component went as 1 over n. Okay. And the other thing they did, so this is the wild type, but they also used this lack operon thing to take a protein and put in IPTG. So that means they unrepressed the repressor. So they put these two colors under the control of this lack repressor, which typically represses things in the lack operon, and they unrepressed it and then they got something, a gene that produced loads and loads of proteins. Okay. That was the point of that. And when they produced loads and loads of proteins, everything went yellow. So this shows that these two facts, the facts that when you suddenly produce a lot of proteins and that if you produce a lot of proteins, the intrinsic noise goes to zero and it falls off as 1 over n, suggested that it comes from small numbers. This is just them fooling around with other things. Okay. So let's see where this expectation comes from that it should go as 1 over n. So if we were to write down, so let me do a simplified version of this and let me just forget, so we have our gene and we're producing proteins. Let's forget about mRNA. We produce proteins with a rate r, and let's forget about regulation for now. Let's do the simplest thing and proteins can be degraded with a rate of 1 over tau. So there's a degradation time scale, which is tau. What I want to do is I want to write down an equation for the probability of having G proteins. So I'm going to call my proteins G at time t. Okay. So what I want to do is I want to write down a master equation. Everybody knows what a master equation is? Okay. Now we have most heads nodding, so I'll assume that if you don't figure it out, it's just a bookkeeping equation for what goes in and what goes out for the probability. Okay, so this is the probability of having G proteins at a given time. So how can I make proteins? Right? I make them at a rate G if I assuming that I have G minus 1 proteins. And, okay. And the other way I can make protein get into the G state if I have G... Sorry. If I have G plus 1, but I make one die. Okay? And since it's a bookkeeping equation, I also have to figure out how I can get out of these states. So is this okay with everybody? Right? This is a balance equation. These are terms of how I get into the G state and these are states of my bookkeeping for how I get out. Yeah? Sort of? Okay. Sorry, what is the... R is the production... Sorry, maybe I should put R here. R is the production rate. This is the degradation time scale. This is a birth-death process with a constant production rate. Everybody happy? If I'm going to... I want to have G proteins. I can get into having G proteins because I had G minus 1 and I produced them at the rate R. Or I had G plus 1 and I killed one with rate 1 over tau. But I also can lose having G proteins by killing one and then I go to G minus 1 or by producing one and I go to G plus. Okay? All I'm doing this is I'm writing this in terms of equations. And now I'm going to solve it and to solve it I'm going to introduce raising and lowering operators. So I'm going to introduce a lowering operator which takes the probability distribution and decreases the state by 1 and I'm going to introduce a raising operator which takes a state with G proteins and turns it into a state with G plus 1. Okay? Or does applying this to any function that's what it does. This is like in quantum mechanics. Okay? Think about these as A and A dagger. Okay? And you can write down similar commutation relations for these and you can play the same gaze. The commutation relation is slightly different than in quantum mechanics but there's a whole formalism. If you're interested you can go and play with it. Okay? But we're not going to do this here. We're going to do the baby version. Okay? So we're going to look at the steady state solution and I'm going to just rewrite exactly the same equation using my operators and not screw up. Okay? So far I haven't done anything. I've just replaced every time I have a G minus 1 by this and whenever I have an E plus 1 a G plus 1 by that. But now I'm going to notice that in fact this term I can rewrite as this. Okay? Because if I first lower and then increase I'm going to get a 1. So I rewrite this 1 as a combination of lowering and increasing. And now I have these two things and this is my equation and if I wanted to solve it in steady state I just need to make sure that what's in this parenthesis is 0. So in other words so this gives me a recursion relation which I can solve. Okay? Specifically for example R tau P naught is P1. Okay? So you can work backwards and solve this equation to get this. And then I need to normalize. So it's a probability distribution so I normalize by summing over the number of proteins over G. So this needs to be equal to 1. I sum over G so this is P naught times this thing which is sorry which is an exponential I'm jumping ahead and so this needs to be equal 1 so P naught is E to the minus r. And so at the end of this let me write it here maybe I get that P G is r tau G G bank E to the minus r tau. Okay? So this is the answer and you should be getting a warm and fuzzy feeling now. What is this? What's this distribution called? Poisson. Poisson, yeah. It's a Poisson distribution. Why do we love the Poisson distribution? What's so special about it? What's the like what's the one thing that's super simple about the Poisson distribution? The mean and variance. Yes, the mean is equal to the variance. Very good. So the reason we went through this and the reason it's so important for all of this small noise stuff is because this is a very, very powerful signature. So as you were pointing out there's a lot of things that can go wrong there's a lot of differences that you can't control for. But if you do an experiment and you get your variance equal to the mean that's something that's very easy to check. You're going to do an experiment so it's going to be noisy but one is still a very concrete thing to aim for. We just did an example where nothing interesting happens. This is basic or sometimes called constitutive gene regulation. This is the boring stuff. No actually it's called well maybe gene expression because there is no regulation. There's nothing interesting happening here. So if you do an experiment and you get the variance equal to mean you know this is a not regulated gene. If you get something else that means there's something interesting happening. There's some sort of regulation. So maybe one other sort of thing to take home from this but we solved this equation this way. You can solve it in many different ways. You'll solve it in another way in homework. I'll tell you in a second how. But the thing to remember is this is a one dimensional linear equation. Even if I add bed and whistles a master equation is a linear equation. A one dimensional equation in steady state you can always solve by recursion. By hook or by crook you can solve it. So even if it looks complicated I need to erase. Yeah. I wonder if something is similar to rho is g. Yeah. G. Yeah. G. I think because these two are different. Yeah. No, no, no this is I apologize for my handwriting which will get worse. Yeah. If you want to make a break I wasn't planning on making a break but should I make a break? A five minute break? Yes. Yes. Yes. Okay. So let me tell you one thing and we make a break because this is something just that you're going to do for homework, right? So a master equation in generally I can write you know as I said linear operator acting on some probability distribution. Another way to solve it is using a generating function. So who's seen generating functions before? Okay. Most of you have but now this is the thing that some of you are earlier in your curriculum than others. So you're going to do a homework problem with okay so this is the definition of a generating function. It's like going to Fourier space, okay? It's just like taking the problem to Fourier space and this is a complex variable and so you're doing a serious expansion and what we're going to ask you to do in homework, in the homework is rewrite a slightly more complicated version of that in generating function space and solve it and the recent generating functions are useful because if you do solve it then you can easily recover the probability distribution by now expanding your sorry z, g I have different notations in my notes I have to be careful now in expanding your solution which you'll find more easily as a serious and it's the derivative with respect to z now the generating function you go into z space you forget about g the idea is you get rid of g and it so happens that in many cases it's just technically easier to solve in this space it's like going in Fourier space, right? Sometimes it's easier to solve something in Fourier space and probably not maybe this afternoon we'll do an example of something that's more easier to solve in Fourier space and then we can go back but this is the way to go back so this is like the reverse Fourier transform but the main thing about it and this is why it's called the generating function is that you can calculate the moments of this distribution so moments are the mean, variance and other things that we never talk about formally there's a z to the l but it really doesn't matter so if you want to calculate this l of moments you do this, okay? and there's a normalization that the generating function taken at 1 corresponds to just plugging in a 1 here so that gives you normalization okay? g of z equals 0 is p of g equals 0 okay, so this is just in case you've never seen it before it's useful and it's cute and break so 5, 10 minutes what's the norm? 5 minutes is okay okay, 5 minutes which will bleed probably so so can you read the blackboard? no, it's possible not to mention it okay, so in the break why don't you take your chair and put it here? so now that we do the break when do I stop? I stop at 1.45? yeah, 2 hours long oh I have a 2, okay okay, I thought I think formalism, okay? I have a review I don't remember exactly from it's on my web page, it's also on the archive from 2011 of 2012 it's me, Andrew Muggle and Chris Wiggins it's called Analytical Methods for Something and then there's a more recent review I think it's also on the archive this is not gonna stick and the collaborator which goes into way more technical detail so the formalism is due to dye okay, I keep on talking so that you regulate okay, question from the break tell us more about the biology because we're lost so okay, this is called the central dogma of molecular biology that from DNA you make mRNA you make protein okay, so long term genetic information is encoded in the DNA gets transcribed into short term information of mRNA mRNA is short lived, DNA is there forever as long as you're alive and you pass DNA onto your children and it's the same thing for bacteria and then proteins are the workers anything that happens is mostly proteins so every step is regulated now physically the way regulation works is that you have the DNA there and as I said there's a machine it's called RNA polymerase RNA polymerase comes and it binds to a site which is called the promoter okay, this is just a big machine also approaching and it transcribes the gene which actually means DNA is made out of nucleic acids it's made out of these letters A, G, T, right, when we sequence things we make lists of letters what this does is it takes it and translates it into essentially the same letter with one difference but it always so it faithfully translates it and that's what mRNA is mRNA is an exact copy of one of the DNA strands and then the mRNA has the information and another machine called the ribosome comes and takes this information which is encoded as nucleic acids and makes it into what's called amino acids which are the building blocks of proteins so a ribosome is something that takes letters which is a letter sequence something like A, T so these are the letters we have they stand for different nucleic acids you can imagine so you have a sort of word like this a ribosome comes and says I know then they're read out in groups of three and it'll take this and translate it into an amino acid a specific one and this into another one and if you want to know which one is on Wikipedia I don't remember I'm not a chemist or a biologist, okay but so this is what the ribosome does this is mRNA and you have the same thing well okay now for the ones that actually know something I shouldn't cheat so in mRNA this will a T gets translated into a U that's the only difference for all practical purposes it's the same molecule it's just slightly different okay so this is faithful this is RNA polymerase and this is faithful and this is now it requires reading so what it does is when it sees this a ribosome goes out and finds the right amino acid if that's floating around in the cell and puts it together and then a set of amino acids is what we call a protein okay and regulation so another protein a transcription factor can come and physically bind so this is a picture for an activator so there's two modes that's repression and activation okay this is plus this is minus a repressor will bind physically here where the promoter binds and it'll make it impossible for the RNA polymerase to bind okay it's just physical exclusion typically so if it does that then it can't bind and so then you don't the protein doesn't get made the mRNA doesn't get made and activator will come and it'll bind somewhere close on a site that's called an operator and it'll basically change the free energy of this thing of this DNA protein thing I don't know complex complex is the word I'm looking for and make it easier for the polymerase to bind okay it'll probably induce some conformational change in practice but we won't go into that that gets soft mattery and you can calculate it you can do a lot of polymer physics here we won't go into it okay is that just a basic a better idea of what how this regulation happens and then an inducer is that like so that was CAMP in the molecule it's something that binds to an activator physically changes the conformational state of the activator make it possible for this activator then the activator becomes more willing again it decreases the free energy of binding and the activator is more likely to bind okay let me so just to motivate again what you'll be calculating in the homework again actually what happens is as I said you get this the DNA gets transcribed into mRNA and then it gets transcribed into proteins so when we did our master equation we ignored the mRNA step and we just went DNA to proteins and in reality what happens is that you have this one mRNA and this mRNA can produce many proteins so it'll be transcribed many times okay and each time it produces one proteins but it'll hang around for you know depending on the species from half an hour to even a few hours and during that time it can be transcribed a few times so if we want to build a model like this where we don't want to deal with the mRNA because it doesn't really do anything well in some cases it gets regulated but we won't talk about that in many cases it doesn't it's just an intermediate step so we can say that instead of going from one RNA DNA to one protein you go to B proteins okay which is called a burst of proteins and this is actually what you see in experiments these are experiments from Sanishi's lab where they build this very smart microfluidic device microfluidics is like miniature plumbing you know you can build little rods so they they use this again fluorescent markers okay why do people use fluorescent markers because fluorescent things make cells light up and they're easy to see stick it under the microscope and even a theorist will see it's there okay it's easy so they like things that fluoresce so they had this molecule in the cells that fluoresces but the problem was that it would get pumped out of the cell very efficiently because it wasn't a good molecule for the cells cells are smart things they can get rid of things they don't like and so they what they did here is they build this microfluidic device where they trapped the cells in these mini chambers so that the cells were pumping out this molecule but you could still see it because it was in a fake cell around it okay made in this fake chamber and then they saw how the fluorescence went up and when they looked so this is the time as a function of time how fluorescence goes up and you see that it doesn't go up in a linear way it goes up in these piecewise ways so if then they translated then they figured out how much fluorescent comes from one protein molecule and they could just by well taking the derivative of this in a smart way because you should never take a derivative of experimental data directly they saw that the number of proteins actually increased in steps okay so most so nothing happened and then suddenly they saw many proteins from one mRNA so this is what's called the burst and this is how many proteins they saw being produced per burst so how many proteins would get produced at one time and you could see that it can go up to quite high numbers so then they asked themselves well as this distribution that we're seeing is it consistent with a model and now we're getting into the regulation part is it consistent with a model where I have these these molecules and then I'm sort of I mean basically is it consistent with a molecule where I frequently produce mRNA but I produce few of them at one time which is sort of what this cartoon is showing and in this case you can do the calculation you'll get a distribution like this or is it consistent with a molecule with a model where I very infrequently produce molecules so I produce one you know most of the time I don't actually produce any but when I do I produce a lot and in this case you get a distribution like this and so this is consistent with this kind of distribution and this is for this is another type well this is the same thing and this is in yeast so this is E. coli in yeast and so the main thing they're saying is that in fact what happens in cells is that we have this intermittent behavior okay so an intermittency it may be a word you know from other dynamical systems nothing happens and then suddenly you get a lot of cells yes in the plateau so what you're saying is the signal the reporter signal from a cell right from a cell where mRNA has been produced so basically it has been produced here and you've produced on this this is around 10 so you've produced 10 proteins from one mRNA and now this mRNA or any other mRNA in the cell is not doing anything it's not producing proteins you don't I mean based on this experiment you don't know whether it's you know the mRNA is dying or not dying because there's probably a few in the cell but they're not producing proteins so what this is telling you is that there's not a continuous production of proteins but it's like it's like traffic lights right the cast stop and then they all go and then they all stop right okay so what do I want so this is what's what I called these are called bursts and they're called translational bursts because they're from the mRNA and you'll see a little bit about that in homework then there's another thing that can happen okay so the other thing that can happen is we'll go into more details of the regulation so this is our gene that can be expressed and this is our binding site now let's draw two versions of this now we're going to say that we have a transcription factor so the thing that regulates our gene and we're going to call it C okay and I sorry it regulates our gene but at the end it produces proteins which we'll still call G as before okay so this is a picture that having a concentration of this protein C will give us a certain number G and the way it works is so again we have we produce the proteins right so we produce our G's and then they can die let me modify the diagram but we're going to say that now we have two expression states and now I have to make a decision and I don't remember what I said in this example it doesn't really, no it does matter transcription factor it goes plus if it unbind so I think I'm talking about a repressor yeah it plus well okay let's talk about a repressor okay so this is going to be a repressor actually no let's talk about an activator because it's simple okay so we're going to say that when the protein binds it binds with some rate that depends on the concentration of this transcription factor and then the gene finds itself in deactivated state so that means it's going to produce the genes the proteins at an enhanced level but this transcription factor can also unbind with a constant k- and if the gene finds itself in the unbound state without the activator bound it can still produce a bit of proteins but it's going to do it at what's called the basal level which is you can think about this as being close to zero okay in reality biology is leaky so it's very rare that there's really no production but what this means is that there'll be like one protein produced per lifetime of the cell okay E. coli lives for about 30 hours for example okay so now we have a semi continuous variable G but we also have a spin like variable okay which tells you whether the transcription factor is bound or whether it's unbound so now if we want to calculate describe the probability of this system as I said it's like a spin like variable um I need to, sorry I need to do it like this P zero okay so this means this is the basal state and this is the activated state okay I'm going to put these indices up here this is the state so this is the gene state index but it doesn't matter I could put them down here too it's just notation okay don't be confused by that it's just there's no meaning to this okay so activated means binding side is occupied and basal means binding side is not occupied okay and so now we're going to write down a master equation for this again so now we have to write down a master equation for two states this basal and this enhanced state so let's start with the basal so proteins are produced and they die just as before so nothing changes here I just need to put on the indices for the state okay but the thing that changes of this state can change in two ways either because the protein number changes or because the gene expression state changes so you go into being indeed this basal state with the binding side non occupied if you unbind a transcription factor and you were initially in the bound state and you go out of it by binding transcription factor if you're in this state so this is unbinding remember we're assuming unbinding is constant we're just assuming this is a number, this is some free energy difference okay and then I have to do the same thing for the other one so this is the same thing as before the break I just added the the binding state so now the production rate is different so this term has the same form but with the rates being different nothing changes in degradation because given there's a protein it's gonna die the same way and now what changes here because unbinding now takes me out of this state unbinding takes me into this state so I have to change the signs but these two equations are coupled it's a set of equations and I have now normalization is given by summing over the states and the number of proteins again you can solve this for a very very long time I'm gonna introduce we're just gonna do something very simple and I'm gonna define first the probability of the binding site to be occupied so I'm gonna say I don't care about the DNA state so I'm not gonna care about the gene state I'm just gonna sum over because of normalization I have that the sum of these have to sum to one so what I'm gonna do now is that color no color chalk I'm gonna take this master equation I mean both of them so maybe I should do this here and I'm gonna sum it over G okay so let me take the first one so on the right hand side if you sum this over G by definition you get N naught so that's this and then if I sum these terms over G you can verify this but you're gonna get zero okay this is just the birth-death process and I mean if you forget about these indices if you sum over G without anything by definition you have to get zero that's the definition of a master equation right this master equation holds true for if I didn't have the indices and since none of the new terms interfere with it if I just sum over G it goes away if you don't see it just do it after the lecture so we're just left with the binding and unbinding terms but those are easy I have K minus and I have PG1 I sum over G that's the definition of N1 and I have plus KC which also doesn't depend on G of PG0 which is just N0 and let me get rid of N0 N0 I know is 1 minus N1 so that gives me K minus plus K plus of C times N1 minus sorry plus it's minus yeah this is a minus thank you but this gives me a plus and this gives me minus K plus of C right N1 very good wait I'm K minus K yeah that's right okay so then if I want to solve this in steady state I solve for N1 and I get that N1 K plus okay which is the answer that I wanted now it so happens a bit of biological detail now that K plus of C often has a form like this so it means it's a sigmoidal function that means that the more transcription factor you have the more binding you will get and then there will be some saturation level so I can plug this in and then I can divide by K plus this thing is called a binding constant an equilibrium constant okay and this is the thing that is related to the free energy of binding and so the larger this now it will be good to have chalk I'll draw it so the production of the gene will take on this sigmoidal form which is like a fermi function you can rewrite it in fermi form too and the point where it so if this is maximum expression this is half maximum expression this is what the concentration here is what we call this equilibrium constant and the slope of this curve is given by this coefficient H so the steeper it is the faster you go from between the two states of the transcription factor of the transcription factor that binds the transcription factor is the regulator of the process it's the protein that binds and makes you switch between these two states it's a constant it's not a constant it's a number that changes it's not changed by any other parameter so yes it's a constant number in the model it's not part of the master equation but then you can change it and see how it influences the process okay so this is one way of deriving this but what I really want to, can I erase this part of the board now I'd like to connect this with something that you may be more familiar with so has everybody done the absorption problem in statistical mechanics you know that you have a lattice side and you bind molecules yes okay again we're getting half half who's done all the problems in Kubo no? so we'll do that and you'll see how this is connected to the absorption problem okay so we can think of a cell so let's draw a cell it's a physicist's vision of a cell okay it's a lattice it's a lattice with binding sites and we have transcription factors again with concentration C and these transcription factors can bind on any of these lattice sites okay but we're gonna say there is one special binding site and if I do have this brown color okay so let me do this one special binding site and we're just gonna we're going to say it's special so let me draw my cell again of course it's an irreproducible cell except for my special binding site and so I can have two situations so one so right so this is maybe getting a bit yeah very good yeah okay partly I don't want to get into this because you know I think Matt will talk a lot about this but yeah what this means is that I need some sort of cooperativity or nonlinearity right this basically introduces a nonlinearity in the problem unless C is equal 1 so where does it come from the simplest thing is that if H is equal to 2 that means I two proteins need to come together and form a dimer and only as a dimer it's gonna look ugly can they bind and act as transcription factors and this is true for very many proteins so for example the Lag repressor we mentioned that's a dimer that actually binds and then forms a loop and forms a tetramer so we get a much higher effective hill coefficient looping will also increase the hill coefficient but yeah you need some but it's important so the important thing I should have emphasized is that we usually have the steep forms in the cell so we have a lot of nonlinearity okay so so far we've shown that the stochasticity and we've shown this nonlinearity in the system so it's not and we haven't really gotten very far right but yeah okay yeah production yeah but N1 the production of proteins so you can solve for the mean you can solve for the g which is sum over g Pg of g and you'll find equal to R1 N1 plus R0 N0 okay so if R0 is 0 or negligible it essentially goes as R1 N1 very good so as I said I'm sort of you're doing a very good job of keeping me in check and not having me cheat too much so please continue because I you know I've decided I want to show you some things but not the full picture but I'm happy to fill in the gaps okay so we're going to take a different view on this cell this is a cell okay as physicist's vision of the cell and so a transcription factor combined in many places so the important thing is so I've been telling you that transcription factors go to the DNA and regulate it but one question you can ask is well how does a transcription factor know to go to the right place and it's a molecule and there's tons of DNA and you are everywhere else so even if we say that a protein can only bind to DNA which isn't true how will it find the right bit of DNA for itself right you you you acknowledge this is a problem yes okay so in actually the way it works is it doesn't work okay so a protein is a protein and it tries to bind to different places and before it binds to the right place it can bind to the wrong places so there's many different places in the cell that it can bind to and if it binds to the wrong place we call it non-specific binding and then there's the right place that it has to find in order to do its job of regulating something now the way it's actually done is that it does have a preference for this okay an energetic preference so it the binding to the right site is much stronger than binding to all the many other sites but there's many many other sites so there's an energy entropy balance right there's the entropic factor of binding to all the wrong sites okay so this is the picture so we can have two situations one where the transcription factor did bind to the right site so this will call bound well bound to to promoter now that now that you're okay with the word promoter and here we'll say not bound to promoter so now I'm gonna look at this problem from a different perspective I'm gonna say that in general I have omega places where it can site so omega binding sites by which I mean the non-specific ones right and I have one specific binding site and I have L transcription factor molecules so that L divided by the volume of the cell is the concentration okay and if I have this situation then the energy of L molecules binding to the non-specific ones I'm gonna call it LNS and in this situation I have L-1 bound non-specifically plus one bound specifically with an energy EB where EB is much more favorable than ENS okay and I'm probably gonna get well okay so how many ways of binding non-specifically do I have if I've given you all the elements you need well okay it's a combinatoric factor right yeah everybody agrees that if I have omega binding sites and L transcription factors I can distribute them in this number of ways okay and in this case I have one list to distribute I'm gonna assume that omega is much larger than L which makes sense I have less transcription factors so in that limit I can approximate okay I can approximate omega over omega minus L as omega to dL again sterling expand and you can check this but this means that this becomes omega to dL over L factorial and this becomes omega to dL L-1 factorial or does it matter that I keep yeah L-1 okay so now what I'm gonna do is I'm gonna calculate the probability of being bound to this specific site like we do in thermodynamics right and I'm gonna say that I need to calculate the weight of the bound state over all the weights in the problem right everybody agrees basic thermodynamics right I wanna calculate the probability of a given state assuming I'm in equilibrium everything's in equilibrium I just take the weight of the that state by the total weight this should give me N-1 from what we had before okay so let's do this what is the weight of the bound state well it's this so I have e to the minus beta eb e to the minus beta L-1 non-specific I have my omega L-1 over L-1 factorial and then I have to add to that so I can repeat this in the denominator but I have to add to this the probability of now being in the non-bound state so omega L-L bank e to the e beta ns I was supposed to not write too low sorry about that but you know you can I think you can fill this one in by yourself and now I'm going to divide everything by this factor and I have to go somewhere so I'll try to keep some of this let's see what happens should be enough place okay so then I get the p-bound which I said is one to be L over omega e to the minus b delta e delta e where delta e is e bound by non-specific and I'm nearly there so I just have to deal with this okay and what I'm going to do I'm going to so as I said concentration is just the number of transcription factors in the cell volume so I'm going to introduce a characteristic size for how big a cell is and that allows me to rewrite this in terms of the relative concentration typical of the cell if you're worried by this I'll spell it out but if we say that L is like we said and that the volume of the cell is the number of binding sites times the volume of a box around these binding sites then C naught is the number of molecules in the, in this box and so then this becomes that and so at the end of the day we get that N bound equals C over C naught e well okay I guess if I want to let me rewrite it so that it looks exactly the same way well okay C over C naught e to the minus delta E C over C naught e to the minus delta E and if I wanted to look this is the way that it makes sense but if I wanted to look the same then I have C naught e to the beta delta E plus C okay so you see that we get back the same thing as here relating this binding energy specifically to this ratio of binding and unbinding rates and we don't have the hill coefficients because we didn't assume it this time so it's a special case but the point of this is that this is a purely thermodynamic argument right it assumes that binding and unbinding is an equilibrium and what this tells you is that this expression is true is a thermodynamic expression too it assumes equilibrium of binding okay if you don't have equilibrium of binding and one can you what's the case when you don't have equilibrium of binding when you have energy right when there's some additional source of energy then this can change and I guess the other important thing is that this thing you'll see this has units of concentrations try to say that here right so I should say this here this has units of concentration okay so that's the that's the thermodynamic part that's the binding part I have 15 minutes I'll start by raising I can raise the cell my cells my square cells yes you have a question my Iranian isn't very good so you have to ask in English go ahead I mean I you know I know everything I'm going to tell you for the next five hours so okay okay okay okay if you have questions don't hesitate to ask later so I said it's a problem that probably most of you have seen on an exam in homework before it's this absorption binding to a lattice problem that's exactly what we did with just a slightly different interpretation yeah yeah yes so this is this is related and okay I think I have a yeah okay okay the other the we can continue rewriting this we can also rewriting as p bound c kd plus c well well then it looks like that and then kd specifically is e to the beta delta e that's mu zero so this c0 is the chemical potential that's another way to look at it that's a good yeah no okay so it's true the combinatoric factors we got here well no the combinatoric factors would be true always but I made the assumption that the number of sites is larger than the number of ligands but that is the most reasonable assumption because you can bind anywhere in the cell right I mean if you just think even if you say I can only bind to DNA if you think about the length of DNA if you wanted to fill every place you could bind on the DNA you can do an estimate you can take the size of DNA the size of a protein see how many proteins fit on the DNA and ask how many proteins would I need in a cell to occupy every binding site okay I think the protein volume would burst you wouldn't have the space in the other direction to fit them in and remember the cell is a crowded environment like the cell isn't you know a room filled with air there's stuff everywhere and this stuff is sticky yeah yeah yeah so the equilibrium constant so if you were a chemist you'd have no basically okay this is unbinding to binding rate or the binding to unbinding then it depends whether you're a chemist or biologist which way you define it but this ratio gives you the equilibrium constant of binding and the equilibrium constant of binding is related to the free energy of binding which is given by the difference between the specific and the nonspecific binding yeah no it does no you have to have a you have to have this has to be favorable because you have an entropic problem with everything else and you can look so people have looked at the distribution of energies for specific, for one transcription factor what's the distribution of energies to where it binds and you see a peak for specific binding and then you see nothing a huge tail of nonspecific binding at higher energies so you see this one low energy solution I mean you know there is some variants in it but then nothing you really and this has been measured and it's definitely like that and nonspecific binding is a phenomenon and it does influence even regulation that if you have small concentrations okay this like opens up a whole can of forms about how do as a transcription factor how do I find my binding site so there's an interesting question of so what the proteins do they diffuse so there's the question of 3D versus 1D diffusion you can do a calculation and maybe you'll do like a version of this later that if you were to just look for a binding site by 3D diffusion it would take a very long time so and people have suggested and then they went in and measured that what most transcription factors do at least in bacteria is they go to the DNA they diffuse 1D along the DNA they unbind diffuse 3D bind to a different part 1D and they do this mixture of 3D and 1D diffusion okay so but this is a question and then you think about diffusion in a crowded environment because that's what the cell is so is it easier to diffuse in a crowded environment or not easier the answer actually is it depends but you know so here the answer is that this 1D 3D thing actually helps you a lot it helps you as an orders of magnitude to find the binding site but these are all questions of sort of the from the life of a cell okay quick story before we we end this is another experiment from from Sanishi and it also goes into this case of noise and what he did is he did well again some poor postdoc of his called Yuichi Taniguchi they looked at all the proteins that are expressed in E. coli okay those are the pink ones and they they measured the distributions they measured the mean number of proteins and you know the way the histogram of the mean number of proteins so in how many cells did they see this mean number of proteins so these are all and the blue ones are the essential essential ones means that if you get rid of this protein that means you knock out the gene E. coli no longer is able to produce this protein it will die it will surely die that's what essential means okay and then they looked at the noise so this is the variance by the mean squared as a function of the mean protein number for for all of these proteins and what they saw is that the proteins that are expressed in high numbers they they reach this plateau of noise but the proteins that are expressed in small numbers they have larger noise and this noise goes as one over the mean okay this is variance over mean squared remember we said that variance over mean goes as one so variance over mean squared is one divided by the mean and so that goes as one over N and that's exactly what they see so again they see this small number noise in all of these different proteins that are expressed at low levels so then they went and looked at mRNA because they you can do experiments this are called fish experiments so now instead of looking at fluorescent proteins you take your gene and you engineer it that you add some some new base pest to the end of the gene so when your gene is made into mRNA you made the mRNA that was encoded in this gene but you also make some additional mRNA and this mRNA is like has these sticky loops that bind fluorescence again proteins okay so when the mRNA is expressed it expresses itself and then a bunch of loops that then bind fluorophores things that will fluoresce so when you see the mRNA being expressed then you see it also bright up under your microscope it's called fish fluorescent in C2 hybridization and so fish is a very powerful technique because it allows you to look at mRNA directly and so you see the mean protein number and the mean mRNA so they correlated the mean protein number with the mean mRNA number and they see there's a strong correlation okay and then some of them sorry all of this they actually got from a different technique a high more high-fruits technique called RNA-seq which uses sequencing but they verify the few of those by fish anyway they see a correlation between the number of mRNA and the number of proteins when they look at the mean number the phano factor so this is the variance over the mean for mRNA they see that it's usually larger than one so that means we're seeing these bursts in mRNA expression but okay let's concentrate on this they see it's at the same level and they see the thing is that when you do these experiments you in a way you kill the cells so you're doing something similar to measuring the steady-state distribution so what they're correlating here is from different cells okay not exactly the same cells the same type of cell but different cells and then what they did is they went in and in each cell they built the construct that at the same time they were able to measure the number of proteins and the number of mRNA and so this is what this is protein number on this axis mRNA on this axis from the same cell and then you see there's absolutely no correlation this is the correlation coefficient okay so what's going on here they were seeing correlations between proteins and mRNA which made sense because the more mRNA you have the more proteins you should have because that's what proteins come from the same cell and there's absolutely no correlation so the answer is that it takes time you have your mRNA and this mRNA produces proteins but these proteins will not be seen at the same moment you see the mRNA from which they're produced right it says if somebody asked you took a picture of you and your parents aged 15 and asked do you look similar right you'd say maybe maybe you don't but you know let's say you do look like one of your parents you'd say yes we look similar and now they take a picture of your parents now aged 50 and you age now age 25 or whatever and ask do they look similar on these pictures and probably most people would say no because you know your father has put on some weight or something right and you haven't so it's the same thing right here if you look exactly the same time you're seeing the you know the future protein well the protein that was produced from the past mRNA and the mRNA that will produce the future protein so no correlation so this is just to make a point that timing is also important and actually time delay coupled to this non-linearity gives us also interesting effects in cells but we won't talk about it so I think it's lunchtime with that and we'll continue a bit about that and start on evolution after the break