 Hello, welcome to this lecture on Biomathematics. We have been discussing various ideas from statistics. We continue to discuss some more statistics and today we will discuss two separate issues, two small separate issues like one something related to. So, we have discussed about average standard deviation etcetera. Now, we have today first we will discuss the case of a flexible protein or a flexible polymer and if you have a flexible polymer of N monomers, what will be the average size of this protein or average folded size like may. So, if you have if you have a protein of N monomer N amino acids, in the folded configuration it can be like anything. So, the size of this folded size. So, what will be what will be the average size and what will be the standard deviation etcetera can be easily calculated from a simple model and this uses some ideas of statistics. So, first we will discuss the size of a folded protein and then we will come to a slightly different distribution as we discussed some more about distributions. So, the main theme is statistics itself. So, that is the bigger theme of today's lecture. It is statistics as you see here. And specifically we will discuss first the part one of this lecture will be size of a folded protein as a simplest model. So, what is the simplest model for a folded protein? Well the real protein folding is a very complex problem and at this point we would not go into this, but the simplest model is just forget about all the complexity and ask and tell that I have a protein of N amino acids or 5 amino acids or 10 amino acids or 40 amino acids and it takes and so you know that reality proteins like when they orient there is lot of constraint of angles and they orient in a particular angle and so on and so forth. But the simplest model is to assume that protein amino acids or protein monomers can take any angle they are flexible to take any configuration. They are flexible to take this configuration, this configuration, this configuration, this configuration. So, any configuration it can take. So, that is the if that is if we assume this if we assume that a protein of N amino acids can take any orientation that is that that makes the case that makes the problem very simple enough to analytically solve and get a formula for average size and standard deviation of the size and all that. So, we will see that how we can do this, but to make to do a calculation here to learn a calculation by doing a simple example, we will make an assumption which is kind of unrealistic. However, we will first we have to do anything that is complex. First we have to learn how to do something very simple. If we do not even know how to do a simple calculation, we have no hope of doing something very complex. So, the in reality the size of a protein, the average size of a protein is of the folded size of a the size of a folded protein is a it is little difficult to calculate, it is not very easy to calculate, but we will make a an assumption and simplify the problem to a level where we can easily solve it and get a get an answer and many things that we we can learn many things from this answer. So, this is first let us consider this particular case where the assumption is that imagine that a protein of n amino acids has no preferred orientation, they are completely flexible. This is true for protein is true for any polymer actually. So, let we have a polymer and this polymer has n monomers. So, let us draw this here. So, let us draw a polymer of n monomers. So, this is the first monomer, second monomer, third monomer, fourth monomer, fifth monomer. So, let us say imagine five monomers. So, they all oriented in some particular direction first monomer. So, there are five monomers. Now, how do we define the size of this protein? So, there is some quantity called end to end distance, end to end distance or vector. So, we can this is one end and this is the other end. So, you can define a vector which connects these two. So, from one end to the other end what is the. So, this vector we can call R which is the end to end vector or the mode of R is the end to end distance. So, the R is a vector and the mode of R we learn what is mode what do we mean by the mode of a vector basically the distance. So, the distance thus distance is called end to end distance and the distance plus the direction which way it has to go what is the direction from one end to the other end. You have to go in which direction to reach the other end. These two information together forms as end to end vector. So, let each of this monomer has some orientation. Let T 1, T 2, T 3, T 4 and T 5. These are the five vectors having five different orientation or any five random orientation. Then the end to end vector R is nothing but the sum of all this vectors. So, T 1 vector T 1 plus vector T 2 plus vector T 3 plus vector T 4 plus vector T 5 will give you this R vector. So, this is can be written as sum over i 1 to 5 T i if you like. So, this is essentially what is shown here. So, the end to end distance you have a vector T 1, T 2, T 3, T 4 and T 5. The end to end vector which is shown in red and represented by R can be obtained by summing T 1 vector plus T 2 vector plus T 3 vector plus T 4 vector plus T 5 vector. So, the sum of this five vectors will give you this end to end vector R. So, this is one quantity. R is a quantity that defines the size of a protein. So, in this particular configuration the rough size can be estimated by this vector R. So, this is also some way you might have heard of something like there are various many other quantities that you can define that can define this folded size, but let us now define a quantity called end to end distance and this is the definition of this. Now, imagine that you are taking 1 million proteins of the same protein 1 million copies and you have put this 1 million copies in 1 million test tubes and at a particular moment you take a photo of all these proteins. Let us say you can exactly see or take a photo. Imagine that you can take a photo of each of this configuration of this protein at a particular time. So, when you look at the watch and when the time is exactly 1 o clock in the afternoon you just take 1 million configuration photos. Imagine this doing this. So, what do you expect to see? So, if you assume that they can take completely random configuration. If you assume that each of this protein can take completely random configuration what you expect to see is the following. What do you expect? Let us draw what we expect to see. So, when we take the first photograph. So, let us draw this x y axis. So, now if you take keep start. So, first photograph if you look it will have some shape like this with end to end distance like this. The next one will have some other configuration like this with end to end distance. Let us say this with end to end distance this. Some other protein will have some other configuration completely random. The end to end distance can be something like this. So, if you look at the end to end distance they might be oriented in completely random direction. Some here 1 end to end distance here 1 end to end distance here. So, have a look at this plot here. So, here you have 3 different randomly oriented protein shown in 3 different colors. So, you can see black, blue and green. So, black has this particular orientation and the end to end distance is this red here. The green has this orientation 2, 3, 4, 5. So, there are 5 vectors. So, they have an end to end distance this and the next one look at the blue one they have some particular orientation and it is end to end vectors this. So, if you just look at the red only. If you look at the red only the end to end distance only end to end vector only and if you look at the end to end vector of 1 million configurations. What do you expect to get? 1 end to end vector like this, another end to end vector like this, another end to end vector like this, another end to end vector like this, some other end to end vector like this, some other end to end vector like this, some other end to end vector like this. So, you will have n to n vector in completely random directions and random magnitudes. So, all sorts of direction, all sorts of magnitudes. So, all of these are n to n vectors. So, it is similar, you have 1 million or large number of n to n vectors. Now, if you find average of all this vectors. So, average sum over all this and divided by the total number. What do you expect to get? The average of all this will be 0, because this is one vector. Exactly opposite direction there is another vector. So, they will be cancelling each other. So, if there is a vector in the plus direction, there has to be another vector in the minus direction, because completely random means, it precisely means that they all will be uniformly distributed. They all will be orienting in all sorts of directions. So, when you find the average of this, there will be 0. If you do not trust this, you can go to a computer and try generating random configurations like this and calculate the n to n vector and calculate the average and see that they are indeed 0. If you go to large enough n, they will be always 0. So, the average n to n distance 0. So, that is what is defined here. So, this r is defined as. So, the n to n distance r, as you see here in the slide t 1 plus t 2 plus t 3 plus t 4 plus t 5. So, this average is equal to 0. So, this average means, what does this angular bracket average means? It means that you do many experiments. In each experiment, you measure the n to n distance of a protein and you average over this experiment and you get 0. So, well we know, we can get that the average value is 0. So, the next it is, it does not have much meaning. But still, even if it is average is 0, if you look at this particular case, each of this protein has some size here. So, if you look at this, if you look at a particular protein, so there is some size. So, we want to get some idea about, what is this rough size like, you want to get some idea about this. So, the next quantity that is sensible is to calculate r square average. We saw that whenever r average is 0, it is sensible, we always can calculate r square average and calculate the root mean square n to n distance. So, what is the r square average? That is the next, what is the r square average is the question, that is the next interesting question. So, how is r square average defined? So, if you look at here, the r square average is defined as T 1 plus T 2 plus T 3 plus T 4 plus T 5 whole square average. So, what is T 1 plus T 2 plus T 3 plus T 4 whole. So, r square average is defined as T 1 vector, this is a vector here, T 2 vector plus T 3 vector plus T 4 vector plus T 5 vector whole square and then average. So, what is this thing whole square? So, any vector square is r vector square is, we know that we saw that this is r dot r. So, basically you have this dot this. So, you have basically T 1 vector plus T 2 vector plus T 3 vector plus T 4 vector plus T 5 vector dot T 1 vector plus T 2 vector plus T 3 vector plus T 4 vector plus T 5 vector. So, what is this value? So, first is T 1 dot T 1. So, what is the answer of this? So, answer of this is, first time was T 1 dot T 1. So, that is T 1 square, then there is T 2 dot T 2, there is T 2 square, then there is T 3 dot T 3, T 4 dot T 4, T 5 dot T 5, T 5 square. So, T 1 square plus T 2 square plus T 3 square upto T 5 square, then what do we have? Then we have T 1 dot T 2. So, there is a terms like T 1 dot T 2, there is T 1 dot T 3 dot dot, there is T 1 dot T 5. So, all this are plus dot dot dot all of this average. This is what we have to calculate. So, if you do this carefully, what you get is there is T 1 square average, T 2 square average, T 4 upto T 5 square average plus all this product. So, we know, that is what is T 1 square average? So, T 1 square average is, we saw that T 1 square, square of any vector average will give you the average, T 1 is the size of the first. So, we have like 5 vectors. So, T 1 is nothing but the length of this. So, T 1 square is nothing but the length of this. So, if we define T 1 square as B square, where B is the length of this monomer. So, if we define T 1 square as B square or in other words, if we define mod T 1 as B, that is the length of each monomer is B, T 1 square is B, length of each monomer is same B. If we imagine the length of each monomer, each monomer has the same size and all of them are B. So, this is B square plus B square plus B square plus B square. So, you will get 5 B square from this. So, there are 1, 2, 3, 4, 5. So, there is what you get? What you get is that? So, the answer is average, there is 5 B square plus there is T 1 dot T 2. So, you know that T 1 dot T 2 is the mod of this and A dot B is cos theta. So, this is basically mod T 1 mod T 2. So, mod T 1 mod T 2 cos theta average plus dot. Similarly, mod T 1 mod T 5 cos theta average plus dot, dot, dot. This is the other terms. So, there will be 5 B square. So, now, what do we get? So, T 1 dot T 1 T 2 is nothing but B square. So, you can take B square out from everywhere and 5 B square average is 5 B square itself. So, 5 B square plus B square into, there is cos theta average plus dot, mini cos theta average. So, now, what is cos theta average? Cos theta average is nothing but 0 because if you look at cos theta, it can take, if you have cos theta can take values from minus 1 to plus 1 and you sum over all this, you will get 0, values from minus 1 to plus 1. So, this is 0. So, the answer is, at the end of it, the answer you will get R square average is 5 B square. So, what do we get? R square average is T 1 plus T 2 plus T 3 plus T 4 plus T 5 whole square average is 5 B square. So, the R must end to end distance, root means square end to end distance is square root of 5 B. If you have 5 monomers, the R square average square root of 5 B. Now, if you have n monomers, you can extend this similar way to n monomers and you will see that the answer, if you have n monomers, this is root of n times B. So, look at here. So, the R square average is n B square and if you look at the standard deviation, which is root of R square average minus R average square, you will find B times square root of n. So, what did we find? We found that R average of a flexible protein, protein or random protein, protein that can take a random configuration will be 0 and the sigma R square average minus R average square, square root, which is, which we define as sigma. This will be square root of n times B. So, sigma is proportional to the square root of n. So, if there are n monomers, the standard deviation of this end to end distance is proportional to the square root of n. So, this is an interesting, this is a very interesting finding. This has some consequence. We had learnt in the case of diffusion previously that, if you calculate sigma for diffusion, sigma for diffusion, which we defined as x square average minus x average square, we will get that this is proportional to square root of time. Here, we have similar way sigma is proportional to square root of n. Here, it is time and here it is number of monomers. There is some relation between this and this. We will discuss this, but this is an interesting property that you should remember and this has some consequences. So, whenever you add, so what did we essentially do here? We essentially, what we did was, we added some randomly oriented vectors. When we added randomly oriented vectors, we got average 0 and the standard deviation going as square root of n. Now, let us, we can extend this. Just remember this argument. We will extend this to a different case in a different context in the coming class, but at this moment, let us learn this for a protein and learn that the standard deviation sigma is proportional to the square root of the total number of monomers. Now, the question we can ask is, if we know the average and if we know the standard deviation, we know a lot of information about this, but what is the distribution of this? So, we briefly mentioned in the class in the previous, one of the previous lectures that the distribution of the end to end distance actually turns out to be a Gaussian. So, if you, if you plot the distribution p of r versus r, that we will get is a Gaussian with r, if you plot r 0 to l, this will be a Gaussian with p cut 0. So, if you, so this is, this will have a, this will have a shape of a Gaussian. So, if I draw it properly, if I draw is, if I, if I, if you draw it properly, p of r has a e power minus b r square. So, it will have this Gaussian shape, shape of a Gaussian and we know the standard deviation and average. So, if we know the standard deviation and average, we know that the Gaussian distribution will be, if you know the standard deviation and average, p of r has to be a e power minus r square by 2 sigma square. That is what we learnt. So, what this would means that the distribution is a e power minus r square by 2 sigma square, sigma we learnt as sigma square was n b square. We found that sigma square is n b square. So, the probability distribution for this particular protein having random configuration is, it turns out that it can be shown that it is this e power minus r square by 2 n b square. We will, we can explicitly show this, if there is time, maybe the, towards the end of this course, if there is more time we have, once we cover our main part of the thing, we can, we may explicitly show this in another lecture, but just realize that this is the distribution function. So, this is an example of a Gaussian distribution happening in biology. If you take a model for a protein or a polymer, a flexible polymer to be more precise, this can be single standard DNA. If you do experimentally and look at single standard DNA as a flexible polymer, you are likely to get this particular answer. So, now, we know this things. Now, we will look at, so to summarize this part, so this is first part of this lecture. And to summarize this part, the summary of this part is that the flexible protein is, we found out the average for a flexible protein, the average n to 1 distance, we found out the standard deviation of the n to 1 distance and we showed that the standard deviation is proportional to the square root of the number of monomers. So, this is the summary of the first part. Now, we will switch gears to a different case. Another, so we have been, in general, this is in distribution functions. So, we discuss Gaussian distribution, normal distribution. We applied this idea of normal distribution to different cases, even to understand many cases. And now, we will try and understand another distribution function, which is very useful in biology. So, before doing that, let us ask ourselves some questions. So, the question, some question which I want to ask you is that, so you know something about microtubules. You know that microtubules rapidly grow and shrink, exhibiting the phenomenon known as dynamic instability. You might have heard of dynamic instability. Otherwise, go and have a look at this text books. What is dynamic instability of microtubules? So, it turns out that dynamic instability is a phenomenon where microtubules rapidly grow and shrink. That is, if you look at the length versus time for microtubules, what you would see is that they grow rapidly and shrink. So, this is a rapid grow and growth and shrinkage. So, this kind of phenomenon is known as dynamic instability for microtubules. So, if you know length as a function of time, if you know the length as a function of time, we can always calculate the average length. So, from this, you can calculate the average length. And we can also calculate the standard deviation. We can even calculate the distribution of length. And did you know what is the distribution of this length? We will discuss that. Now, also imagine that proteins that bind on to the DNA and imagine that they bind with the rate k. What is the probability that you have to wait at a time, wait a time of d t between consecutive binding events? So, imagine that you have binding and binding event happening. And you have to, let us say there is a binding event happening with a rate k. So, consider this example of, let us imagine that your, imagine proteins binding on to DNA. So, imagine that you have a, let us consider DNA here. So, look at this paper and what you have is, look at here and imagine DNA and you randomly, let us say proteins bind on to this DNA. Then you ask the question, let us say one protein binds. How long you have to wait for the next protein to bind? So, one protein comes and binds here. How long you have to wait for the next protein to come and bind? So, let us say d t is the time you have to wait for the next binding event to happen. In such a case, typically, what will be this distribution of d t? How long typically you will have to wait? So, p of d t, what is this? It turns out that the answer to this question is something called an exponential distribution. Exponential distribution is having this particular form p of x is a e power minus k x, which is a peak at 0 and it is coming down as a function of the parameter x. So, this distribution is so called exponential distribution. So, length of microtubules in during dynamic instability, they will have exponential distribution. That means, small length, you are likely to find small microtubules more than very long microtubules, less likely to find very long microtubules. Similarly, if you ask the question how long you have to wait? It turns out that you would not have to, the probability that you have to wait very long is less and the probability that you have to wait short is more. So, the event will happen immediately has a large probability and the event that it will happen after a long time has very little probability, that is what this precisely means. So, now in such a case, if we have such a distribution, what are the properties of this distribution like average, what is the standard deviation and so on and so forth? That is our interest to find out here, discuss here. So, first question if you have this p of x is a e power minus k x. So, the first question is surely, what is a? So, how do we find a? That is a question and like we did before, we know that this integral of this, the total probability has to be 1. So, the integral p of x d x is equal to a e power minus k x d x integral, this has to be 1. What does that mean? So, if you have an integral, so what we have? So, we have p of x is a e power minus k x and we want p of x d x equal to 1. That means, integral a e power minus k x d x is 1. This means that, what does this mean? Let us do it here. This means that, is not a function of x is a constant. So, you can take a outside. So, a integral e power minus k x d x equal to 1. So, what is integral e power minus k x d x? So, what is the limit? The limit is 0 to infinity because length of migratability is typically either 0 between 0 and infinity. It cannot go my negative. At the time, you have to weigh it typically cannot go negative. So, this is all these cases. This is the integral is from 0 to infinity. So, let us do this 0 to infinity. So, what is integral e power minus k x from 0 to infinity? It is, so integral of e power minus k x is e power minus k x by minus k. This is the integral of e power minus k x limit 0 to infinity with a constant a here. What is this? This is a times. If you apply this x, if you put x is equal to infinity, e power minus infinity is 0. So, this is 0 and minus n x equal to 0. This is minus 1 over k. So, the answer integral p of x d x, it turns out that a integral 0 to infinity e power minus k x d x is k. Sorry, this is 1 over k. So, this answer is 1 over 0 minus k with an a here and this has to be 1. So, this has to be 1. That implies that a is equal to k. If a by k has to be 1, a has to be equal to k. This is what is shown here. The way to find a is that if you do this integral 0 to infinity p of x d x and equate that to 1, we will find that a is nothing but k. So, our distribution function p of x is nothing but k e power minus k x such that p of x d x is 1. If this is the case, now we know the distribution function. How do we find the average? What is the average of this? So, the average of this is defined in this typical fashion. So, the average for any distribution function is defined as 0 to integral over x p x d x. So, the integral here is 0 to infinity because the x can go only from 0 to infinity in this case that we are interested in. So, x average is 0 to infinity x p of x d x, x square average is 0 to infinity x square p of x d x. So, what is x average and what is x square average? So, let us calculate this x average and x square average. So, let us first calculate x average. So, x average can be defined as 0 to infinity x the distribution function is k e power minus k x d x. If you look at here, we use the technique that x e power minus k x. So, this is 0 to infinity x e power minus k x can be defined as del by del k of e power minus k x with a minus sign. So, this if you look at this, if you look at del by del k of e power minus k x with a minus sign, this is x e power minus k x d x. You can do this yourself and convince yourself that del by del k will give you a minus x and with a minus sign will be plus x and there is a d x here. So, there is a d x here. So, this is what essentially this integral is equal to this. So, x square average is this. So, this is not equal. So, I can just take this k out. So, this is what it is x e power minus this integral is over x. So, I can k this take this k out. So, this quantity x e power minus k x is nothing but minus del by del k of e power minus k x. So, this is you can convince yourself that this is true. If this is true x square average can be written as x square average can be written as we saw k with a minus sign 0 to infinity del by del k of e power minus k x d x. This is minus k del by del k of 0 to infinity e power minus k x d x. So, we saw that 0 to infinity e power minus k x d x is 1 over k. So, this is minus k del by del k of 1 over k. So, what is del by del k of 1 over k? So, this is if you look at it del by del k of 1 over k is del by. So, you have minus k times del by del k of k power minus 1 which is minus. So, this is x average minus k times minus 1 times k power minus 2. So, this is k by k square. So, the answer is 1 over k. So, the x average turns out to be 1 over k. So, the x average the average x average for an exponential distribution is 1 over k. Now, what is x square average? So, as we typically define x square average is defined as integral 0 to infinity x square p of x d x. So, here x square k e power minus k x d x integral 0 to infinity. So, I can take this k outside and write it as and write this thing as x square average can be written as k into integral 0 to infinity x square e power minus k x d x. And the standard technique is that you can write this x square e power minus k x. So, this is k this is 0 to infinity this part as del square by del k square e power minus k x d x. If you do this and just look it is exactly this. So, this we saw we can see that this is this. So, just like we did before if we just you can convince yourself the del by del square e power minus k x is x square e power minus k x and there is a d x. So, there is a d x. So, now, if you do this what you get is that k into del by del del square del k square 0 to infinity e power minus k x d x. So, you know that 0 to infinity e power minus k x d x is 1 over k. So, this is k del by del square k by del k square 1 over k. And if you do this the answer is 2 over k square. So, this is what x square average is. So, x square average turns out to be 2 over k square and x average is 1 over k. So, what do we get? We get x average is equal to 1 over k, x square average is 2 over k square and now we can calculate the standard deviation. The standard deviation is sigma is equal to x square average minus x average square which is 1 over k. So, if you look at here. So, what do we have? We have x average 1 over k, x square average 2 over k square, standard deviation 1 over k. So, what are we finding? We find that the standard deviation and the average are the same. For exponential distribution, this is a very interesting property that it has average and standard deviation the same value. So, this is an interesting thing to note or remember that the exponential distribution of average, the fluctuation. So, if you look at the length of a micro T view now or if you look at the time that you have to wait d t, the average time you have to wait will be the 1 over the rate per k. So, here if you have a binding even happening with the rate k, the average time you have to wait is 1 over k, but the deviation in this average that is d t square average minus d t average square is also 1 over k. So, this is the property of an exponential distribution and this is same for micro T view length. If you the average length is some particular number and you will find that L square average minus L average square will also be the same number. So, if this is some L 0, this will also be L 0. So, sorry square root of this even here this is square root of this. So, the square root of this will be a number L 0 which is same as the average. What does that mean for micro T view? That means that if you look at length versus time for a micro T view, what do you find? Like it grows, shrinks, grows, shrinks, grows, shrinks, grows. So, this if you just follow this growth and shrinkage, the average length, the average length you see and the standard deviation, the deviation from the average they are roughly the same. So, the standard deviation is as big as the average. So, this is the property of an exponential distribution that the standard deviation is as big as the average itself. So, this is one thing to remember that exponential distribution has a standard deviation as big as the average. So, dynamic instability has this property that the deviation, the length deviation is as big as the average itself. So, to summarize the problem in this exponential distribution we learned the average, we learned the standard deviation and we learned that the standard deviation and average are the same. Now, you can also ask the similar question we ask for Gaussian distribution. We ask for the Gaussian distribution. If you have a Gaussian distribution, so if we had a Gaussian distribution, what is the width at half maximum? Similarly, you can also ask the question, what is the width at half maximum? So, do this questions yourself and find out the width at half maximum for each cases, for standard exponential distribution as well as the Gaussian distribution. So, we learned two kinds of distribution, normal distribution, exponential distribution. So, these two distributions are very useful in biology. It appears in different, many, many in different places. You will see that they come very often in biology and then you will have to use this. You will have to know something about this, the property of this. You have to always remember their average and standard deviation at least and the shape of the distribution. So, with this idea, with this few things that we learned today about exponential distribution, normal distribution and the property of flexible polymers, we will conclude today's lecture and we will continue discussion on statistics in the coming lectures. So, see you and bye for now.