 Hello, welcome to this lecture on Biomathematics. In the last lecture, we have been, in the last lecture, we started discussing statistics, how statistics can be applied in biology. And we took a simple example of a few, a few examples like traveling to your college and number of student, number of mark for an exam and all that. And we discussed the idea of average and standard deviation. Now, how do we, so the basic question we had is that, we get a lot of data in from experiments, large amount of data. And given this huge data, data set or set of numbers, data is essentially a set of numbers. How do we make sense out of this numbers? What we can learn from this numbers? And we learn that, average and standard deviation is some two simple things that we can learn. And that, that has a meaning and there are two numbers. From a huge data set, we can extract two numbers, meaningful numbers, average and standard deviation. What more can we do this? That is the, that is the, that is the, that is what we are going to discuss in this lecture. What more we can extract from the data or how do we present the data, so that we get, we have much more information, we can convey much more meaningful information. So, today again we have statistics as our main title. In statistics, we will have, so this is the slide we, in statistics we will have, we will discuss in specific, specifically probability distribution. So, we will go ahead and see, what is, what do you mean by probability distribution? So, as we said, we, we had various experiments. So, one of the experiment was travelling to the college, but we will discuss a new experiment, a simple experiment that you all can do yourself. We do not need a big setup or anything, just you can do it in your class. So, let us do a simple experiment of measuring height. So, the experiment is measure height. So, measure height of voice in your class. So, this is a simple experiment that anybody can do. You can measure height of all voice, let us say, and or girls. So, here we, we take example of voice, because some numbers that we used might be more suitable here for this example. So, if we do this experiment, let us say you just go on and measure. Now, all the heights of each and every student in your class and what do we get when we measure? We get a, we will be getting a big data set. So, let us look at some, some, something what you will typically get. You will typically get a set of numbers. So, let us say here is what the, let us say in your class if you just look at the numbers. So, you can have 150 centimeter, 171 centimeter, 165 centimeter, 140 centimeter. So, some numbers, these are some kind of reasonable numbers. You know that like, so the many, many of them around 150, 140, very few, 180, this is just 1180. There is nothing bigger than 180 as you see here. There is one person, 180 centimeters and like this one person with like 131 centimeter, very small, short person and very tall person and others somewhere in between. So, you have like large number of such data set, such numbers. So, this is not ending here. This is just a continuous list. So, let us say you do this measurement and how do we make sense of this data? So, we have this huge data here and how do we make sense of this data? So, let us say we have hundreds of, we measure heights of hundreds of students and so we have hundreds such numbers, 150, 160, 130, 140, 140, 122, 135, 100 numbers. How do we make sense of this data? So, two things we learnt are that we can find the average and standard deviation. We can say that average is 155 centimeters or 150 centimeters. That will give you some idea and we can say as standard deviation is plus or minus 20, so standard deviation is 20. So, you can say that average is 150 and the standard deviation is 20. That means, 150 plus or minus 20. So, this is something which we can say if you wish like using the idea that we learnt in the last class, in the last lecture. But the question is, is there any other way we can present this data so that it reveals much more useful information? More than average and standard deviation can give, can we present this data instead of presenting all this many, many numbers here? Can we present this data in a different way such that this reveals much more information by just looking the way we present it? So, we just writing down this numbers is a very bad idea. It is a like a very, very stupid idea in some sense or it is not like a very, it is not a great idea. It does not make much of a sense. You see lot of numbers. What does it mean? Like many, just many, many numbers. It does not make much sense. But if we present it in a particular way, is there a smart way of presenting this such that it makes much more sense? By just seeing that you should be able to make out lot more information than either average or standard deviation or just by one glance of this numbers you anyway does not make much sense. So, just more than standard deviation and average, we need lot much more information and how do we present this such that, how do we present this numbers in one slide such that it reveals a lot of information? So, the answer to, so this is the question. Is there a way, is there any other way we can present this data so that it reveals much more useful information and the answer is distribution? So, we can, we will discuss what is distribution means. So, but the, just the meaning of the word would tell you we are talking about heights. So, we want to present the distribution of heights. So, how do we present the distribution of heights? So, let us look at this. This is one way. So, we can say that taken presented in different range. So, let us say we can break down the data into ranges. How many students are there having a height in the range of 130 to 140 centimeters? So, this height range is in centimeters. So, if the height range is between 130 and 140 centimeter, how many students are there? Just one student. How many students are there having height between 140 centimeter and 150 centimeters? Let us say there are 17 students. So, this is why this column is number of students having height in this range. So, 150 and 160, how many students are there such that the heights fall in between 150 centimeter and 150 centimeter? There are 35 students. There are 30 students having height in between 160 and 170 and there are 15 students having height in between 170 and 180 and there are 2 students having height in this range between 180 and 190. So, large number of students in the middle and a few students with very short few students and very long few students. So, very tall students, only two, very short students, only one student, like very short. Tall, two tall students and all are somewhere in the middle. So, this is the typical data that you expect and this is what you have. So, this is actually a smart way of presenting the data because this gives you some idea. But if this is just a table and if you plot this table in a graph, this we always like, we always like to present the data in graph as a graph that that can make much more sense. So, one way of presenting such things are called histograms. So, when you have data, this kind of data, if you plot it, they appear as if like a histogram. So, what is a histogram? See this. So, let us say between 130 and 140, this is what I mean by 135. I just put some number in between here. There is one student with 135 height between 130 and 140. So, h is the heights and n of h is the number of students having height h. So, what is the grads, the distribution of heights. So, n of h, you can call distribution. So, this is essentially the same thing that we saw, like if we can draw this in a different way, if we wish. So, we can draw this h and n of h. So, how many students are there between 130 and 140? So, there is 150, there is 160, there is 170, there is 180, there is 190. So, these are the numbers we have. So, 130, so let me write, this is 130. Let me write it little more clearly. So, this is little more clear for you. So, let me draw this little more clearly. So, you have, let us say this is 130, 140, 150, 160, 170, 180, 190. So, how many students, so in this table, if you, they had one student between 130 and 140. So, between 130 and 140, we have just one student. So, we mark it one. So, between 130, just one student, between 140 and 150, there were 17 students. So, between 140 and 150, there were 17 students. So, let us mark this as 17. So, this is 17. Between 150 and 160, how many students? So, this table tells you, between 150 and 160, there are 35 students. So, between 150 and 160, there are 35 students. So, let me mark this as 35, this is 35. So, there are 35 students in this range. Between 160 and 170, there are 30 students. That is what this table says, 30 students in this range. So, 160 and 170, 30. So, this is, let us say this is 30 summer here, this is 30. So, summer here, 30. So, if we, we can mark this, this way, 30 students. And 170 and 180 in this range, there are 15 students. So, this is 17. So, this is 15. So, 15 students in this range, between 170 and 180, 15 students. And 180 and 190, only two students. So, this is 1 and 2 is here. So, this is like two students. So, this kind of a block diagram, if you wish. So, if you, in a block, in a manner, something like a block, like, so it is a block of 130, 140, another block of 140, 150, another block of 150, 160. So, another block here, another block here, another block here. So, such, this kind of a plot is called histogram, if you wish. So, this is called histogram. So, this is distribution. So, this is height in a particular range. And this is number of students having that height h. So, this is what essentially plotted here in the, in this. Exactly what we just drew is plotted here. So, there is, between 130 and 140, I just mark by 135. There are 1 and there is this 30, 35. This is 17. This is just over 15. This is 17. 35. And this is 30. So, again this is 15. And this is 2. So, this is a histogram. And we can call this distribution. But is this an accurate representation? In some sense, it is not very accurate. Why? Because there are students with 130, 130 to 133. We all put in one range. So, we do not actually distinguish between students having 139 height and 131 height. Height 131 centimeter and 139 centimeter, they are all in one range. Like, let us look at here. If a student having 151 and 159, we put them in same range. So, we can actually reduce the range, if you wish. So, instead of writing, instead of writing 130, 140, I, if I wish I could write, how many students between 130 and 132? 132, 134. So, we can range, make this in 2 centimeter. And let us say, 150 and 152. Similarly, 152, 154. So, like, 161, 160, sorry, 150, 160, 162, 162, 164. So, like this, you can write, you can take an interval of 2. So, then, you can make the interval smaller and smaller. Then, that will be a better description of the data. So, essentially, you can make the range smaller to get a better description of the real data. Now, how small it should be? How small the range should be? Let us say that, when you measure this height, you do not have, let us say, this, your tape, which you measure, can only measure this in a, say, in centimeters. You cannot measure less than a centimeter. Let us say, you cannot measure millimeter. You can only measure in centimeters. Then, you can present the data in the range of 1 centimeter, if you wish. So, then, there is nothing better than that. That is the best description of the data. So, then, you can ask the question, how many students having height 131 centimeter? How many students having height 132 centimeter? How many students having height 134 centimeter? How many students having height 151 centimeter? So, each centimeter by centimeter, you can have this data presented. So, let us say, let us say, you have the data presented, such a way that, n of h i is a number of students having height h i. h i height could be like 31. So, this, the height could be 131, 151, 162, any number. In short, you can call this. So, now, let us say, you plot this as a distribution. So, now, if you plot this as distribution, you will have many, many histograms. Let us say, you have a histogram. So, let us say, you have a histogram height this. So, I call this 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. So, what are these? These are ranges. You took 10 ranges. So, just like here, in the previous example, we had, we had, how many ranges? We had like 1 range, 2 second range, 3rd range, 4th range, 5th range, 6th range. So, we had 6 histograms, 1, 2, 3, 4, 5, 6. Similarly, let us say, if we are presenting the data of height of students between 130 and 180 centimeter by centimeter. So, you can have 130 here, 180 here. So, how many students having 130? Maybe one student. How many students 131? How many students 132? So, you can have, for each centimeter, so you can have like 50 points. So, you can have 50 points like this. For each centimeter, how many students having? So, this is h i. So, h 1, h 2, h 3, h 4, h 5. So, h 1 is 130 centimeter. h 2 is 131 centimeter. Similarly, h 50 is 180 centimeter. So, you can write this way. So, basically, you can have h i versus n of h i graph, which might look some, which might have some particular shape. So, then let us say, it has this particular shape, n of h i versus h i. If you take for centimeter by centimeter, that means, you ask the question, how many students having 130 centimeter, what is this? This is number. So, this could go from 0 to some particular number, let us say 100. How many students having 130 to centimeter? How many students having 134 centimeter? How many students having 180 centimeter? So, you can ask this question. So, you have two things. You have n of h i, number of students having height h i. Now, if you do this, sum over i, n of h i, what does this mean? If I write sum over i is equal to 1 to 50, there are 50 different heights. What does this mean? What does this imply? So, let us think about it. So, let us expand this. When we expand this, sum over, so this is if I n of h 1 plus n of h 2 plus n of h 3 plus dot dot dot n of h 50. So, let us say h 1 is 131 centimeter. So, this is number of students having 131 centimeter, number of students having height 132 centimeter, number of students having height 133 centimeter plus dot dot dot, number of students having height 180 centimeter. So, this will give you total number of students, right. Because if you, this involves all the students, number of having height 131 to 180, like all the, we are measuring centimeter by centimeter and we are having this kind of a histogram and then we are doing this sum. And at the end, what will you get is the total number of students. So, that is what essentially shown in this slide here. Sum over i n i, by n i I mean n of h i, I get something all n, which is the total number of students. And if I divide this n of h by n, I can define something all p of h. In other words, if you wish, you can, you can write it as p of h i is equal to n of h i divided by n, where n is equal to sum over i n of h i. So, actually in this slide here, it should be n of h i. So, I can write here in this particular way. So, this is the correct way to write it. So, if this is the case, what is p of h i? So, we can call the p of h i as probability. So, we will later come and understand what does this probability actually mean. But let us call this as probability. Probability is essentially sum number between 0 and 1. Probability is sum number between 0 and 1. If you just, at this moment, it is for you to just realize, what is the colloquial meaning of probability, which you all know. When you say something is very highly probable, that means very likely, that is very, very, there is something not very probable, that means very unlikely. So, let us have understand only this much at this moment and let us also understand that probability p is sum number between 0 and 1. So, it can be 0 and it can be 1 or anything in between. So, probability is sum number between 0 and 1 and probability that you have a height h, that is what we actually defined now. Probability that you have a height h is number of students having that height h divided by the total number of students. So, if we have one student. So, let us say, now let us imagine that n is 100. Let us take n is 100. Then let us calculate. So, let us say, there are one student with height 131 centimeter. There are 10 students with height 140 centimeter. Let us say, there are 15 students with height 150 centimeter and there are 0 students with height 180 centimeter. Let us say, there are 0 students with height 180 centimeter. Let us say, this is the case. If this is the case, we can call p of 131. Probability that you have students have a height, students in a class in your class have a height 131 is 1 by 100. So, this is the example. Just by following this formula, this is what you get. p of 140 is 10 by 100. So, 1 by 100 is 0.001. 10 by 100 is 0.1. So, p of 150 is 15 by 100, which is 0.15. p of 180 is 0. 0 by 100. So, probability p of h is some number between 0 and 1. So, p of h is some number between 0 and 1. So, how does, if you typically plot, you take large number of students in your whole school, in your like or many, many schools and take the calculate p of h and plot it. However, it look like. So, it might look like this. So, have a look at this. So, we can call this probability distribution. So, what is this? This is a curve, which has a peak somewhere around 150 and this axis, it is height and this axis is probability to. So, this is the height h and this is probability of having height h. So, very, 0 students have the probability of having a height 120 centimeter is nearly 0. There are unlikely that anybody will have such a very small, such a short person. Very high, like above 190 is also nearly 0. Somewhere in the middle, there are many students, like 150, there are many students. So, the probability of finding students having 150 is around 0.05 in this example. So, this is an exercise that you can do. You can calculate the probability distribution, the way we defined. So, the probability distribution is probability of having students height h. Probability of having students height h is n number of students having height h divided by the total number of students. This will give you the probability. So, now, you have this probability. How do you find the average? Can we find the average from this probabilities? So, it turns out that we can. So, let us say, let us go back to the definition we had. So, we had, we have, we had discussed probability of having students 131 centimeter. Let us say, probability of having students 132 centimeter. Similarly, let us say, probability of having students 140 centimeter and probability of having students 150 centimeter. So, let us say, you have all these probabilities. The average, so basically, you have probability of having students having some particular height h i. So, then the average is defined in the following way. The average height is defined as sum over h i p of h i. So, this i goes from 1 to m. So, if you divide this to m intervals. So, in our example, we had 50, so 130, 131, 131, 131, 132, 133, 134, up to 180 we had defined. So, 50 heights we had defined. If we define 50 height, m was 50. So, this sum over i 1 to 50 h of i p of h y, this will give you the average. And if I calculate this, this will give you the square average. So, we have average and square average h square average and h average. So, there is, this is essentially the same way we had calculated this. So, instead of summing the throughout the data set, we can multiply with the probabilities and sum like this. So, then you get average and h square average. So, do this calculation by taking an example in your case. Do an exercise yourself. If you know h average and if you know h square average, we can calculate h square average minus h average square, which is standard deviation. So, the standard deviation that we discussed before can be easily calculated in this particular fashion. So, now we had this h square average and h average square defined in this particular way. So, we had defined h average as sum over i h i p of h i. So, now let us, this h i we had 1 centimeter by 1 centimeter before. But, let us say we can, we can give h i in a very like 130 centimeter, 130 point 1 centimeter in a continuous manner. So, if h i is a continuous function, so height, so then we, if we plot it, let us say it look like something like this, where every value of h you have a p of h i. For any value of s you take, there is a p of h i. It has a continuous function. In that case, you can write this sum as an integral. So, just by using the idea of, where we learned in this integration, this sum can be written as an integral. So, then in that case, the h average can be written as if h is a continuous function. You can write this as h p of h d h. So, this can be converted to an integral if h is a continuous function. Similarly, similarly if we had h square average can be written as integral h square p of h s d h. So, if h is a continuous function, we can define the averages and square averages in this particular way. In fact, when we did the case of diffusion concentration, we had done precisely this. If you remember, we had defined c tilde, there is something called c tilde of x as c of x by total concentration. So, this is just like we defined today, n of h by n, we defined as p of h. This is exactly the same way we had defined concentration. So, this is, this appeared like a probability. So, here this is probability, we said that this is probability. So, this is also like a probability and we had defined x average and x square average as x c tilde of x d x and x square average as x square c tilde of x d x. So, if you go back to the lecture where we discussed diffusion, we had discussed, we had defined x average and x square average in this particular way. So, the reason for defining this is, as we understand in today's lecture, if you have a distribution. So, c of x or c tilde of x was the probability distribution for concentration. So, the probability of having concentration at a particular distance x. So, what is the problem or the concentration at a particular distance can be defined in some kind of a probability in this particular way and you can define the averages in this way. So, now let us go back to the distribution that we had. So, we had a particular distribution in this particular fashion. So, we had a curve which looks like this. So, now what is the name of this curve which is having in this particular kind of a distribution. So, rough typically most of the things in nature have this bell shaped curve. So, this bell shaped curve or this bell shaped distribution is called normal distribution. So, let us write it. So, the bell shaped curve is called normal distribution. So, many things in nature as it might be the case with height of students or the mark of students or there are many examples that we will come along as we go along. We will discuss as we go along. So, all these examples in all these examples the distribution might look like a bell shaped curve. So, then this distribution is called a normal distribution. What is the mathematical property of this normal distribution? How does it look like? So, the shape of this normal distribution can be written mathematically in this particular form. . So, look at here p of h is equal to a exponential some constant b into h minus h average whole square. So, this is the mathematical formula for a normal distribution where a and b are some constants. So, we will clearly understand in the coming classes what is a and b stands for what is. So, there are some constants. If you wish in a simpler form we can write it e power minus b x square. So, the simplest we can write it as e power minus b x square if you wish in a much more simpler manner. So, this has a particular if this kind of if you have a function of this kind e power minus b x square. So, that is called a Gaussian function. So, if you have a function f of x which is e power minus b x square. So, this is called a Gaussian. So, the normal distribution also has a name called Gaussian distribution and which has a bell shaped curve. Now, what is the meaning of this distribution that we will come when we will discuss in the coming lectures. But, just realize that there are these examples in many examples from nature fall into this category. So, let us look some examples from biology. So, some examples from biology includes n to n distance distribution of long DNA what does this mean. So, let us say you take a let us say you look at the DNA. So, let us say DNA has some particular shape and you ask the question what is the distance of the DNA from one end to the other end. So, this is like a double standard DNA if you wish like it will have all this. I am just showing this double standard DNA as just. So, let us say very long DNA and you can ask the question from this end to this end what is this distance. Let me call this distance as r. So, now let us say that you have let us say that you have the you have in a patchy dish you have a million DNA or Avogadro number of DNA or large amount of DNA. Let us say you have a particular concentration of DNA and imagine that you have this amazing property that you can just take photograph of each of this DNA. You have let us say you have this amazing device where you can take the photograph of this DNA. You freeze the DNA at a particular moment and take the photograph. So, if you do that what do you expect. So, sometimes some DNA will be like this some DNA will be like this some DNA will be like this some DNA will have this shape some other DNA will have this shape some other DNA will have this shape. So, here some other DNA might have this kind of a shape. So, now here the distance between the two ends is very small, here the distance between the two ends is very large, here the distance between the two ends is somewhere in between. So, here the distance between the two ends is this, here you have another distance, here the distance is something else, here the distance between the two answer something else. So, you can just like we did the experiment of measuring height. You can do an experiment of measuring the end to end distance of the DNA and you can write make a histogram and plot it. Then it might look like a Gaussian distribution if you wish. It may look like a Gaussian distribution if you wish. So, it might look something like this, where there can be many of them where the two ends are very close. So, this is r equal to 0 and this is r is r equal to l and this is p of r. So, there can be many of the DNA can have this height, which is very large and very small of them having many the probability that you will find the DNA which is a very short end to end distance. That means, their ends are very close to each other could be large and the probability that you will see their ends far apart could be small. If this is the case, this will this might be look like a normal distribution. So, this is the half of a normal distribution. The other half here, which is a negative part, which is actually meaningless, which we are not plotting. So, this is only one half of the probability distribution of the Gaussian r greater than 0. If you wish, we can plot this is a vector, but we will not come to that now, but you can get such a distribution. This is one half of a Gaussian distribution. That is if you plot e power minus a r square between r equal to 0 and l, this will look like this. So, if you plot this function e power some constant, we define this as b r square for some value of b between 0 and l, this will look like this. So, this is end to end distribution of a long DNA. So, thus look at the slide here. So, this is the first example, end to end distribution of long DNA. We already saw that concentration of diffusing proteins can have a Gaussian distribution. So, when we discuss the concentration in diffusion example, we said that, if you have a tube and if you look at a particular time, how many proteins are here? As a distance, if you go from x equal to 0 to either way, there are large number of proteins at the middle of the tube and a fewer proteins at the end of the tube. So, this might have this kind of a Gaussian like a shape. So, if you plot this, this might also have a shape, which will be like e power minus e power minus some b x square. So, this is another example. You could also think of another example, which is let us say amount of a particular gene expressed in cells. So, let us say, let us imagine that, you can measure the amount of gene expression in cells. So, you have a bunch of cells in a patric dish and each of the cell, you take a particular gene and then see, how much of this gene is expressed. So, you can count, let us say, how much mRNA is produced or how much gene expression has happened. So, then what you might get is something like this. So, let us plot here. Let us plot in this axis. So, let us plot something like this and let us plot some function like this. It should be little more symmetric. So, when I plot it, it does not symmetric. Let us say, it looks like a Gaussian. This is not really like a Gaussian, but it should be very symmetric. Gaussian should be symmetric, but in the case of gene expression, we do not know what should be symmetric or not. But let us say, you have such a curve. So, now here in this axis, we plot amount of protein expressed or it could be called this number of mRNA and amount of protein expressed and how many cells express this amount. So, number of cells. So, amount, let me call this amount as m and this is n of m. So, how many cells expressed very little of this gene? Very few cells. Number is very small. Very few cells amount very little, small amount. Large amount is also expressed like only few number of cells expressed large amount of protein. Maximum number of cells express something intermediate amount of protein. So, if you have this example, this might also look like a normal distribution if you wish, but it will surely have distribution of this roughly this shape. This is the same shape which is at peak somewhere in the middle and dying down to the both ends. So, here this is amount of protein expressed versus number of cells having that particular amount of protein. So, we had many examples in biology. So, to summarize, we learned a few things. What all we did we learned? We learned that probability distribution, the distribution you can show you can present the data in the form of distribution. So, that it makes much more sense like a histogram. So, the distribution is one thing we learned and how do we find averages etcetera from the distribution. So, if you know this n of h, the number of students having height, we can define probability distribution p of h as n of h divided by the total number and we can define averages by sum over i p of h i times h i. We can also define standard deviation in this particular fashion and there are many examples in biology. So, these are the things that we learned today. We learned about distribution, the high distribution and how do we convert this distribution to probability distribution and how do we calculate average and standard deviation from this probability distribution and many examples. So, this is the summary of today's lecture. We will discuss various other distributions and properties of the distributions in the coming lectures and we will discuss many more biological examples. So, with this we will stop today's lecture with this discussion of distribution or introduction of distributions. We will stop today's lecture and you should remember this idea of distribution carefully. Think about this carefully because this is some idea that we will be needing to learn statistics in a better way and this is very useful to present data and to analyze data. So, just introducing the distribution we will stop today's lecture. Thank you.