 In the last lecture, I described various characteristics of the distributions such as moments, quantiles, moment generating function, the moments in the sense of mean, variance, measures of skewness, kurtosis, etcetera. These distributions describe various properties or you can say characteristics about the shape, about the center of gravity or the measures of location, etcetera of a distribution. Now, I will look at certain frequently or commonly used distributions and you can say that each of them has some historical origin and that means, the distributions have been studied for long time. Nowadays of course, you see that we have so many type of data sets for which special distributions are fitted and some new distributions are generated. But there are certain distributions which have some sort of universal applications, they arise anywhere and when we discuss them you will see that their importance is in the theory or in the development of the subject itself. Now, let us start with say discrete distributions. So, we talk about say now in the discrete distribution see we have seen that the distributions have positive mass on certain points. Now depending upon the number of points itself we can have different distributions. One is that what is the probability allotment and other is the number of points itself. For example, if there is a surety about something then what do you say in common sense language we say it is almost sure or you will say it is sure that this event will happen ok. Now if a random variable is allocated to that then it will become a degenerate random variable because we will say degenerate distribution. That means, if a random variable X is taking a single value with probability 1 where A is some fixed constant. For example, the boiling temperature of water under a certain atmospheric pressure. So, it is fixed that it is 100 degree Celsius at a certain 760 mm per h g atmospheric pressure. So, if we are reaching 100 degree Celsius as the boiling point and if X is denoting that then it will be the value will be 100 with probability 1 and similarly for other metals. Similar thing we can talk about say certain chemical compositions and chemical reactions etcetera that if this is mixed with this then this will be the outcome. So, although this looks a very simplistic distribution, but nevertheless it has some uses in the areas where you have certainty and of course, all the moments will be obtained simply by looking at the value here itself because this will be simply A to the power k. So, this is a sort of trivial distribution, but still one may use it. Now this is like a classification based on the number of points. So, this is based on a single point. Now another thing could be based on two points. Now what is the two point distribution or what is the significance of that? If we have two point distribution then you will allocate the probabilities say P 1 to 1 point and P 2 to another point such that P 1 plus P 2 is equal to 1. That means P 2 is actually equal to 1 minus P 1. Now this simple structure is extremely useful to describe a very large number of physical phenomena or you can say physical realities which are encountered in day to day life. If a person appears in a competitive examination then the event of interest is whether he qualifies the exam or he does not qualify in the exam. Although the real outcome could be that how many marks he scored out of the total number of marks, how many students were above him or below him and so on, but ultimately an event of interest could be whether he really qualifies or not. For example, a clinical trial is conducted to test the efficacy of a medicine. So, there may be various things for example, what are the side effects of the medicine and if the medicine was given then whether the patient was cured or not cured and so on. So, there can be so many things how much time it took, but one may be simply interested in knowing that if whether the effect or efficacy of the medicine is more than the efficacy of one previously used medicine or not or simply it could be that whether the medicine is effective in treating a certain disease. For example, it may be used for killing a certain virus or bacteria. So, whether it is successful in doing that. So, although there may be a full fledged manifestation of an experiment, but we may be interested only in a particular type of outcome. For example, you look at the cricket match. In a cricket match you may have for example, it is played for 20 overs or for 40 overs or for it is a test match which is placed over 5 days. Then there can be so many phenomena which are associated with that the number of players which are scoring runs more than certain number the people who are taking say wickets, but one may be interested in knowing whether the team A is winning the game or team B is winning the game. So, when we classify the outcome of the random experiment into two possibilities, we may associate the term success to one and failure to another one. This is termed as Bernoullian trial. So, in a Bernoullian trial you have success with probability p and a failure with probability 1 minus p. Let us associate x is equal to 0 with a failure and x is equal to 1 with a success. Then we have probability of x is equal to 0 is equal to 1 minus p and probability of x equal to 1 is equal to p. Then this is called Bernoulli distribution. One can look at the moment structure for example, what is expectation of x to the power k that is mu k prime that is equal to p. In particular what is the mean, so mean will be p. What is a second moment that is equal to again p. So, what is the variance that will be p minus p square that is equal to p into 1 minus p. One may use a notation say q for 1 minus p. So, you may write it as also p q and one may write the higher order moments also. If we try to plot it looks like this that you have 0.0 and you have 0.1. Now depending upon what is p and what is 1 minus p you may have different shapes. 1 minus p may be more p may be less here 1 minus p may be less you may have equality also both are equal to half. So, there are various types of representations for this distribution. Now a direct generalization of this is to look at several Bernoullian trials. In place of one trial for example, you are looking at the number of guinea pigs used in a clinical trial. And then the event of interest is that how many survive the trial. So, for example, 100 guinea pigs are put on the experiment. Now each may survive with a probability p and may not survive with probability 1 minus p. Assuming that the effect of the trial is independent on each of them then suppose we say x is the number of survivors what is the distribution of that. So, if we look at a sequence of independent and identically conducted Bernoullian trials and we look at the distribution of the number of successes it is called a binomial distribution. So, this is another important distribution in theory of statistics and it has historical origins as I mentioned this Bernoullian trials is named after Bernoulli the Swiss mathematician which is of course, one of the family of the Bernoulli families binomial distribution. So, the binomial distribution is let a let x denote the number of successes in a sequence of n independent and identically conducted Bernoullian trials. The probability of success in each trial is p. So, what is the probability what are the possible values of x here x can take values 0 1 to n. So, what is the probability of x equal to k that is equal to n c k p to the power k 1 minus p to the power n minus k where k is taking values 0 1 to n. The name binomial has come because of the binomial coefficients appearing here and also if you want to check whether it is a valid distribution then you will sum from 0 to n. So, it is nothing but the binomial expansion of this is actually binomial expansion of 1 minus p plus p to the power n. So, this is actually 1 to the power n that is equal to 1. So, that is why the name binomial distribution has come. Now, naturally the question arises that what is the characteristics of this distribution. For example, we may look at expectation x. Now, to calculate the expectation of x I have to calculate for x equal to 0 to n. Now, easily you can see here we are able to have it as a direct binomial expansion if I multiply by x I cannot have a direct expansion. So, we need little bit of adjustment. So, we can look at it like this x and then we have n factorial divided by x factorial n minus x factorial p to the power x into 1 minus p to the power n minus x. Now, corresponding to x equal to 0 this term will be 0. So, we start from 1. Now, we can write it as n p and here I consider n minus 1 factorial x minus 1 factorial n minus 1 minus x minus 1 factorial then you have p to the power x minus 1 1 minus p to the power n minus 1 minus x minus 1. So, this is from x minus 1 is equal to 0 to n minus 1. That means, if I put say x minus 1 is equal to y then this sum is becoming equal to n p sigma y is equal to 0 to n minus 1 n minus 1 c y p to the power y 1 minus p to the power n minus 1 minus y that is equal to n p into 1 minus p plus p to the power n minus 1 that is equal to n 2. So, the mean of the binomial distribution is actually n times p. So, you can easily see if the probability of success in one trial is p then the average number of successes in n trials will be n into p. Suppose, p is equal to half that means, roughly half successes you will get that is the average. So, this is matching with our way of common sense understanding of the binomial distribution. Now, certainly if we are doing the calculations we may look at say expectation x square what is the variability of this. Now, if I look at expectation of x square I will get an x square term here. Now, naturally you can see that in this one I had an advantage of cancelling x, but if I look at x square I would not be able to do that. So, for that we can actually make use of the factorial moments because factorials are involved here. So, I can actually cancel out factorials only. So, a better way of calculating expectation of x square could be to look at expectation of x into x minus 1 plus expectation of x. Now, if I apply the same logic which we had here then I will get n minus 2 here that means I will be taking out n into n minus 1 p square. Let me just show one calculation and then other calculations can be done in a similar way because for higher order moments we can proceed in the similar way. So, this is x into x minus 1 n factorial divided by x factorial n minus x factorial p to the power x into 1 minus p to the power n minus x corresponding to x equal to 0 and 1 this term vanishes. So, we can write from 2 to n. So, this is simply becoming n into n minus 1 p square and you will have n minus 2 C y p to the power y 1 minus p to the power n minus 2 minus y that is equal to for y is equal to 0 to n minus 2 that is equal to n into n minus 1 p square because this becomes again a binomial expansion of 1 minus p plus p to the power n minus 2. So, if we use this then we get expectation of x into x minus 1 as n into n minus 1 p square plus expectation x will become n p. So, that is equal to n. So, this is the term here. Now, if I want to calculate say variance then variance is expectation of x square minus expectation of x whole square that is equal to n into n minus 1 p square plus n p minus n square p square that is equal to n p minus n p square that is equal to n p into 1 minus p or we can also say it is n p q. One point which may be may notice here as in the Bernoulli also that see in the Bernoulli we had mean as p and the variance as p q and since p is between 0 to 1 q is also between 0 to 1. So, p q is always less than or equal to p. In a similar way n p q will always be less than or equal to n p that means, the variance of a binomial distribution is always less than or equal to the mean of the distribution. This is also one of the observations that we may have about the distributions. Now, one may also look at say what is mu 3 prime. So, for mu 3 prime you need expectation of x into x minus 1 into x minus 2. So, that will become n into n minus 1 into n minus 2 p cube using that you can calculate expectation of x cube and so on. So, without going into the detailed calculations let me just write the expression. Say for example, mu 3 mu 3 is equal to n p into 1 minus p in 2 1 minus 2 p. So, if I look at say the coefficient of a skewness that is equal to mu 3 by mu 2 to the power 3 by 2 this is actually mu 2 that is also called sigma square. So, it is becoming equal to n p into 1 minus p into 1 minus 2 p divided by n p into 1 minus p to the power 3 by 2. Now, these terms are positive non negative. So, we look at this here you can see this is equal to 0 if p is equal to half. That means, the distribution of course, binomial you can see you can by plotting this these are binomial coefficients and you can see that if p is equal to half then this will be exactly symmetric. But if p is say greater than half then this will become negative. So, this will become I am sorry this is greater than 0 if p is less than half and it will be less than 0 if p is greater than half. That means, binomial distribution is positively skewed for p less than half. Now, if you see carefully this probability is here. If p is less than half then 1 minus p is greater than half then in the beginning you will have higher probabilities for smaller values of k and for larger values of k you will have a smaller probabilities. Therefore, it will be positively skewed. Whereas, if I consider say p is greater than half then for lower values of k you will have smaller probabilities and for higher values of k you will have higher probabilities. So, it will become negatively skewed distribution negatively skewed for p greater than half. One may also look at the measure of kurtosis, but I will not look at this for this particular distribution. So, may see we calculate mu 4 also I am just leaving that here. In the Bernoullian trials what we have done that we conduct the trial a certain number of times and then we see how many successes are observed. That means, whatever is a favorable event or favorable outcome we want to look at that how many times it has occurred. Now, another way of looking at it could be we conduct the Bernoullian trial. For example, it is a clinical trial and we are testing various medicines or various chemical substances which one will be successful. So, the first time the success is observed for example, for various kind of diseases one conducts the trials and as soon as a trial is successful then it is used for constructing a medicine and then it is marketed for treating that particular kind of disease. Now, assuming approximately the Bernoullian structure that means, we say that the trials are independently and identically conducted although in physical reality it may not be so, but statistical distributions are approximations to the physical reality. Therefore, we may make a valid assumption of this nature. How many trials or how many times you have to actually conduct the trial when you get the first success. So, this could be another way of looking at the Bernoullian trials. So, suppose X is the number of trials needed to get the first success in a sequence of independent and identically conducted Bernoullian trials with probability of success p in each trial. So, what is the probability of X is equal to k. So, it is like this you conduct the first trial you get a failure you conduct the second trial you get a failure and so on up to k minus 1 up to k minus 1 you are getting failure this is the kth trial here you have a success before that you have all failures. Now, here the position of the failures and success is fixed unlike the binomial distribution there we said k success in n trials. So, that can be done in n c k ways whereas, here the position of the failures are fixed position of the success is fixed. So, what is the probability of this if I am using the independent trials then failures have the probability 1 minus p. So, k minus 1 failures and the last one is a success. So, we get this distribution. If we look at this sum of this probability 1 minus p to the power k minus 1 p for k equal to 1 to infinity then it is actually equal to p plus p into 1 minus p plus p into 1 minus p square and so on. This is an infinite geometric series. So, it is equal to p plus 1 minus let me call q here as 1 minus p. So, q square and so on then it is equal to p divided by 1 minus q. So, 1 minus q is p. So, it is equal to 1 this has become an infinite geometric series with the common ratio lying between 0 and 1. So, this is valid. So, that is why there is a name geometric distribution given to this. Also you look at the probabilities see the first one is p then you have p q then p q square. So, each successive probability decreases. That means, if I consider the plotting of this one like binomial distribution here first one is say p then p into 1 minus p depending upon what is the value the drop may be faster or slower depending upon the value of this. So, 0, 1, 2 sorry 1, 2, 3 and so on this is the way the distribution will proceed. You can easily see that it is a positively skewed distribution. So, this is positively skewed. You actually do not need to calculate the third order moments and so on for this because here that form of the distribution is known and you can plot it. The measures of skewness kurtosis etcetera are helpful when knowledge about the probability mass function or density function may not be very precise. In that case if you have those things then you may make certain guess work about that. Immediately of course, it will come that one thing is that the decreasing nature of the probabilities. Another important property that you may notice is suppose I consider what is the probability of say x greater than m where m is of course, an integer m is a positive integer. Then that means, I am writing probability of x is equal to k for k equal to say m plus 1 to infinity. So, this is equal to q to the power k minus 1 p for k equal to m plus 1 to infinity. So, this is equal to p and then the first term here is basically what are the terms you will get p q to the power m plus p q to the power m plus 1 and so on. So, this is equal to p q to the power m 1 plus q plus and so on. That is equal to p q to the power m and this is again the infinite geometric series. So, this 1 minus q and p will cancel out you get q to the power m. Now, a physical interpretation of this is this is the probability that we need more than m trials for the first success and that is equal to q to the power m. Let us write another what is the probability more than say m plus n trials are needed for the first success given that m trials have passed say n trials let me say n trials have passed without success. So, a physical interpretation of this kind of sequence would be that we are actually looking at we are the trials are being conducted already n trials have been conducted and then there is no success. Now, suppose the team changes the and there is a new person who is conducting the trials now, because now he may be interested in knowing that how many trials more will be needed now. So, that is his interest. So, we say m plus n ok. Now, let us interpret it in terms of the probability. So, more than m plus n we may say x greater than m plus n given that x is already greater than n. Now, this conditional probability you see in the suppose I call this event a this event as b then this is probability of a intersection b divided by probability of b. So, now, since this event is smaller than this event in the numerator I will have that is equal to q to the power m plus n divided by q to the power n that is equal to q to the power n which is nothing, but probability of x greater than m. Now, we look at the physical interpretation of this means that we need more than m trials for the first success. Now, here it says that already n trials have passed now I want to know what is the probability that more than m trials will be needed. That means, starting from n plus 1 th trial what is the probability that more m trials will be needed this probability remains constant. That means, the starting point does not matter that means, in the experiments where geometric distribution is applicable you have something called a memory less property memory less property of geometric distribution. Because, the starting point does not matter it is as if we started from the scratch. Now, we also look at the mean and variance of this distribution. Since, you have seen that the sum is infinite geometric sum here. So, if I have x here then this will become arithmetic geometric progression. So, you can easily apply the formula for such things I will only write the final value here. So, expectation x is equal to sigma k q to the power k minus 1 p k equal to 1 to infinity that is equal to p divided by 1 minus q square that is equal to 1 by p it is something like this. Suppose, I am considering say a simple coin tossing experiment with probability of head as say 1 by 3. So, 1 by 1 by 3 will become 3 that means, we need on the average 3 trials to get the head for the first time. Similarly, you can see say a tossing of a die fair die and you say probability of 1 by 6 for each phase. So, 1 by 6 for each phase then suppose I am expecting a 6. Then what is the expected number of trials needed for the first time 6 to be observed then it will become 6 number of trials are needed. So, in that way this is having a very nice physical interpretation variance of x is actually equal to because you will get 1 by p minus 1 by p square. So, that is becoming q by p square here. One may write higher order moments also, but I am not interested in that. Let me simply write down the moment generating function. For example, what is the moment generating function here that is expectation of e to the power t x here. So, that is equal to sigma e to the power t k q to the power k minus 1 p. Now, this type of a structure is extremely helpful because you can actually adjust this term here k equal to 1 to infinity. So, we write it as p e to the power say since here we have q we can consider it as q e to the power t to the power k for k equal to 1 to infinity. So, that means, I have adjusted 1 q here. Now, you look at this term the sum this will become equal to p by q and then q e to the power t divided by 1 minus q e to the power t. If q e to the power t is less than 1 then the only this will be a convergent series. So, this is becoming equal to p e to the power t divided by 1 minus q e to the power t provided t is less than minus log of q. See log of q will be a negative number because q is between 0 to 1. So, minus log q is a positive number. So, in this range this is valid here this expansion is valid here. From the moment generating function one may be able to evaluate mu 3 prime mu 4 prime and then therefore, mu 3 and mu 4 and therefore, you can find out the moments measures of skewness and kurtosis etcetera, but of course, here you already know the exact structure of the probability distribution here. Now, one easy generalization of this geometric distribution could be in the geometric we are looking at the number of trials needed for the first success. Now, I have given a physical interpretation for example, you are conducting the trials to be successful like in a clinical trial or somebody is appearing in a competitive examination. So, how many trials you will need to get the success and so on, but there are certain other experiments. For example, you may associate the success with a failure just change the name. So, for example, a complex mechanical system is there which is consisting of say 100 components and those components are working. The machine works as long as say 50 percent of them are in the working order. So, you are waiting for suppose there are 8 components. So, you will wait for the first time the fourth component fails that means, 4 number of components fail and then the system will fail. That means, rather than looking for the first occurrence in a sequence of Bernoulli and trials, you are looking at the rth occurrence for certain value of r. This is known as negative binomial distribution. So, we consider this generalization. Suppose, let me put it in a new page here. Suppose, X denotes the number of trials needed for the rth success in a sequence of independent and identically constructed identically conducted. Let me use this abbreviation i i is independent and identically conducted Bernoullian trials for the with probability of success as p in each trial. So, you look at in this way we have these trials here. This last one is the success this is actually the rth success ok. Here out of so, these are total k trial this is kth trial out of k minus 1 trials you have r minus 1 success. And k minus r failures. So, what is the probability that X is equal to k? That will become equal to k minus 1 C r minus 1 because this one is fixed here. Now, you have q to the power k minus r that is 1 minus p and p to the power r where k can vary from r r plus 1 and so on. So, this is called since this coefficients when we consider this one this is called actually negative binomial distribution. It is also called inverse binomial distribution actually in binomial we are fixing the number of trials and we are looking at how many successes are occurring. In the inverse binomial sampling we are conducting the trials up to a certain number of successes are observed. So, the number of trials is not fixed. So, there is a difference at looking at the experiment. So, that is why it is also called inverse binomial distribution or a negative binomial distribution. I am not going to look into the expansions and other things. We can actually look at what is expectation of X that will be r by p. The variance of X can be easily calculated to be r q by p square and if we look at the moment generating function then it will become equal to p e to the power t divided by 1 minus q e to the power t whole to the power r and of course, this is valid for q t less than minus log of q. For this reason this will be valid here. If you look at this trials that I have considered these are approximations to the physical situations assuming independence of the experiments conducted under identical conditions. So, this may be valid if we are looking at the large population size and then the sampling scheme is continued indefinitely like that. But there are other experiments where things are in finite. For example, you have a small shop I gave you example in one of the previous lectures that you have a certain number of say items in a shop where you have a certain number of defectives. Now, if a person is buying what is the probability distribution of the number of the defectives. This example I considered in one of the previous lectures let me just show you that. Let us look at this example. So, shop has 5 computers out of which 2 are defective. Now, if that is happening and we are considering purchasing from there only then this is a small sample problem. In a small sample problem when the population size is also a small the probabilities will change rapidly. So, this condition of independent and identically conducted Bernoullian trials may not hold here. So, for calculation of the probabilities in this finite situation we need another distribution. So, let me come to this one now. So, we have let a population have capital N items out of which say M items this are of type A and say N minus M items these are of type B. This can be easily considered like this like you have a class of a student. So, you have some boys and there are some girls. You may have a office staff some of them may be smoking and some of them may not be smoking. You may have say people visiting a shopping mall and then you may have a people of different origin that means, how many of them are of a particular ethnic group. You may have say an organization where you may have upper income group people, middle income group people and so on. So, here I am considering only two split that means, I am considering two categories that means, they are complementary. Now, a random sample of size say small n is chosen from this population then let X denote the number of items of type A in the sample then what is the probability distribution of X. So, certainly you will have say let me call probability of X equal to K then this will become equal to MCK because you have chosen K from here that means, N minus K items will be from remaining and total number of items are chosen in NCN ways. Then of course, you will have this K is equal to 0, 1 to N. But of course, you have some further restrictions such as you have chosen K from here that means, N minus K items will be from remaining and total number of items are chosen in NCN ways. Then of course, you will have this K is equal to 0, 1 to N. But of course, you have some further restrictions such as this K cannot cross capital M. Similarly, this N minus K cannot be greater than or equal to N minus M. So, these are some physical restrictions that will come here. Now, if you look at the sum of this, this is actually equal to NCN. This is done by considering the expansion of 1 by 1 plus X to the power N and you will split it like this 1 plus X to the power M into 1 plus X to the power N minus M. Consider the coefficient of X to the power N on both the sides. So, on this side you will have NCN and here you will have this summation coming here. Therefore, this sum is equal to 1. This is known as hyper geometric distribution. So, this is also one of the useful distributions in the theory of probability and this is describing finite sampling scheme and finite population size here. The Bernoullian trials are helpful when we are considering exceedingly large population size that means, we can theoretically call it as an infinite population and that is why we are able to maintain the equal probability axiom, but here in small sample that axiom will not be valid and therefore, we consider this as a hyper geometric distribution. Of course, see you can consider that suppose here population size becomes actually large. This capital N is really large and the name is also large. In that case if the proportion is constant then this will converge to a binomial distribution. So, I can state it as a theorem here. Let X have a hyper geometric distribution with this parametric structure N, M and small n etcetera. If M and N go to infinity such that M by N goes to P then the distribution of X converges to binomial N P distribution. So that you can prove easily by taking the expansion here and then taking the limits here. I will not be writing the proof I leave it as an exercise here. Then further you look at the expectations etcetera. Let me do one term here and other things you can do yourself. See if I do the expectation then this is equal to K times M c K N minus M c N minus K N c N. Now as in the case of binomial distribution you have seen that we actually adjust this term K here. That means, in the divisor we have K factorial. So, you cancel K and K factorial you will get K minus 1 factorial. Accordingly you take common one of the terms here. So, basically it turns out to be N M by N. I am not writing the calculations here one can easily check that thing. Another property you can observe that as M by N goes to P. So, it is actually N P. So, you look at as a physical interpretation of this that if I have M number of type A items in the population then the proportion is M by N in the whole population. So, what is the expected number of items in the sample of size N? It will be small N times capital M by capital N. So, it is actually verifying that thing here. I am leaving the variance X to the exercise and one can look at. Of course, higher order moments will be a bit complicated and also the moment generating function will have extremely complicated structure here. So, we are not looking at that thing. Now, you have seen here in each of these problems that I have discussed here. It is one can actually describe the experiment in a very proper way by physically ascribing the probability of success etcetera in a precise way. But then there are many other things where it is not really possible. For example, we are looking at see one is sitting at a railway reservation counter. Suppose you look at a railway station. Now, you look at the number of customers arriving at the reservation counter or a ticket counter. For example, it is a cinema theater. So, how many people arrive and between what time to the theater to purchase the ticket? See this type of thing is important one why one will study this phenomena is to consider the appointment of the proper personnel or deployment of the people for the service. So, it is something like a arrival and service type of feature and this happens almost everywhere. For example, how many buses you need to deploy to ferry or to carry the people from one place to another place between two cities and between what times. So, that will need an estimate or a probability distribution of the number of the passengers. How many flights you need or at the airport you have the check-in counters. How many people you need to deploy? How many counters should be open between what time to what time? Similar type of thing is also appearing. For example, you are making a traffic analysis. So, you look at the number of accidents. The number of vehicles passing through a particular crossing between certain timing. So, you need to make that the road should be so much wide or how many your traffic police should be deployed to avoid any unnecessary circumstances. Here you are looking at the area in the earlier examples you are looking at the time. Similarly, you can consider say natural disasters. So, the disasters for example, a flood, floods are happening over a period of time. Floods also happen over geographical region. So, over a particular region how many such events may be observed? Earthquakes or other kind of natural disasters. You may also look at say space. For example, one is observing astronomical events. For example, collision of say satellites or passing of satellites through certain portion of the space. Like we hear that there is a comet which is supposed to pass very close through the earth and so on. That means, we are interested to look at the number of occurrences over a period of time, area or space. And now we want the precise distribution. Unlike the situation which I described in the Bernoullian trials or hypergeometric distribution, this will have a slightly vague type of distribution because how to say that how many people are arriving in a particular time interval? How many accidents are occurring over a stretch of road? How many earthquakes are occurring over a period of time? Here it is not possible to ascribe certain probability like p and 1 minus p kind of thing that if earthquake occurs it is occurring with probability p or it is not occurring with probability 1 minus p or if a customer is arriving at a shopping mall. So, he is arriving with probability p or 1 minus p. We cannot do like this. So, in that case we need a little bit of different thinking to describe the phenomena. We call such events as events occurring in a Poisson process and for that we put forward certain assumptions that if they satisfy certain assumptions, then we call it a Poisson process. In my next lecture I will be actually obtaining the distribution of occurrences in a Poisson process and then we will see that how it is helpful in describing various phenomena which I have described and then also leading to some other distributions. For example, distribution of the times for the event for the occurrences and so on. So, I will just give the assumptions of the Poisson process in today's class and in the tomorrow's lecture in the following lecture I will be describing the distribution and so on. So, we consider Poisson distribution or Poisson process. So, events occurring over time, area, space, etcetera are said to follow a Poisson process provided they satisfy certain assumptions to satisfy the following three assumptions. First assumption is now another thing is that I will restrict attention to time because see we can consider area, space and so on, but for deriving my thing it will be convenient if I restrict attention to time. So, time scale is like this. So, you start from t 0 and you go up to a time t. So, you look at how many occurrences are there. See same thing will happen if I consider say area then over a particular area how many events are there or if in I am in a three dimensional space then in a particular stretch of the space how many things are there. So, that one may look at, but I am restricting attention to restrict attention to time ok. So, now the first assumption is that we assume that the occurrences in disjoint time intervals are independent. Second assumption is that the probability of a single occurrence in a small time interval is proportional to the length of the, the third assumption is the probability of more than one occurrence in a small time interval is negligible. Let me introduce some notation here. See we may consider like 0 to t kind of thing. So, we may keep open on one side and closed on another side which is of course, not very strict one may put it in another way also you may put close on this side open on this side, but this is for convenience because as you will observe that thing. So, let us consider say let x t be the number of occurrences in the interval 0 to t then we have probability of x t is equal to n let me denote it by p and t. So, this assumption 2, this assumption 2 it is equivalent to that p 1 h is equal to some lambda h, h usually means h is a small. That means, probability of a single occurrence in an interval of length h is proportional to the length of the interval. And similarly this assumption 3 this is equivalent to saying p 2 h plus p 3 h and so on that is negligible. Now, for negligibility we use a mathematical notation small o of h. So, basically this is meaning 1 minus p 0 h minus p 1 h is equal to o h which is also equivalent to saying that your p 0 h is equal to 1 minus lambda h minus o h minus o h r plus o h does not matter you may put like this. And here also you may put small o h because just to take care of further approximations here. Under these assumptions the distribution of the number of occurrences will follow a Poisson distribution. So, in the next lecture I will be showing you how this is following I will show you the derivation here. And then we will also see that how it is leading to describing the distributions of various other phenomena in the Poisson process. For example, a exponential distribution, a gamma distribution and so on. So, that I will be doing in the following lectures.