 Next time we discussed the exponential distribution and the pdf or the equality density function of an exponential distribution for the parameter lambda is lambda e to the power minus lambda x and x takes values greater than equal to 0. Now, we have already seen the various properties of exponential distribution in terms of the mean and variance. There is a very important property of the exponential distribution and that is exponential variable and that is of the memory less property. Now, in order to motivate you to this property, let us take a simple example of a pen drive right. Every each one of you must be having a owning a pen drive and I have a pen drive which is one year old, two GB pen drive which is one year old. Some of you may be having the same pen drive which is six months old right. So, the basic question now is if I consider the random variable x to be the time at the length of time this pen drive works before a failure occurs. So, the length of time the pen drive works before a failure occurs and I am interested in the probability that x is greater than six months. So, in other words I have a one year old pen drive you have a six months old pen drive what is the probability that the pen drive would work for an additional six months at least. Now, what are the chances that my pen drive would have a failure within six months we service the probability that your pen drive would have a failure within six months which one is more likely. My pen drive having a failure within the next six months or your pen drive which is only six months old having a failure within the next six months any guesses which one is more likely usually instead of a pen drive if I had talked about a mobile phone then I would expect that my mobile phone may go out of order much earlier than yours because yours is relatively new, but if this time variable before the failure occurs if it follows an exponential distribution then my pen drive or your pen drive is as good as new respect with respect to a person who is having a new pen drive. So, let us look at this property of the exponential random variable which has this memory less property. Let us formally define what a memory less property is a random variable x is said to be memory less if the probability that x is greater than s plus t given that x is greater than t is same as the probability that x is greater than s for all s t greater than equal to 0 and if we imagine that x represents the length of time that the certain item functions before failing as we discussed then it amounts to saying that the probability of additional functional life of t unit old item exceeds s that would be the probability that the life is greater than s plus t given that the life is already t is t units old. So, that is what this interprets now thus for memory less random variable x the distribution of additional functional life of an item of age t is the same as that of a new item in other words there is no need to remember the age of a functional item since as long as it is still functional it is as good as new. So, such a property in fact holds for the exponential random variable because of the simple reason that the probability of x being greater than s plus t given x greater than t is same as the probability of x greater than s. Now, how do we see that this holds for the exponential random variable. So, let us rewrite the condition for memory less random variable x. So, we have the probability x greater than s plus t given x is greater than t is equal to probability of x greater than s this is the definition of the memory less property. So, this being the conditional probability we can write it as the probability of the intersection of the two events x greater than s plus t and x greater than t which is this divided by the probability of x greater than t that is the left hand side this is being equated to probability of x greater than s. Now, left hand side here probability of the joint probability that x is greater than s plus t and x is greater than t that is the same as the even that x is greater than s plus t. And therefore, this is the equivalent form of what is required for a memory less random variable x. Now, let us see what happens when x is an exponential random variable then we know that probability of x greater than x this is nothing but 1 minus the c d f of x. And we have already derived the c d f of x earlier if you recall which was this. So, 1 minus this quantity is e to the power minus lambda x that is what is mentioned here. Now, so we can write down the probability that x is greater than s plus t that is e to the power minus lambda s plus t which can be factored as here and each one of these are the probabilities which is probability of x greater than s and probability of x greater than t. So, we clearly see that when we start with a random variable x which has a exponential distribution it leads to the fact that x is a has memory less property. So, hence exponentially distributed random variable are memory less. In fact, it can be shown that they are the only random variables that are memory that are memory less. So, that is why this is a very important characterization of an exponential distribution that is it has the memory less property. Let us take an example where we can exploit this property of a exponential random variable. We have here an example related to the battery where the number of kilometers that my car can run before its battery wears out is exponentially distributed with an average value of 10,000 kilometers. So, that is the average life of the battery and that life is having an exponential distribution. Now, if I desire to take a 5000 kilometer trip what is the probability that I will be able to complete the trip without replacing my car battery. Now, in here note that we have not indicated how long the battery had been in use. We have indicated that the average life of the battery before it wears out is 10,000 kilometers. Now, if I have already covered 9,000 kilometers and I decide to take this trip which is of 5000 kilometers. One would expect that the chances of it wearing out is very likely during the trip. However, since I am indicating that this life is exponentially distributed. So, we have a memory less property as on the fact that how long the battery has been already in use. So, it follows by the memory less property of the exponential distribution that the remaining life time in units of 1000 kilometers of the battery is exponential with parameter lambda and what is lambda? Lambda is equal to 1 by the mean mu. So, in units of 1000 lambda is 1 by 10. Recall that the mean of an exponential distribution is 1 by lambda. Therefore, lambda is 1 by mu and mu is given to be 10,000 which in units of 1000 is 10. So, as that is why I have written 1 by 10 as the parameter lambda. Hence, the probability that the remaining life time is greater than 5 units or 5000 kilometers is nothing but 1 minus the CDF at 5 which is 1 minus 1 minus e to the power minus 5 lambda and that works out to 0.604. So, I have been able to work out this remaining life time being greater than 5000 kilometers. That means I would be able to complete the trip and the probability has been worked out to be 0.6 and this I could work out that easily without the knowledge of how long the battery had been in use because of the fact that the battery life had been declared as following an exponential distribution. Now what happens if we do not know the distribution however if the life time distribution is not exponential then the probability that the life time is greater than say the life the battery had been used for say t kilometers already t units of kilometers. So, plus 5 given that the life time is greater than t this is what we need to work out and this is equal to 1 minus the CDF at t plus 5 divided by 1 minus the CDF at t. So, if we replace the cumulative distribution values corresponding to the exponential distribution we will get the same result as we got earlier. However, since here we do not know the distribution or rather we know that the distribution is not exponential. Therefore, we have to know the explicit form of the distribution before we can actually compute these values where here t is the number of kilometers in units of 1000 that the battery had been in use prior to the start. So, this knowledge becomes important. So, therefore, if the distribution is not exponential additional information is needed namely t before the desired probability can be calculated. So, we cannot blindly use the memory less property we have to first safeguard that it has the random variable in use as the exponential distribution. We next go to another important continuous random variable that is the chi square random variable the corresponding distribution is said to be the chi square distribution. So, let nu be a positive integer then a random variable x is said to have a chi square distribution with parameter nu if the probability density function of x is of this form. So, we see that this random variable has positive probabilities for x get then equal to 0 this form is complicated enough to actually work out the areas under the chi square curve. However, here also we will resort to tables now the parameter nu is called the number of degrees of freedom of the random variable x the symbol chi square is often used in place of writing in words chi square. An important result of the chi square distribution is as follows if x 1 x 2 x n are n independent and identically distributed random variables each following the normal distribution with mean mu and variance sigma square. Then the sum of squares of the standardized values of these variates they follow a chi square distribution with degrees of freedom being n as we started with n of these i i d random variables. So, basically this is nothing but the sum of squares of standard normal variables that follows the chi square distribution provided those standard normal variates are all independent. As we defined the critical value corresponding to the normal variate where we defined z alpha here we have z square we have the chi square alpha n which is called the chi square critical value. And this denote the number of measurement number of the measurement axis such that alpha of the area under the chi square curve with n degrees of freedom lies to the right of this value. In symbols basically it means that this chi square alpha n is that value on the measurement axis such that the area on the right of it under the chi square curve is alpha. So, alternatively we can also mention that if x is a chi square random variables with n degrees of freedom then for alpha lying between 0 and 1 the quantity critical the critical value chi square alpha n is defined to be such that the probability of x greater than equal to chi square alpha n is equal to alpha. Now, I would like to mention here that the chi square curve in general is not symmetric. So, suppose this is the chi square curve for say n equal to 3 this is for say n equal to 10 this is for say n equal to 50. So, what we see is as say n equal to 2. So, we see that as the degrees of freedom increases the chi square curve slowly shifts its shape towards the normal curve and this follows because of the fact that if x follows chi square with n degrees of freedom this implies that the expected value of x is n and the variance of x is 2 n. In other words we see that the spread also increases as n increases. So, the spread also increases so to be more precise. So, if this is so the other ones should have more spread. So, this is for n equal to say 50 and this could be say n equal to say 2. So, the spread of the distribution also increases because the variance increases. Apart from the spread increasing we also see that the shape takes the form of a normal curve. Now, we have this table this gives you the critical value the area on the right of a point. For example, let us see this happens to be let us consider this as say 0.01 suppose this area is 0.01 and I have this curve for say n equal to say 10 10 degrees of freedom. So, this is a chi square curve with degrees of freedom equal to 10. So, if I want to find out what is the value the chi square value such that the area on the right of that is 0.01. We can look at this table look at the number of degrees of freedom which is 10 and then look at the value of alpha which is here. So, this is the required answer 23.206. So, 209. So, this is how this table can be used for various critical values. And this table will be most useful as we go on and carry out the hypothesis test or try to find out the confidence interval of the parameter estimates. So, what we have seen till now in the continuous distribution is the uniform distribution the normal distribution the exponential distribution and the chi squared distribution. Now, these distributions we would now try to apply in real life. Now, the whole objective of studying these distributions is to try to get to try to derive information of a population based on a small sample right. As a simple example your class has a total strength of your class is roughly 800 students right. And if my objective is to find what is the mean height of these students. So, what do I need to do in order to find out the mean height of the students of IC 1 or 2. I have to approach each and every student each and every of these 800 students check their heights at those and divide by 800 that would give me the population mean of IC mean height of the IC 1 or 2 students. But that is not possible for me right now because I do not have 800 students at my disposal. So, what I would do is I have a sample of say 100 students right now sitting here I can ask each one of these 100 students their height at those values divide by 100 what I get is a sample estimate of the population mean. The basic question now is how good is this sample mean based on a sample of size 100 to represent the population mean which is not known to me. Definitely the sample mean which I will get based on 100 sample points is not going to be equal to the population mean what are the chances for that almost 0 right. Still this sample mean what can we say how likely would this sample mean be close to the population mean. Can I make statements that if my sample mean here is say 5 feet 3 inches can I say that the population mean is going to be between if the sample mean is say 5 feet 3 inches the I can say that the population mean is going to be between 5 feet 2 inches and 5 feet 5 inches. And I am correct in making that statement with probability 0.99 can I make such a statement right. So, we have to derive these values based on the sample mean and the variability in the data to make statements of the type that based on a sample mean how good or how well informed can I make the people about the population mean. So, that requires what are called random samples. So, data from random samples drawn are used for inferring certain population characteristic of interest the distribution of the population variable is usually known except for some unknown population parameters. So, I know that the population of 800 students the distribution of the heights it is having a normal distribution if I draw the histogram of the heights it will have a very smooth bell shaped curve. So, I know that there are most of the people are in and around height of say 5 feet 4 inches and there would be few people who are very tall there are few people who are very short and. So, it is going to have a bell shaped distribution for the height variable what I do not know is the mean height. So, the distribution of the population variable is usually known except for some unknown population parameter which could be the population mean or the population variability or the population variance. Now, problems in which the form of the underlying distribution is specified up to a set of unknown parameters are called parametric inference problems. So, parametric inference problems are what we would be concentrating hence forth and till the end of this course which will be ending by hypothesis test. Now, let us formalize this the random variables x 1 x 2 x n are set to form a simple random sample of size n if the x i's are independent random variables 1 and the second is every x i has the same probability distribution that is if x 1 x 2 x n are independent random variables having the common identical distribution f then we say that they are i i d random sample of size n from the distribution f. So, basically this the word i i d the first i stands for independence and the second stands for identical. So, going back to the example of height I have a total of 800 students the sample I have of 100 students here whether it is a random sample or not how do we judge this. Now, if you look at all the 800 students and if I want to pick up if I want to know what are all possible samples how many samples are there all possible samples I have got 800 students in all out of which I need to pick up 100 students what are all the possibilities that I can have how many possible samples can I have. So, 800 out of that I have to pick up 100 in how many ways can I do this 800 choose 100. So, you can see how large is my sample space the possible number of samples and what is important of the random sample is that each one of these 800 choose 100 samples they should be equally likely. So, this group of 100 students is one among the 800 choose 100 samples. So, if this sample has got equal probability of being selected out of the total possible samples then you this sample is a random sample. So, that is equivalent to saying that each one of you x 1 x 2 x n here n is say 100 you are said to be I said to form a simple random sample of size n if x i's are independent random samples. So, each one of you who is the sample unit who is a member of the population it has been you have been picked up independently from the population and each one of you have the same probability distribution. That means, you call you are a member of the distribution of the heights which has the normal distribution. Now, let us look at the distribution of linear combinations of random variables given a collection of n random variables x 1 x 2 x n and n numerical constants a 1 a 2. So, on a n the random variable which is a linear combination of x 1 through x n as indicated here is basically called a linear combinations of the x i's and if you want to look at the expected value of such linear combinations we have the following results. Let x 1 x 2 x n have mean values mu 1 mu 2 mu n and variances sigma 1 square sigma 2 square so on sigma n square respectively. So, whether or not the x i's are independent the expected value of this linear combination is the linear combination of the expected values. That is the first result and if I now replace each one of these a i's by 1 by little n and consider that each one of these x 1 x 2 x n are identically distributed with a common mean say mu then the expected value of x bar becomes mu. Now, here this results whole x 1 x 2 x n for whether x i's are independent or not, but similar result may not hold for variances. So, what we have for variances if x 1 x 2 x n are independent then what can we say about the variance of the linear combination. The variance of the linear combination a 1 x 1 so on a n x n is a 1 square variance of x 1 so on a n square variance of x n. In other words it is the sum of the some of the variances weighted by the squares of the coefficient and the standard deviation of the combination is the square root of the variance. Here again for i i d x i's that means they are identically distributed with say common variance of sigma square and the coefficient a i's are all equal to 1 by n then we get the variance of x bar or simply find this as sigma square by n. So, this holds for i i d where the common variance is sigma square. What happens if we do not indicate any independence result holding among the random variables x 1 x 2 x n. Well in the absence of independence in general we have the variance of the linear combination as being equal to the sum of the co variances weighted by the coefficients involved in the terms of the covariance. So, this is in general which holds and you can clearly see that the when i is equal to j this the terms involved here when i is equal to j would be of the type a i square variance of x i and if they are independent then the covariance terms vanishes. So, this is a generalization of what we just now saw in for the result concerning the results where x 1 x 2 x n were independent. Now, the difference between two random variables well from the general result we have that expected value of x 1 minus x 2 would be the difference of the expectations and the situation where x 1 x 2 are independent the variance of the difference of the random variables x 1 x 2 is the sum of the variances and the sum of the variance follows because we have to weigh by the square of the coefficients involved in the combination. So, in terms of the distributional property if x 1 x 2 x n apart from being independent if we additionally know that they are normally distributed then any linear combination of these random variables they also have a normal distribution with certain mean and variance and that mean and variance is what we just now worked out the difference x 1 minus x 2 between two independent normally distributed variables is itself normally distributed also as a particular case therefore, it follows that if x 1 x 2 x n are independent and following normal distribution then the mean of these random variables that would also follow a normal distribution. So, what we now have is to begin with I know the distribution of the heights of the population which is normal and since my random sample consist of individuals who have been randomly picked up from the population therefore, the sample mean the distribution of the sample mean that would also follow a normal distribution. So, this brings us to statistics and their distribution sample mean which we discussed is a statistic, but in general statistics can be any function of the observations. So, let us define statistic formally a statistic is any quantity whose value can be calculated from sample data. So, we have a sample here of size 100 we can easily calculate the sample mean. So, the sample mean is a statistic we can also compute the sample variance the sample range the sample median all these are statistic and we can study each one of the statistic look at the behavior of the statistic with respect to the population from where it has been drawn. Now, prior to obtaining data there is uncertainty as to what value is taken by any particular statistic. Thus, a statistic is itself a random variable and its probability distribution is referred to as its sampling distribution. So, each one of the sample points that is a random variable being a random variable drawn from a population and we see that the statistic derived from a sample that itself is a random variable and therefore, it has a probability distribution and that probability distribution is said to be the sampling distribution. So, that we basically associate the information that that particular random variable is a sample is a statistic based from a sample. So, that is why we call it the sampling distribution though in fact, it is a probability distribution necessarily. Now, we like to consider a simulation experiment let us consider the probability distribution experimenter drawing 300 random samples each of size 25 from a normal population with mean 5.4 and standard deviation 0.2 ok. So, we have a population and from that population we are drawing a sample of size 25. So, I can draw us for example, now let us concentrate at this class and let us concentrate let us take that this particular class is my population. So, we have around say 100 students here. So, I would like to find out what is the mean height of the students in this class room. So, I can pick up a sample of size 25 I can pick up at random 25 students from here and what I will get is a mean based on these 25 students right. Now, I can ask to do this exercise by each one of few. So, each one of you can pick their own 25 students ok. So, let us get 300 such samples of size 25. So, each one would have a mean and let me call them the sample means x 1 bar x 2 bar so on x 300 bar. So, I have drawn samples of size 25 leading to the sample means and there are 300 such samples random samples that we have drawn. Now, if we draw the histogram of these sample means, this gives a good approximation of the sampling distribution of x bar. So, what is the behavior of the sample mean that can be figured out based on the histogram that I draw from these 300 values. It will give us a clear picture what is the distribution how the sample mean is distributed. Now, in order to find the exact sampling distribution of x bar, we need the distribution based on all the 300 choose 25 possible samples. So, let us look at the population. We have the mean as 5.4 and we have the mean as 5.4 sigma is 0.2. So, this is the population this is the population distribution of say variable height say x. So, S D so, it is 5.6 5.8 6 and on the left it is 5.2 5.6 5.6 5.6 5.6 5.6. 5 and 4.8. So, we know that the distribution of height is normal with mean 5.4 and since the S D is 0.2, we have scaled it accordingly to indicate that within 3 S D of the mean almost 99 percent of the area is covered. So, within 3 S D of the mean almost 99 percent of the area is covered which is the property of a normal distribution. We know the exact value as 99.7 percent of the area being covered within 3 S D. Now, from here if I now look at the sampling distribution of the variable mean height. So, this was for x and now I am looking at the distribution of mean heights. So, we can have different possible means we have a sample of we have considered 300 different means based on 300 different samples of size 25 each and the distribution should look something like this where. So, what we are saying is that the distribution of x bar is such that the mean of the random variable x bar that is 5.4 and the standard deviation of x bar is something like say smaller which is say this. So, what I am indicating now is as compared to the distribution of x, the distribution of x bar has the same central tendency which is that means it is in and around the same mean value of 5.4. However, the variability has reduced drastically. So, we have here based on samples of size 25. In general the question remains how to find this I have indicated the value is 0.04, but we will learn how we can find this variable the standard deviation value given that the population from where it has where the sample has been drawn has a way has a standard deviation of 0.2. In fact, let me just mention for curious mind this sigma x bar in general is the sigma from where we have drawn the sample divided by the square root of the sample size which gives me 0.2 by root 25 which becomes 0.04. So, what we see is that this variability in the sampling distribution that will go on reducing as my sample size n increases. So, the variability in the sampling distribution of x bar reduces as the sample size little n increases. So, if I ask this question had I had I not known the population mean suppose I do not know the population mean to be say 5.4 if I had got my sample mean to be say 5.45. So, I do not know the population mean let us forget that I know the population mean. So, I do not know it and based on a sample I have the sample mean as 5.45. What can you say about the population mean you only have the information that the sample mean is 5.45. So, well I would like to say that I can be quite confident in saying population mean is within 5.45 minus say twice of the standard deviation of the sampling distribution in saying population mean is within 5.45 minus say twice of the standard deviation of the sampling distribution and 5.45 plus twice of the standard deviation of the sampling distribution. Of x bar that is between 5.37 and 5.53. I make this statement that based on the sample mean of 5.45 I can be quite confident in saying that the population mean is within 5.37 and 5.53. The only thing now you need to explain is what do you mean by quite. So, anybody can tell how much is quite confident in terms of probability. Look at the sampling distribution curve as here look at this also you know that the SD is 0.04 and the sampling distribution of x bar is normal. Can you indicate what how confident am I in making this statement what is the empirical rule recall the empirical rule within how much SD of the mean 95 percent of the values lie within 2 SD and that is what I have done I have considered the 2 SD limits about the sample mean. So, the chances that the sample mean will lie within the centered 2 SD values is 0.95 and so the chances that it would any point within this would cover the true population mean that would be 0.95. So, here based on the distributional property of the sampling of the mean I can make statements that I am 95 percent confident that the true population mean is going to be between 5.37 and 5.53 based on the sample mean 5.45. So, this is what we would be studying further this is just an illustration and just to motivate that how we can exploit the distributional properties of the sampling distribution of x bar in order to make statements which will give us more confidence. So, I will stop here today any questions if not we can call it a day. Thank you.