 So we were doing inferential statistics and we remember that in inferential statistics maybe talked about two major objectives or goals. Just maybe one was estimation and the other was testing hypothesis. In inferential statistics, what we are more interested in, we have talked in previous lectures, that we are interested in that by studying a sample, we draw inference about the population and we should be able to estimate the population parameter. The majority of the cases may, it's not really possible to study the entire population. So what we do, we just take a small sample from the population, we study it, we calculate sample statistics and then we make certain assumptions about the underlying parent population and then we test those assumptions. To test those assumptions, we use different statistical tests, like Z test, T test or F test. In Z test, we have already done that Z statistics, when we did probability distribution, when we did Z scores and we saw how we can find or test any assumption using Z scores or Z statistics. We use Z statistics when we know the population parameters, when we know the population variance and the meaning of population. And when our underlying population is normally distributed, we use Z statistics. And I'm sure you remember that in Z, our formula was mean minus mu, that is, we minus the mean of the population from the sample, and then we divide it from the standard error, where the standard error was equal to sigma over n under root. So this means that to go on Z statistics or to use the Z test, we need to know the mean of mu, that is, the mean of the population and we should know the sigma. If we know these two things, then we can calculate Z statistics and then we compare the Z statistics in the table if the calculated value is greater than the table value, then we reject our null hypothesis. So similarly, we also use T statistics. T statistics is a little different from Z, because the mean in it is the same formula, but the biggest difference we have is that in T statistics, we have the variance of the population unknown, that is, we are studying the underlying population and we have most of the time, in fact almost every time when we draw any sample or we want to test the hypothesis, then we do not know the variance, standard deviation of the underlying population. So what we do is we estimate the parameters of the population, that is, sigma and mu, using the sample statistic and then we use the T test. So in T statistics, we use the mean minus mu formula but now when we divide it from the standard error, then instead of sigma, we know the standard deviation of the sample. We do not know the sigma of the population, so we replace it with the standard deviation of the sample, when we use the standard deviation of the sample, we usually go with unbiased estimation. When we use the standard deviation of the sample population's standard deviation, we use the concept of N minus 1 to make it unbiased estimation. What is N minus 1, which is degrees of freedom? I will explain it shortly. But just like that, we calculate the T test, but we replace the variance with the standard deviation. And you remember that to calculate the standard deviation, our formula summation x minus means square, that is sum of square deviations divided by N. But when we are using it in the T statistic, then we will use N minus 1. So we will calculate variance. You remember that S is equal to standard deviation, whereas S square is equal to variance. When we take the standard deviation square, that is variance. So we use the formula to calculate it. And we plug in values when we calculate the standard deviation of the sample. So we plug in the values in the T test, just like we said, we can mean minus mu divided by S over N under root. You can also call it S over N under root, or you can call it variance sample, that is squared standard deviation of the sample divided by N under root. Both are the same things. You can use it like this. Both ways you will get the same answer. So T statistics, we calculate it. It is used to test hypotheses about an unknown population mean where the value of the sigma is unknown. So mainly, when we will use, students question me, usually we talk about T and Z because Z is a large sample statistic and T is a small sample statistic because T distribution's shape is flatter because N size is smaller. But if you are doing your dissertation, you are collecting data, even you have collected 100 or 200 data points, but still you will be using T and not Z test. Why? Because always remember that when we go to the T, we always go to the T because the sigma is unknown. We do not know the standard deviation of the population. Even if we are using a large sample, like 100 to 100, still we will be using T test to test our hypotheses and assumptions. After the degrees of freedom, I said that when we use the standard deviation of the sample, instead of the population, we use N minus 1, that is, unbiased estimation. N minus 1 means degrees of freedom. Just I will give you an example to explain what is degrees of freedom. By the way, its quoted definition is to describe the number of scores in a sample that are independent and free to vary. You are independent and free to vary means that you first do, my population and its variance is 4, right? Now you have to take samples out of the same population. For example, 3 people have taken samples, 1 mean has come, 2 mean has come, 3 mean has come, 4. So sample 1, sample 2, sample 3 means have come, right? But actually I know that the mean of sample is 4. So what will I do? Because unbiased estimation means that you have to get one cell free, degrees of freedom. You have to keep one cell free so that you can put the same value in it, which I can add to the average or I can approach or I can approximate or estimate population's variance. Now if there are 4 population's variance, that means I will keep one cell free so I can put that value in there to get unbiased estimation. So if we do 4 plus 3 plus 2 and average population's mean estimate, so 4 plus 3, 7 plus 2, 9, so 9 divided by 3 is equal to 3. It comes as 3, whereas the actual population's mean is not 3, it is 4. So for degrees of freedom, we do unbiased estimation that we free one cell so that we can put the same value in it, we can approach, we can estimate population in an unbiased way. So degrees of freedom basically is the number of scores in a sample that are independent and free to vary and it is always equal to n minus 1. So jitna bhi aap ka n size hai, n minus 1, yani you will keep one cell free so you can vary that. Because the sample mean places the restriction on the value of one score in the sample, there are n minus 1 degrees of freedom for a sample with n scores. Exactly, we will do n minus 1 concept. The t-distribution, it is the complete set of t-values. We have read the z-distribution and we have said that you can find out the value of z at any place and find out the area. So z could be 0.05 or 1.333, just like t-distribution. In that, you can find any value, every possible random sample for a specific sample size, for a specific degrees of freedom, you can find out the value. T-distribution approximates the shape of a normal distribution as n increases. I have shown you in this. As the n-size increases, if our n-size is very small, our variability is very high and our t-distribution shape is flatter. But as our n-size increases, it approximates the normal distribution. That is why it is said that when n-30 increases, it approximates the normal curve or normal distribution. But with a small sample size or the small degrees of freedom, that shape of the t-distribution would always be a little bit flatter. T-distribution has more variability because if the n-size decreases, the variability will increase. As the n-size increases, the scores become tighter and our variability decreases. Similarly, we find out the proportions and probabilities as we did in z-score. Remember that for any score, when we put that on the normal distribution, we identify or mark the area and we find the probability of any score occurring in that particular area. In t-distribution, there is a table. In that table, we compare that value with our calculated value. This is an example of a t-table. You can find it at the end of any statistic book, just like the z-table. It says one-tail, two-tail, you can select your alpha level and you can see the critical value against the degrees of freedom. We will do the t-test manually and solve few numericals in the next manner.