 Now, let us move on to the t test. By the name, we see that t distributions are associated here. Any statistical hypothesis test in which test statistics follows students t distribution under null hypothesis is called t test. The most common use of t test comes when the means of two populations are different. So, notice that in the previous example, when we computed about the p values and p value test we were there dealing with one sample, I mean one population x 1, x 2. Now, we may be dealing with so, I am going to write them like this one population I am going to superscript them one to indicated this corresponds one. You may also have another population x 2 which I am going to superscript with 2. Right now and this numbers like the number of samples could differ. Now, the question here is I want to check let us say the hypothesis is this is whether they are coming from a population which has the same mean. Suppose, let us say I am assuming this is like f given by mu 1 and this is let us say mu. Now, my null hypothesis is mu 1 is equals to sorry equals to mu 2 and my alternate hypothesis could be like a mu 1 not equals to mu 2. So, this is the common usage where we want to this test comes into picture, but it is not necessary that we have to deal with two population in using two tests it could be used on a single population also. So, let us now look into this case like I mean we have whatever I explained it is just written here the most frequently are like one sample or two sample in the one sample test we will check whether the population mean has the same specified value in the null hypothesis. That is they are having the same value let us say to something mu common and two the two sample test says that whether null hypothesis says that the means of the two populations or equal oh sorry like let me rewind this. So, this is okay now let me refine this. So, in the one sample location test we have just one population samples given to us and there my hypothesis is to just whether my parameter is equals to mu which is the underlying specified parameters for the null hypothesis mu. And of course, the null hypothesis here could be like a theta is not equals to mu. And this is for the one case and for the two location test I have already given here in the two location I have two sets of sample and there I want to check whether this two samples sorry two populations have the same mean or they are differing. Often this two sample tests are referred to as unpaid or independent sample t test and we will just see this. Okay to apply two tests we make some assumptions which by the way we will hold for the Gaussian distribution by default Gaussian samples by default but by making these assumptions maybe we can say something more general. The most test statistics are like in general this when that statistics I am interested will have this form z by s. So, there is a little notational change I would like to continue to denote this by z where z and s are functions of data. So, z and s are themselves random variables. Okay now first for let us look into one sample t test and in this one sample the numerator z can be simply the sample mean. So, here notice that in the one sample test I am basically testing the hypothesis that my parameter theta is mu or not. It is a one sample test and let us say this is a one sample two sided test like one sample two sided test. So, here the numerator is the sample mean centralized by subtracting the true the parameter of your null hypothesis and the denominator is your estimate of your standard deviation. Okay s. Now in this one sample test we are going to assume that this x bar follows a normal distribution with mu and variance sigma by n and always in this recall that x bar is your estimate which estimate providing estimate for null hypothesis parameter. You are claiming by this you are claiming that x bar is giving me a good representation of mu and that is what like x bar have this x bar is the average. So, it will have the variance sigma square by n and we are also going to assume that s square s square recall that s square is the estimate unbiased estimate of the variance when you multiplied by n minus 1 divided by sigma square it follows a chi square distribution with n minus degrees of freedom and z r and z and s are independent. Notice that even though we put this as assumption when my x my random sample is coming from Gaussian distributed with parameter let us say mu and sigma this assumptions naturally hold which we have already seen when we are talked about sampling from random distribution sorry when we talked about sampling and studied properties of random samples. In the two sample test where let us say we have one set of random sample like this we are going to say that this means of the two population to become compared should form a normal distribution that is if you are going to compute this this is going to be normal and also if you are going to compute the mean of this they should be normal and both of this having the same variance the samples are coming from a population distribution having same variance and that the data this set of samples this random samples should be independent or the sample independently and so we can just say that they are going to have the same variance the samples are generated from a underlying population which have the same variance and we want their sample mean to follow normal distribution. Notice that all distribution again holds when your samples are drawn from Gaussian like if they are drawn from let us say some Gaussian mu 1 and sigma square and this one let us say is coming from let us say mu 2 and sigma square it is all this properties naturally holds good. Now let us see what would be the statistics for me here to compute the p-value. Now that let us focus on one sample test here. Okay now let us directly look into this and here in the one sample test my I am basically testing the hypothesis whether mu naught or not we let us say take the two sided case for this I can have a statistics which is x bar minus mu naught divided by s by square root n. Notice that earlier it was sigma sorry sigma now I have replaced it by s because I do not know sigma also in my case. Okay so here we are basically saying I do not know none of this both this mean and variance are unknown. If we knew that we could have gone with the p test which we already did before. Now if you recall by our assumption that this is normal distributed and this is chi square distribution with n minus degrees of freedom. If you look into the ratio the ratio is distributed as student t distributions with n minus 1 degrees of freedom which we denote like this. Now with this I can again go back and compute my p-values what is the probability that z is greater than or equals to z. Now here my z is t distributed with n minus 1 degrees of freedom and from the t distribution table for any given z which is coming from my data I can readily compute this value and compare it against a given significance level and decide whether my claims are statistically significant or not. Okay and this is where the I hope it is clear how the t distribution came into picture here because we do not know the variance that is why we used unbiased estimator for variance here or like rather unbiased estimator for standard variance here and once we do that we know already that this statistics follows t distribution and we can use the properties of t distributions here. Now fine this is clear I hope for the case of one sample test. Now how to check this for the two sample case and recall that for the two sample case I am going to assume the variance are same I am also going to assume the two samples have the same number of samples they could be different but I am going to start with the case where they have the same number of samples. So, here xn1 so let us case where n1 equals to n2. Now in this case I am interested in the hypothesis that whether the their means are equal or not one can argue that after little bit of manipulation this could be taken as its statistics where x1 bar as usual is the sample mean coming from the first population and x2 bar is the sample mean coming from the second population. And sp here is the value of the standard deviation estimator unbiased value that I got and this is again here x1 xs1 square is the unbiased estimator of your variance and xs2 squared here it is going to be unbiased sample estimates of your variance of sample 2. So, notice that we have assumed in this case they are same actually you could combine these two samples and get one value for your common variance or sorry estimate of the variance or your estimate combine all the samples like in this case maybe let us call this n you can combine all the samples we will you are going to end up with the 2n samples to get combine all to get your estimate of standard deviation. Even though here I have written that as if this is computed from this n1 sample separately from the population of the first first population and x2 is coming from second population, but for the variance since they have the same common variance and all the samples are independent of each other you can just use all the 2n samples to get one standard sorry estimate of your standard deviation. And now one can show that or it is actually straightforward to observe that the z here is going to be a t distribution with 2n minus 2 degrees of freedom where n is the common sample size. Again now once you have this for a given z you can compute the p value and then compare against the given significance level to see whether you want to accept or reject or like accept or reject the altered hypothesis. This could be also extended to the case when the number of sample size or not equal n1 is not same as n2, but still under the same variance case. Again here one can show that the statistics that is of relevance here can be given given as the difference of the sample means of the populations divided by their standard estimated standard deviation and that could be computed in this fashion. I am just leaving this calculations, but you can verify that indeed the standard deviation the estimate of the standard deviation can be given that. And here now we can see that this z the denominator here is actually a chi square distribution with n1 plus n2 minus 2 degrees of freedom and now you can argue that the z is also have t distribution with n1 plus n2 minus 2 degrees of freedom. Now once you know this is the t distribution with a certain degrees of freedom you can again go and compute your p value and compare against your significance level and decide to accept the alternate hypothesis or reject your alternate hypothesis or not accept the alternate hypothesis. Now this is where whenever we have our test statistics following the t distribution we can use all this, but it may happen that every time we may not start statistics may not be just like a t distribution you may end up with some situation where your statistical tests will involve test statistic which has f distribution under the null hypothesis and you may have to use properties of the distribution to compute your p value. So, this f distribution we will not go into detail here like I just want to give you an idea of what is going to how this f distribution can possibly arise. Suppose let us say you have now more than two populations let us call this as 2 sorry 2 1 all the way up to x to n and now x 3 is x 1 3 all the way up to x n 3. Let us say this is with some distribution with parameter mu 1 and this one with some distribution with parameter q and this one with distribution with parameter m 3. So, one you may be interested in testing the hypothesis that whether all these values are equal or mu i equals to mu j for some ij when you can go on like you can have n number of such or m number of such populations for some ij pair. Now, then you are going to answer such question you may have to construct certain statistics and after some analysis one may end up with the statistics which actually satisfies f distribution and in fact, this is the case in analysis of variance. So, in analysis of variance you are interested in exactly this question whether the underlying parameters or the population parameters are all same or they differ and when you are going to construct test statistics for that and when you have to suitably make the test statistics so that you are able to say something about your claim then that statistics will have f distributions. We will not get into the details here, but that is something you can just keep in mind and study more. And also that arises in regression models like for example, when you want to have let us say I hope all many of you might be already knowing linear regression. So, really a regression's text of given a data point x this is like your input it was trying to find a relation between x and y and this relation between x and y happens through this parameter theta and there is something motivation or noise here. So, given x we want to find out what is the best y here and initially the theta is unknown we want to find what is that theta and given a set of observation like let us say y 1, x 1, y 2, x 2 like that you have some n observation and based on that you want to find a best estimation or representation of that theta to test whether whatever you are done are good you need to have a statistics and when you find a statistics there the f distribution arises. To just briefly say little more about analysis of variance. So, the one way ANOVA also refers to as one factor ANOVA is a parametric test you to test for a statistically significant difference of an outcome between 3 or more groups. So, here you would be interested to consider when 3 or more groups are there when it is less than 3 we already know how to use significance test using our p values computed based on our t distributions. So, here we would be interested in checking I want to challenge saying that at least one of the groups is statistically significant sorry statistically significantly different than the other ok. So, actually the name ANOVA here it talks about analysis of variance but it is not actually about the variance it is actually analysis of the variance in the means analysis of variance in means. Like for example, as in the previous example I said like your null hypothesis is checking whether all the parameters are same and your alternate hypothesis is are they differ. So, the variance in the mean is what the ANOVA test is trying to identify whether the parameters or let us say the mean values of all these distributions are the same or not is what we are going to take as a null hypothesis and try to validate it ok. So, if it so happens that when I do ANOVA test the p value happens to be statistically significant then one cannot tell which group is different like among this which groups are different like maybe possibly they are all the same you do not have enough evidence to say that they are different. So, this we already this is like a brief things which we already talked about like suppose let us say we have this x 1, x 2 and x 3 like that right like we had this samples like as I written here like so these are like all independent variables and we want to define that defines the groups that are to be compared let us say these are all the values here all the the grades of a 3 bunch of students and we want to see that whether their average scores are same or I want to validate the hypothesis that their average test will be same and we maybe we can use ANOVA test here and there the F test arises or like as I said it could be also can come in let us say in kind of a regression models where you have a bunch of like you have we are going to observe data which are of the form y equals to theta transpose x plus noise and this maybe you may be observing this data for a bunch of let us say you can observe let me call this y 1 x 1 1 and y 1 2 x 1 2 let us say like this you have some bunch of data and this is one population and another could be like let us say y 2 x 2 1 y 2 2 x 2 2 then let us say y 2 n and y 2 n and the third could be y 3 1 x 3 1 y 3 2 x 3 2 like this these are like three bunch of datas you are observed which are like where y's are dependent on your x and now you want to claim whether these parameters the the three parameters associated with this through polynomial then through this linear relation that is whether the theta 1 and the theta 2 and the theta 3 are same or not then you want to again want to use this ANOVA test where F test can arise or F distribution can arise where you can calculate again your p value using those F distribution tables. So, with this I hope you people got a summary of what is p value what is a p test one can use when we know the variance and we are dealing with one population and then you got some exposure to t values t test where we do not know the variance and we have to find the population mean or the parameter of a single sample or two samples then we also talked about when we have to look for more than two samples whether they have the same parameters or not either in the independent case or the dependence case how F distribution can help us compute the p value and decide our significance of the statistical test. So, with this we will stop here. Thank you.