 Now, let us say you have this random sample x 1 to x n drawn from this Gaussian mu sigma square. If you look into this quantity x bar minus mu divided by sigma square 1, the claim is this is going to be Gaussian with mean 0 and variance 1 or this is going to be normal distribution. Everybody agree with this? How to compute again? Again just to do go and do your moment generating function calculate for this and you will actually see that you will end up getting ok. One quick way to verify that this is indeed have a mean is like how to what is the if you compute this? Can I write it as can I write it like this? Because sigma square by constant I pulled it out and expectation of x bar minus mu and I know that expectation of x bar is mu that is why it is 0 and in a similar way you can also compute variance of x bar minus mu by sigma square by n which is nothing but covariance of x bar mu agree with this and now we know that this is nothing but 1 upon sigma square by n whole square covariance of x bar minus mu x bar minus mu right. Now, if you just expand this you will see that this will actually turn out to be just one ok. Now, first let us simplify further break down the problem instead of assuming that both mu and sigma square unknown let us assume that sigma square is known ok. What is unknown is only the mean value is unknown and now if that is the case can I understand from this value how much is the difference? Now, I now only mu is unknown and I want to quantify how much is the difference? I want to basically let us say I want to ask this question whether this is going to be greater than epsilon if I want to ask this question. Now, what I know is instead of this what I will consider is I will consider this quantity which is nothing but x bar minus mu greater than epsilon sigma square by n. Now, can I ask what is the probability that x bar minus mu is greater than epsilon sigma square by n. Can I compute this probability? . Huh? . Ok, what is that phi of ok x bar what is the distribution of x bar minus mu normal ok. Everybody agree? Now, I cannot directly use the phi function here right I need to do couple of more steps here what are those couple of more steps? Ok let us write here this is nothing but probability that x bar minus mu greater than epsilon sigma square by n and no wait a minute. So, what I want is this difference I wanted to be lying in this region outside right this difference I wanted to be exceeding some region let us write this this x minus mu should be let us say ok maybe you should write like this on this and this probability that x bar minus mu greater than epsilon sigma square by n plus probability that x bar minus mu less than minus epsilon sigma by square is this correct? I am just writing this part into these two probabilities like when this is positive quantity it should be there when it is negative it should be equal to this ok. Now, what I know about this I can further simplify this step ok where I can write now this is nothing but 1 minus probability that x bar minus mu less than epsilon sigma square by n plus probability that x minus mu less than or equals to minus epsilon sigma square by and by our definition what is this quantity? But notice that x minus mu what is the distribution of x minus mu and when I want to apply phi function I want to make sure that the distribution of this quantity is normally distributed. So, what I will do is I think I should have avoided this circus what should I have done is I should have kept this x minus mu sigma square by n this is less than epsilon and plus probability that x minus mu sigma square by n less than minus epsilon and now I know that this quantity is what normally distributed right. So, this is nothing but 1 minus phi of phi of epsilon right and what is this other quantity this is phi of minus epsilon right. Now, notice that this all I could do provided my sigma square is known. Now, I know that if I know that now I am able to understand how much is my difference actually there is one small caveat here like the way I used I am now basically asking my difference x bar minus mu to be greater than this quantity ok. Now I am, but this can be anything that is given to you suppose let us say you have been asked x bar minus mu let us say you want to find it this value to be greater than gamma. Now, all this my method works if I set my gamma to be epsilon sigma square by n and so actual sigma is gamma n by sigma square right. So, if I want to know that my difference is greater than gamma then I should be writing this epsilon to be gamma n divided by sigma square. So, this should be actually 1 minus gamma n by sigma square plus gamma minus gamma n by sigma square is that clear? It is clear you wanted me to do more calculations is that fine all these details ok. Now, but unfortunately many times we may not even know gamma square right both mean and variance could be unknown, but in that case I cannot apply this directly ok. Then what is the method for that? For that we have a method which were proposed by one of the famous statistician long back I think in early 1800s GS Gosset I do not know it is 1800s or early 1900s long back statistics is a very old subject. So, lot of people have a in about lot of actually development in statistics is happened in early 1900s. So, lot of these results are pretty old actually, but now if you see that most of these results are now repackaged and now you see them as machine learning methods ok. When you do machine learning actually you will be basically doing this you will having data trying to extract information from the data ok. Now, when sigma square is n is known you could do this and you got a normal distribution and that helped you to find out what is the error. Now, but when you do not know I have to do find some proxy for sigma square right what is a good proxy for sigma square sample variance. So, that is what what I will do is instead of sigma square I will go and put sample variance I think why is this data make any mistake either should it be square root or it should be square root right everywhere. Everybody agree? It should have been square root of sigma square by instead of square root by n. Now, that is what now sigma squares rox is s square and when I put under square root it becomes simply s by square root n. Now, I am saying that ok I do not know sigma square let us take a its sample estimator and I will put it like this. Now, the resulting distribution of this notice that x bar is a random variable s is a random variable. So, this whole quantity is a random variable when this samples are coming from population which is Gaussian with mu sigma square then this quantity x bar minus mu s by n has a distribution which is called student t distribution with n minus degrees of freedom. Now, what is that distribution? Now, let us try to find out actually what is the distribution of this quantity and how does this t distribution look like ok. We will do some manipulation this is the quantity of our interest x bar minus mu divided by s by square root n what I will do is I will divide and multiply by sigma by square root n ok. If I do the sigma by square root n you will get this expression. Notice that I do not know what is the actual value of sigma, but let us say that I could do this sigma by square root n I have just done this. Now, I am going to look into this as a ratio of two random variables the numerator is this quantity and the denominator is this quantity. What is the purpose of doing it like this separating numerator and denominator? If you now notice is the numerator is a quantity which I know right which is x bar minus mu sigma by x n square which has a Gaussian distribution and the denominator has this s square by sigma square. Suppose if I multiply it by n minus 1 I know that we just discuss that this has a chi square distribution with n minus 1 degrees of freedom right. So, what I have basically done is and also notice that we just discuss that this quantity and this quantity they are independent of each other when it is Gaussian distributed is not it. So, this quantity is just like this normalized version centralized and normalized and this was sigma square say square by sigma square when we multiplied by n minus 1 we notice that this is a chi square distribution and these two are independent. So, you have to be careful like what we actually said is x square and s square are independent ok, but here we are saying this quantity u and this quantity v they are independent this needs little bit thinking right. We know that s bar and s square are independent, but here x square is being subtracted by constant mu and normalized by sigma whereas, this quantity s square is also multiplied by constant n minus 1 by sigma square. So, both these s square here by multiplied by constant and this x bar is also manipulated with some constant. If you just add and multiply add some constant or multiply constants does this change their independence nature? No right. So, that is why that is why we are we are doing this sigma is unknown, but you just take some sigma value it is constant here it is unknown, but in this computation it is a constant ok. We are notice that I have right hand exactly sigma I have not put it as an it is estimator or something. So, this is that constant only. Now, if you now look into this this quantity here nothing, but in terms of our representation numerator is simply u and the denominator is simply v by n minus 1 and this quantity and denominator they are independent of each other. So, this helps us to compute the distribution of this quantity which you are calling as t distribution. Now, if you use this property now what we have let us call that is call our z, z is our quantity of interest which is mu by s by square root of n which now we are able to write it as v u by v by n minus 1. So, I can treat as now z as function of these two random variables u and v and we know how to compute distribution of functions of random variables right. If you recall all the methods that we studied earlier where we use Jacobian and all right to find out the distribution of joint distributions we can use that method here. And now if you do that I am jumping the step now if you do that you will actually end up the distribution of here I am calling this as x this as x will have t distribution denoted by T p and it will end up with this kind of pdf function and this is for all x between minus infinity to plus infinity and this is exactly called t distribution or denoted as T p with p degrees of distribution. So, when I say now henceforth my notation is when I write T p this is a t distribution with p degrees of freedom and as a special case when I said p equals to 1 this will end up with this simple form which has a special name called Cauchy distribution ok. Now quickly you may be wondering how that this magically come up this distribution. As I said this come from computing the joint distributions ok. So, let us take you have this u and v and we know that u and v are independent and we know the distribution of u, u is Cauchy distribution and v is chi-square distributed with n minus 1 degrees of freedom since u and v are independent I should be able to write their joint pdf right just take their product because of their independence and now define two random variables x like this y like this and then just to find the joint distribution of f of y in terms of the joint distribution of u and v that we have done using Jacobian method earlier and now you will see that after that you find out after you find the joint distribution you find the marginal that will exactly give you whatever written here. Again this needs to be verified that is why I have this you need to check. Again I will post you the book where these computations are done, but you need to go and verify that it is very difficult and not really difficult it will be like I do not want to sit out and work out all the details this all the details you already know ok.