 So, let us get started with today's session on the workshop on research methodologies. And I am going to continue yesterday's discussion of random variables. I will extend it today with a discussion of hypothesis testing. So, let me actually before I start talking about hypothesis testing respond to some of the feedback that I saw on Moodle yesterday. And in that sense let me clarify that the purpose of the introductory talks I am giving you on the essence of statistics is not to set you up with a tutorial on how to deal with the data sets that you have. So, at this point all I am doing is trying to elaborate on various issues that you need to be aware of and trying to clarify basic concepts that you need to take into an analysis of your own problem. So, I am deliberately staying away from taking up data sets and kind of sitting down tutorial style, trying to work out various measurements for instance of centrality as we discussed yesterday. So, all of that will be available in a later context. So, one of the things that Professor Fatak mentioned to you early on was that we do not end our engagement with you with just a set of talks. So, throughout the rest of the year until presumably we have a secondary work of following this we will remain in touch via Moodle and via Moodle I will elaborate on some of the technical aspects that some of you are raising for example, relating to specifics of the application of a particular test relative to the hypothesis that you are testing. So, again to recap all I wish to do is to point out the importance of properly setting up a hypothesis test and I do not wish to talk about the integrity of evaluating very specific problems from a numerical standpoint. It is important to get the concepts straight first in an introductory workshop and we will follow this up later on with a more elaborate discussion on offline using Moodle. So, then to recap what we did yesterday in terms of a discussion of random variables. So, remember what we were trying to do we observed that when we do our experiments we have started off with some physical model which we wish to prove or disprove and that physical model has in it a collection of parameters. For example, when we talked about identifying a distribution of heights the most relevant parameter was the population mean mu. So, we then realized that to get an estimate of the population mean mu you needed to work with a sample. So, if you are working with a sample you need to start collecting individual measurements and we said the merit of collecting individual measurements was that if you looked at any individual that individuals height gave you an estimate of mu the true global population average. So, the expected value of a measurement was going to be mu, but we then spent some time trying to discuss why it was more relevant to look at a collection of measurements together rather than at looking rather than looking at individual measurements. And we said that comes about because when you look at an individual measurement you do learn about mu, but the variance associated with that measurement is fixed at sigma square. Whereas, the moment you start looking at the sample mean the sample mean tells you about mu all right, but the massive advantage you have when you work in an arithmetic mean of a set of samples is that it is variance now is a function of the sample size. And if you control the sample size you can start getting more and more refined estimates of the population parameter. So, in other words at this point you already have a strong incentive to keep sampling more and more and then averaging your measurements and that starts giving you a true refined estimate of whatever it is you are trying to find. So, that I graphically pointed out to you as a task where you look for a point estimate in this case a point estimate of the population mean mu which we decide is x bar. And then we said that we wish to find an interval around this mu because we know that our measurements are themselves random variables. And therefore, anything computed from a collection of random variables is in turn a random variable. So, therefore, x bar is forced to be a random variable which in turn means it could change from day to day as you do your experiments. So, x bar therefore, now not only has to be talked about in the context of a point estimate, but you also need to know what range of values you might reasonably expect to find x bar within and that is our interval estimate. So, we said really it boils down to now asking if you are making a statement about mu is the x bar that you expect to see close enough to mu within a certain tolerance a certain tolerance here defined by the parameter delta. Now, this was true for heights, but this was also true for if you remember I kept going back to that acceleration due to gravity example. So, g where there is the hypothesis that the value of g is 9.8 and the question is would a set of experiments allow you to confirm that it was 9.8 meters per second square. So, before we get into the actual integrity of a hypothesis test I want to kind of step back and look at how the entire scientific method of performing a hypothesis test has come about. And so, let us look at the history of the scientific method and it goes back to somebody called Karl Popper and what is now called critical rationalism. So, if you want a quick summary of this you can go to Wikipedia and look up both Karl Popper and also critical rationalism. And the insight that he comes up with is that it is much easier to prove a statement false than it is to prove something true and in fact this reflects almost every single domain of our existence now. So, essentially the way this reasoning goes is that if you want to prove something true you probably have to do a large number of experiments look through all possible scenarios and then finally, come to the conclusion that something is true because basically you have not seen a contradiction to whatever you are proposing. On the other hand all it takes is one contradiction to your theory in other words we come up with one contradiction to prove a notion false and that shoots down the original hypothesis. So, for example, if I start with the hypothesis that all crows are black all it takes is for me to see one white crow and that shoots down the hypothesis that crows are black. So, it comes back to this observation that it is easier to do a set of experiments till you find a contradiction as opposed to doing a very large number of experiments looking for a contradiction and not finding one and then finally concluding that a notion is true. So, this in a sense is rooted in practicality we want to get to the root of a hypothesis we want to make a decision about a hypothesis but with the least number of experiments that we can possibly undertake. So, it turns out that several people have thought this way. So, Karl Popper himself writes that the criterion of scientific status of a theory is in its falsified ability can you prove something false or can you refute it or can you test it and in fact Einstein is thought of said though there is no definitive evidence of this that he is actually said it he is thought of said that there is no amount of experimentation that can ever prove him right, but a single experiment could however prove him wrong. So, it is with this mindset that we go about talking about how to set up a hypothesis in whichever domain it is that you are working in whichever problem it is that you are working with. There is a statement that you will make about the physical model you are working with about a parameter in the physical model that you are working with and. So, that physical that model parameter that you are working with the essence of what we are doing now is asking can we set up experiments where we try to prove something wrong about what you believe about that parameter. So, fundamentally then what we wish to do is to design an experiment where the emphasis is on proving the hypothesis wrong. This is actually quite easy to see in a couple of domains. So, the first is think of the legal system that we have. So, what happens with our legal system? We are aware of the fact that people are innocent and will be proven guilty. So, therefore in our legal system who has the burden of proof of trying to prove guilt or innocence. So, inherently the prosecution in any trial has the burden of trying to prove guilt beyond the shadow of a doubt and the defense the accused person does not have to prove his innocence. And so, the way this works if the prosecution does not prove guilt beyond the shadow of a doubt and actually both parts of this are important prove guilt and shadow of a doubt. And we will come back to this in the context of hypothesis just a little later, but if the prosecution does not prove guilt beyond the shadow of a doubt then the accused must walk away free. Now, notice something when the accused walks away free that does not mean that the accused is innocent. That does not mean the accused is innocent in the sense that the crime has not been committed. It just means that there has not been enough evidence demonstrated that the accused was truly innocent. So, in other words once again easier to prove that there is not enough evidence out there to prove a hypothesis. And therefore, the accused walks away from the trial, but it is possible of course that the accused remains actually inherently guilty of this, but has escaped without being caught. To some extent the hypothesis testing that we do reflects this. It is possible that occasionally we will get the conclusions of a hypothesis test wrong, but the fact of the matter is we will follow a procedure where we look to test a statement and prove the innocence or guilt of that statement and once in a while we will get it wrong. So, the second example where the scientific approach follows this aspect of falsification is actually in biology where you look at the production and testing of pharmaceuticals. So, if you think of a clinical trial. So, let us look at the situation where somebody has discovered a new molecule in their laboratory and claims that it is a new drug against cancer. So, it is supposed to be a good drug against cancer. Ideally you go test it out on a bunch of people who have cancer and prove that the disease improves, the state of the disease improves, the state of the patient improves. So, it boils down to now comparing a set of people who have taken this drug and whose health is presumably improving versus a set of people who are not taking this drug and their health hopefully is not improving. And that to the pharmaceutical company in turn proves that the drug works and that they should therefore go ahead and manufacture. And by the way this is big business each new drug that is put out particularly of a biological origin cost order of a billion dollars to research and put out and that is basically because it subsidizes the cost of failures with all the other trials for all the other candidate molecule. So, therefore it is extremely important for a company for a pharma company to do all the statistical tests and prove the hypothesis that the molecule that they are about to formulate and produce and sell is actually an active ingredient which can actually improve and diminish the effects of a disease. So, now the clinical trial is set up in a very curious way of course, you have got to find volunteers for this and typically people who are in a very bad shape with a health come forward to participate in a clinical trial. You need a reasonable number of people participating in a clinical trial and we saw that yesterday in the context of needing a large sample size if you wish to converge to the truth about something. So, with a clinical trial it is now a situation where you take this active molecule that you have an active ingredient that you have and you want to give it to a set of people and you want to be able to say that this molecule actually does what we claim to see in the lab in other words that it reduces the symptoms associated with the disease. So, the catch with this is you have got to actually now prove that this molecule does better than anything else out there. So, it is not enough to simply say that this molecule in given to a patient reduces the effects of the disease. It is also possible that the very act of a doctor putting on a white coat and giving a tablet to some patient that might itself make the patient feel a whole lot better about the state of his health and that might be the true underlying cause of why it feels better. So, if you reason out that there are other variables which potentially might make a patient's health improve and not the drug itself then in performing a hypothesis test they got to control for the effect of other variables. So, the other variables here are the fact that there is a fairly kind of sophisticated setting in which the doctor goes about dealing with the patient invariably you wear a white lab coat that makes the patient feel a whole lot better. There are plenty of nurses around that makes the patient feel a whole lot better and then of course you make an elaborate deal of giving out a set of tablets and them over and that automatically as I said makes most of us feel better that our concerns have been addressed and that we are going to be cured in a short while. So, this of course is well recognized in the medical literature and what in fact is now suggested is that while giving out the active ingredient to a set of people there must be another set of people who receive at the same time from the same doctors and nurses dressed in the same way. They must receive an identical looking set of tablets, but these tablets rather than having the active ingredient in them are just sugar pills. So, these by the way are called placebo. So, you have a situation therefore where you have two groups of people and you can see the making of a very elaborate experiment here. There are two groups of people one group will get for themselves the active ingredient the other group will go without the active ingredient, but are getting the sugar pills everything else about the entire activity is identical is the same sets of doctors the same sets of nurses everything else is identical. And what you hope to figure out is that if the two groups of people are dealt with over a period of time each receiving either sugar pills or the active molecule. Then you wish to look at these patients from these two groups after an interval of time to see whether the health has improved and of course the whole motivation behind doing this is that the active ingredient hopefully improve somebody's health. And so as a farmer company what you want is to of course hope that the result turns out to be positive with respect to the active ingredient, but notice a very subtle thing. You are not out to prove that the active ingredient works on each and every patient you can find all you are out to prove is that the active ingredient that you are giving out is doing better than a sugar pill. So, we set up a very simple hypothesis which is then essentially a trying to prove false which is that of the two groups of people who are getting pills the simple hypothesis that you are setting up is that both groups will effectively behave the same way and the end result of taking the tablets is the same ok. So, that of course is not what you want to happen what you want to happen is for the active molecule to do better, but by actually not setting the bar for the active molecule so high and in fact doing the reverse and setting the bar for the active molecule so low what you end up saying is that look let us compare whether active molecule does the same as a sugar pill and hopefully the answer comes back that it does not do the same as a sugar pill. In which case of course the interpretation is that it does better than the sugar pill and therefore it is indeed truly active in which case you start your manufacturing process and market your drug ok, but technically look at what is happen what you have ended up doing is that you have ended up saying that the molecule is better than a sugar pill or at least it is not equivalent to a sugar pill ok. Now, there is a difference between saying that and saying that the active the molecule you have just manufactured and tested is the best molecule out there for curing this particular disease. So, we are doing essentially a test of the activity of a molecule by falsifying a hypothesis that the molecule does nothing which is not necessarily an intuitive thing to do, but on other hand it is an easy thing to do because all it takes is a small group of people and you prove that the active molecule does better than a sugar pill for a small group of people. So, the entire philosophy of what we are going to talk about and I am going to demonstrate this using a series of graphs in various context and we will ask what influences this kind of methodology. So, all of this now is in the context of setting up a hypothesis where the focus is on falsifying it. Now, what is the alternative to this? So, if we did not set up something to prove it wrong then we are actually working towards inductively reasoning further and further looking at more and more evidence ok and then trying to prove that something actually works and this is actually the way most of us in fact this is how for example traditional medicine in India has been factor that is lots of empirical evidence that certain herbs for instance improves your health and we keep taking these and it is more a matter of induction if you take this kind of herb your health will improve in that form and you keep going like this. There is no systematic test where you are comparing it against another control group and all you are looking for is with a series of experiments where you keep doing this from the learning that you get you induce the reasoning that the preparation that you are taking actually improves your health ok. So, now the problem with this is that it takes you a lot of effort and therefore, lot of personal experience before you finally come to this realization that some herbs are good for you and in fact that is what are Ayurveda for instance that is for us. There is lots and lots of historical evidence that some herbs matter, but that is not the kind of insight which came to us very quickly it took us generations before we could finally come to the conclusion that some herbs were good for specific diseases. So, I repeat something I put up yesterday when we are talking about statistics because this is even more critical now in the context of hypothesis test which is that as we get into carrying out a hypothesis test we are not asking whether something that is being tested is scientifically significant or not all we are going to focus on is whether the procedure for proving significance is fair or not ok. So, again I do not know which domain your problem is in I do not know what parameter you want to evaluate and hypothesize about, but all that we are going to do is assume that once you have already set up your problem once you have already identified what needs to be tested how do you now go about testing for the statistical significance of a claim. So, go back to this business that we have a point an interval estimate for anything that we wish to comment about. So, it boils down to saying that a hypothesis has a comment being made about a population parameter in a physical model. So, again it is important to understand the sequence of events in all of this. So, before you even go about doing an experiment and collecting your measurements and looking into your measurements and trying to infer something and this is actually a very important point because most of us are guilty of not doing it in this fashion. So, before you even perform an experiment you must have known what physical model it is that you are going to evaluate and you must have identified what parameter in that model you are going to test. So, of course the parameter in this model is a population parameter and what we mean by that is it cannot change as a function of the experiment that you are seeing it cannot change as a function of sample. So, yesterday when we talked about the heights of people you know we talked about how that alien coming down from Mars was required to figure out the average height of a human being and then relay that back to Mars. So, the alien has to report a population parameter. The alien is not expected to report value of an average height as a function of this or the samples of heights that he or she sees. So, remember that whenever we make a comment about a hypothesis that comment is always going to be about a population parameter in a model but that comment for that comment to be made of course we are going to have to collect measurements and the comment will therefore, ultimately arise by looking or comparing a sample measurements against a claimed value of a population parameter. So, if I go back to acceleration due to gravity if somebody comes up with theory saying that the acceleration due to gravity is 9.8 meters per second square that is a hypothesis that is a physical model describing gravity and in that physical model there is this parameter g and that parameter g through some reasoning is being assigned a value 9.8 meters per second square. Now, my role in all of this is to ask the question will my experiments and my data turn out to be consistent with 9.8 meters per second square. So, it boils down to therefore, asking when our measurements are taken and when we compute an x bar. So, hopefully you can see on the x axis of this plot that what I am actually plotting is not my individual measurements x but I am plotting x bar itself. So, again remember the entire focus on sampling yesterday once we realize that working with the sample gives us a much better way to converge to mu we stop looking at individual measurements they are always going to look at an arithmetic mean given a collection of measurements. But remember that the arithmetic mean itself could change depending on the collection of samples you have inside it. So, the arithmetic mean itself is a random variable and so therefore, what this curve what this distribution in this plot is telling you is that there is a range of values that the arithmetic mean could take depending on which measurements individual measurements you collect and therefore, the arithmetic mean could therefore, vary somewhere around the true mean mu. So, x bar could be anywhere around the true mean mu and it is our job to figure out what interval x bar might line reasonably such that the proposal being made about the population parameter is accepted. Therefore, the hypothesis test about the value of a population parameter will be accepted if x bar lies in some interval around mu ideally of course, x bar is mu but it is very unlikely that our measurements will be so precise that we will get the true value. Therefore, what we ultimately oppose x bar should be near mu within some tolerance interval and it falls down to figuring out what this tolerance interval is and for that matter what this area that I am showing you outside this tolerance interval is and that area if you look at my notation says alpha by 2 on both sides of this interval. So, one of the things going ahead is we will have to figure out what this alpha is what is an interpretation for this alpha and how do you ultimately control the effect of this alpha. So, just looking at this sketch if I move my thresholds out. So, this thresholds at mu minus delta and mu plus delta now start working kind of like a goal post if our measurements fall within the goal post then we end up saying that whatever is being said about the original population parameter is fine we have to accept whatever has been claimed but if our measurements fall outside it then there is reason for us to not believe the statement about the population parameter. So, I will give you one more example before we go ahead. So, go back to that coin toss experiment that we had yesterday. So, we had 100 tosses of a coin we said the coin was a fair coin in which case we should have seen 50 heads. So, mu should be 50 if it is a fair coin but then we said if you actually perform the experiments you are unlikely to see precisely 50 you might see 45 you might see 55 you might even be extremely unlucky to have this unusual occurrence of 20 heads in 100 tosses happen to you and now you can see a issue arising if you were to see 20 heads out of 100 what is your conclusion about that coin. So, if the hypothesis that start of the whole experiment is that your coin is a fair coin and you are now seeing 20 heads in an experiment out of 100 do you accept that the coin is a fair coin or do you instead go over to the conclusion that the coin is a bias point. So, you now have to make a choice between two alternate conflicting hypothesis. So, to decide between these two hypothesis you need to set for yourselves a goal post and this is what we are precisely doing in this sketch we have set ourselves a goal post defined by this parameter delta and it is also defined by this area that I am showing you outside this goal post. So, the value of delta will depend on in turn alpha we are not getting into the algebra of all of this but graphically just appreciate that if I keep alpha small if I make alpha small that basically means I am moving my goal post outwards I am moving it wider and wider apart that basically means delta is becoming larger and larger and that means I am looking at more and more of a probability of finding my measurements of x bar to lie within this interval. So, it comes down to the goal post and these goal post are critical to us in accepting a hypothesis result or from a test or not accepting it. So, the procedure therefore is going to be find yourself a model find yourself a parameter in the model you wish to make a comment about propose for that parameter a value that you think pertains in a population sense. We are still not looking at samples we are still not performing an experiment we are not looking at individual measurements yet. So, before any experiment is done you are doing all of this and once you have got your goal post define and to define a goal post you are either choosing delta or else you are telling us what alpha is once you have done this. Now we will get down to the business of collecting measurements computing from our measurements x bar and then asking does x bar lie within the interval or does it lie outside the interval and if x bar lies within the interval the conclusion is that h naught may be accepted. If it lies outside the interval then it implies that the hypothesis that we are talking about may be incorrect and a preferred interpretation is that the hypothesis is wrong. So, it boils down to a statement about the population parameter and that statement I just called it h naught the formal term for it is the null hypothesis and it is critical to realize that the statement being made has nothing to do with the samples that you are collecting but it is about a population value. This is a common mistake most people make. So, this could be about any parameter in any model. So, if you are looking at a distribution of heights you may be interested in the mean value for the height and in the variance of the height. If you are looking at a linear model for example, between force and mass maybe you are trying to make a comment about whether g equals 9.8 meters per second square or not that is your null hypothesis. So, depending on whatever it is that you are studying as a physical model identify the population parameter in it and then make a comment about what value it ought to take and then start asking the question whether there is a possible way to collect measurements to make a comment about it. So, therefore, to test this hypothesis about a population parameter there are two parts to this test. One is compute a test statistic and what what is the statistic is to recap anything computed from sample data is going to be called a statistic. So, to make a comment about a hypothesis we need that population parameter and we need a statistic which in turn will compute given a collection of sample data and then we also need to know which distribution on which we are going to plot or sketch the statistics. In other words we need to know what is the shape of the distribution we need to know where are the goal posts on this distribution and then we need to know where is our sample like a statistic going to end up falling relative to the goal post are we on the inside or are we on the outside of this interval. So, invariably the doubt is how do you formulate the test hypothesis. So, the null hypothesis H naught is the hypothesis that we wish to test and if you go back to what I said about the scientific method what we want to do with the null hypothesis is not to prove the null hypothesis right, but we want to prove the null hypothesis wrong. So, invariably what we are out to do is to prove the null hypothesis wrong. For example, with a clinical trial with a drug we wanted to prove that the null hypothesis that the active molecule is equal to a sugar pill that was our hypothesis. We want to prove that the null hypothesis is wrong and by proving it wrong indirectly we are saying that the active molecule is better than the sugar pill. With even with our legal system the null hypothesis we wanted to prove wrong we wanted to prove that there is somebody is guilty. We want to prove that somebody is guilty, but to prove them guilty you have got to take the effort of collecting the evidence and showcasing beyond a significant doubt that somebody is guilty and by default if we cannot prove them guilty then they go away as not guilty which brings us to that point I made before. If you set up H naught as X is guilty and you cannot prove it that does not mean that X is innocent all you have ended up showing is X is not guilty and the difference between X is not guilty and X is innocent. That subtle difference is almost like look I have not been able to prove your guilt, but hopefully if more data comes my way if more evidence comes my way I might then be able to change my mind and prove that you are indeed guilty. So for example if the active molecule you worked with turns out not to be too different from a sugar pill then you have not been able to differentiate between the two but technically all I have ended up showing is that the active molecule is reasonably like a sugar pill but that does not still mean that more data cannot come along which allows you to then differentiate the active molecule from the sugar pill. So it is a slightly subtle thing which is that when you set up a statement H naught we want to prove it wrong we want to prove H naught wrong and when we fail to prove H naught wrong we do not go ahead and say H naught is right all we say is H naught was not shown to be wrong and there is a big difference between saying H naught is not shown to be wrong and saying that H naught is right ok. Now the moment you come up with one statement of H naught which is what you ultimately want to test the question comes up what is an alternate statement. So if you are talking about guilt the alternate statement is about innocence that is straight forward but in engineering problems is not always straight forward what is the alternate statement that you can come up with for example if I am making a comment about some model parameter theta ok and I am making the comment that I think according to my calculations that theta theta should take on the value theta naught for example acceleration due to gravity should take on the value 9.8 that statement has many possible alternative for example the most obvious alternative and in fact the direct complementary statement to H naught is that theta is not equal to theta naught theta theta theta naught allows you to take on any value other than 9.8 but you might want to be more restrictive for example and say that the theta is less than theta naught if you have some reason to believe that the true value of the parameter you are testing may be less than what is being claimed then you might want to propose that as an alternate statement and conversely you might want to go on to the other side you may have some belief that theta is greater than theta naught for example if theta is the activity of a drug in some way in your clinical trial and high theta is a good thing as far as the drug is concerned then really what you want theta to be is to be greater than theta naught where theta naught is the value for your sugar pill and indirectly you are saying if H 1 is right then your active molecule is better than a sugar pill because theta is greater than theta naught for the sugar pill ok. So that might be the form of the alternate statement that you are trying to prove so again notice the subtle thing the direct statement that the pharma company could have taken up to prove is H 1 ok where you are saying theta is greater than theta naught active molecule better than sugar pill but notice that they are not doing that instead they are taking on the question my molecule equals to sugar pill and that is H naught and then you are trying to disprove H naught rather than trying to prove H 1. So that form of this alternate hypothesis is when you say theta takes on a unique value theta 1 which is not theta naught so for example somebody says I do not believe that the acceleration due to gravity is 9.8 but I believe it is exactly 9.6 so now you are setting up a fight between 9.8 and 9.6 in terms of the two statements that you have ok. Notice that H naught has an equality in a statement and that is a very important feature in the design of hypothesis. So why do we need an equality so when I say equality I can also put greater than equal to or less than equal to but as a least I need an equality in an H naught statement. So why do I need an equality so it turns out that the moment you say something takes on precisely a certain value we now have ourselves a distribution we can work with. For example the moment that hypothesis about heights is that the average human height is 3 feet I now myself have for myself a normal distribution of heights centered around mu naught which is 3 feet. So what is the problem with not having an equality if you tell me that the average height is not equal to 3 feet and you set that up as a null hypothesis then I have nowhere to place my curve because remember this curve has to be centered around a mean and what you have just said is you do not know what the mean value is and in fact all you are telling me that you do not agree with the mean value I propose of 3 feet. So therefore this curve is sliding to the left or to the right and does not find itself on a central value to sit around and the net result is if you do not know where the curve is sitting around around which central value you do not know where the goal post are for your test where the goal post is this interval within which you will accept a null hypothesis. So it is critical for us to set up a pair of goal post which do not shift and the only way you will set up a pair of goal post which do not shift is by saying that at the center of this goal post is mu naught and only way that will happen is if the null hypothesis that you are setting up is mu equals mu naught. So in other words you are now forced into a position where the statement of your test is mu equals mu naught gravity equals 9.8 meters per second square and in a very narrow sense all you can now test is the value 9.8 meters per second square reasonable or not. So your hypothesis test now is focused around that unique narrow claim. So what your hypothesis test does not help you do is to simultaneously evaluate a statement about whether gravity could instead of 9.8 could it be 9.7 instead because the moment you talk of an alternate 9.7 value now that is a central value in its own way and so your curve has to shift in this case it will shift to the left to be centered around 9.7. So just to recap hypothesis as implemented by a null hypothesis statement has always got to be an equality and the reason it has got to be an equality is because you got to be able to anchor a distribution around this average value that you are proposing for your parameter and that in turn allows you to fix the goal post and the goal post in turn will allow you to decide on whether your measurements are reasonable with the claim or unreasonable depending on whether your measurements fall inside the goal post or outside. So before we kind of go ahead you can see that the major task in hypothesis test involves now figuring out where the goal post are in any test and we talked briefly yesterday about how each variable has its own units. So if you are talking about a distribution of heights and we are making a claim about what the average height should be of a person then there is a goal post possibly using the units of feet or inches. Alternatively if you are talking about weights as a variable you have a different set of goal post and it becomes problematic trying to ask what is the probability within the goal post under the curve under the probability distribution. So we said that there is a convenient way to deal with having to constantly compute probabilities under the curve and we said that that was the standard normal distribution and how did we get the standard normal distribution you take for yourself a variable in this case if measurements individual measurements follow a normal distribution n with mean mu and variance sigma square and by the way if x was our individual measurement we remember that x bar was also a random variable but x bar now is following a normal distribution also with mean mu but with variance sigma square over n. So we have got two types of distribution to look at and again remember that we only start looking at x bar from now on we forget about individual measurements but anyway in either case how do you take this and make it free of units so what you need to do is to transform and normalize or standardize. So how do you do this take a variable subtract from it its mean value as you propose it in other words mu and divide it by the standard deviation sigma and the net result is this is now a transform variable which follows an n 0 1 distribution as it is called in other words the mean of z will turn out to be 0 because you have just shifted all your values relative to the mean anyway and the variance is already also ended up being changed variance now turns out to be 1 because you have essentially divided by sigma and you have rescaled all your data. So this turns out to be convenient because all you need to now do is if you transform your data before you get into a test you can go look up probabilities under the standard normal distribution which you will find at the in an appendix in practically any statistics textbook including the ones I listed out yesterday. So since finding these intervals on the standard normal are important these goal post will obviously end up determining the outcome of a test and you need to go to the probability tables for this just introducing a new piece of notation to you previously I define my intervals using that parameter delta we said a goal post are going to be a distance delta away from the mean on either side but now when I standardize things my mean becomes 0 and the goal post are better defined in terms of the area outside the goal post and since I already gave myself a notation for the area outside the goal post as alpha that notation now for the goal post is in terms of the variable z and the subscript alpha by 2 reflects how much area exist to the right of that particular goal post. So this alpha becomes more and more important to it so what is alpha so let us start with the kind of formal definition and I leave it for those of you who have done some basic statistics courses so the formal definition is that we are interested in what the error could be in our estimate of the mean so we are trying to find mu we have x bar as our point estimate of mu we know x bar is variable so we could be getting a measurement of mu wrong so x bar could vary from day to day on us so therefore we need to figure out what could the error in our estimate be and we have to talk in the language of probability so therefore we end up if you do the algebra realizing that the error in our estimate depends on our choice of the goal post therefore alpha it depends on the variation we are seeing with the data that is sigma and it depends on the number of samples that we have that is n and this maximum error is less than that term I show you there in the box with probability 1-alpha so alpha is the probability 1-alpha is also a probability and the interpretation of 1-alpha is the probability is more critical to us rather than the mathematical definition itself in this formula so let us jump to that straight away so what is alpha so let us step back for a second and realize that actually when we are testing H0 versus H1 that there are 4 possible outcomes this is actually very easy to see with a diagnostic so again I am going to use an example from biology so think of a diagnostic it let us say there is a kit to test whether somebody else tuberculosis so what could that kit end up doing for you that kit could end up saying that you have TB when the reality is that you truly have TB but then once in a while the kit could get things wrong it could say that you do not have TB when you have TB and conversely the kit could say that when you do not have TB it could wrongly accuse you of having TB and finally there is also the scenario that you do not have TB and the kit also agrees that you do not have TB so there are 4 possible outcomes to a test depending on which way you look at it in terms of outcomes for an H0 and the truth behind what is going on so if I quickly list all of these we are accepting H0 as true when in fact it is true we are accepting H0 as true when it is untrue which of course means H1 is true we are rejecting H0 as true when in fact H0 is true and we reject H0 as true when H0 is untrue okay H1 is true so out of these which ones do you really want to hope that our diagnostic kit correctly identifies and solve okay what you want is obviously to correctly identify a person whether they have the disease or not have the disease what you in other words want to minimize are the mistakes in identifying or classifying a patient okay so it turns out alpha therefore as an interpretation in the context of misclassification so let me show you this in terms of a truth table so on the top we are listing the truth of it on the left we are talking about the decision that could be taken in this case the decision or the result of that diagnostic kit so our 4 outcomes I am writing in a matrix for tabular form so if you look at the interpretation of this matrix okay in particular if you look at what happens to us with the lower left term H0 is true but H1 is accepted that is the pink area in the plot that is the pink term in the matrix so what is the interpretation of this so it is possible that the hypothesis test the hypothesis in our test is true but we decide we think that we should instead go with the other conclusion so let me immediately go back to before we go ahead let me go back to the coin toss example remember we tossed a coin 100 times okay so the null hypothesis H0 should be the coin is a fair coin in which case okay what we are testing is that the number of heads we get is 50 so H0 according to H0 if the coin is fair I should be seeing 50 heads okay now we do the experiment and we find that we get 15 out of 100 tosses as heads 15 so what should I then do so should I call 15 and a very unlucky example of an event where it is actually a fair coin being tossed in which case I should be accepting H0 okay in which case I should be plotting my 15 relative to this curve because remember this curve is being plotted the moment we say H0 is true if H0 is true the mean value is mu0 which is why my curve is centered around mu0 okay so mu0 is 50 and relative to 50 I see 15 which means I am at the extreme left of this curve and if I am at the extreme left of this curve okay the question now comes up should I okay accept H0 despite it being an extreme result or should I go with the conclusion that H0 is wrong and assume that the coin is therefore biased and not okay fair so you can see that once in a while as we perform our experiment we run the risk of a phenomenon actually being okay but because the measurements we have seen are so extreme 15 out of 100 tosses okay our measurements are so extreme and it becomes more compelling for us to reject the null hypothesis and go with some other hypothesis so in other words this is an error of a certain type we are committing an error because the truth okay was that H0 was correct but because we happen to by sheer bad luck see an extreme measurement okay we now start doubting the truth of H0 and find it easier to go with H1 but then technically what we have committed is an error okay so some fraction of the time therefore we are committing an error about H0 and therefore that then turns out to be the interpretation of an alpha so alpha as it is known in the statistics literature therefore reflects the significance level of the test that we are performing okay it is called a type 1 error H0 is true but H1 is accepted so of course what we want to do for example if you want a good diagnostic kit is we do not want to be committing these mistakes about whether somebody has a disease or not we do not wish for mistakes therefore it boils down to saying keep alpha as small as you can because alpha now reflects the likelihood of making a mistake okay in the outcome of a test so again remember that we are trying to disprove an equality and that the equality statement is now carrying the entire burden of proof with regards to the hypothesis test okay again going back to the legal system think of this the legal system prefers that we try and disprove H0 that X is innocent so okay notice that we are not setting it up as X is guilty we are not trying to prove H0 X is guilty instead we are setting up X is guilty as H1 which in turn means H0 is X is innocent and what we are trying to therefore now go about doing is trying to disprove innocence okay so if I now connect what goes on with the legal system with what you have just said about falsifying a hypothesis okay the accused is innocent and that is our H0 until proven guilty and that is our H1 and this must happen beyond reasonable doubt which is alpha okay so alpha is the possibility of error that we wrongly accused somebody okay despite them being truly innocent so the moment you do this question comes up what is a good value for alpha so alpha intuitively should be small if you want to minimize errors but what is a good value for alpha so convention typically in most domain is to set alpha to 0.05 which of course means 95% of your measurements will hopefully fall within the goal post okay so but it varies from domain to domain and in some domains people go with 0.01 and particularly in manufacturing and for that matter even in some areas of biology it is set to 0.001 in other words you want really really small chance of committing an error so if you decrease alpha you are reducing your chances of committing that type of error where H0 was inherently true but you prefer to go with H1 now if there is an alpha you must also be a beta and so beta is that other type of error where H1 was true not H0 but H1 is true and instead you decide to go over with H0 okay now rather than try and show it to you in terms of equations there is an easier way to see this okay we will assume that the null hypothesis is mu equals mu0 as before but this time I will assume that H1 the other hypothesis is mu equals mu1 and the reason I am assuming mu equals mu1 is that the moment I say H1 is mu equals mu1 it allows me to sketch an alternate distribution this time centered around mu1 and for the ease of understanding I am also going to assume that mu1 is greater than mu0 but nevertheless what I say is perfectly true even if mu1 is less than mu0 okay. So if I say that I have a null hypothesis H0 okay that immediately allows me to sketch a distribution according to the null hypothesis H0 centered around the value mu0 because that is what the whole null hypothesis is about we are making a statement mu equals mu0 so at this point a parameter has been identified to have a particular value I have identified a distribution that could explain the observations that I see according to that particular parametric value on this distribution I have put down a pair of goal posts and those goal posts are defined by alpha okay and the interpretation so far is if I get measurements within the goal post then they are consistent with the claim H0 at the very least they do not disagree with the claim H0 okay it is a subtle thing remember that somebody is not truly being proven innocent what is being shown is that not enough evidence has been demonstrated of guilt okay so technically if my measurements lie within this goal post though I am writing in the slide H0 is okay another way of wording it is okay that we have not been able to prove H0 wrong okay and in fact that is a slightly superior way of wording things because after all the way we are setting up for example that drug trial is that we really wanted H1 to win and a molecule to turn out to be a good molecule we really did not want to focus on H0 being false but we end up taking on H0 being false because that is the easier experiment to do so if I now plot on top of this H1 so H1 is now a statement about Mu1 not Mu0 okay so again compare the two so I will go back and forth and you will see what I am talking about initially we make a statement about Mu0 that is H0 then in my alternate statement I am making a comment about Mu1 and I said that Mu1 is greater than Mu0 that means I can if Mu if H1 is right my observation should be following a different distribution centered around Mu1 but remember my goal post were defined not by looking at H1 my goal post were defined by looking at H0 and in other words my goal post were defined based on the choice of alpha so my goal post already exists and now on my plot given the goal post from H0 if I superimpose a plot for H1 I quickly realize that H1 is accepted if I follow the H1 curve and currently based on the location of my goal post I will only go with H1 if my observations fall outside the goal post because after all if my observations fall within the goal post I have already told you that I go with H0 okay so therefore the area that shaded blue is a situation where H1 could have been right okay but my measurements are falling within the goal post and because I have already decided that within the goal post I prefer to go with H0 I choose not to go with H1 despite the fact that H1 is right okay so the net result is I am committing an error I am committing an error where H1 is right but I choose to go with H0 within the interval that I am showing you so therefore we have actually two types of errors to worry about and what is a bit of a problem is because I made up my mind to control the value of alpha first my goal post kind of got defined on me okay and because the goal post are defined I now might see a very large value for beta and that is what I am actually showing you in this cartoon a large value for beta it is kind of like saying the diagnostic kit for tuberculosis correctly identifies people with the disease let us say 99 percent of the time so 99 percent of the people who actually have TB are correctly identified as having TB it is only 1 percent that alpha 1 percent who are not diagnosed as carrying the disease but there was another type of error there which is you have somebody who does not have the disease they should have been diagnosed as not having the disease but now the kit is making so many mistakes it is unfortunately and unnecessarily calling many people as tuberculosis positive okay so we got two types of errors and in trying to keep one error small there is always exists a possibility that you are creating a large error of another kind.