 In this lecture I want to talk to you about the very important topic of hypothesis. First of all let's just start by importing what we need, so from ipython.core.display So it's the sub modules of the ipython module I'm going to import html, that is what I require here for my css file But I'm also going to import the image command there Just so that I can display images to the webpages So hypothesis, how do you do good research? Best research comes from I think having a burning question You might be dealing with some tests in the laboratory You might be dealing with patients and you start, something starts nagging at you You wonder why does this turn out to be this way You might be treating hypertensive patients and there's some question you have Because there seems to be a difference in something, some variable Between those that smoke and don't smoke You might be dealing with some intervention that you do and you wonder Is this really successful? And for that you would do a randomized trial So you have to have this burning question, something nags at you and you want the answer That forms a good basis for research Now where do we go from there? The first thing that you're going to do is set two hypotheses Once you've got this burning question and you know what variables and what groups You want to compare to each other to solve your burning question You've got to set your hypotheses It might seem such a trivial thing but it is absolutely important I'll show you at the end that based on how you set your hypotheses You can get different p-values and you might make a mistake Which might be an innocent mistake but it might be more serious than that So what are these hypotheses? Now there's a lot of sentences there, lots of boring words Let me tell you what they are You set two hypotheses One is going to be called your null hypothesis And your second one is going to be your test or alternate hypothesis So I reiterate you're going to have this burning question You're going to decide what variables you want to collect Then you're going to set your hypotheses Then you're going to do your data collection Then you're going to do your statistical analysis That is the order of the steps So the null hypothesis is you're not moving from the centre In other words you're going to say I'm not going to find any difference between the groups Statement of no departure I'm not going to find any difference Your alternate hypothesis or your test hypothesis is going to say I am going to find a difference That sounds very trivial And it is but it is so important Now that test hypothesis there is a difference There's actually slightly more to that Let's use an example Say for instance for some test or some disease There's a test and it has a mean value of 18 And you take 30 samples at random You do analysis and you get a mean of 21 Can you now state that that 18 was a lie? It was incorrect Well you've only got a few samples here You didn't do millions of people with that disease And did the test and found 18 You've only got that sample Can you say that that 18 was a false statement Where you're going to state your null hypothesis You're going to say I'm not going to find any difference 18 is what the answer is out there And 18 is what I'm going to get My null hypothesis, that's my null hypothesis My test alternative hypothesis is going to say I'm not going to find 18 Not finding 18 can come in three varieties Either going to be not 18 at all So either higher or lower Or it's going to be my test hypothesis Says it is going to be more than 18 Or it can say it is going to be less than 18 The first one where you just state It's going to be different That is called a two-tailed test And where your test alternative hypothesis says I'm either going to find a value higher Or I'm going to find a value lower Those are one-tailed tests Very important As I say it works out The calculations work out a different p-value Whether you do a one-tailed test Or a two-tailed test So what does the computer do behind the scenes Before I move on to that Remember it's very important That your decision between one-tailed and two-tailed Logically follows from the problem That you are dealing with The research question that you are trying to ask Can't be unscrupulous about this It will be clear that you chose The wrong form of test alternative hypothesis Just to get a different p-value Which might bring you into statistical significance Whereas if you chose the two-tailed it wouldn't It's easier to get a lower value With a one-tailed test than a two-tailed You cannot do that before your data collection Or your statistical analysis You make that based on the scenario That you are dealing with With that you set the type of test hypothesis Then you do your analysis And you do not change after you've done your analysis That's that to just be cheating End of story So what does the computer do behind the scenes Remember the central limit theorem It said that if you could repeatedly do this Countless times, take a sample Either get a mean value Or the difference between the means of two groups You start counting And those millions that you do Over and over again you start counting How many of each value you get Some differences or some means Or whatever the case is Are going to occur more commonly than others And you just start counting How many of each of those occur And you start stacking them up Obviously some will occur much more commonly They will fall theirs And some will occur less and less And less and less, commonly And if yours is one of the less common ones We're going to go for statistically significant results How does the computer do that? It can through the t-distribution Now remember the t-distribution If we don't know what the true population standard deviation is We can only infer it from our standard deviation That we get in our trial or study It's going to draw this curve for us And now you've got to tell it What you want your level of significance to be What area under the curve would be Remember the central limit theorem The area under this curve is one So constructed by design is a significant And usually we choose 0.05 That's called the alpha value So for a two-tailed test It's going to divide that into two equal bits 0.025, 2.5% And draw a line where that area would be This is not to scale Would be 0.025, 0.025 And have a cut-off value on the x-axis there Or there It will then take your difference in means Or your mean converted to a value A t-statistic that falls somewhere on the x-axis It will then draw the lines and color in If the area of your coloring in Are more than 0.05 You're going to say It is not statistically significant If it is We would say then The p-value is less than 0.05 There's a statistical significant difference here Or I've found a statistically significant value It's actually wrong to say You must actually say You have found a value as extreme as this was Less than 0.05 But in clinical research We usually say statistically significant There's something else about this alpha value Is also it is It actually signifies the risk That you are willing to take To falsely reject the null hypothesis And that is what is known as a type two error It means that There actually wasn't a difference In truth If you could investigate all 6 billion people It wasn't really But you falsely rejected that null hypothesis That there was no difference You take a 5% risk That's the alpha value From there it will do this Area under the curve That's the type one error Now as far as that goes Remember You can either then Reject or not reject The null hypothesis Statistics doesn't work In a way that you can ever prove The null hypothesis You cannot accept the null hypothesis You can either reject it Thereby accepting the alternate hypothesis Or you do not reject the null hypothesis So p value more than 0.05 You do not reject the null hypothesis You can never accept it You can never prove it P of less than 0.05 You reject the null hypothesis Thereby accepting the alternate hypothesis Now just for clarity's sake Here's the other little figure This would be one tail In my test hypothesis Alternate hypothesis stated that I think the value would have been more than Then you lump all your 5% Here's an example where We found a value out here Computer will color that in It'll say this green area under the curve Is larger than 0.05 Again this is not to scale Therefore we do not reject For this one tail test The null hypothesis Therefore no statistically Significantly significant difference Or a statistically significant value Was found So in essence that is Hypothesis setting After you develop your research question And decide what variables You want to collect to answer that question You set your two hypotheses Based on the scenario that you are dealing with Then you start collecting the data Then you do the data analysis Excellent