 So, how do we get to decide if a p-value is significant? Why do we change or take this research question and change it into something that we can actually test? That's called hypothesis testing. What we do in health care statistics is based on hypothesis testing. We're going to have an hypothesis and then we're going to test it. And as good researchers, we always have a null hypothesis. I'm going to talk to you all about that. The null hypothesis is there is no difference. I bring a new drug to market and I'm comparing it to placebo or some previous drug or some intervention compared to previous intervention. My null hypothesis is that there will be no difference between my two groups or three groups or whatever the situation might be. That's always my null hypothesis. Then I'm going to set an alternative hypothesis. So in case I go out and collect evidence and do my tests and it shows that I can reject my null hypothesis so that I then accept my alternative hypothesis and that alternative hypothesis is going to be well, there is a difference between these two groups or three groups or whatever the case might be. So I have this null hypothesis and until I have evidence otherwise I cannot reject it. I can never prove the null hypothesis. I can just fail to reject the null hypothesis. So that's all a bit strange. Let's go ahead and use the notebook and I'll explain everything to you. So we've had a look at distributions, specifically sampling distributions and while that might have been a bit theoretical let's put it to the test here and I'm pretty sure that at the end of this things will be pretty clear. So we've got this idea of hypothesis testing. We have a research question and somehow we've got to convert it to some format that we can actually test and analyse. We can use statistics, use the mathematics behind it by writing a few lines of code and get these results and that's what this section is all about. So I've imported a couple of libraries there from the SciPy library. We've imported the stats module and we've imported a whole of numerical Python as NP and then as per usual we've got our plotly graph objects and plotly IO and I've just expressed the default there as plotly white so that we have nice white plots. So this is all about inferential statistics, think about it. If you read a journal article someone else took some patients or participants at random hopefully not to introduce bias but they were taken from a certain population and if you want to use the results that they found in your setting you sit with a different population and your samples are going to be the people who come to you either for treatment or whatever the case might be but you want to infer those original results onto your population and those two populations better look the same. If the population from which those participants were taken does not look like the participants that you are seeing then you can't infer those results and that is what we always have to be very careful about when we read journal articles or when we do our own research. So inferential statistics assist that. We're going to take some random individuals from a population and it's not necessarily that we talk about humans here we might be talking just about experimental all sorts of experiments and we just want to infer those results onto a larger population. Remember it's also just a small sample we can almost never investigate the whole population specifically if you talk about human beings there's 7 billion plus so that will be impossible both in time and money but we just want to be able to infer these results to a larger population. So imagine then that we have we're just looking at two sets of participants they take a placebo drug or a new drug or a new drug and an established drug so we've got them in these two groups and we looked at their cholesterol before they started taking these and we looked at their cholesterol afterwards and we looked at the difference so we are hoping for that this new drug will lower their cholesterol more than the old drug so we're just going to compare these before and after so our variable is the difference between before and after and then we're comparing that variable between two groups one group taking the new drug and one the old drug and we just want to know if there is a difference so how do we do that? Well we set an hypothesis we set an hypothesis and our hypothesis works this way we always have two there's a null hypothesis and an alternative hypothesis the null hypothesis is the one that we just live with all the time unless we go out and do an experiment and find a different result we are stuck with a null hypothesis and that's where the fanciful claims come from all over the world people just have these magical fantastical claims and what most non-scientific people don't understand is that the onus is on you to prove something and until the proof comes we are just stuck with a null hypothesis so if we have this new fantastic drug we have to prove that it is better or at least different than the old drug we can't just claim it we can't just say that it is because we want it to be or because we think it is until such time as we set up an experiment and analyse the results we are stuck with a null hypothesis that there is no difference between the old drug and the new drug or between some placebo intervention or the new intervention that's always the null hypothesis there is no difference we then have an alternative hypothesis which would state that there is a difference now be very careful here I just state that there is a difference it might be that it is better than or less than again it's the same thing as saying if I say 3-2 that's positive one but 2-3 is negative one so it just depends which one I subtract from the other and that's really called a two-tailed alternative hypothesis in other words I'm not stating which one will be more than the other or which one I'm subtracting from which one in essence I'm just stating that there is going to be this difference and that's my alternative hypothesis and I've got to do this experiment get the data analyse the data before I can what we say reject the null hypothesis and then accept the alternative hypothesis and if the evidence is not there then I fail to reject the null hypothesis I haven't been able to show that I can reject the null hypothesis and therefore accept the alternative hypothesis and that's just really how it works and it's all based at least when we talk about parametric test about that nice smooth bell shaped curve and being a curve of sampling distributions in this case we're talking about means so if I subtract the two means that difference in the mean depending on which one is larger than the other one I might be on the left of the mean in the middle the zero or I might be on the right and it just depends how far away I am from that mean and the further away get the less likely I was to find this kind of difference and then if it's beyond a certain point that we call a critical point then that little area under the curve is going to be less than 0.05 and we have evidence to reject the null hypothesis and accept our alternative hypothesis we reject our null hypothesis and then accept alternative hypothesis so let's just view this in terms of a research question and as I've written here at the top an example always helps so let's have it here we imagine a study we're investigating a new intervention we create two groups in the one the participants receive a placebo intervention and in the other a new intervention in each group we measure a certain variable for each individual and our research question is and listen carefully to this because that allows us the way that we set this up is what allows us to do inferential statistics or at least hypothesis testing is there a difference in the variable between the placebo and the intervention groups that's my research question so many times we have these nebulous thoughts about our research I wonder if what the outcome of this is or what the difference is we've got to be able to break it down to something that you can state hypothesis for and so you've got to be very clear about this and this research question is very clear is there a difference in the variable between the placebo and intervention groups now our null hypothesis is that there is not going to be a difference and in this explanation of hypothesis testing and the example that we're going to run through we're just going to do the test statistic that we're going to be interested in is the mean many others that we can test so is there a difference in the mean for that variable between these two groups and we usually write the difference as h sub zero as you can see there so that is our null hypothesis there's no difference in the data for the variable between the two groups and in this instance I mean the mean between the two groups and that's what we write there when we talk about mean we use this x for the placeholder for the name of this variable of us and we put a bar on the top a subscript of one there means for the group one and then for group two so I'm going to make got to decide this beforehand that placebo is my group one and the new intervention is my group two so I have these two means so that I can write the null hypothesis as we've written it here so h sub zero my null hypothesis is that the mean for group one equals the mean for group two the means are equal that means my alternative hypothesis is equal to each other and now you might be asking well what is the difference if the difference was 1.1 or 1.2 how do I decide when this is enough or not well it's all based on that sampling distribution curve that makes it very easy for us first of all we need to decide on a level called an alpha value below which we will accept that this was a very rare finding that we got and it's usually 0.05 representing 5% of the area under that nice symmetric curve and we usually split it up specifically here which is an example of a two-tailed hypothesis test a two-tailed alternative which means I'm splitting my eggs into two half goes on the left side of the curve and the other half goes on the right side so 2.5% on the one side 2.5% on the other side 0.025 and 0.025 I split it up in two and together that makes 5% if my difference falls some way on those outsides that would be a very rare finding indeed and then we could reject on our hypothesis and we have enough evidence to reject that and accept alternative hypothesis so let's put this in action so that there's no ambiguity so that you walk away from this fully understanding what's going on here so I'm going to create my two groups I'm going to create 100 participants in each group and you see I've named them there placebo and intervention and I'm going to use this rvs function from stats the stats module in the sci-pi library and we're going to take we're going to just simulate some values from a normal distribution so in the intervention group so we say stats.norm.rvs rvs is going to just select a random from that lock equals 50 that means location which is a term for the mean so mean of 50 standard deviation of 5 a size of 100 and I'm setting the random state to 3 if you set the random state to 3 you're going to get exactly the same random pseudo random values as I am if I set the random state there there isn't much random about this but I want you to watch all to get the same results but put in something different so we have this for the intervention group we're going to draw 100 values at random and it just simulates these 100 participants that are getting our new intervention for this specific variable they'll have a mean of 50 and a standard deviation of 5 and we see for the placebo group we have 48 mean and a 7 standard deviation and there we go and I just wanted to show you this little sneak peek here this is how we're actually going to do it we're going to say stats.ttest underscore ind intervention comma placebo let's just see that we run this that we actually get these values I just do that statistics and a p-value that in the end of all of this is all you're going to do you're going to write a single line of code with a couple of characters in it look how short that line of code is stats.ttest underscore ind that means independent we'll talk about that intervention comma placebo that's the two sets of values and I get a p-value as simple as that it's just phenomenal language it's just python this makes data analysis so easy but we're not interested in that yet we want to understand how python got to this value so let's have a look at it first of all I'm just going to print to the screen here the two means so we can just get some understanding of what is going on here so I'm going to say intervention dot mean so I use the dot mean method for my numpy array there and I've just put it into a print statement if you're interested in how print statement works so anything I want to print just goes inside of quotation marks single quotes here I've used so that becomes a string comma that means I'm going to concatenate these things together and this backslash t as a string means a tab so that there's nice alignment comma so more concatenation the intervention dot mean comma this is a new line so it's going to escape to a new line mean for placebo group comma another tab comma the actual mean if we print that all out it looks very neat there we go the mean for the intervention group is 49 and the mean for the placebo group is 47 now is that statistically significant take a guess now let's also just have a look at the standard deviations now remember we are specifically to draw 100 from this theoretical distribution and this is what we get so let's look at a box and whisk a plot of this you know how to use plotly now and there's our box and whisk a plot so here's our intervention group here is our placebo group and I wonder do you think that's a statistically significant difference if you were just to look at that and after you've done many many of these you can almost eyeball what is going to happen here but there we go we visualised it we've got the summary statistic we visualised it so as I've written they take a guess do you think it is statistically significant is there a statistically significant difference that's such a horrible term but it's just stuck in the way that we do talk but I wish we could get away from it anyway is there a difference so I've got these two means 49.5 and 47.2 now remember I said the difference is it's going to be depending on which one we subtract from which one if we say intervention dot mean minus placebo dot mean we get 2.217 and if we do it the other way around of course we get negative 2 so it just depends on how we set this up but remember we are going to go for the second one because we made placebo group number one and we made intervention group number two but you can imagine that we could have done it the other way around and of course we get round zero this symmetry in the difference because we don't know in our little example here what the whole 7 billion people what the standard deviation is for our variable in that we're going to make use of the t-distribution and the t-distribution only cares about our sample size so we've got 200 people in our study here and from that we're going to subtract the number of groups and the number of groups is 2 200 minus 2 is 198 so we're going to have a t-distribution for 198 degrees of freedom and I'm just going to use what we did last time but this time I'm going to go from negative 3 to 3 and then draw the probability density function for you here for t-distribution with 198 degrees of freedom there you see now our difference is going to fall somewhere on this this is how we simulate with this probability density function all the possible outcomes and we are suggesting that in reality now I'm going to whisper this in your ear but this is the way the mathematics of this works if in reality there was no difference so some deity knows that there really isn't a difference we could draw this graph given that there's no real difference we do our study and we find a difference between the two means of somewhere on this curve and if we find one of the rare ones we are going to based on the fact that there really is no difference that's what this mathematics is it's the only way we can set this up in statistics if we find one of the rare differences we're going to say we reject our null hypothesis except alternative hypothesis and we think we have now proven that there is a statistically significant difference well not really but that's what we have and that's what we do exactly what we do now we've just got to convert our difference and we stuck with mean placebo mean minus intervention mean that was our group 1 minus group 2 we always do that of negative 2.21 we've got to convert that to a t statistic so that we can plug it in here for z distribution we would have said how many standard deviations is that difference away from the mean but this is the t statistic and here's your little equation for doing this so we divide the difference this delta x is group 1 minus group 2 the mean and then we have the variance that's the square we see here that's the square of the standard deviation remember it's the variance so we divide the variance of group 1 divided by how many there are we take the square root of that sum and that becomes the denominator and so I've done it there and I'm just going to call it t underscore stat so that little equation I've done there and you see I used dot standard deviation star star 2 remember that means squared or I could have just said dot var and that would have been the variance and there's our t statistic we have converted the difference between the means into a t statistic we'll now go somewhere on this curve and let's just do that so I'm just adding another trace to my curve that already exists I've already created this figure t underscore dist underscore fig and I'm just adding another trace so it'll keep this one that we created here and it's just going to add something to it I've updated the layout by changing the title and I'm just printing it to the screen again so we need not repeat all of that code and there we go there is our difference represented as this red line there is that's our difference between the two means expressed as a t statistic now remember what we said is that we have to reflect this on the other side as well because number one most importantly we have a two-tailed alternative hypothesis we didn't care which one was lower or higher than the other one so we really have to do both sides and so let's do that and I've just added in green on the other side so if I did the subtraction the other way round but more importantly because we have a two-tailed hypothesis I reflect this on the other side and what we are looking for for the p-value is from this red line towards the negative infinity that little area under the blue curve on that side and on the right hand side area under the curve from this line our difference towards positive infinity that whole area under the curve from negative infinity to positive infinity works out to be one beautifully so and we just want to work out these two areas under the curve and that's as simple as taking our t statistic here and plugging it into the cumulative distribution function CDF there and I'm multiplying it by two because I want both areas under the curve and there's my p-value 0.02 so if we rounded off exactly what we found before when we used the simple line of code but now you know what's happening behind the scene just to let you know the critical t-values and I get that remember this ppf function and I'm going to set for 0.025 498 degrees of freedom so just want to show you these two critical values and I've got them in purple there and in orange so that would be the cut off towards the left of it would be 2.5% of the area under the curve to the right would be 2.5% of the area under the curve and between the two this rest would be 98% or 95% of the area under the curve so you can see that our differences fell within these rejection what we call rejection zones so we can reject our null hypothesis because if from the purple towards left is 2.5% certainly from the red towards left is less than that and the same reflected on the other side is how you can see what the critical values would have been for us to get use an alpha value of 0.025 and that's really as simple as that that's a two-tailed alternative hypothesis we've got this beautiful research question we collected the data we know we set an alpha value of 0.05 as our cut off with this purple and orange line I can show you where that falls on the t-distribution with 198 degrees of freedom and we've also seen where our actual differences fell and we can now say well given the mathematics which said there really was a difference we found a very rare difference and therefore we reject our null hypothesis accept alternative hypothesis and we can conclude that there is a statistically significant difference between those two means isn't that a thing of beauty so easy it just rolls off the tongue it's so easy and look at this it really is that easy is this this geometric area under the curve of course the CDF function is just going to calculate the value using the cumulative distribution function remember that now I'm always reticent to add these when I do lecture on this and that is the one-tailed hypothesis these are very dangerous things but you've got to know about them so we might have two other scenarios where we don't have a two-tailed alternative hypothesis where we just say that well there is a difference between the two but we don't really care which one is high or lower usually we really do care and specifically if we have a new intervention we really wanted to perform better don't we of course we do but as researchers we've got to accept the null hypothesis first but there might be instances where we can convince all our colleagues that we really did suspect a difference either higher or lower between the two groups but those really have got to be well stated arguments and everyone's got to accept those arguments so let's look at the first one I'm just going to print those two means again so we just know where we stand the intervention group and the placebo group I should have put them the other way around because placebo we're going to make our first group and intervention is going to be our group too and as researchers imagine we are secretly hoping a very bad thing to say because we're not allowed to do that but for the sake of explaining what's going on here we very secretly hope that the mean for the placebo is going to be lower than the mean of the treatment so you see that reflected here in equation number two at the bottom here that's our alternative hypothesis what we secretly hoping for of the placebo must be less than the mean but that forget about that, that means we are stuck with a null hypothesis which says that the mean of the placebo is larger than or equal to the mean of the intervention group and that is what we accept and our alternative or test hypothesis then is going to be that the mean of of the placebo group is less than the mean of the intervention group and what we do now is instead of splitting our eggs into both sides we put that for an alpha value of 0.05 we put that all of that 0.05 on the left hand side that's what this less than means there so you're going to put that all on the left hand side so I'm working out this critical value so that we have 0.05 of the area under the curve and not 0.025 on the one side and 0.975 on the other side and just putting all my eggs in one basket here on the one side so let's plot that and you can look through the code there but this is what it means if we look at the green now it is much closer to the middle than it was before because we don't have to split the 5% on both sides we can put all our eggs in this basket on the one side so this area towards the left is now our zone of rejection and we see our difference as well within this area of rejection now and in actual fact we don't multiply the area under the curve here by 2 at all anymore so now we only have a single we only have that p-value we don't have to multiply this area by 2 we don't have to reflect it on the other side which means we're actually going to have exactly half of what our previous p-value was and there's the danger and it happens it really does happen someone finds a p-value of 0.07 for their predetermined two-tailed alternative hypothesis and they so desperately want that p-value of less than 0.05 and if you then change the scenario after the fact to this one-tailed hypothesis alternative hypothesis suddenly you only have half of the p-value and suddenly from 0.07 you go to 0.035 and lo and behold you have a statistically significant difference but that's very dangerous so we set those hypothesis either one-tailed or two-tailed way before the study starts that's in the design phase of the study and if we then do choose a one-tailed hypothesis we must have very solid arguments for that and we hope that the reviews of the journal and the audience of that journal accept our one-tailed alternative hypothesis it really has to stand on firm ground before we can use this much safer to use the two-tailed and in fact we multiply the area under the curve by two, here we don't but we can still see the effect here of putting all our eggs in the one basket because this green line is much closer to zero than if we only did 2.5% on this side and 2.5% on this side by the way I speak of 0.975 because if I look from the left-hand side it will be 97.5% of the area under the curve until I get to this side but anyway and there we go we only do we only have to do our p-value once here and now it's much smaller half of what it was before because we only have to work with this one side and then we could also have obviously the opposite side imagine that we really hope for the and again I say this with great caution as researchers we can't hope for something we have to be blunt about this and do the test and then accept what the results show us but here we were secretly hoping for I hate to say that but we were secretly hoping for the mean of the placebo group to be larger than the mean of the intervention group and you can see that in my alternative hypothesis there which mean my null hypothesis is that the mean for the placebo group is going to be less than or equal to the mean of the intervention group so now I've got to work out I've got to put all my eggs on the right-hand side basket so to get this critical value on the right-hand side I'm going all the way from the left to the right that means I'm going all the way representing 95% of the area under the curve because we always count from the left side and I'm working that out for 198 degrees of freedom and I get my critical value there which will just be the positive of what it was before and let's just print our findings here so we can be very sure of what is happening here all my 0.05 my 5% of the area under the curve is now lying to the right of this green that's my area of rejection I can't change things now it is still group 1 minus group 2 we stuck with that which means we still stuck with our difference being on this side here on the left-hand side but now the area under the curve listen carefully is 1 minus this little bit area here 1 minus so the whole 1 minus this little bit and I hope that makes sense because our null hypothesis was that the placebo group mean was going to be less than and we really did find it was less than so now we can't claim that is our p-value that little one it's 1 minus all of this because our rejection zone is now here on the right-hand side so we have 1 minus the CDF of the t-stat for 198 degrees of freedom and we get a p-value of 0.99 we can't reject that null hypothesis our null hypothesis was that x sub 1 is less than or equal to x sub 2 and we actually found that in actual fact we found a very large difference so our p-value is going to be very big it was very common to have found so there's our critical value it was very common to find the left of this and that's why it's 1 minus so you just got to get used to that you've got to think about this carefully but that's it if you understand this you understand statistics specifically if we go back to this two-tailed hypothesis our difference is one of many possible ones we create all those many possible ones we simulate that by the t-distribution given the sample size it's going to look slightly different we choose an alpha value we split that on both sides so that we can get and then we calculate these critical values which would represent 2.5% of the area under the curve to the left of the purple line, to the right of the orange line and then we look at the difference that we have and we just reflect it on the other side and we see that this falls within these rejection zones and we calculate that area under the curve towards the left of the red towards the right of the green here and that area is going to be less than if we combine those less than 5% of the area under the curve because the purple and the green would be the cut-off lines for 5% of the area under the curve and we have a statistically significant difference it's that beautiful it's that lovely and with the enticing prospect that we're not going to have to do any of these things just for you to understand what we're really going to do is way up here where did we put it got to look for it again anyway we'll find it where I just wrote a single line of code and it did all this calculation for us behind the scenes and we just got boom a p-value and we can just write it in our report it's going to be really as simple as that so I hope you enjoyed this lecture on hypothesis testing look at this notebook look at this video again this is the crux of the matter this is the crucial bit if you get this you understand statistics that's very well