 Now in this tutorial I want to talk to you about the exact test the exact goodness of fit test exact meaning We're going to calculate a p-value directly not first a Test statistic that then falls on a distribution in an area under the curve We're going to do that directly calculate this p-value directly now We're going to stick to a binomial categorical variable in other words the sample space only contains two elements Yes, or no value one or value two and we're going to see one of them as a success So we're going to follow this binomial distribution where we expect One of the values to occur at a certain probability and then we can work out a p-value for that. Let's have a look So let's look at the exact test of goodness of fit So we've really seen already You can look at those online. Remember all these files on github and on our pubs We see the our pub html rendered file here You can download the actual rmd file for our studio on github, but we've looked at the G test of goodness of fit and we've looked at the chi square test of goodness of fit so we have a single variable and We have Categorical nominal can be discreet as well, but let's stick to nominal categorical to make things easy We have a sample space in there and we count The occurrence the frequency of each of these elements in a set we do a test An experiment we have the single variable and we count those unique values in the sample space And we just want to know the proportions that we get back Are they different from what we would expect and that is a goodness of fit test now The g test is a log likelihood test and this exact test is a binomial Now we're gonna stick to binomial, but you can have multinomial as well But it's not a log likelihood test and it's exact because we are calculating a p-value directly We're not we're not calculating a statistic That forms a distribution and then calculating an area under the curve for that specific value And more extreme no no no we are just calculating a p-value directly So that's what the exact means. It doesn't mean it's better or more accurate So there we see the binomial distribution there. So imagine then we have Categorical nominal categorical variable and we are sticking to binomial in other words the sample space contains two elements and In this instance, I'm going to call them value one and value two just to be very generic Now because this is a binomial distribution one of them. We've got to see one of the two as a success So remember success is just a generic term success might in a Dichotomous outcome such as alive and dead the dead might be the success that you after that you want to investigate So one of those two elements in the sample space of this nominal categorical variable is your success And that has a probability of occurring and that's the P We're looking at the one minus P would be the other variable So let's have a look at that. So imagine that we have ten trials and We have two successes. So we handed out Questionnaire to ten people and the success was them choosing value one But only two chose value one and in the end we suspected a 50-50 split So we expected the probability of a success or someone choosing value one at 50 percent So hence our P of 0.5 So right inside of our we can use the binom dot test binomial dot test K there is our number of successes in is our sample size P is our probability of a success so of that value one being chosen and We wanted to sign a P value at a confidence level of 0.95 And then we're going to get back this nice value number of successes to number of trials ten And we see a P value there of 0.1094 Now very very easy now Let's have a look at what I've done down here because this is a binomial distribution We can actually look at we can create a bar plot and we can look at what the probability was If it was if there was a probability of 50 percent of choosing value one What was the likelihood then of getting only one success? So one value one and nine value twos or two value one as we did here two value ones and eight or three and seven Or four and six, but I'm interested in that if there were zero value ones or one value one or two value ones or three value ones I can plot all of those probabilities So again n is ten k is two p is 0.5 I'm going to create this sequence A sequence vector of zero to n so that's going to be zero to ten because I might Do that sample and no people select value one and on the other extreme all ten people select value one And then a y is just going to then be that probability that we work out and before that we're going to use the d Binom function and I pass that x so that's zero one two three four five six seven eight nine ten The size is n they were ten and the probability is p So I'm just running through all of those and now we create this bar plot The height is going to be y as we pass those 11 values in The names.org Argument there. It's just going to be x so zero one two three four five six and eight nine ten again I'm going to use a color of deep sky blue. I have a title I have an x axis label a y axis label and I want the values here to be printed upright and there we go so the probability of A success is a very low probability If I handed this out and I expected 50 percent of people to choose value one and I got it back and no one chose value one There's a very low probability of that happening There's a higher probability that one would and given that we expected a 50 50 There was a very high likelihood that there would be five successes in that 10 and that's why that's the most But it's also a very high likelihood that there would have been four successes or six successes or choices of value one So how do we get from this to this p value of 0.1094? Well, we have to look at A value of two or more extreme So we're going to have to look at this probability of choosing of only getting one Or two I should say value One's back and the one and the zero we've got to add all of those three probabilities So it's from this point to the extreme And if we do that, let's just look at At the sequence zero one and two we pass it through the binom function So we get this probability. So the probability of the zero was 0.00097 Of the one was 0.009 and for two was 0.43. So we've got to add all of those together And if we sum that up those three up, we get to 0.05. Well, that's still not 0.1094 But what we have to remember we chose a two-tailed hypothesis test So what we do on this side of the curve we also have to do on that side So we actually have to add the eight nine and ten probabilities to And if so with a symmetric distribution like this, we can just multiply that by two Or I could then specifically ask for zero one two eight nine and ten And if we work that out we get to the 0.109 So very simple to see That's it though if it is we have this 5050 split in our binomial choice that we have there But what if that is not so what if it's not a 5050 split? So what we're going to do here is we're going to make the probability of them choosing value one We expect that to be 0.75 And let's run 15 trials now and I'm again going to go all the way from zero to 15 And plot all of these and if we plot them, this is what it looks like It's not symmetric anymore. It's definitely left tailed here So how would we now go about for a how do we would we go about doing a A two-tailed test Now there isn't really not all statisticians agree as to how you would do this I think the most common way probably is this idea that we're discussing here the method of small p values or the small p value method So what we're going to do Is in this instance here, I'm interested in seven. So what was What was the we did our test and we send it out? We send out 15 and we expected 75 percent of them to have of those 15 to have chosen value one and we only got seven of them Chose value one is that statistically significant or not now? It's very difficult here to see how am I going to reflect that on the other side for two-tailed hypothesis test Well, what we do is we calculate here The the seven what the probability was for seven And then we can calculate each of the probabilities separately. So from zero to 15 What was the likelihood of that number of successes? And we sum up all the ones That have that probability that seven had or less than that So what we do is I'm going to create this vector called it suc for success And I'm going to do all of them from zero to 15 Size in probability p. So 15 and 0.75 so run all of them. So I'm going to have these 16 values now And what I'm going to do is I'm going to sum up those ones So in suc where suc is Less than or equal to the specific one that's seven So it's just going to look at all the probabilities. So that height there and it's only going to add those that are actually The exact same probability or less And if it adds all of them up, it's going to give us a p value And that's how we would use this method of small p values to work out the probability of Having found those seven and we see it's 0.017 and that's the significantly significant for an alpha value of 0.05 So we could now suggest that if we expected 75 percent of people to choose value one And we handed it out to 15 people and only seven of them chose value one So the other eight chose value two that that would have been a statistically significant finding So there we go the exact test of goodness of fit Remember that for these tutorials on r that the actual html rendered files are on our pubs And that's what you might see on the screen But these files are also available in their raw form on github and all the links will be in the description below So you can either go to the website and look at our pubs files As they're already rendered or you can go to github and download those files Into your system so that you can use them in our studio yourself So if you like these videos on r, please let me know so that I can make more of these or their subjects That you want me to cover as far as buy statistics is concerned and the use of r Please let me know otherwise. Please always remember to subscribe and hit the notification bell so that when new videos come out You will know about it. You can also follow me on twitter because that's where you'll also see that new videos are out