 Hi everyone, it's MJ. Welcome back to hypothesis testing. In the previous video we looked at the goodness of fit test and we use this test statistic which we are going to be using a gain for our contingency table example where what we're going to be doing is testing where the two factors are independent. So contingency tables can be used to test if two factors are independent and what we're going to be doing is going through a very easy example to just illustrate this whole point. So our two factors are going to be company. We have three companies A, B and C. We weren't very creative with the names and we will have the claims. What was their claim proportion of these companies and we want to see are they the same. So think of this as being car insurance is the one company having a much lower claims than the other one. This could either indicate that there's other fraud going on one of the companies or one of the companies is not actually paying out claims how it should be. So it is an interesting test that you can do to see if there's anything fishy going on in the business world. In our example we would be given the following information would be told that we have these three companies A, B and C would be told that their claim proportion is as follows 23, 28 and 20. We'd also be told the number of policies. So number of policies in force 100, 100, 200. Now we're getting all of this information from the exam question. So if you're like where on earth did those numbers come from? They will have been given to us in the exam question. That is our data. Now what we've been told is we need to take a or make a statistical test to test whether there is in fact independence between these two factors or if something weird is going on with the companies. Now to do that we're going to use hypothesis testing. And remember I spoke about the procedure in the very first video the six steps will be going to be going through those six steps. Okay so first step, first step is to write out our hypothesis. And this I know some people do find writing out the the null and alternative hypothesis as being the hardest part. If you do get it wrong you do mess up the whole question. So you you do want to pay a little bit of attention here. The null hypothesis could be that the population proportions are equal. And the reason for that is if these three different population proportions are equal then it's very likely that the two factors are independence and that the company doesn't have any effect on claims. The alternative hypothesis is therefore going to be that the population proportions are not equal. Okay if you're battling with that step do some do some practice get your head around that because you get that wrong like I say the rest of the question does fall apart. If you get that right the rest becomes quite straightforward. So the other is step one. Step two is choosing the test stat and I mean we've kind of given away which test that we're going to be using because it's you know hello it's the title of this video. So we are going to be using contingency tables and what our contingency tables well that's the point of this video to explain how they work. So what we're going to be doing is calculating our contingency table. We're going to calculate our test stat because we've got our data. Let's put it to use. So in a sense what we're going to be trading is three tables. So the first table let's get a different color over here. The first table we can call observed. So what was the data that we observed? And the tricky thing here is we're not just regurgitating these values but we are going to be combining them with the number of policies. So for example we've company A, B and C. We want to check how many claimed and how many didn't claim. So we see for A and B it's going to be quite straightforward because 23 percent times 100 is just you know 23 28 and then what we have 77 we have 72. For company C we've got 200 policies which means it's going to be double 200 times 20 percent. We're going to have 40 and then in order to get 200 it'll be 160. Okay so this is the very first table that you create. It's always good to sum the totals. That's a good little check and what we're interested in is this 91 over there and you see in total they are 400. Okay but we'll get to that. We'll get to that and in fact we'll get to it right now. We'll get to it right now. So what we're going to see is if we if we assume that these things because that's remember hypothesis testing we're making the assumption. We're making the assumption that these things are independent. So therefore if they are independent they're all coming from the same population we can combine them all together. So we can have this big overall population proportion which is equal to 91 divided by 400. So 91 divided by 400 and that's going to give us this value over here. You can put that in your calculator and you get 0.2275 Now we are going to be using this number here with the number of policies. Okay and together together we are going to calculate our second table which is called our expected table. So if this is the population parameter remember that's how hypothesis testing we're making this assertion. If that is the case what can we expect? So once again we have A, B and C and we have the claims and we have the claims and the non-claims. Okay this is our next table. Sorry that table looks horrible but we can work with it. We can work with it. So we're taking 0.2275 and we're multiplying it by the number of policies. So the number of claims that we expected is 22.75 subtracted by 100 because remember there's 100 in total. We're going to get 77.25. Cool. With B what we're going to get is exactly the same because it's got also 100 policies. So 100 policies times that value over here and then again 7725 100. Then C is the little bit of the tricky one. We're now timesing this by 200 in which case we're getting the answer of 45.5 and remember these equal to 200 subtract those together we get 154.5. Always sum them across just to double check we see okay we get that 91 again we get 309 and we're getting 400 going down and 400 going there. Okay great. We've made these checks to make sure we haven't made a mistake. Don't underestimate the checks it is very easy especially in an exam when you've got time pressure that you will be making mistakes as you go forward. So if you can it's always nice to throw in these checks and examiners also appreciate it. Now what we're going to do is calculate our third table which is the difference. Okay let's look at our difference and essentially what we're going to be doing is subtracting these values from these values. Remember our test stat is of that form oi minus ei divided by ei and that's great. So we are now going to create a table of this. You don't have to you don't have to but just for illustration purposes it's quite nice to just just make a little table and we can see that we're going to have six differences because we have six cells. So 23 minus 22.75 we see that we're going to get 0.25 and this one here 25 there we're going to get 5.25 I'm just subtracting this one from this one and then I'm going to get negative 5.25 when I subtract this one from this one over here and then we're going to have negative 5.5 and 5.5 cool and you can see this is what we need to use the square because otherwise these guys will cancel each other out. But this is the third table that we create it's the difference between the observed and the expected. We're now going to use that to calculate our test stat okay. So we're going to use the differences and remember the test that also relies on the expected in the denominator so we're also going to be using this over here. So let's write out our test that it's observed minus expected squared divided by expected and we're going to get the following we're going to get 0.25 squared I got it from there divided by 22.75 and I got that from there then I'm going to plus 5.25 squared divided by and that's I got from there divided by 22.75 by 22.75 plus negative 5.5 squared divided by that value over there which is 45.5 okay and you kind of get the point I do that for I don't want to do it for the next there'll be three more okay but you must do it you can't be lazy in the exam you can't just go dot dot dot dot you must actually write them I just yeah conscious of time and then you know you can use your calculator I don't expect you to do that in your head and you can get 2.43 okay we like this value we're going to be using this value now to calculate our P value okay so now we come to calculate our P value so how do we calculate our P value we're gonna have the probability that our test statistic is going to be greater than 2.43 but in order to do this we need to assign because there's a whole bunch of car square distributions we need to sign a degree of freedom okay the degrees of freedom is 2 I'm going to tell you should I tell you now let me tell you now okay how do I get to well how do we get to degrees of freedom if we want to do degrees of freedom with contingency tables we have the following formula okay and that is equal to rows minus one times columns minus one which what we have how many rows do we have we have two rows minus one times three columns minus one which is equal to one times two oh how does some of the day and we see two so that's how we got our two degrees of freedom there is a whole philosophy on how you get your degrees of freedom and how we got to this formula and it's very intense but it's very unlikely that they're gonna ask you a philosophical question in the actuarial stats exam at this level you just need to know this formula and apply it if you are interested and it is an interesting thing I do recommend that you go check it out or if you wanted to ask me feel free to ask me and we can discuss it in the comment section below okay but now that we have our probability that the chi-squared with two degrees of freedom we want to what is our p-value we simply go to the tables and I mean this is where you can actually get a little bit lazy you can actually use the the approximation and you can see it is approximately 30% now the reason why we can't go with this approximation and I'm getting this from the tables is because all we need to really do is show that our p-value is greater than 0.05 in fact this is going to be the final step when we're going to be inferring we can say since our p-value is greater than 0.05 so that's why it doesn't really matter if this is 31 or 29 percent as long as we can show that it is significantly larger than 0.05 we don't have to worry about accuracy and we can say since our p-value is greater than 0.05 we fail to reject HO which in simple English is except but remember you can't say except because of the whole uncertainty factor and the errors and everything which I have spoken about in the other videos it is a little bit confusing the first time you come because I mean we do associate the word fail not with success and this is exactly what we do want this is like it is successful if we fail to reject but I think think of it as in there is the double negative and in that way because we're failing to reject HO we can almost make the statement that yes there does seem to be independence between company and claim which means our assertion was not necessarily true we just don't have enough evidence to say that it is not true and that is one of the lovely quirks of statistics but like I say when we start going into degrees of freedom and the language we use it does start getting a little bit philosophical where at the end of the day what we really interested in is calculating you know this test stat which we can then use to calculate the p-value which just makes the examiners happy but these are the steps that we would take for the contingency tables and once again we are using the same test statistic that we did when we were using goodness of the test so there we go we're done and let me know if you've got any thoughts or any questions and I'll see you in the exam questions we will tackle much trickier ones I just wanted a nice simple explanation so that you kind of get your head around this whole idea it's not that bad it's not that scary although the exam questions can get a bit tough but I'll see you for those keep well everyone cheers for more content study advice and exam questions enroll in statistics by MJ link in the description below