 I will now continue looking at a goodness of fit test statistic for Poisson regression. So we would like to test if a particular model that we are assuming really does fit the data or if you may want to extend our model and include maybe more covariates in our model. We first of all look at another example, so I will first of all talk that through and then I will be discussing goodness of fit test statistics, so the Pearson-Keyer square test and the log likelihood ratio test. So we've got the example of recall of stressful events, so example three, and basically we would like to first of all start with something simple again, a simple Poisson model without any covariates, but then extend this to an example with covariates. So basically participants from a randomized study were asked if they can recall any stressful events over the last, in fact, 18 months, and if yes, in which months, when did this particular stressful event happen? And we then wanted to look at the number of stressful events that people were able to recall and look at their distribution according to these 18 months. So we had 147 stressful events recorded in total. So the H0 hypothesis is that first of all we start with something very simple, start with some vague sort of conservative attitude, conservative assumption that these events are uniformly distributed over time. That basically means that H0 follows the equi-probable model, that all these probabilities across months are the same. So we have an event occurrence of 1 over 18, so a particular event can occur in any given month. So it would be a percentage of 0.055 or in terms of percentage terms, it's a 5.5%. So we would expect about just over 5% of all events to happen per month. Looking at the actual count data, at the actual data that was recorded, so we've got first of all months, ranging from 1 to 18, and then we've got the actual count data, so the actual number of events that were recorded per month and then the percentage that relates to the actual count variable. So we can already see that some counts are significantly or are actually higher than 5.5%, and some are actually a bit lower than 5.5%. So just looking at the data, we may already conclude that there is some divergence between or discrepancy between the observed values and what we would expect to see based on our equi-probable model. Let's look at the evaluation of the Poisson model doing this more formally. So we would be using the Pearson-Kyles-Square test and the deviants or also called the log likelihood ratio test for Poisson regression. Basically both are goodness-of-fit test statistics and they basically compare two models. One for the current model, the model that we have attained, so in this particular case the equi-probable model, and then we compare this model with this so-called saturated model, i.e. the model that is larger, that is the saturated model, that is the model that fits the data perfectly and that explains all of the viability. That basically means we are comparing observed and expected frequencies. Looking at the Pearson and the log likelihood ratio test statistics, basically we say that if H0 is true, that means that if the equi-probable model actually holds, then we would expect or roll a distribution of 147 times 1 over 18 so that we would expect an expected frequency of 8 points, just over 8 per month basically. So we basically have one parameter model that we would like to estimate and that would be just over 8 events per month. So we can compare the observed counts with the expected count per month and we can see that obviously some counts are quite a bit higher than maybe 8 and some counts per month are quite a bit lower than 8 than what we would expect. Looking at the Pearson-Kal square test, it allows us basically to compare the observed and the expected frequencies and you may have come across the Pearson-Kal square test when testing associations between two categorical variables. So it's the same principle effectively here. We are trying to compare observed and expected frequencies and basically it allows us to look at the sum of the standardized residuals in squared terms and we can calculate this particular statistics for our example. So basically you just have to plug in the numbers for each cell basically so we've got 18 cells in total and for this particular example recall of stressful events we will have the KAL square test statistic of 45.4 and we now need to compare this to the value from a KAL square distribution so basically the assumption is that if H-node is true if indeed the actual probable model holds then this test statistic will follow the KAL square distribution so we can compare it with the distribution from the KAL square tables for example. So for that we need to have the degrees of freedom which is defined as the number of cells minus the number of model parameters which is the number of cells is 18 so capital C minus the model parameters here in this particular case for the simple model is just 1 because we have got only alpha that we need to estimate so we've got the KAL square test statistic of 45.4 with basically 17 degrees of freedom so 18 minus 1 at the 5% significance level looking at the KAL square table the value we've got 27.6 from the table and also we have got that basically associates with the p-value of really rather small 0.001 so that means the value we would then reject H-node based on those characteristics. Conclusion is that there is strong evidence that the equivalent model does not fit the data so just looking at the actual data from our table we already looked at observed and expected frequencies and we already saw some discrepancy but exactly how significant the discrepancy is the discrepancy between observed and expected values we can then formally test this for example here with the Pearson KAL square test and we concluded that the difference is rather large so it is not just due to chance and that the data does not follow the equi-probable model so we have to do probably something else to improve this particular model likewise looking at the log likelihood ratio test statistics for Poisson regression we can now also use this test statistics to again compare observed and expected frequencies so basically the formula here gives the log likelihood ratio test statistic and you can plug in the numbers the observed and the expected frequencies and that is again a measure of the fit of the model so the goodness of fit test statistics and again similarly to before if H-node is true then actually this particular log likelihood test statistics would follow a KAL square distribution so again we need to define the degrees of freedom which again is the number of cells minus the number of model parameters so again it's 17 for our example and the log likelihood ratio test statistics for our example is 50.8 if you plug in the numbers on 17 degrees of freedom so you've got a p-value of less than 0.001 again and you would again reject H-node so basically in the same way as with the Pearson-KAL square test so the conclusion is again the same there is strong evidence that the equi-probable model does not fit the data so we basically now need to use this information to take this forward and to think about possibly fitting a more sophisticated model maybe including another covariate to allow for differences between number of months just a couple of remarks about the Pearson-KAL square test and the log likelihood ratio test they are basically asymptotically equivalent so basically they are relying on a large sample and you would expect them to be giving you basically the same or very similar results if they're not similar this could just simply be an indication that the actual approximation doesn't actually hold and also just to note that for fixed degrees of freedom so when n increases for larger samples that the distribution of the Pearson-KAL square test usually converges to the actual KAL square distribution and also it does it more quickly than the log likelihood ratio test and also to note that the KAL square approximation is usually relatively poor or not appropriate if any of the expected cells are less than 5