 So today I'm going to talk about Poisson regression models for count data. I will first of all give a brief review of regression analysis. I will then introduce Poisson regression and looking at a simple model without a covariate, first of all the so-called equi-probable model. I will be then assessing this particular model with the Pearson-Keyer-Square test that is a lot like your ratio test statistics. And also I will be looking at some residual analysis as well. And then I will be introducing the Poisson regression model with a covariate, so basically a Poisson time-trend model. You may have come across different types of regression models already. For example, a linear regression model for a continuous dependent variable. You may have used logistic regression models already for a binary outcome variable. There are obviously other types of regression as well that are also part of the generalized linear models. So basically, for example, the multinomial logit model for a multi-category unordered variable and also the so-called cumulative logit for multi-category ordered variables or ordinal regression. Here we are going to basically go a step further. We are going to look at the outcome variable that is a count variable using Poisson regression. And sometimes in the literature you may also find the expression of a log linear model. Data for this particular session are assumed to be, first of all, a count variable y. So for example, the number of accidents or the number of suicides in a particular geographical area or time period. Then we've got a categorical variable x for example with, let's say, capital C possible categories such as days of the week or months. So basically, y here in this particular case has capital C possible outcomes. So y1, y2 and so on until yc. Obviously, generally in Poisson regression modeling you may think of a number of categorical variables that you have or a number of even continuous variables as explanatory variables in your models. Here we're going to start with something relatively simple. Just to sort of introduce the basic principles of Poisson regression. So basically it's a form of regression analysis here to model a count data and a particular case if all the explanatory variables are categorical then we basically model a contingency table, so basically cell counts. And the model basically models expected frequencies. The model specifies also how the count variable obviously relates to any of these explanatory variables or for example of the level of the categorical variables. Poisson models is a form of generalized linear modeling. It uses the logarithm, the log as the canonical link function in this particular case. We basically assume that the outcome variable y, the dependent variable, the variable that we are particularly interested in has a Poisson distribution and the logarithm basically is its expected value that can be modeled by a linear combination of any of these unknown parameters. So basically of these unknown beta coefficients, the regression coefficients in your model. Sometimes it's referred to as a log linear model in particular when used to model contingency tables. Let's have a look at a brief example. For example the number of suicides by weekday in France. So we've got a number of weekdays in the first column and the second column just simply the frequencies, the occurrences, the events and then let's say the percentages, how it is distributed according to days of the week. So that is the type of model or type of data that we would like to model. Let's first look at a very simple case, the equiprobable model. The equiprobable model means that basically all outcomes are equally probable. So they're equally likely. That is for our particular example we assume a uniform distribution for the outcome across days of week. So y does not vary with the days of week x basically. So the equiprobable model is basically given by this formula here. So the probability of a particular event across these categories basically of the days of the week is equally distributed. So it's 1 over C, so 1 over the capital C. So we basically expect an equal distribution across days of week. And given this particular data we can test then the assumption of our interest basically the assumption of the equiprobable model, so H0 that this assumption holds. So looking at our example again, let's say suicides by weekday in France. Basically H0, the assumption that we would like to test means that each day is equally likely for the suicides to happen. That means the expected proportion of suicides is about 100 over 7, so 7 days of the week. So basically just over 14% per day. And if you're looking at the third column of the table we see the actual observed distribution. And obviously that depends a little bit on each day of the week possibly and diverges a little bit from 14% per day. But maybe the divergence is not very much and we are satisfied with actually our assumption. And to do that properly you obviously would need to do a formal test and I'll come to that in the next session and I will explain the actual formal test in further detail. Looking at another example, example two, looking at traffic accidents per weekday. Again we want to make the H0 assumption of the equiprobable model that means that each day is equally likely for an accident. That means the expected proportion is again the number of accidents is 100 over 7. So basically just over 14% per day we would expect. And there maybe in this particular example we see a greater distribution. In particular for Sundays there seems to be a greater percentage than just 14%. So we may want to continue testing if the observed distribution that we have is maybe different from the expected distribution or if it's still okay to assume that they are actually equal. Looking at hypothesis testing we may say in this particular case H0 that each day is equally likely for an accident to happen. But we can also think of other alternative null hypotheses. For example that each working day is equally likely for an accident or that maybe Saturday, Sunday the weekends are equally likely for an accident. You could also think of other extra or additional variables. For example the distance driven each day of the week and you may want to take into account those types of explanatory variables as well. Just thinking about this a bit further, basically we can now express the equiprobable model more formally as an actual Poisson regression model without a covariate and that models the expected frequencies. So basically we assume a Poisson distribution with parameter mu for a random component. That means the response variable y follows a Poisson distribution. That means basically that y follows this notation here or this formula here using the exponential function and mu the parameter of interest and also the y the outcome variable of interest where y is just simply the count variable 1, 2 and 3 and so on. So basically y is a random variable that takes on only positive integer values and also this Poisson distribution has only one single parameter mu which actually is the mean and the variance of this distribution. And we assume that our outcome follows this Poisson distribution and our distribution follows the integer count distribution. Looking at basically the simple model to start with we aim to model the expected value of y and it can be shown that this is the parameter mu hence we aim to model the parameter mu effectively in our Poisson model. So defining the equiprobable model that I had on an earlier slide as sort of the intuitive notation. Normalizing this writing it down as the expected value of y mu basically the parameter mu and that is then 1 over c because we are making the assumption of the equal probability across weekdays. Or using the link function the log of mu would then be a coefficient alpha so that is basically the coefficient that I would like to estimate as part of my model alpha is then basically the log of 1 over c in this particular case.