 This model introduces the exponential models for counts. Why the video is titled exponential models instead of just count data will become clear pretty soon. So what are counts? Counts are typically counts of events. How many times something happens, like if you go fishing, how many fish you get, how many times you catch a fish. If you are running a company, how many times the company patterns per year. So there are discrete numbers, whole numbers, and they are strictly, they're zero or positive and non-negative. And there is some confusion around how you model count variables in the literature. And this article in Organizational Research Methods is one such example. So it's very commonly believed that if you have a variable that is count, then you have to use some other model than normal recursion analysis such as the Poisson recursion analysis or negative binomial recursion analysis. And this article explains that the application of normal recursion analysis would be inappropriate for data where the dependent variable is a count. And if you use a normal recursion analysis for a count, the results can be efficient, inconsistent, and biased. And these statements are simply not true generally. There are cases where normal recursion analysis shouldn't be used for counts, but as a general statement that is always wrong to use a normal recursion analysis for counts, that is simply incorrect. How this statement is justified is by giving two references to econometrics books. The problem is that these are big books and there are no page numbers. So we can't really check whether these sources support a claim without going through and reading the full book, which you can't assume you're a reader to do. So whenever you see statements that cite books as evidence, you really ask the page number. Where exactly in that book does it say that recursion analysis will be biased and inconsistent and inefficient if your dependent variable is a count? To understand why using counts it could be a problem or it's not a problem for recursion analysis, let's review the recursion analysis assumption. So this is from Woolridge's book and recursion analysis assumes four different things for unbiasedness and consistency. So we have a linear model, we have random sampling, no perfect collinarity and no endogeneity. If these are true, recursion analysis is consistent and unbiased. There is nothing about not being a count variable here. There is in fact nothing about the distribution of the dependent variable at all. It's only about what's the expected value of the dependent variable or the mean given they are the observed independent variables. We start getting interested in the distribution of the dependent variable when we have the efficiency assumption. So when you have a homoscedastic error so that the variance of the error term doesn't change with the explanatory variables, then recursion analysis is also efficient. But again, there's no should not be a count assumption. So using a recursion analysis for counts is completely fine. There is no problem with that. To demonstrate, let's have an empirical demonstration. So we have dice here. We throw 30 sets of die throws and we have the number of dice that were thrown and the number of sixes that we got. So the number of dice that were thrown and number of sixes that were got are the independent variable and the dependent variable. In a simple recursion analysis, we draw a recursion line, number of die throws, the explanatory variables, number of sixes here. And it looks pretty good to me. So recursion line seems to go through the data. And in fact, there is heteroscedasticity. So the variance here is greater than variance here. But other than that, recursion analysis is fine. Just use our robust standard errors and this is going to be the best way to model it. So what if we use Poisson recursion analysis that is commonly recommended for counts? So we use the Poisson model here, the coefficient 0.02. And for normal recursion analysis, the coefficient is 0.17. So 0.17 is about one out of six, which we know that we get for each addition or die throw, the expected number of sixes increases by one out of six, because that's the probability of getting a six from a fair die. The 0.02 should be interpreted as a percentage increase. So relative to the current level of sixes, the expected level of sixes increases by two percent. That doesn't really make any sense to think about die throws that way. And if we plot the Poisson recursion line here, it's actually a curve because Poisson is an exponential model. We can see that this exponential model doesn't really explain the data at all because when we have one throw, for example, it predicts that we get four sixes. That's impossible. And also that the number of sixes here grows exponentially. It can't. At some point you hit the limit of how many times you throw. So just the fact that our dependent variable is a count doesn't mean that we can't use recursion analysis and that we should use Poisson recursion analysis or some variant of that technique. The important thing about the Poisson recursion analysis is that it's an exponential model. So we're modeling the expected value of y as an exponential function, and this is the important part. When you have an exponential function, then the least squares is no longer an ideal technique. If you think that your count depends linearly and additively on your independent variables, then using normal recursion analysis is not problematic at all. In fact, it's an ideal technique for that kind of analysis. So in Poisson regression analysis, we are using an exponential function and that's the reason why this video is not a regression analysis for counts, but instead it's an exponential model for counts. So what is the Poisson distribution? It's a distribution of count of independent events that occur at a constant rate. So if you have a rate of, let's say, 0.001 deaths per capita in a country, how many people die in a given year? Something like that. And what does Poisson distribution look like? It's a discrete distribution, so we have discrete numbers and when we have small numbers, the expected value here is 1, then we typically get 1, 2 or 3 and getting 20 is almost impossible. If we have a large value here, the expected value is 9, then there are the range of values that we can get ranging from about 3 to about 20 and that's still plausible to get. So what we can see here is that this person increases with the expected value and that's a feature of Poisson regression analysis. So normally when we have this expected value, this is our experiment. The variance is 1, expected value is 9, the variance is 9. So the variance and the mean of Poisson distribution are the same. Now coming back to our example of die throws, this distribution is an ideal distribution for modeling die throws. But we don't need to just Poisson regression analysis because that also includes the exponential function which we don't need and using the least squares estimation technique is good enough regardless of the distribution of the dependent variable. So using a linear function with Poisson distribution would be unnecessary. Sometimes if we are interested in the actual predictions from the distribution and how they are distributed then we could use that but normally the Poisson distribution is only required when we do nonlinear models. When we go for larger values in the expected value, let's say we go from 2, 4, 8 and so on to 512. So these are exponents of 2. We can see that the distribution approaches the normal distribution. So with large numbers, large expected values the Poisson distribution approximates normal distribution. Whichever you use normal distribution or Poisson then in many cases doesn't make a difference if you can have the standard deviation of the normal distribution as a parameter as well. So they are roughly the same. So the distribution makes the most difference if your expected value is small. So this is distinctly non-normal as is that but this is not as much. So you apply the Poisson regression model when you think that the exponential model is the right model for your data. So you are expecting that the effects are relative to the current level and they are multiplicative together. And you interpret these results the same way as you would results when your dependent variable is log transport. The number that you explain is the expected number of events. One thing that's very common in studies that apply these techniques is that if we study for example how many people die in each country and we look at European countries, the European countries are quite different size from one another. Finland has 5 or 6 million people and Germany has like 100 million people also. So we have to take that into account somehow because we can't really compare the rate and the number of deaths in Finland and the number of deaths in Germany unless we somehow standardize the data. Quite often we are looking at we want to understand the rate at which something happens instead of the count. And to do that we use the exposure and offsets. For example the number of deaths due to cancer per population or the number of citations per article in a journal. There are the population here and the article here are what we call exposures. So this is like the total amount of units or whatever at risk that could have the event occurring at them. One thing that we could try if we don't think it through is just to divide the number of deaths per population but that's highly problematic for reasons explained in this article. So using the rate itself is a bad idea and also the Poisson regression analysis and the variance of that technique are very useful because there is a nice trick that we can apply. So when we want to model the rate instead of model the actual count of deaths or counts of citations we want to estimate this kind of article. So we will look at the expectation here multiplied by the exposure. So we are interested in that kind of model. So this gives us the rate of events and we multiply it with the size of our unit and that will give the actual count of events. We can apply a little bit of math and move this exposure inside the exponential function by taking a logarithm and then adding it to the linear predictor. And this taking a logarithm and including the variable inside the recursive model without the recursive coefficient or recursive coefficient constrained to be one is called an offset. So we are basically adding a constant number to the fitted value that's calculated based on our observation. So using an offset is something that your statistical software will do that for you. So you just specify one variable as an offset and how it works is that the statistical software takes a logarithm of that value adds it to the recursive function but instead of estimating a recursive coefficient it constrains the effect to be one and then that allows you to interpret these effects as rates instead of as total counts. And that's very useful. I have used that myself in one article that I'm working on. Then we have another variant of the Poisson-Rigerson model. So the Poisson-Rigerson model, Poisson distribution assumes that the variance of the distribution of the dependent variable is the same as the expected value for a given observation. So the Poisson makes the variance assumption that the variance equals mean. We can relax that assumption by saying that the variance equals alpha times the mean and that will give us negative binomial recursion analysis. So if alpha is greater than one then we are saying that our data are over dispersed and that's when negative binomial recursion analysis could be used. If alpha is less than one so the variance of the dependent variable is less than the mean then our data are under dispersed. So here's an example. So this is the Poisson distribution. Expectation is one, two and three. So these are powers of two I think or something like that and then alpha is two, two, two and three. So we can see that the expectation stays the same but the variance increases. So when we say here that the over dispersion here is three so the variance is three times the mean. So the mean is about three or something and the variance is a lot greater. Negative binomial recursion analysis is commonly used for these scenarios but the choice between negative binomial and Poisson analysis is not as straightforward as looking at the amount of dispersion. So which of these techniques the common way of choosing between these techniques is to fit both and then check which one fits the data better using likelihood ratio test. But there's more to that decision than just comparing which one of these fits the data better. So whether you use Poisson or negative binomial depends on a couple of things and you have to understand the consequences of that decision. So typically when you choose an analysis technique over another you have a specific reason to do so. So using the Poisson recursion analysis over negative binomial recursion analysis when we know that the distribution of the dependent variable is Poisson then the reason to use Poisson recursion analysis is that it is more efficient than negative binomial which is consistent but inefficient in this scenario. When there's over dispersion there are... it goes the other way so Poisson is consistent but it is inefficient and negative binomial is consistent but efficient. Then standard errors can be inconsistent for Poisson depending on which of the available equations you apply because there are multiple and you have to consult your statistical software's user manual to know which one is applied. Most likely at least in state you're using the equation that is consistent even under over dispersion. Then we have under dispersion and Poisson recursion is consistent, inefficient and standard errors may be inconsistent. Negative binomial is inconsistent so the estimates will be incorrect in large samples and that's really bad. Okay so this covers the three scenarios when the dependent variable is distributed like Poisson random variable but it could be over dispersed. It's also possible that you have counts that don't look like Poisson distribution at all and in that case Poisson recursion analysis is consistent, standard errors are inconsistent and negative binomial recursion is inconsistent. So what do we make of it? In some scenarios negative binomial is more efficient than Poisson in others it's less efficient than Poisson but generally we want our estimates to be consistent so that we may have a bit of inefficiency but the trade-off of getting an efficient estimator that could be inconsistent that's not worth making. So you want to have something that is robust and if your sample size is large then efficiency differences don't make much difference. So using Poisson recursion analysis is a safe choice if you don't know what you're doing. If you have a specific reason to believe that your dependent variable is distributed as negative binomial conditional on the fitted values then you can use negative binomial but using Poisson is a safer option. This is not something that is current practice but that's what the methodological literature suggests. We have also some extensions to these models. Zero-inflated models is one. So the idea of zero-inflated models is that sometimes you have these structural zeros we call them in the sample or in the population and the status user manual gives this example of a person going fishing or people going fishing to a natural park and the number of fish that they catch is not distributed as Poisson because some people choose not to fish. So people get zeros if they choose not to fish and they get zeros if they choose to fish but they don't get any. So the amount of fish that you get is probably independent events probably are distributed very close to Poisson depending on the weather and season and maybe your fishing gear and skills but given the time and given the person this is most likely very close to Poisson except for those people who decide not to fish that will get zeros. This is called a zero-inflation scenario and how we handle the zero-inflation is that we estimate our two models so we estimate some kind of SQR model typically logistic regression analysis for structural zeros. So this is the idea of modeling whether a person decides to fish or not and then we have exponential count models such as the Poisson model for the number of fish or we could have a linear regression model as well if we think that the linear model is better for the data than the exponential model. So we estimate two models at the same time and these two models give us the likelihood that we maximize. It's important that we report both models and interpret both models when we report the results because it could be interesting what defines the structural zeros and if that's very different from the actual zeros that occur from the actual process or the non-zero values. Then we have another commonly or a bit less commonly used but still sometimes used variant of these models called the hurdle model which is similar to the zero-inflation model. But in this case, instead of looking at the people who don't fish at all, we look at the difference between people who get one and people who get one or more. The example here is, the typical example is going to see a doctor. So how many times you go and see a doctor? The first time you go to a doctor depends on different things than whether you go there the second and third and fourth time. Whether you go to see a doctor the second time probably depends a lot on what the doctor tells you and whether you decide to go and see a doctor in the first place can't depend on what the doctor tells you because you haven't seen the doctor. We model this kind of process using the hurdle model. The idea is that we have two models. We have, again, S-curve model for zero and non-zero and then we have a truncated version of exponential count model for the actual count. We model first, does the person go to a doctor and then we model given that the person went to a doctor at least once? How many times does the person go to a doctor? Again, you get two sets of results for two models then you integrate both and report them. Let's take a look at an example. This is from the same example from the Vlevin's paper. They don't interpret the zero-inflation model but they present Poisson, regression negative binomial regression, zero-inflation Poisson and zero-inflated negative binomial. We're going to be looking at the likelihoods and the decrease of freedom or this is not actually a decrease of freedom but it's the number of parameters instead which is incorrectly reported as a decrease of freedom. The decrease of freedom difference between a negative binomial model and basic Poisson model is one. The one difference is that these estimate the same model so the regression coefficients are the same but the negative binomial regression model here estimates the amount of over dispersion in the Poisson distribution that will fit to the data. When we go from basic Poisson model to the zero-inflated Poisson model we get seen that the number of parameters is twice as the Poisson model. The reason for that is that we have actually two models so we have one model explaining the structure of zeros then they are the S square model and then we have the normal Poisson regression model. The negative binomial results and Poisson results are typically very close to one another because Poisson is consistent under the negative binomial assumptions so if sample size is large then they should be very similar. The zero-inflated model results and the negative binomial Poisson results are typically quite different and here we can see again the one degree of freedom difference. How we choose between negative binomial and Poisson is the convention is that you do a likelihood ratio test so you compare the likelihood of the log likelihood of the Poisson against the log likelihood of the negative binomial we can see that there is 400 difference with one degree of freedom difference that is highly statistically significant so the negative binomial here is a much better fit for the data than the basic Poisson model. The reason why negative binomial almost always fits better than the basic Poisson is that the Poisson model assumes that all the independent variables in the model explain the mean perfectly. So there is no only variation around the mean is a variation that belongs to the error term or however you like to call the Poisson distribution. In practice our models are imperfect so there are always some variables that we could have observed but did not that would explain the dependent variable and if those explain the dependent variable to a substantial degree then that additional variation that could have been explained but we didn't goes to the error term so it's the same thing as in a Richardson analysis so if your r square is 20% then 80% of the variation is unexplained if you add more variables r square increases to 50% and then the error variance decreases so the same thing happens here there are negative binomial model if it fits better than the basic Poisson model then it means that our model is incomplete in explaining the data completely that's not a problem as soon as any of the omitted causes are uncorrelated with the explanatory variables and don't lead to an endogenetic problem but that's something to be aware of. Finally there are quite often you see this kind of diagrams on what to do and this is again a convention how we choose between negative binomial and Poisson model there is no problem in using Poisson model for over dispersed data once you adjust the standard errors accordingly but the current convention is that you do both models and then you do a likelihood ratio test between the Poisson model and the negative binomial model if the negative binomial model fits significantly better that's evidence for over dispersion and then you go for the negative binomial model then these articles suggest that you look at whether there are xs0s then you do a Wang test and based on that you either choose negative binomial model or zero inflate in negative binomial model The problem with this approach is that the Wang test is problematic and also you should not be doing modeling decisions based on empirical results only the zero inflation it's a hypothesis that has theoretical interpretation if you use zero inflation model then you are making a hypothesis that your model is actually your data are actually results of two different processes a process that generates zeros people who never go fishing will never get fish so you have a theoretical guidance that you can usually use to choose whether you use zero inflation or not a possible mechanism for the zeros then you apply zero inflation otherwise you apply Poisson regression analysis because the zero inflation is actually not a violation of the Poisson regression assumptions