 Genallislinear models are very commonly used in research. Logistic regression models and many other commonly used models belong to this broader family. It's useful to understand that these are techniques that are essentially variants of one more general technique than to try to understand each of these techniques separately. I will now explain the principle of generalized linear model. The idea of generalized linear model is that we have a linear regression model here, so we have the model that is linear, that gives the linear predictor. Then we apply a link function f here, and the y is the f plus some variation that the model doesn't explain. Alternatively, the model can be expressed as expectations, so the function and the linear predictor here give the expected value of the dependent variable given the observed values of x. Then we also need a probability distribution for the dependent variable given the expected value. In logistic regression analysis we use the Bernoulli distribution, which is once and zeroes only. In normal regression analysis we use the normal distribution. In other models we use other distributions. This family consists of some very commonly used models. For example, from Wikipedia we have the distribution here and we have the link function here. A distribution and a link function define a model. This list here consists of some very commonly used models. For example, we have the logit link here, and then we have a Bernoulli distribution here, and that is the logistic regression analysis. Then we have a multinomial distribution, which is a categorical distribution, and a logit link that gives us multinomial logistic regression analysis, which is very commonly used for categorical dependent variables. So this is once or zeroes, a choice between two options. This is a choice between multiple options. Then we have Poisson regression analysis, which is very commonly used for counts. The link, there are data supports many of these distributions, and this is the list from state of documentation. And it may look like a lot of things to know, so there are about 15 different distributions, and then some distributions can be used with multiple different links. What's useful to understand is that typically when you choose a distribution, there is one default link, and for example, the choice between logit and probit, when you choose one distribution, the choice between the first and the logit and probit link has very little consequences on the results. There are special cases where one must be used. The complementary log log is very uncommon in management research. So basically whenever you use any of these that support logit, you are going to be fine just by using logit distributions. So the question of which GLM to use boils down to choosing a distribution that you apply, and the distribution is which one you use depends on the phenomenon that you're studying. A choice between 15 is a big choice, but fortunately these distributions fall into some categories. So you have to first pick a category, and then you pick within the category. There are the Bernoulli, beta and binomial are for data that are ones or zeros, or between one and zero. So the Bernoulli is ones and zeros only. Beta is between one and zero. So we join one. That's for fractions. So how large share of your time do you spend working varies between zero and 100. So you will be using beta. And beta distribution is one kind of fractional response model. So beta, GLM with beta distribution is one way to do fractional response models. There are also others that fall outside the GLM family, but it's useful to know that there's at least one that you can apply, which is the beta. Then binomial is just a sum of Bernoulli distributions. So if you have summary data, you know that you have 10 individuals, you have groups of 10 individuals, and then you know how many of those 10 individuals have, for example, committed a crime, and then you can apply binomial distribution because it's a sum of 10 independent events. Then categorical models use ordinal or multinomial distributions. Ordinal distribution is an order category. So if you have a tall person, a taller person, and the tallest person, then you know the order of those people, but you don't know how much taller the tallest person is than the taller person. And you don't know what's the difference between the taller and the tall person. So you only know the order, you don't know how far those observations are, then you will be using the ordinal distributions. Then multinomial is for categorical variables. If you have, for example, if you're studying which country a company expands to, and you have the choices of Finland, Sweden, and Norway, that's a categorical variable, and the categories, the different countries, don't really have an order, so you'll be using a multinomial regression for that, or GLM with the multinomial distribution. Then the next two, the Poisson and the negative binomial are for count models. The choice between these two is mostly an empirical matter for most people. There are theoretical reasons sometimes to refer Poisson and sometimes negative binomial, but for most researchers you just use whichever model fits your data. The difference, or what this assumes is that they have a count of independent events. For example, a count how many people die in a country in a year, and you want to predict that with the number of people in the country. So that's where you will be using Poisson regression analysis. Negative binomial regression analysis is used when the variation of the data is more than what the Poisson model would predict. So that's Poisson with over dispersion, but that's an empirical matter. So typically when you apply these two models, you run both, and then you compare the results using a likelihood ratio test, and then you choose the one that's supported by that test. The final models are survival models. Survival analysis concerns scenarios such as what's the expected amount of what's the life expectancy of a person, how much is the expected mean time between failures in some equipment. So you are looking at a time, and then after a time an event happens. And you are trying to either predict the time or predict the risk for the event to happen, depending on what you are modeling. But these are useful scenarios where time passes, and then something occurs. It can occur repeatedly, such as failures of equipment, or it can occur just once, such as a death of a person. When you want to do survival modeling, the choice between these requires some expertise. So you basically have to take a book and study a bit of survival models and survival analysis before you start applying any of these. Let's take a look at the GLM results and what kind of results a GLM analysis provides you in addition to regression analysis results. The GLM results, this is done with R, look a lot like regression results from R. The first thing we have again, model summary, tells us that we use the Menard's dataset, predicting Menard's with AIDS and binomial distribution, which in R means that it's Bernoulli distribution with logic link functions or logistic regression analysis. Then we have a deviance residuals. So deviance residuals are something similar to residuals in a regression analysis, in that they are normally, in large samples, they are normally distributed. So the mean is at zero, the median should be close to zero, then the minimum and maximum should be about the qualifier part. Then we have coefficients, which is exactly the same as regression analysis. The problem is because GLMs are nonlinear models, these are difficult to interpret directly. Some models such as logistic regression analysis have special interpretations like odd ratios, Poisson regression analysis gives you incidence ratios and so on. But an easy way to interpret all these results is just to plot the data or plot the marginal predictions from the model and see how the dependent variable behaves as a nonlinear function of the independent variables. Then we have a model quality indices. First thing, there is a dispersion parameter. So if we use the normal distribution, the regression analysis, the linear predictor gives us the mean and we also need to estimate the variance, the dispersion of the distribution. And this dispersion parameter here is what is the variance of the distribution. In Bernoulli distribution, the variance is completely determined by mean, so it is not estimated separately. That's the same thing in many other models. In some models, we estimate a dispersion. In some models, some distributions, the dispersion is given based on the mean. Then we have deviances. The deviance is calculated based on the minimized log likelihood statistic. So deviance is minus 2 times the log likelihood and the null deviance here compares how much better this model is to a model that doesn't predict the dependent variable at all. And then AIC is a statistic that allows us to compare models. So it's kind of like adjusted R-square. It allows us to compare non-nested models. It doesn't really have any interpretation. It's just a larger value is better. So smaller value is better. Practical considerations when using GLMs. So there are a couple of practical considerations. Well, first of all, you have to build your model and you have decisions to make. You have to build the linear predictor. So what variables go to your model? What are the independent variables? And then you have to choose the response distributions. What is the distribution of your dependent variable and which link function you apply? In practice, it's a good idea to always start with normal regression analysis. The reason for that is twofold. First of all, normal regression analysis is simple to use and it will tell you something that you didn't know before the analysis. The second thing is that sometimes, even if you could be using a GLM, you could be just as fine with the regression analysis. So a linear model could be adequate for your data and if you do regression analysis and the diagnostics for that, you don't identify any problems, then you are going to be okay with just a linear model and it's easier to apply than GLM. It's easier to do diagnostics and easier to interpret. So if you can apply a linear model, then by all means just apply a linear model. Then we have assumptions. So regression analysis makes assumptions and so do GLMs. So the GLMs basically inherit all regression assumptions except those assumptions that are about the error term because we are not using a normal distribution or we are using something else. The assumptions are that the sample size is large. So GLMs have been proven to work well in large samples, but only certain special cases are proven to work well in small samples. So generally, you have to assume a large sample. The fact that something has been proven only in large samples, such as unbiasedness, doesn't mean that these wouldn't work in small samples. Just that they have been proven to work in large samples, they have not been proven to work in small samples, nevertheless experiences shown that they do. Then the model is correct, so you have a correct distribution and you have a correct linear predictor. Then you do diagnostics the same way as in a regression analysis. The diagnostics are not as well developed, but basically they involve feeding or plotting fitted values and different kinds of residuals and observed values against one another. Then you look for certain kind of patterns like you do in a regression analysis and then you make adjustments. You can also do influence statistics, such as Cook's distance. Then multiple models can be compared using likelihood ratio tests, which compare the deviances between the two models and interpretation are... We have pseudo R-square statistics. The idea of pseudo R-square is that with R-square in a regression analysis, it has multiple different interpretations and properties. So the R-square is the squared correlation between the predicted value and the observed value. It is the amount of variance explained and so on. In GLMs you can have a statistic that has one of these properties, but not all of these properties. In practice we have different statistics that mimic certain aspects of R-square. They are called pseudo R-squares. There are probably 10 or 15 different ones that you could apply. Usually it's a good idea to just... When you do an analysis, then check which pseudo R-squares are available for that analysis in your statistical software. Then you read a bit about what those pseudo R-squares quantify and then you choose the ones that you think are most relevant for your research question. Or you can use one based on what the readers recommend. I don't generally use pseudo R-squares myself much because I like to plot my data and plotting gives me the size of the effect anyway. That brings us to plotting. It's very important that you plot your data because intrepending the odds ratios or even if you can intrepend them really well, explaining to your reader what they mean can be difficult. Showing a plot how the predicted probability goes as a function of, for example, our age is a lot easier for your reader. Then some link and distribution combination have special interpretations like odds ratios in regression analysis, incidence ratios in Poisson regression and so on. But you don't have to know all that if you just know how to plot the data.