 Hi, my name is Haini Reisenen and today I will talk to you about ordinal regression. This first part is an introduction to the method and it will tell you when to use the method and what kind of variables we need if we want to use this. So the outline of the session is as follows. First we will talk about ordinal response variables, which are the type of outcome variables that you would use with these models. Then we will talk about what we mean by cumulative odds and probabilities, which are important for these type of models. And then finally we will look at the model itself. So how does an ordinal regression model work? And sometimes this model is also called proportional odds model or cumulative logic model. So many categorical variables that we use in our research have a natural ordering. So for instance we might be interested in how severe someone's symptoms are if they are suffering from disease and we might categorise them into low, medium and high. While it will be difficult to say if there is a distance between these categories or if the distance is the same between any two categories, we can still easily say that there is an order, low is less serious than medium and medium is less serious than high. We could also examine someone's attitudes towards a social question and we might ask people whether they are in favour of something, whether they are indifferent and gets. And again we have a natural ordering here going from more agreeable towards less agreeable. If we use this information about how the categories are ordered, we can conduct more informative and powerful analysis than if we don't. Because we could use multinomial model to model these type of variables, but actually if we use an ordinal model we get a more informative and powerful analysis. An ordinal model essentially wants to figure out what is the cumulative probability of being in different combinations of the categories of the outcome. And here's some notation. So when you CPK that is the probability of being in the response category K. So I sometimes also call it a response probability. CPK means the cumulative probability of being in category K or lower. So if we had CP2 for instance, that would be the cumulative probability of being either in category one or two combined. And one minus CPK is probability of being above category K, which is basically the inverse of the situation that I just described. And if we know all the response probabilities of a certain variable then we can calculate the different cumulative probabilities for that variable. So CP2 for instance would be P1 plus P2. If we know the response probabilities we can figure out the cumulative probabilities P1. So the probability of being in the lowest category is the same as CP1. So the cumulative probability of being in the lowest category varies for the other categories we would need to combine. The category in question and everything below that. And for the highest response category, so if we had three categories P3 would be 1 minus CPK minus 1, so CP2. Here is an example of how this actually works with some numbers. So let's pretend that we have an outcome variable with three categories. And we have some probabilities of each of our respondents being one of these categories. The probability of being the lowest category P1 is 0.5. The probability of being the middle category is 0.3. And the probability of being the highest category is 0.2. If we want to figure out the different cumulative probabilities we can do that now that we know the response probabilities. So the probability of being in the lowest category CP1 is equal to the probability of being in that category, so that is 0.5. So we don't actually need to calculate anything here, we just know that it's 0.5. The probability of being in category 1 or 2, so in other words CP2, is the probability of being in category 1 plus the probability of being in category 2, so that is 0.5 plus 0.3 equals 0.8. The probability of being in category 3 or lower, so CP3, is 1, because 3 is the highest category that we have in our outcome variable. If you don't believe me, you can calculate 0.5 plus 0.3 plus 0.2 and you'll get 1. But we don't even have to do that because we know that everyone has to be in one category, so the probability of being in the highest category or lower is always 1 or 100%. Like with other probabilities, we can transform these probabilities into odds and log odds. So if we take the cumulative probability of being category K or below CPK and divide that by the inverse of the probability, we get cumulative odds. And if we take a logarithm, a natural logarithm of these odds, then we get cumulative log heads, just like in other regression models, other logistic regression models. We can then use this cumulative logic transformation in our ordinal model. And on the left hand side, you can see that we have a logarithm of the cumulative odds. And then on the right hand side, we have something that looks quite similar, even if not exactly similar to other regression models that you've seen before. So we have an intercept and we have a slope, so here we only have one explanatory variable in our model. So like in a multinomial model, in ordinal model, we are estimating K minus 1 equation simultaneously. So if we have three outcome categories, then we will estimate two different models. The difference to multinomial models, however, is that while each equation has a different intercept, alpha K, the slope in every equation is exactly the same. In a multinomial model, the slope would be different in every equation. The intercepts that we get are always ordered in size, which is also different from a multinomial model. So the intercept in the first equation should always be lower than the intercept in the second equation, etc. You might have noticed that if you look at the regression equation, on the right hand side, you have something a bit strange going on before the beta. So you have a negative sign there before our beta 1. And that is to make interpretation of these models easier. And this is what most statistical software do, state and SPS, for instance. It means that once we take this negative sign and plug it into our equation, a positive slope implies that higher values of the explanatory variable are associated with an increase in odds of being in a high response category rather than a low response category. If we didn't have the negative sign, it would be the other way around, which would be quite unintuitive. The same much ratio applies to all of the thresholds, so in all equations for the different intercepts, because we have made something that is called the proportional odds assumption. So we assume that it actually makes sense to have the same slope for every equation. And there are ways to test whether this is actually the case, but we will talk about that in later sessions. Then we will look at a simple example of an ordinal model. We have some data about undergraduate plans to apply for postgraduate study. So a survey asked undergraduates how likely they were to apply for a postgraduate study. And they had three different options that they could choose from. They could say unlikely, somewhat likely, and very likely. We also collected data from their parents' education, and we want to know whether their parents' education is associated with their own likelihood of applying for postgraduate study. So the parents' education is a binary variable, which takes value one, if at least one of the parents is a graduate themselves, and takes value zero if they are not. And in the table you can see how the outcome variable of the likelihood of applying for a postgraduate study is distributed. So we have 55% of undergraduates in the unlikely category, 35% in the somewhat likely category, 10% in the very likely category, and in total we have 400 respondents. If you use data, here is how your ordinal regression results will look like. So these results are in the logout scale for now. And in the first row, where you have some numbers, you see your coefficients. So the 1.12 here corresponds to the likelihood of applying for postgraduate studies if at least one of your parents has graduated. Then the next two rows tell you what the intercepts are for the two equations that we have run in this model, because we have three categories in the response. So the first of those two rows says 0.376, and that is the intercept for our first equation. And then the second row says 2.452, which is the intercept for our second equation. Like with any logistic model, you can interpret these results using odds. And like with any logistic regression model, you do that by exponentiating your logout value. So if you exponentiate 1.127, which was our coefficient for parent education, you get 3.09. And this 3.09, remember, does apply for all of the two equations that we modeled here. So we could say that for students who have educated parents, the odds of being in the very likely versus the combined somewhat and unlikely categories are three times greater than those whose parents are not educated. We could also say that the odds of being in the combined very and somewhat likely categories versus unlikely are three times higher for those with educated parents compared to those whose parents are not educated. So to sum up, that means that the odds of being in a higher rather than a lower category of the outcome are higher for students with educated parents. Like with other logistic regression models, you can also calculate predicate probabilities. If you want to figure out what is the probability of being in one of the exact response categories out of the three possible categories. And you can see the formula for that calculation here. And you can see that it looks very similar to other logistic models. The difference, the main difference being that on the left hand side, you see that we are calculating cumulative probabilities rather than response probabilities. And on the right hand side, you can see that we must remember to change the sign of our slope. So we have a negative sign before the slope. So if you wanted to calculate the probability of being in category one, for those whose parents are not educated, we would take the first equation which had the lower intercept 0.377. We would take our slope for parent education 1.127. We would change the sign because of the negative sign here in the equation. And then we would multiply that by zero because now we are looking at the reference category of this explanatory variable. So those whose parents were not educated. If you were looking at those whose parents were educated, you would multiply it by one. We exponentiate that equation, divide that by one plus the same thing. So we exponentiate the value of the first equation. If we solve that, we get 0.59 and that 0.59 is both the probability of being in the exact category one and the cumulative probability of being in category one. For category two, we do the same thing except that we changed the intercept. Now for the higher intercept that we had in the second category, and then when we solve that equation, we get 0.92. So now we know that the cumulative probability of being either in category one or two is 0.92. But we don't yet know what is the probability of being in the exact category two. However, we can fairly easily find that out by taking the cumulative probability two that we just calculated and then subtracting the cumulative probability one, which we also just calculated. So we plug in the values 0.92 minus 0.59 equals 0.33. So now we know that the probability of being second category for those whose parents were not educated is 0.33. We can also calculate the probability of being the highest category using the information that we already have. So the probability of being in category three is one minus the cumulative probability two. One minus 0.92 is 0.08. Here is the table of response for probability showing the response, the different probabilities for the two categories. So those whose parents were not educated, so the ones that we just calculated, and those whose parents were educated, I haven't shown how to calculate that, but the only difference is that you multiply the slope by one rather than by zero. And you can see that parents education does make a difference. So those whose parents were educated had a higher probability of being very likely to apply for postgraduate education, 25% versus 8% among those whose parents were not educated, for instance. And if you look at the lowest category, unlikely to apply for postgraduate studies, we can see that for those whose parents were educated, and the probability of being in that category was 32% compared to 59% among those whose parents were not educated. When you use ordinal models, calculating response probabilities is usually quite useful because sometimes dealing with the cumulative odds is a bit difficult to understand if you're not very familiar with these models. So depending on your audience, you might want to consider using probabilities rather than odds when making interpretations from these models. Thank you.