 In this video I will introduce you the basics of nominal data models. Nominal data is one of the four levels of measurements that many introductory level research methods books explain. So the idea of a nominal model is that you have a variable that is categorical and there is no order to the categories. For example Finland, Sweden and Norway coded as one, two and three would be nominal because there is no order. You can't say that Finland is more than Sweden or Norway is more than Sweden. Logistic regression analysis normally assumes at least interval scale and now we look at the nominal scale. So how do we deal with variables that are choices between different alternatives? We use the logistic regression or probability regression framework for this kind of analysis. I'm going to take a look at the logistic regression analysis perspective. The logistic perspective or the probability regression perspective is roughly similar. You just do the math slightly differently. So let's take a look at our choice between three categories. Let's say it's Finland, Sweden and Norway and we have probabilities of these three categories expressed as odds. So the odds of choosing Finland versus Sweden and odds for choosing Norway versus Sweden for example. If we know two odds for these choices then we can always calculate what's the third, the odd for the remaining comparison. So we can calculate the remaining comparison like that. So we only need to estimate two odds and then we know the third one as well. So that allows us to have one category as a reference category. So we estimate a model that explains the choice between Finland and Sweden and choice between Finland and Norway for example. Then when we know the odds we can calculate the probability. So for example the probability of category one is the probability of category one divided by some probability of all these categories. We can express that as odds like that and that gives us the probability of category one as a function of the odds. So when we do two models that explain two different odds then we can calculate the probabilities of all three of these outcomes. Finland is Sweden and Norway and that allows us to calculate the likelihood and estimate the model. So this idea of explaining two different odds or more than two different odds using different sets of recursion analysis is called multinomial recursion analysis. You can use logistic recursion analysis or you can use probit recursion analysis or you can use some other kind of S-care model. So one category is set as reference and then we calculate m-1 where m is the number of categories of recursion logistic recursion models or other recursion models. And there is an important assumption, independence irrelevant as alternatives. So that means that for example if we compare, if we ask a person where he or she would like to live we give the options Finland, Sweden and Norway. Then the relative odds between those different categories shouldn't change if we introduce let's say Denmark as a fourth option. So when we add other alternatives they are not relevant, they don't make a difference for these comparisons. Because we are comparing the odds for choosing Finland against Sweden and that odds shouldn't depend on whether Denmark is an option. This is something that you should be justifying based on theory when you do this kind of models. Let's take a look at an empirical example of what this kind of models look like. The actual estimation is pretty simple, you just calculate, you estimate these recursion coefficients as one set of models and you adjust them all. You calculate likely with the same way as logistic recursion model, now you just have more than one probability. Here's an empirical example, so this is a paper about using multinomial logistic recursion analysis in organizational research methods. So it's a pretty good explanation of what these models are for and how they should be presented. There are questions that this paper asks as a demonstration is how companies choose their internationalization mode. So you can expand internationally in multiple different ways and they have three different ways. One is exports, another is joint venture, which means that you start a company with somebody else that is already in the market. And then there is a subsidiary, so you just start a new subsidiary that you own yourself in the foreign market. So which of these modes the company chooses is their research question. They have two models, so model A here has three equations, model B has three equations. So they have three options for the categorical variable and they have three equations. The reason why there are three equations is that it's just easier to present the result if you have all these three possible comparisons here. So this is redundant, it's not actually needed for estimating the model, but you can calculate the third model for the third odds afterwards if you want to. Then the reference category here is expert is own subsidiary. So they're estimating a model for the odds of export versus wholly owned subsidiary and joint venture versus wholly owned subsidiary. So they're estimating two logistic regression models and these coefficients then give the probabilities of the different alternatives. When we look at the model indices they're just repeating, so this is one model and this is another model. And they're just repeating the indices for the first model for every single equation in that model. You don't need to do that, they just do that for some reason. So the indices are we have some pseudo R square, we have AIC, we have a likelihood ratio test, the density model comparison against the new model, then we have some kind of pseudo R square classification rate, and then we have the likelihood ratio test between these two models. So is the model B, does it explain the data better than the model A that doesn't have the firm size variable, which was their interesting variable. So the logic of using these models is the same as any other models. You have a control variables only model, then the interesting variable and the control model then you compare using nested model comparisons. So how do you then interpret these results? The problem is that these regression coefficients can't really be interpreted directly because a negative positive effect for example here could actually, when you consider all the effect, you can actually correspond to a negative change in probability of getting this particular category out of the category variable. They are way to into these models again by plotting and this paper shows a really nice way of plotting the data. So you choose some sets of interesting variables that you hold constant and then you vary one variable. For example here the firm size, which is their interesting variable, they have logged it. I would have preferred to have the raw metric, but you can do it that way as well. And then you look at how is the predictive probability for each category when we change the value of firm size holding everything constant and then you interpret. So the interpretation of these plots would be that smaller companies tend to go for exports. And then as the company size increases, the joint venture odds or chances increase as well, but not by much. It's much more common that you go for a subsidiary. So it's basically a choice between exports and subsidiary, but a small fraction of companies, mostly in the medium size, go for a joint venture as well. So that's then you interpret. What do these graphics mean? So you don't interpret the actual numbers because that's very difficult to do. This paper also provides another graphical interpretation, which is the marginal effect. So that is the marginal predictions. And this is the marginal effect. So it tells which direction is the probability moving when the company size increases. So we can see that the probability increases. It's positive. It increases most rapidly when you are a medium size company. And when you start to become larger and larger, then the probability increases, but the increase is not as steep. The reason being that this curve is very close to being one. I think this is a lot easier way to interpret the results. But some prefer these lines that tell what is the direction that these lines are moving. The paper also makes a strong case that interpreting these models using the actual coefficient is a bad idea. And they have a list of recommendations here. And they make a strong statement that the graphical interpretation is better because you are less likely to make mistakes. And also modern statistical software, such as state on R, will make these plots for you. So you have to do them manually. So the chances of doing mistakes is not as great as would be when you interpret the regression coefficients directly.