 One really common task in analyzing data is classifying cases into one category, or another, based on a number of other variables you might have. So for instance, with your computer is trying to decide whether a particular email is spam or not, we're trying to decide whether a particular person is likely to buy your product or not. And those are dichotomous classifications. And a common method for analyzing or predicting dichotomous outcomes is to use what's called binomial, which means two names, binomial logistic regression. So it's a form of linear regression, but it's adapted for placing cases into one group or another, based on the probabilities that are predicted from your other variables. Now, this is really easiest to simply show how it works. So I'm going to use the data I have about state data. I've got a number of variables about personality characteristics on a statewide basis and search terms. But the dichotomous variable that I have in here is whether the state's governor their current governor right now is Republican or Democrat. Now, let's start with just a little tiny bit of exploration. So we know what we're dealing with. I'm going to take governor put it over here. And we're going to get a frequency table. And we will also get a bar plot. And so at this exact moment of the lower 48 states in United States, about two thirds of them have Republican governors. And so we're going to see if we can use some of the other data we have in this data set to classify states or predict which ones have Republican governors and which ones have Democrat governors. And so the way we want to do this, I'll just close that is to come to regression and come down here to logistic regression to outcomes or binomial again, we're binomial means two names or two categories. I'll click on that. The first thing we need to do is put our dependent variable, that's the outcome variable thing we're trying to predict. And that's going to be governor. And I don't even have to tell it, you know, this is what this means is what that means, because I actually have it written as words in the data set. And the nice thing is Jim will be smart enough to be able to tell that those are categories. And then we get to pick some covariates. Now, I could pick a lot here, let's pick just the social media ones just for fun. So I'm going to come down here to Instagram, Facebook and retweet. By the way, the reason it says retweet is because Google correlate wouldn't let me search for Twitter, I don't know why. But since retweet is exclusively a Twitter word, it seemed like a good substitute. So I'm going to put those all into covariates. And those are the three variables that I'm going to use together to try to predict which states have Democrat governors and which states have Republican governors. And you can see right here, we've got a model that's actually working pretty well. We've got three variables, we have the intercept, which is not zero. Instagram is not statistically significant. Facebook is and retweets know we're close. But there's a lot of other information we can get through the options that we have in Gemobi. So let's just take a quick look, let's take a look at model builder. And this is how we want to enter things and we can put stuff in blocks if we want to. I don't feel the need to do that in this case. But I showed how that works in the video on linear regression. Let's look at reference levels. And one of the categories needs to be taken as the baseline. And we're trying to predict a change to the other category. Because there are more Republican governors, let's just have Democrat as the baseline. And we'll go up from there. Assumption checks. Colinearity is an issue, especially because I'm pretty sure that these three social media terms as Google search terms are related. And from that, we get both the VIF, the the variance inflation factor and the tolerance. And there's indications here that we've got some collinearity. But you know, nothing awful. We have a few choices on how we assess fit. I'm going to leave this with the defaults where it uses deviants in the AIC, which is the AKI information criterion. Then we go to the model coefficients. And a common choice when you're dealing with binomial logistic regression is the odds ratio. And it's also nice to get a confidence interval. And that's going to add a few columns on to this table right here. You see, we've got the odds ratio right there. Now, in so many other statistics, zero is the base value and things either go positive or negative. But the nothing's happening value for an odds ratio is one, that means a one to one ratio. And it goes below one, but it doesn't go all the way down to zero can't get all the way down there, it can go up from there. And so you see, for instance, that the intercept, well, the intercept is reliably above zero, but you can see that it's odds ratio, both of these numbers are above one, this one, one's below one, one's up. These, both of them are above one, they're on the same side. And so these give us an idea of the variables that predict the odds of a particular state having a Republican governor based on these three social media variables that we have from Google correlate. If we come down a little further, we can go to estimated marginal means. And so for instance, Facebook was significant within the context of these three predictor variables. Let's take that and stick it into here for marginal means and actually gets a nice curved chart that goes with it. If I come down here for a moment. And what this shows us is the probability of a state having a Republican governor going from zero to one, that's like 0% to 100%. Based on the Z score that state has on searches for Facebook on Google, and you can see here that when states search less than other states for Facebook, they are less likely to have Republican governor. But when they search more, they are more likely to have Republican governor. And so this is a nice way of looking at the effect of that because binary logistic regression does work on a curved system where it's drawing this probability changing over time. This one shows you just the one variable. It actually uses all of the variables together to calculate probabilities. But another nice thing about categorization tasks like this is you can get a classification table. So I can click on that one right here. And what is going to tell me is what it predicts the states will have versus what they actually do. And what's interesting to do is you can change the cutoff value. So for right now, it's only getting 40% of the Democrat states correct is getting 91% of the Republican ones. But let's take a quick look at what's called a cutoff plot, which gives us a chart of what's called specificity and sensitivity, you can think of sensitivity, meaning like, it's very likely to set off an alarm or give you the answer if it believes it has a Republican governor. And specificity means it's going to do that only if it does. And in many situations, these two lines, this is specificity going up here. And this is sensitivity going down. Often they cross right here at the 50% point. But these ones are a lot closer to 0.7. So what I'm actually going to do is I'm going to change the cutoff from 0.5 to 0.7 and do that right here. And you'll see that it changes the way the classification table works. Because now it's going to say, well, only put them as Republican if they have over a 70% chance of being Republican. And that actually makes sense too, because it's about two thirds Republican governors, slightly over in the country as a whole. You see this cutoff line is a lot closer to where these two crossovers, usually where you want to get it's because it's going to be maximum utility there. And it's changed the classification table. Instead of 40%, we now have 73% correct for the Democratic governors. And we've gone from 91 to 73. But now it's sort of balanced out in terms of how accurate it is for the two different conditions. And so these are some of the methods that Jmovi gives you for looking at the relationship between several predictor variables, like the three social media search terms, and how they can be used to predict the classification of a dichotomous outcome, like Republican or Democrat governor or any other time, you have two distinct outcomes that you're trying to predict.