 For a final look at SPSS and analyzing data, at least in this brief overview course, let's take a look at one of the most useful procedures around regression. Now, you might think of regression as sort of the statistical version of the three musketeers where it's all for one. I say that because all for one is actually all variables for predicting one outcome. Another way regression uses many different variables, many predictor variables to predict scores on one outcome variable. This makes it useful in a huge range of circumstances, especially because there's something for everyone with regression. There are many different versions of it and many adaptations of regression that make it truly flexible and powerful when analyzing data and make it a go to tool for almost any analytical purpose you might have. We'll try a simple version of this in SPSS. First, make sure you've downloaded this data folder from the course files, and we'll use the cars dot save data set that we've used in our two previous examples, along with this syntax file. And when you get to the syntax file, it begins as usual with the code for loading the data set from the desktop truthfully is easier to just double click on the file cars dot save and haven't opened it up directly in SPSS. That's what I've done here. And you can see it's the same data set with about 32 rows of data, a bunch of cars from 1974. And several variables, what we're going to try to predict in this one is miles per gallon, based on things like the number of cylinders, the displacement, horsepower, weight, quarter second time, transmission and kind and gears and carburetors. All right, so that should be pretty easy. What we're going to do is go to analyze and come down to regression. And we'll use the second option here linear. That's just basic linear regression. Now we need to put under dependent the outcome variable thing we're trying to predict kind of bugs me here because independent and dependent really should be reserved for manipulated experiments. But we still know what they mean. Our outcome variable, the thing that we're trying to predict goes here independent. So that's miles per gallon. And then we can take everything else except car name, that's just a label, we'll take all the rest of these and we'll put them under our independence or the variables that we are using to predict the outcome. Now I want to do the totally default no extra steps version first. So I've put the variables in their respective place, and I'll just hit okay. And now we get our output and it tells us first the code that was used to produce this analysis, that it used all of these variables simultaneously to predict a single outcome which is listed down here, and they were entered at once. The model summary tells us that we have a multiple correlation of these predictor variables with our outcome variable of 0.931, which is really, really high. If you square that to get the proportion of variance explained, it's 86.7%. Even the adjusted R squared because we have a small sample is still 82%. It's it's huge. We get a significance test right here. We are not surprised to see that the significance is 0.000 is not zeros all the way through, but it's it's highly significant. And then we get coefficients for the individual regression coefficients. So what we're looking for here are significance levels that are under 05. And interestingly, only one of them in this collection is under 05. And that's weight in tons. None of the others are there close. That doesn't mean that none of the others matter. It simply means that when you take all of the variables together at the same time, when they are taken as a whole, really only one of them deviates significantly from zero to become a predictor. That's a weight. Now, there are a lot of other ways of doing regression. And SPSS gives you a lot of choices. I'm going to come up here, back to analyze, down to regression. Now I will mention, there's a really interesting one here called automatic linear modeling. This is a SPSS function that came in a few versions ago, it does a lot of automatic data prep, it does a lot of combining and splitting up variables. On the other hand, it's really kind of difficult to explain how it all works and then to interpret it properly. And I'm going to save that for another course where I specifically talk about analyzing data. For now, I'm going to go back to linear. And we're going to make a few choices, we're going to make a few option rephrase. And we're going to make a few choices, we're going to take some of the options that SPSS makes available. Now the first one I'm going to do at the risk of doing something very controversial is I'm actually going to go from simultaneous entry to stepwise regression. This is controversial because some people in the literature have called it positively diabolic in its risk of a type one or false positive error. And there's good evidence for that. On the other hand, in modern machine learning, stepwise procedures have been very fruitful use. And so it's not totally unacceptable to try it, especially when we're doing sort of an exploratory project like this right now. You certainly wouldn't want to use it for rigorous model building, but it's a nice way to get some insight into the data pretty quickly. I'll come up to statistics, and I'm going to add a few things. I'm going to get confidence intervals for the coefficients, those are nice to have. We have the overall model fit and I'm going to get the r squared change, because a stepwise model goes through several different steps adding variables. And we want to see if each variable adds something that is statistically significant to the overall model. We could get a lot more information here, but I'll leave it there for now. Under plots, we can get a ton of different plots, but I'm actually just going to come down here and choose the standardized residual plots, a histogram, and a normal probability plot. Now there are other options as well. I could save about 15 different kinds of scores to the data set, I can save unstandardized predicted values I can save, studentized, deleted residuals, and so on and so forth. Things I could do here and there are situations in which I might want to do those. But for right now, I'm going to skip them because I'm simply trying to build a model without necessarily saving all of the steps in between. Options really just talks about the criteria used in the stepwise procedure. I'm going to leave it at the default right now, but you could change it if you wanted to. And then style is a new thing that has to do with the formatting of the table. I'm going to leave that one alone for right now because we're going to have exactly what we need. Now I've created this already and I've saved it in the syntax. I'm just going to hit okay. And you'll see that we get a different kind of output right now. I'll zoom in on this. Now what we have is some code that's a little bit longer that says to go through the variables one at a time, and find the predictor variable that is most strongly associated with the outcome, put it in the model, get partial correlations and go through step after step. What we find here is that although we had nine predictors originally, only two of them were statistically significant when put into the model. They were weight and number of cylinders. Again, what we're trying to predict is gas mileage miles per gallon. And if you come down here, you can see that they were both statistically significant, where the adjusted R squared for just weight is 74.5. And when you add on number of cylinders, it goes up, not a huge amount, but it goes up almost 8%. The analysis of variance table lets us know that both of these models with just one variable and with two predictor variables, they're both statistically significant. Here are the individual coefficients, along with their confidence intervals over here on the right side. Now, because we've gone through a stepwise procedure, it's not surprising that all of these are statistically significant because that was the criterion used for including them. Here we have a list of excluded variables along with their collinearity statistics. And this has to do with how much each of these variables is correlated with the others. So for instance, number of carburetors is highly collinear or easily predicted by the other variables that we could have included in the model. And then we come down to the residuals, I'm going to look specifically at the charts. In an ideal world, your residuals are normally distributed, which means they're just as likely to be high as they are low and they're symmetrical. And we see here that they're not horribly pathologically far from normal. So this is probably a good model in this set. And here is a normal PP probability probability plot of the same data. And if it were perfectly normal, all the dots would be on the line, the diagonal line, they're close. These are the 32 individual observations and how far off they're, they're close enough. And so this lets us know that our model is predicting really well, and it appears to be not biased in one direction or another. So this is one method of developing a model. Again, the stepwise procedure is best for exploratory analysis, it's not something you would use for confirming a finding. But as a quick way of sifting through a large collection of potential variables, this is a nice way to do it. It lets us know, for instance, that in this particular dataset, miles per gallon is predicted primarily by weight, which completely makes sense about the car. And number of cylinders, which is associated with having a large and thirsty engine. So the general idea of multiple regression, again, is to use many variables to predict a single outcome. SPSS gives a lot of options for those, we've looked at the default, we looked at one variation on there. But there's a lot more that you can explore and that we will cover in another course on statistical analysis in SPSS. But for now, I encourage you to take some time and look at some of these options and see the kind of insight that they can give you on your own data and see what options you can use to get useful insight into your own analyses.