 For a final look in SPSS at basic statistics, we'll look at the explore command. I like to think of this as a way to get a lot closer, get a little macro view on your subject, get closer and see what's there in detail. Now the explore command is going to give you a bunch of statistics, it can give you the mean and the confidence interval for the mean, and the trimmed mean, as well as the variance, the standard deviation, the intercortile range, the minimum and maximum, the range, skewness, kurtosis, a collection of M estimators, which are special robust ways for measuring the center of a distribution, percentiles, which we've seen before, and lists of outliers. It can also give you a collection of plots. It's the one place in SPSS that you can get a stem and leaf plot. Now traditionally, those are things that are drawn by hand, so it's kind of cute to see a computer do them. You can also get box plots, and you can get histograms, and you can get a set of normality plots such as a QQ plot or a detrended QQ plot. And the neat thing after that is you can break all of these analyses down by groups. So let's try it in SPSS and see how it works. Just open up this syntax file, and we'll run through the various procedures and explore and see how it can add to your own analysis. As always, we'll begin by opening the demo.save dataset. Here's the command for a Mac, here's the command for Windows. Now again, I'm saving this as syntax, that makes it repeatable, and it means so you can download it and try running it on your own. But I created all of this by using the menu commands. Let's start by doing a default explore analysis for a couple of variables. I'll come up to analyze two descriptors, and then we'll come here to explore. And what we're going to do is age an income category. And again, this is kind of interesting because these are different kinds of variables, age is a scale variable, an income category in this case is an ordinal variable. I'm just going to leave all the defaults as they are and hit okay. And here's what we get from this. First, we find out whether there were any missing cases, there weren't in this situation. And then we get a collection of descriptive statistics for these, we have first for age, then for income category, we have the mean with the standard error, the confidence intervals, the 5% trimmed mean, median variance, standard deviation, minimum maximum range, inner quartile range, skewness and kurtosis, along with their standard errors. And so there's a lot of information there. And we scroll down, we find the same kinds of information for income category in 1000s. Now remember, some of this you wouldn't normally want to use because income category in this case is not a scaled variable. And a lot of these things like minimum maximum and trimming work best with a scaled variable. But SPSS is able to kind of run it on everything. So interpret with caution. And then we come down and look, we have a stem and leaf plot, where this is age, which in our sample is two digit numbers. And so this means 1818. And each of these leaves, each of these numbers over here is a leaf that represents 10 cases. Remember, we have 6400 cases. So we have about 640 numbers right here. And you can see, for instance, that the 30s appear really common late 30s. And that we go up to somebody in their late 70s. And so that's an easy way to see what's going on. Simultaneously, we get a box plot. And the nice thing about this is you can tell really quickly, there are no outliers on age, not in this particular data set. We do the same thing with income category. And the stem and leaf plot looks funny, but that's because there's only a few possible values, one or two or three or four. And it's drawing it it's drawing it. So it looks a little weird. But we can come down and get the box plot as well. And we see there's no outliers, at least on this kind of variable. Again, not normally something you would do with a rank order variable. But it's possible here. Now the neat thing is there are additional statistics, I'll do the same two statistics, but I'm going to go check off a lot of options that I have right here. So let's go back to that dialogue, I'll go to explore. And what I'm going to do is I'm going to say, just give me the statistics right now. And I'll come up here and I'll make some selections. One thing, although 95% confidence intervals are by far the most common, I have seen significant situations where people used 80% confidence intervals. So you can change it if you want. Then I can get all of the estimators, it's a whole collection. I can get a list of outliers and a list of percentile values. I hit continue, and I click OK. And now we have the same table we had before, that's their descriptives up there at top. Then we have the estimators. And this is four different robust measures of center. Again, all of them are trying to give us something equivalent to the mean. And you see in this case, Huber's estimator, Tukey's by weight, Hample's estimator and Andrew's wave. The numbers are all pretty similar. I mean, it goes from a low of 41.18 to a high of 41.52. But they're all really close. And each of these has specific parameters that go into them, you can't adjust them in the dialogue box. But let me just return to the syntax for one second. You see here, these are the parameters for each of the estimators, you could change them here if you wanted to. I'll go back to the output. Then we have percentiles 510 25 up to 95. And then it gives us the case numbers for the highest and lowest five cases on each variable. And so this is a really nice way of seeing a multi dimensional picture of our data. Now, in terms of pictures, an even better way to do this with more graphs. So let me go back to the syntax for a second. And you see that we can get some additional plots. I'm going to use age and income category again, but I'm going to change what it tells us. So first off, I'm going to say give me just the plot. So we're not going to get any statistics. I'm going to the plots menu. I say, well, we have a stem of leaf by default. Let's get a histogram. Let's also get normality plots. That's a way of assessing how closely your data match a normal distribution. I'll hit continue. And okay. And now I have a histogram for age. The stem and leaf plot. But this one here is normal. But this one here is new. It's a normal qq or quantile quantile plot of age and years. And if it were normally distributed, all of these circles would fall exactly on this line. You see it's really close, but it does deviate at each end. And then a detrended one takes that line sort of flattens it out. And it's much easier to see where the changes are. Now I know it looks really big in this case. But this variable is in fact pretty close to a normal distribution. Then we have our box plot. And then we do the same thing for income. We start with a histogram, our stem and leaf plot. And our normal qq plot again, a little weird, because there's only four possible values in this data set, but they all fall pretty well on the line. And there's our detrended plot. And then finally, the box plot that we saw before. Now there's one more thing we can do with the explore command, and that is we can take some of these analyses and break them down by groups. So if we go back to the syntax, we'll see I'm going to do income and break it down by gender. Let's go back to the menu here. Go to explore. And I'm going to reset this. And we're going to take income and put that into our dependent or outcome variable list or the thing that we're trying to predict. And then we'll take gender, scroll down a little bit, there's gender, put into the factor list, or sometimes people call it independent variables, that's if it's an experimentally manipulated variable, or the predictor variable. Now I'm going to come up here and I'm actually going to skip the statistics and get plots only. I don't want to stem I don't want to stem and leave, but I will get a histogram, I'll get the normality plots. And now because I'm breaking it down by groups, I can check the spread versus level with the Levine test. The idea here is that the data should be spread out approximately the same amount for each of the groups. So we can compare them using some uniform statistics. I'm going to do what's called a power estimation here, click continue, and then okay. And now what we get is again, is a list of the number of cases that have complete data and all of them do. There's no missing data. We have a test of normality. And what we see here is based on both of these that the data for neither group is normal. That's okay, because we knew that income was strongly positively skewed. As for homogeneity of variance, whether the two groups have about the same variance or spread. You know, there is some difference, but they are not statistically significant. And so it appears to be the same for the men and the women, which is good in this particular data set. Then we can come down and see the histograms. First for women, and you see it's got a really strong skewness there. And the same thing again, for men, really strongly skewed. Then we get the normal qq or quantile quantile plots. And again, if it were normally distributed, all of these points would fall right on this line, it's strongly skewed. And so we have this really big bend in the data. The same is true for men. And here's the detrended lines, where they should all be flat on that line. Instead, you get this swoosh mark instead. And so it just confirms that we're not dealing with normally distributed data. Then what you do have is this big collection of outliers in the box box. I'm going to do one thing. I'm going to double click on this. And then I'm going to come right up to here. And this will turn off the data labels. So we can get rid of the ID numbers. And you can see that we have a lot of outliers in both the men and both the women. And there's no really obvious differences between the two groups. And the spread versus level plot is something that you can use if you have multiple levels that it can help you select a kind of power transformation, a square root or reciprocal a square or something like that. But that's a more complicated topic and something for another day. And besides, it appears that we have relatively homogeneous variance in the two groups. And so we'd be good to go ahead and do our other analyses. And so those are some of the options and explore. And that's where we'll end our discussion of basic statistics. But you can see how they can be used to see how well your data meet the assumptions of the procedures that you use. And then really, how well you can make inferences from your sample to other groups.