 And now in SPSS and introduction, we get to the part that maybe you were waiting for and that's analyzing data. I'll mention, however, I'm going to give only a very small overview of analyzing data, because we have an entire separate course here for data analysis, and also data visualization in SPSS. And I recommend that you check those out. But as a taste of what's available, we'll talk about a procedure that's of interest to a lot of people in applied settings. And that's hierarchical clustering. Now, the idea here is that you're trying to find clusters, you're trying to find the clusters in your data, more specifically what you're trying to see is whether similar cases cluster together in some way that you can use to make inferences about them. The trick, however, is that similarity depends on your criteria. And there's a few decisions that you have to make when you're doing a cluster analysis of any kind. So for instance, you have to decide whether you're going to do a hierarchical cluster analysis, which goes from one group to as many groups as you have cases, or whether you're going to use a set K or set number of clusters. You also have to decide on the measures of distance that you're going to use Euclidean distance, which is sort of like measuring the as the crow flies distance between cases is very common as is squared Euclidean distance, which is what SPSS uses. There's also the question of whether you want to start with everything together and split it up in a divisive procedure, or start with everything separate and put it together in an agglomerative procedure. By default, some programs like R do divisive. But by default, SPSS does agglomerative. You basically end up with the same general findings anyhow, so it's really not a huge difference. So we're going to do a cluster analysis, but we're going to try to keep it simple. We're going to use some of the most basic methods for doing this. We'll use Euclidean distance or square Euclidean distance. In this case, we'll use hierarchical clustering where we don't have to choose the number of groups ahead of time. And we're going to use an agglomerative procedure where it starts with every case separate and then gradually puts them together. We'll try this in SPSS. But I need you to do something first. There is a folder that you can download from the case files that ends with data here. And in it, there's one file, it's cars dot save where the SAV is the proprietary SPSS data format. And in addition to that, there is the SPS syntax file, and you'll want both of those for this demonstration. If you save the data file to your desktop, it looks like this, you can just double click on it and it will open up in SPSS. You also have the option of using syntax to do that. It depends on your operating system. This is for a Macintosh right here. And this is for a Windows computer, though you may need to use backslashes instead, depending on your version of Windows. I'm just going to go back and double click on this to open it up in SPSS. And there's my data set. What this data set is is a slight variation on a data set called MT cars. That's in the default data sets package in our it contains road test data on a number of cars from 1974, from the magazine Motor Trend. And what we're going to do is we're going to look at this information, we're going to see whether the cars cluster together in some important way. I'll go to the data view here and you can see we have Mazda RX four Hornet Sport about Mercedes 450 SE Lincoln continental and so on cars that were all available in the early 70s. And we have information about miles per gallon. We have the cylinders. We have the displacement in cubic inches horsepower, weight in tons, quarter second time in the standing quarter mile, whether it's an automatic or a manual transmission, the number of gears in the transmission and the number of carburetors are probably carburetor barrels here. I'm going to turn on the labels, only one variable changes here. By the way, one of the things I did is I formatted this for SPSS by adding labels and change some of the decimals and makes it a little easier to work within the program. But let's go to the syntax file right now. Once we have the data open, we want to do a default hierarchical clustering. Now this is the code to produce it right here. But I'm going to do it with the drop down menus to show you that it's really not hard to do. All we need to do is come up to analyze. And then we come down to classify. Now I have to admit off the top of my head, I cannot remember if every version of SPSS has this particular menu. Most will, I hope yours does. So you can follow along with this hierarchical cluster. I'm going to click on that. And what I'm going to do here is I'm going to take car name, which really tells me just says what the cars are. And I'm going to use that to label cases, because that's going to mean something to me. And I'm going to take all of my other variables, we'll just do a little shift click here and put them over here. And at this moment, I'm going to change nothing else. You'll see there's going to cluster cases. That's what we want. It's going to give us both some statistics and some plots. That's fine. I'm going to hit okay. And we're going to get a result identical to my first syntax command. I see it sound I'll make the output window bigger here. And here's what we have first off, it tells us how many cases there were and they were 32. And they all had complete data, which is nice. Then SPSS gives us something kind of unusual called an agglomeration schedule. And it really specifies at what point in the procedure did two cases get put into the same cluster. I personally don't have much use for this, except I know that when there is a big jump in the coefficients as there is here from three to 26, you know that there is a very distinct category change as there is from a 660 to 1125 and so on. Most of the time, though, I would just completely ignore this one. And this, this is called an icicle plot. And it shows sort of the same information about when various cases got dropped in with everything else. It's kind of pretty to look at. I find it kind of meaningless. And so truthfully, the default output for SPSS is hierarchical clustering to me is not very helpful. In fact, it's so unhelpful, I'm just going to delete it all. And I'm going to do this over again. I'm going to come back up to my recent menu items. And I'm going to go to this analysis again, I'm going to make a couple of changes. I don't want the agglomeration schedule that doesn't really help me. And for plots, I'm going to get rid of the icicle plot. And I'm going to get a dendrogram instead. Dendrogram, that means branches in Greek. So it's a graph of the branches. And this is usually the most important thing you can get out of a hierarchical cluster analysis. I'll hit OK. And now what we have is a chart here that lists all the cases, the cars on the side, and it shows how they group together. So we see, for instance, that these first four cars, the Mazda RX4 and the wagon and the Mercedes 280 and 280 C are very similar to one another. They all go here together. We see that some others, if we come down here. So for instance, the Cadillac Fleetwood, the Lincoln Continental and the Chrysler Imperial, which are all gargantuan American cars with big V8, they all go there together. And then we see down here at the bottom that this one, the Maserati Bora is all by itself for a very long time. This is where cases are individual here on the left, and they gradually get put together. And you see how they come together in each of these branches. That's why it's called a Dendrogram. And so this is a really nice way of seeing how similar your cases are. And if you have more pixels displayed, you can see the entire graph at once. I've got a low resolution right here. And you can see that maybe it makes sense to split this off into say four groups. Looks like we've got a distinct group right here, right there, right there, and right there. And so I can do something else with this. I'm going to come back to the menu here. And what I'm going to do is I'm going to save group membership. Now, I've done a hierarchical analysis, so I didn't have to specify the number of groups. But now that I've looked at the chart, four seems like a good number. So I'm going to come here and say, give me the group membership for each case, breaking it down into four clusters. I'll hit continue. And I'm going to ask for it to not give me any plots. I hit okay. And this time, we're not going to get any output except to say that it did the work. Let's just get that. Here it says that it process them, the place where we're going to see the difference is in the data file. So I'm going to move over to the data file. This button, by the way, will get me over to the data. And now you can see I have a new variable that got added here for clusters for. And you can see that each of the cars is listed in one of these four clusters. And what you can do then is you can then take these cluster memberships, and you can compare them on the other variables. Again, remember, the clustering here is only as valid as the data that we give it. So it's only comparing these cars on a small number of variables. And it's using that to decide what goes with what. It's here, for instance, that you see the Maserati Bora is in a category all by itself. And this is a neat way of looking at the similarity between items, you can do it with people if you're doing market research, you can do it with companies if you're doing some sort of segmentation. And it allows you to see what groups have important similarities for what your purposes are, and which groups you need to treat differently is one another. That's the goal of hierarchical clustering analysis. And you find it's a very easy thing to do in SPSS.