 In design, you'll hear the saying less is more. And the same thing is true to a certain extent in data analysis. Instead of trying to analyze 50 different variables, why don't you collapse them into a smaller number of more manageable variables, that's going to give you more reliable information and more stable insights. But in order to do this, you need to first establish that you can collapse the data that there are correlations that you're trying to combine like with like. And the most common way of doing this is with a reliability analysis, which looks at the relationship of several variables that you would like to combine. This is very common if you're working with survey data and you're asking several questions about the same general concept or construct. In this data set, which is based on real data on the big five personality factors, and then we've talked about those before, because a movie has a built in data set, but it had the summary data or rather, it was already collapsed. This is a data set that has the 10 questions for each of the five scales, extra version, agreeableness, openness to experience conscientiousness eroticism. And so we've got 50 personality variables here. And this is originally from an open source data set that has nearly 20,000 cases. In the downloads for this chapter, there's a folder called big five, it gives the references and the notes for this data set. What I've done is I have a randomly selected 1000 cases that we're going to work with to demonstrate liability and really the scale functions of Jamovie. So let's start with this. We're going to just see what we have here, we have a data set with an ID numbers, that's just a row ID that I put in there, the age of the respondent, they're reported gender. And then we have a whole bunch of questions that they rated themselves on. And the data information that you can download contains a description of what each of these questions were, but they're all rated on a one to five response scale. And the numbers up here, for instance, and four and five mean this is the neuroticism fourth question, neuroticism fifth question, and so on. There's a for agreeableness, C for conscientiousness, and O for openness. And that gets us to our last one. So there's 10 variables on each, and we want to see if we can combine them. So we're going to come up here to factor. Now factor is an umbrella term, because all of these procedures have a lot in common, they all have to do with fundamentally whether there's correlation or covariance between the variables that you're looking at. But we're going to start with this one scale analysis, a scale meaning a questionnaire or a survey where you have multiple questions designed to measure or scale the same thing. So we'll hit reliability analysis. And here are our options. It simply asks us what the items or where the questions are that are supposed to go into the same scale that are fundamentally measuring the same thing. Now the nice thing is this information we have is already labeled e one through e 10. These are 10 questions that are designed to measure extra version, as opposed to introversion. So let's just take those 10 and move them over. So when we run the reliability analysis, this is our default table and it gives us something called Chrome box alpha, that's a lowercase Greek alpha or a, and it's like a correlation coefficient. And what you're hoping for here is to have a positive value that is close to one maybe 0.7 or higher. We have a negative value, which is really bad and it's a low value. This is happening because we've got some funny things going on with the data. But that is something we're going to be able to fix. But let's do a couple of things really quickly. I want to come back to these scale statistics, and I want to click mean. And I want to click standard deviation. Now right now we have a mean that's really close to three. That's on a one to five scale, that's going to be the midpoint. So that's great. Standard deviation. There it is. I find it really helpful to come over here and do item statistics and get the Chrome box alpha if the item is dropped. So we've got 10 items in this scale. What is going to do on each of these is say, Well, if you got rid of that one and kept the other nine, what would it be? And you see it bounces around, but it's always negative. But a graphic can also be really handy here. Let's click on this one correlation heat map. It's kind of a cute thing, especially when you have positive and negative correlations. We're going to scroll down here. And in a second, the correlation heat map will pop up. What happens is when a correlation is negative, it shows up as red. When it's positive, it shows up as green. And we got this nice little checkerboard pattern. And the reason that's happening is exactly what our analysis told us here, it says it looks like five of these variables are reverse scaled. That's something that you do occasionally when you're doing surveys to make sure people are paying attention, you flip it around. And if we were in SPSS or some other program, we would then have to go and manually reverse variables and do this all over again. But fortunately, the package that this is based on the psych package and R makes it really easy to do this. All we have to do is scroll down here, and tell it which ones are reversed. And it tells us right here e246810. So I'm going to go to four, I'm just double clicking six, eight, and 10. And look what happens over here. It's going to recalculate the alpha for this scale. And now it's recalculated the scale. And you can see, for instance, that we've got these little a's here to indicate that these variables have been reverse scaled. And look what's happened. The Chromebox alpha has gone from a negative two point something to a positive point eight nine. And you can tell that dropping one of these variables, it's not going to help it overall. This is a very tight group of variables. Let's come down and look at the heat map now. It's all green, everything's correlated with everything. And that lets us know that we are now safe to average these 10 variables. And so instead of having 10 variables that measure extraversion, we can combine them and get a single scale score that gets a more reliable and stable measure of extraversion. And of course, this is for the first of five major personality factors. This is extraversion, there are several others. And we would simply repeat this analysis for each of them as we go down. But the general concept is the same. We pick the variables we want. It calculates Chromebox alpha, which is like an average correlation between the items. It lets us know if any of them need to be reverse coded, it's very easy to set that up in the menus. And then we can see how each item contributes or if it seems to be pulling away. And in this case, the numbers support the whole thing. And the graphical representation supports the whole thing to we have consistent data, we can average it and then we can have the less as more fewer variables, but more reliable and potentially more meaningful data to work with in our other analyses.