 In our last video on basic graphics, we talked about bar charts. If you have a quantitative variable, then the most basic kind of chart is a histogram. And this is for data that is quantitative or scaled or measured or interval or ratio level, all of those are referring to basically the same thing. And in all of those, you want to get an idea of what you have and a histogram allows you to see what you have. Now there's a few things that you're going to be looking for with a histogram. Number one, you're going to be looking for the shape of the distribution. Is it symmetrical? Is it skewed? Is it unimodal, bimodal? You're going to look for gaps or big empty spaces in the distribution. You're also going to look for outliers, unusual scores because those can distort any of your subsequent analyses. You'll look for symmetry to see whether you have the same number of high and low scores or whether you have to do some sort of adjustment to the distribution. But this is going to be easier if we just try it in R. So open up this R script file and let's take a look at how we can do histograms in R. When you open up the file, the first thing we need to do is come down here and load the data sets. We'll do this by running the library command. I just do control or command enter. And then we can do the iris data set again, we've looked at it before, but let's get a little bit of information from it by asking for help on iris. And there we have Edgar Andersen's iris data, also known as Fisher's iris data, because he published an article on it. And here's the full set of information available on it from 1936. So that's 80 years old. Let's take a look at the first few rows. Again, we've seen this before. Seeple and petal length and width for three species of iris. We're going to do a basic histogram on the four quantitative variables that are in here. And so I'm going to use just the hist command. So hist and then the data set iris and then the dollar sign to say which variable and then seeple dot length. When I run that, I get my first histogram. Let's zoom in on a little bit. And what happens here is, of course, it's a basic sort of black line on white background, which is fine for exploratory graphics. And it gives us a default title that says histogram of the variable. And it gives us the clunky name, which is also on the x axis on the bottom, it automatically adjusts the x axis, and it chooses about seven or nine bars, which is usually the best choice for a histogram. And then on the left, it gives us the frequency or the count of how many observations are in that group. So for instance, we have only five irises, whose sepal length is between four and four and a half centimeters, I think it is. Let's zoom back out. And let's do another one now this time for sepal width. You can see that's almost a perfect bell curve. If we do pedal length, we get something different. Let me zoom in on that one. And this is where we see a big gap. We've got a really strong bar there at the low end. In fact, it goes above the frequency axis. And then we have a gap, and then sort of a bell curve that lets us know that there's something interesting going on with the data that we're going to want to explore a little more fully. And then we'll do another one for pedal width. I'll just run this command. And you can see the same kind of pattern here where there's a big clump at the low end. There's a gap. And then there's sort of a bell curve beyond that. Now another way to do this is to do the histograms by groups. And that would be an obvious thing to do here, because we have three different species of iris. So what we're going to do here is we're going to put the graphs into three rows, one above another in one column. I'm going to do this by changing a parameter, par for parameter, and I'm giving it the number of rows that I want to have in my output. And I need to give it a combination of numbers. I do the C, which is for concatenate, it means treat these two numbers as one unit, where three is the number of rows, and then the one is the number of columns. So I run that it doesn't show anything just yet. And then I'm going to come down and I'm going to do this more elaborate command. I'm going to do his that's the histogram that we've been doing. I'm going to do petal length, except this time in square brackets, I'm going to put a selector. It's this means use only these rows. And the way I do this is by saying I want to do it for the Satosa irises. So I say iris, that's the data set and then dollar sign and then species of the variable. And then two equals because in computers that means is equivalent to and then in quotes, and they have to spell it exactly the same with the same capital station, I do Satosa. So this is the variable and the row selection. I'm also going to put in some limits for the x, because I want to manually make sure that all three of the histograms I have have the same x scale. So I'm going to specify that breaks is for how many bars I want in the histogram and and actually what's funny about this is it's really only a suggestion that you give to the computer. Then I'm going to put a title above that one. I'm going to have no x label and I'm going to make it read somebody do all of that right now. I'll just run each line. And then you see I have a very skinny chart. Let's zoom in on it. And so it's very short. But that's because I'm going to have multiple charts and it's going to make more sense when we look at them all together. But you can see by the way that the pedal width for the Satosa irises is on the low end. Now let's do the same thing for Versa color. I'm going to run through all that. It's all going to be the same except we're going to make it purple. There's Versa color. And then let's do virginica last. And we'll make those blue. And now I can zoom in on that. And now what we have are three histograms. It's the same variable pedal width but now I'm doing it separately for each of the three species. And it's really easy to see what's going on here now. Satosa is really low. Versa color and virginica overlap, but they're still distinct distributions. This approach by the way is referred to as small multiples making many versions of the same chart on the same scale. So it's really easy to compare across groups or across conditions, which is what we're able to do right here. Now by the way, anytime you change the graphical parameters, you want to make sure to change them back to what they were before. So here I'm going par and then going back to one column and one row. And that's a good way of doing histograms for examining quantitative variables and even for exploring some of the complications that can arise when you have different categories with different scores on those variables.