 In our two previous videos, we looked at some basic graphics for one variable at a time. We looked at bar charts for categorical variables, and we looked at histograms for quantitative variables. While there's a lot more you can do with univariate distributions, you also might want to look at bivariate distributions. We're going to look at scatter plots as the most common version of that. You do a scatter plot when what you want to do is visualize the association between two quantitative variables. Now I actually know it's more flexible than that, but this is the canonical case for a scatter plot. And when you do that, what sorts of things do you want to look for in your scatter plot? I mean, there's a purpose in it. Well number one, you want to see if the association between your two variables is linear or if it can be described by a straight line, because most of the procedures that we do assume linearity. You also want to check if you have consistent spread across the scores as you go from one end of the x axis to another, because if things fan out considerably, then you have what's called heteroscedasticity, and it can really complicate some of the other analyses. As always, you want to look for outliers because an unusual score or especially an unusual combination of scores can drastically throw off some of your other interpretations. And then you want to look for the correlation. Is there an association between these two variables? So that's what we're looking for. Let's try it in our simply open up this file and let's see how it works. The first thing we need to do in our is come down and open up the data sets package. Just do commander control and enter. And we'll load the data sets. We're going to use empty cars. We looked at that before. It's got a little bit of information. It's road test data from 1974. And let's look at the first few cases, I'll zoom in on that. Again, we have miles per gallon cylinders, so on and so forth. Now, anytime you're going to do an association, it's a really good idea to look at the univariate or one variable at a time distributions as well. We're going to look at the association between weight and miles per gallon. So let's look at the distribution for each of those separately. I'll do that with a histogram. I do his and then in parentheses, I specify the data set empty cars in this case, and then a dollar sign to say which variable in that data set. So there's the histogram for weight and you know, it's not horrible, though it looks like we got a few on the high end there. And here's the histogram for miles per gallon. Again, mostly kind of normal, but a few on the high end. But let's look at the plot of the two of them together. Now, what's interesting is I just use the generic plot command. I feed that in and R is able to tell that I'm giving it two quantitative variables and that a scatter plot is the best kind of plot for that. So we're going to do weight and miles per gallon. And then let me zoom in on that. And what you see here is one circle for each car at the joint position of its weight and its miles per gallon. And it's a strong downhill pattern. Not surprisingly, the more a car weighs and we have some in this data set that are five tons, the lower its miles per gallon, we have to get down to about 10 miles per gallon here. The smallest cars, which appeared away substantially under two tons, get about 30 miles per gallon. Now, this is probably adequate for most purposes, but there's a few other things that we can do. So for instance, I'm going to add some colors here. I'm going to take the same plot and then add on additional arguments or say, use a solid circle, PCH is for point character 19 is a solid circle. CX has to do with the size of things. And I'm going to make it in a 1.5 means make them 150% larger. Call is for color. And I'm specifying a particular red, the one for data lab in hex code. I'm going to give a title. I'm going to give an X label and a Y label. And then we'll zoom in on that. And now we have a more polished chart that also because of the solid red circle, it makes it easier to see the pattern that's going in there, where we got some really heavy cars with really bad gas mileage, and an almost perfect linear association up to the lighter cars with much better gas mileage. And so a scatter plot is the easiest way of looking at the association between two variables, especially when those two variables are quantitative. So they're on a scaled or measured outcome. And that's something that you want to do anytime you're doing your analysis to first visualize it, and then use that as the introduction to any numerical or statistical work you do after that.