 Before we really get started in this course, there's one important piece of advice I always have to tell everybody, and that is when you're working with data, always do pictures first. The idea is you want to take a look at your data, see what's in front of you, get the context, get the big picture, maybe zoom in on some smaller things, but you need to start with a visual analysis. In other words, first graphs, then numbers. As far as I'm concerned, the graphs are the analysis and the numbers are simply a way of lending precision to the numbers. I always tell my students when I teach statistics that every single thing we do in the class can be done visually, you can graph it, or you can calculate it with numbers that's possible too, but graphs and visuals come first. Now I can't talk about this without sharing the world's best known example within the statistics community. It's called Anscombe's Quartet, a statistician Charles Anscombe, back in the 70s, somehow found a set of data that did something very unusual. We have four small sets of data here. And what you need to see is that these summary statistics on the bottom, the sample size, the mean, that's the average, the standard deviation and measure of how spread out things are, the correlation between the two variables, and the regression equation with the slope and the intercept. See those numbers right there. They're the same as those numbers right there are the same as those numbers right there are the same as those numbers right there. So based on a cursory numerical analysis, these four data sets are identical. On the other hand, when you graph it, you see it doesn't really hold up. Here's the first set of data. This is a nice, normal looking regression equation. Things are scattering on a little bit on this line. This is what we're expecting in regular analysis. On the other hand, the second pair of variables has this really clear but curved relationship. Now, the kind of analysis we usually do is looking for straight lines because that's the simplest possible way of describing an association between two things. You can draw a straight line through this, but you're doing damage to the relationship because really you need to be modeling the curve. There are ways to do that, but they're relatively advanced and personally I've never had to deal with them. The third set of variables, you see, we've got this really strong straight line, but we've got one number that's popping up and it's an outlier. And the immediate question is what's going on with that number? Why is it out of line? Because if you're doing, for instance, a manufacturing process that might indicate something's broken. It might just be a sensor that would be nice, but it could be something about the manufacturing process. Again, our standard approach doesn't deal well with outliers and anomalies. And so you got to be careful about that. And then the last one set for is a very bizarre, really kind of a pathological data set where all of the numbers are jammed over here on the left with this one extreme outlier on the right, which would normally be an indication that something has gone terribly, terribly wrong in the data gathering. And so simply by looking at them, you can see that there's an enormous difference, even though based on common statistical measures, they were identical. Now, these are artificial data sets. Charles Ann's come had to work very hard to find these four sets of data, but diverged so much graphically. But closer to real life, there are other reasons why you would want to do a graphical analysis first. So for instance, take this, this is potential data about donations to a nonprofit organization by month. And you can see that on the left side, things are motoring along kind of fine about 5000 bucks a month great and then suddenly halfway through it more than doubles. And it settles down a little bit, then it goes up another 10,000, and it settles down, then it goes up another 10,000, and then it drops by 50%. The graphical analysis, just a simple line here, lets you know, we've got things going on and we need to figure out what's making it jump up at each of these times, because maybe we can do more of that. And then also what making it drop off so much at the end. This is the way that even a visual analysis can guide the follow up questions and can guide the actions that you take within your organization as a way of better understanding what's happening and allowing you to maximize your effectiveness and your efficiency in reaching your own goals. And besides, there's another thing, people are very good at visual analysis, it's a high information density medium. And so by getting a picture by getting visuals, you're going to get a lot more insight than you could with simple tables of numbers. And so the moral of all of this is simply always start by looking. Remember, begin with graphs and then follow them up with any numerical analysis that you want to do. In doing so, you'll get a lot more insight and you'll avoid a lot more problems on the way of getting to the information that you need.