 Now, data science wouldn't be data science and its methods without a little bit of statistics. So I'm going to give you a brief statistics overview here of how things work in data science. Now you can think of statistics as really an attempt to find order in case find patterns in an overwhelming mess, sort of like trying to see the forest and the trees. Now let's go back to our little Venn diagram here, we recently had math and stats here in the top corner. So we're going to go back to talking about stats in particular. What you're trying to do here, one thing is to explore your data. You can have exploratory graphics, because we're visual people and it's usually easiest to see things. You can have exploratory statistics, a numerical exploration of the data, and you can have descriptive statistics, which are the things that most people would have talked about when they took a statistics class in college if they did that. Next, there's inference. I've got smoke here because you can infer things about the wind and the air movement by looking at patterns and smoke. The idea here is that you're trying to take information from samples and infer something about a population, you're trying to go from one source to another. One common version of this is hypothesis testing. Another common version is estimation, sometimes called confidence intervals. There are other ways to do it, but all of these let you go beyond the data at hand to making larger conclusions. Now one interesting thing about statistics is you're going to have to be concerned with some of the details and arranging things just so. For instance, you get to do something like feature selection, that's picking variables that should be included or combinations. And there are problems that can come up, there are frequent problems, and I'll address some of those in later videos. There's also the matter of validation. When you create a statistical model, you have to see if it actually is accurate. Hopefully you have enough data that you can have a holdout sample and do that, or you can replicate the study. Then there's the choice of estimators that you use, how you actually get the coefficients or the combinations in your model. And then there's ways of assessing how well your model fits the data. All of these are issues that I'll address briefly when we talk about statistical analysis at greater length. Now I do want to mention one thing in particular here. And I just call this beware the trolls. There are people out there who will tell you that if you don't do things exactly the way they say to do it, that your analysis is meaningless, that your data is junk, and you've lost all your time. You know what, they're trolls. So the idea here is don't listen to that. You can make enough of an informed decision on your own to go ahead and do an analysis that is still useful. Probably one of the most important things to think about in this is this wonderful quote from a very famous statistician that says, all models or all statistical models are wrong. But some are useful. And so the question isn't whether you're technically right or you have some sort of level of intellectual purity, but whether you've done something that is useful. But by the way, it comes from George box. And I like to think of it basically as this as wave your flag, wave your do it yourself flag, and just take pride in what you're able to accomplish, even when there are people who may be criticizing it. Go ahead, you're doing something, go do it. And so in some statistics allow you to explore and describe your data. They allow you to infer things about the population. There's a lot of choices available, a lot of procedures. But no matter what you do, the goal is useful insight. Keep your eyes on that goal, and you will find something meaningful and useful in your data to help you in your own research and projects.