 Welcome to Statistics and Data Science. I'm Barton Poulsen, and what we're going to be doing in this course is talking about some of the ways that you can use statistics to see the unseen, to infer what's there even when most of it's hidden. Now, this shouldn't be a surprise if you remember the Data Science Venn diagram that we talked about a while ago. We have math up here in the top right corner, but if you were to go to the original description of this Venn diagram, its full name was math and stats. And let me just mention something in case it's not completely obvious about why statistics matters to Data Science. And the idea is this, counting is easy. It's easy to say how many times a word appears in a document. It's easy to say how many people voted for a particular candidate in one part of the country. Counting is easy. But summarizing and generalizing, those things are hard. And part of the problem is there's no such thing as a definitive analysis. All analyses really depend on the purposes that you're dealing with. So as an example, let me give you a couple of pairs of words and try to summarize the difference between them in just two or three words. I mean, in a word or two, how is a souffle different from a quiche, or how is an aspen different from a pine tree, or how is baseball different from cricket, and how are musicals different from opera? It really depends on who you're talking to depends on your goals. And it depends on sort of the shared knowledge. And so there's not a single definitive answer. And then there's a matter of generalization. Think about again, take music, listen to Three Concerti by Antonio Vivaldi. And do you think you can safely and accurately describe all of his music? Now, I actually chose Vivaldi on purpose because Igor Stravinsky said you could he said he didn't write 500 concertos, he wrote the same concerto 500 times. But take something more real world like politics. If you talk to 400 registered voters in the US, can you then accurately predict the behavior of all of the voters? There's about 100 million voters in the US. And that's a matter of generalization. And that's the sort of thing that we try to take care of with inferential statistics. Now, there are different methods that you can use in statistics. And all of them are designed to give you sort of a map, a description of the data you're working with. They're descriptive statistics, they're inferential statistics. There's the inferential procedure hypothesis testing. And there's also estimation. And I'll talk about each of those in more depth. There are a lot of choices that have to be made. And some of the things I'm going to discuss in detail are, for instance, the choice of estimators that's different from estimation, different measures of fit feature selection for knowing which variables are the most important in predicting your outcome. Also common problems that arise when trying to model data and the principles of model validation. But through this all the most important thing to remember is that analysis is functional, is designed to serve a particular purpose. And there's a very wonderful quote within the statistics world that says all models are wrong, all statistical descriptions of reality are wrong, because they're not exact depictions, they're summaries. But some are useful. And that's from George box. And so really, the question is, you're not trying to be totally completely accurate, because in that case, you just wouldn't do an analysis. The real question is, are you better off doing your analysis than not doing it? And truthfully, I bet you are. So in some, we can say three things. Number one, you want to use statistics to both summarize your data and to generalize from one group to another if you can. On the other hand, there's no one true answer with data, you got to be flexible in terms of what your goals are and the shared knowledge. And no matter what you're doing, the utility of your analysis should guide you in your decisions.