 In this video, I'm going to break down what all of statistics is about. Everything in statistics hangs on one core idea that is so big, you will not want to miss it. Start reading the first sentence of the statistics Wikipedia page. The definition is not bad, but it could be more focused. I then went to chatGPT and asked what is statistics? It mostly copied the first couple lines in the Wikipedia page. With chatGPT and Wikipedia failing us, let me tell you what statistics. All of statistics is about statistics is about fishing. Before you click away, hear me out. I have some stories that I think will stick in your brain, and I think stories stick in the mind better than facts. So again, statistics is about fishing. Not just some of statistics, all of statistics is about fishing in some way. A curious person wants to know the average length of fish in a lake. This person goes to the lake and starts fishing. He's able to catch a few fish. What he catches is called the sample. This is a really important word in statistics, which is drenched in a lot of additional meaning that I will explain in a second. Another important word is the population, which represents all of the fish, both the sample and those unseen caught fish in the lake. All of statistics is about samples and populations. Samples and populations have their own distinct alphabets in statistics. These is so statisticians can clearly communicate when they are talking about one or the other. For example, the average of a sample is represented with an X bar. To represent the average of a population, the letter mu is used. Mu is a Greek letter. Descriptive statistics on samples are represented with Latin letters. The true statistic of the population, which can't be seen, is represented with Greek letters. So in the fishing story, the set of fish caught would be X1, X2, X3, all the way to Xn, the last fish. If we want to take the average of the sample of fish caught, this would be represented by the Latin X bar. This is distinct from the average of the full population of fish, which is mu. Mu is the lower case of Greek m, which stands for the mean. Now you may say, look, this is a fun fishing example, but not all of statistics has to do with samples and populations. What about things like forecasting, or data visualization using histograms, or hypothesis testing, or asymptotics, and statistical learning? Hang on tight, because I'm going to show how some of these fields become areas of statistics. I'm also going to make a profound argument that applies to all of science at the end of the video. The spoiler is that all scientific areas of inquiry involve fishers. Forecasting is about fishing. The sample is all the historical data and the population is all the historical data and future observations. The future observations are not known, so the forecaster has to put uncertainty bounds around their best guess. These are prediction intervals. If you want to know more about forecasting, check out some of my previous videos. Histograms are about fishing. Histograms are the plotted sample of points, which are supposed to represent the underlying population of data. The fisher could plot the length of the fish caught, and make some assumptions about how well his sample may or may not generalize to all the lake, to all the fish in the lake. Asymptotics is fishing. Asymptotics is a branch of statistics that focuses on how quickly a statement about a sample estimate, such as x-bar approaches the population parameter. I always felt like this was a little bit like computer science. In computer science, there's a concept of complexity, where big O notation is used to determine if one algorithm has fewer iterations than another. The algorithm with fewer iterations has a lower complexity. This holds also for statistics in an analogous way, but swap out iterations with the sample size and algorithms with estimators. Some estimators are more efficient than others because they can get to parameters of populations faster, sample-wise, than others. This is why the field of experimental design is an important branch of statistics. In experimental design, statisticians have optimized studies where a lot can be said with an astoundingly small set of data. Statistical learning is fishing. Let's take one example of a statistical learning algorithm, regression. In regression, we end up with coefficients. These coefficients are each associated with confidence intervals. These intervals represent the probability that they will cover the true population coefficient. Bayesian epistemology is one way to reason from the sciences. It correlates a degree of belief with evidence. In this way, you can say most areas of scientific inquiry have some use for statistics. Bayesian epistemology is fishing. The samples represent all the evidence that we have to date, and the population is all the evidence that can be gathered now and in the future. Thanks for watching, and we'll see you next time.