 Statistics and Excel. Statistical inference. Questions of how close and how confident. Got data? Let's get stuck into it with statistics and Excel. Introduction to statistical inference. Statistical inference is the process of using data from a sample to make estimates or test hypotheses about a population. You will recall from prior presentations we talked about two major buckets or categories of statistics. The first bucket being where we know all the information. We have all the data. Our goal is to try to organize that data in such a way that we can draw meaning from it using tools such as mathematical tools like calculating the average or the mean, the median, the quartiles and so on. And using pictorial tools such as the box and whiskers or box plot and the histogram. The second major bucket of statistics, what we're focused in on now is where we don't know all of the data for the entire population. However, we might be able to get like a sample data of the population. And what we want to do there is to try to extract meaning from the sample data that we have. Once we have extracted that meaning, we're hoping that we can infer some of that meaning onto the entire population. So inference, the major thing that comes to most people's mind is an election type of situation where people, the pollers are trying to take polls to determine what the results of the election will be by taking a sample of the population and seeing if they can infer the results they have there to what's going to happen in the entire voting population when the actual election happens. So note, in this type of statistical analysis, the actual sample that we're looking at, the data that we're actually analyzing is not the goal. That's not the important point. Those people are not important in and of themselves. They're important as people, but not for the statistical poll. What we're getting at, what we're trying to get at is the inferring of the entire population. We're looking for meaning for the entire population. So we're going to use similar tools as with the first bucket of statistics where we're just trying to extract meaning if we already know the entire population, meaning for the sample, we're still going to be looking at things like the average, the mean, the median and so on, the spread of the data. But what we're hoping to do once we know that is infer that on the entire population so we have predictive power about the entire population. So the key goal is to provide an approximate description of the larger population based on the observable data from a smaller sample. So the small sample, it's not our goal to just know everything about the small sample. We want to know information about the small sample so that we can infer that to the larger population. So how close the shape, center, and spread of just some of the population is to the shape, center, and spread of the whole population. You will recall from prior presentations oftentimes when we're looking at a data set, what we want to get from that data set is what's going to be the middle point of the data set, what's going to be the shape of the data set, and so on. And so we want to know those characteristics for the sample, but not so that we can understand the sample more, but so we can infer that and say, is this going to be similar for the entire population? It's the entire population that is important, although we're going to be analyzing, of course, the sample in a similar way as we did when we knew the entire population, and we're just trying to get an understanding of the data that we have. And then this question of how confident we are, gets to be quite tricky, and we'll dive into that more in future presentations. We just want to get an idea of what we're doing with the statistical inference here. But if you take the sample, then the questions are, can we infer that sample to the entire population, and how confidently can we do that? Can we do that with a certain level of confidence? The more we can get numerical data about how confident we are, then the better predictive power we have, the better tools we typically have as well. So practical applications of statistical inference. So clearly, the election polling is usually the first thing that comes to people's minds oftentimes. And in that situation, when an election is impending, it is not feasible to ask every single voter who they will vote for. So clearly, when you're trying to predict the results of the election, we can't just ask everyone, because that's basically we would be taking the election at that point in time. So what we can possibly do is have pollsters take a sample, say a thousand voters, and based on their responses, try to estimate the voting pattern of the entire electorate. So of course, they're going to try to get a sample and see if they can get the data on the sample and infer that results to the entire population. So it is the challenge of statistical inference to extrapolate from the small sample to the larger population. That's what our goal is going to be. So for the medical trials is another common example. So anytime we're doing any kind of scientific kind of testing, a hypothesis testing approach, which is kind of like the foundational thing that you're going to do in science, is going to be a statistical kind of kind of test oftentimes, right? So in medicine, for example, in testing a new drug, it is not possible to test the medication on the entire population. So if we're trying to say, is this drug effective? If someone takes this drug, will it do what we think it's going to do? Well, we can't take the entire population, even the population that is sick with what I do is take a smaller group and then and then test on the smaller group. Now obviously, there's a lot more to the testing than this that we'll get into in future presentations because you have things like a placebo kind of effect. And when you get into the pollster testing, you have things like, can you reach the entire population and so on? And how exactly are we going to pick the sample? So we'll get into all of those kind of nuances in future presentations, which are very important nuances. They're not just minor things, but you get the general gist, the idea of what we're doing here. So based on the reactions of this group, we infer the drug's efficiency and its potential effects on the larger population. All right, let's take a look at a scenario. Imagine that we have height data for a specific group of adult men. And our objective is to gain a comprehensive understanding of the height distribution for all adult men. Now height is a is a good one to test out oftentimes, when you're first kind of looking at the concept of this inference type of analysis, because one, the heights normally will come to somewhat of like a bell shaped distribution. In other words, most people, when you look at the height of men, for example, are going to have, you know, 10 toward a middle point, most people are going to be somewhere on an average height.