 then you have the distributions somewhat around the average and as you get towards very tall or very short, then it's a lot less likely, there's a lot less people that fall into that category. So it has that distribution that is kind of a bell shaped type of distribution. Also people already have kind of a sense of what the height distribution should look like just from observing people. So you kind of have an idea of what you expect to be happening and then when you run the testing you can kind of see that in your mind. However, note that of course, with many other kind of tests that we might run, we might have different distributions of the data and we might have no idea what the effective, you know, what the results will look like. We might be testing something that we have no idea but it's good to start with kind of height. So we could potentially select a small number of men whose heights seem to reflect the heights of all men. So you might say, hey, look, if I was to take a sample, the question of course will then be, how am I gonna take a sample of men? I can't test all men in the population to see what the average height of men are. But what I could do is take a sample. Well, how am I gonna take the sample? Well, you might say I have an idea in my mind of basically how tall people are. So why don't I just choose in my sample men that I think look about average and then I'll select men that have a distribution that I think is about right that mirrors the actual distribution and then it'll be easy for me to pick my sample. However, we're starting with a lack of knowledge about the overall distribution of heights across the entire population of adults. So the problem with that of course is you're assuming that you know the answer to the problem that you're trying to solve, right? So if we already knew the answer of the middle height, then of course picking a sample would be easy. It would also be defeating the points because we would just simply be picking a sample that ties out to the actual height. And this seems like kind of obvious. However, this kind of thing happens sometimes, right? Because we start to think, well, if I'm gonna pick a sample, it would be better since I already have some knowledge about the sample. It would be easier if I just picked people that I know are kind of in the middle already. But clearly by doing that, then you're inserting your own bias into the sample. And that's, so you're trying to help things out in that case, because you think you know something about the world and you're gonna pick a sample that kind of reflects what you already know. But in doing that, then of course you're putting a bias into the sample and that's gonna cause a problem. And if you're wrong about the assumption, then of course, what your bias in the sample is gonna mess up the whole thing. So the whole point is then that you have to have some kind of randomness involved. So this is gonna be the key for the statistics. When we pick the sample, we have to have randomness. Now sometimes you could have different kinds of picking that could be more complex than just simply total randomness. But there's always gonna be some format of randomness in a sample whenever we're picking it because we want to remove the biases when we pick the sample. So we're gonna use the idea that we don't know. We're gonna say, I know I don't know, right? It's a tough thing to do. I know I don't know. I'm not gonna try to help. When I know I have no idea what I'm doing, right? I'm gonna use the idea that I have no idea and then I'm gonna try to pick completely randomly and then we'll see, then we'll go from there and then we'll have an unbiased kind of set of data that would be the idea. So randomness, random selection is crucial to gathering the representative sample. So this key is gonna come up again and again. If we're trying to say, I wanna take a sample that's gonna tell me something about the entire population, I have to generally have to use randomness in some way, shape, or form to pick the sample and that can be more nuanced. So we'll talk about problems, do that more in the future but that's the key concept. So the concept of randomness ensures that every individual in the population has an equal chance of being selected in the sample. Now, in some cases you might be able to do that in other cases you will not in real life. If you're taking a poll, for example, for voting, it's gonna be very difficult to say that everybody in the population, the voting population has an equally chance of being selected because how are you gonna do that? You only have their phone number, right? You don't know, it's gonna be difficult to contact people that maybe don't use the phone anymore. Maybe they contact by messages or something. So it's gonna be difficult. So in the real world, when you actually apply these, this concept, we have to, of course, adopt it to what is practical as well and take into consideration the effect of that. But the concept, of course, would be that I would like to have everybody in the population to have an equal chance of being selected so that I have a true random sample of the population. So this reduces bias, improves the reliability of our inferences about the population. So estimates and confidence, and by the way, we'll get into this more in the future, but just note, of course, like with the polling, for example, if you only poll people that have a telephone number in the phone book, then a lot of people these days might not have a telephone number in the phone book or they might not answer their phone and the people that are likely to answer the phone and have a telephone number in the phone book and are actually willing to talk to a pollster might have different voting patterns than other people. So you can see why there's a problem sometimes that we have to take into consideration when we're applying these concepts in the real world. So estimates and confidence intervals. Statistical inference provides us with an estimate of population parameters like the mean or the proportion, but it also provides a range of values around the estimate that is likely to contain the true population parameter. So this range is called the confidence interval and the likelihood that this interval contains the true value is called the confidence level. So if I analyze, for example, a sample of the population, I can get a mean, a mid value for that sample. I can get the distribution of that sample, but then the question is, well, if I try to infer that answer to the entire population, how confident am I of the results that I have? And that gets to be kind of a more of a nuanced question and we would like to lock that down as much as possible mathematically if we can because the level of confidence will give us a lot more kind of predictive power if in the future. So for example, we might be 95% confidence that the population mean lies within a certain interval. So you've probably heard.