 Hello, everyone. Today we're going to be talking about the central limit theorem, and this is going to be a relatively quick slide. It's not overly difficult, but it is a little bit difficult to kind of get started with or to wrap your head around. But once you understand it, it goes pretty quickly. So the central limit theorem, you might have noticed so far that we are quite concerned with means, with calculating the middle points of our data. And why are we so concerned with means, you might be asking? First, they give us a middle ground for comparison. They give us an idea of where our data falls into, what is an average data point look like, and how far away are we from that average. It's often very interesting to us to figure out how far we are away from our average points. We're also interested in with means because they're easy to calculate. Calculating the middle points of our data are relatively easy and a good place to start whenever we're trying to analyze and make sense of what our data is telling us. So the central limit theorem very much relates to calculating means or finding the center point. In statistics, central limit theorem is extremely useful for analyzing your data. And it states that if we collect samples of size in with a large enough in, calculate each sample's mean and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape or a normal distribution. And this can be very useful whenever we're attempting to, for example, analyze distributions that are not normally distributed, and we want to be able to say something about them. This gives us a way to be able to analyze those distributions as we would in some sort of normal distribution way by calculating the sample's mean and then creating a histogram of those means. So looking at the samples that we have, calculating the means and then the collection or the distribution of those means eventually becomes normal themselves. So the central limit theorem, it does not matter what the distribution of the original population is because, again, we're essentially looking at the means of the samples that we collect. So any distribution that that sample set has will be turned into, essentially, we're converting it into a normal distribution by taking the mean. The distribution of sample means and the sums tend to follow the normal distribution. And the more data points we have or the more samples we have, the more like a normal distribution it becomes. The sample in that is required in order to be large enough depends on the original population from which the samples are drawn. So essentially, if we have a lot of big population, we want to collect as much data as possible like we've talked about before. If we want to represent what our population or features of our population and make sure that all of those features are represented accurately, we need enough samples to be able to describe the features of our population. So the sample in that's required in order to be large enough depends on the original population from which the samples are drawn. So kind of a hard and fast rule, we need at least 30 samples or we need the samples to come from a normal distribution. So again, if we're dealing with the original population has data that is already normally distributed, then we don't have to worry so much about having a large number of samples because we're just essentially going from one normal distribution to another. And it kind of is already described, let's say. But if our data is not normally distributed, then we need at least 30 samples and the further away from normal the distribution is, the more samples we're going to require. But the kind of like I say hard and fast rule is we need at least 30 samples. If the distribution is far from normal, then more observations are needed like I just said. Sampling is done with replacement. So whenever we are making these samples whenever we're taking samples from some population that is possibly not normally distributed, then we sample with replacement. So everything we take out we put back in before we do our next sample. So the central limit theorem basically says kind of a summary or review of what the central limit there it says is if you keep drawing larger and larger samples and calculating their means, the sample means form their own normal distribution. So we're essentially translating or how can I say modifying or rearranging shall I say the data from one distribution and converting it into a normal distribution for analysis. And there's lots of different reasons that we would do that and I'll show you some examples in a second. The central limit theorem for sums says if you keep drawing larger and larger samples and taking their sums instead of like as we said before, the kind of the standard central limit theorem is taking their means. Instead, if we take their sums the sums form their own normal distribution, which approaches a normal distribution as the sample size increases. So we can do this with both the means and the sums of our samples and again we'll do a practice to show you that in a second. If you're being asked to find the probability of the mean, use the central limit theorem for the mean. So again, we want to find the probability that out of some sample some population, what is the probability that that population is going to have a particular mean. Well, then we're just going to use the central layer central limit theorem because it is calculating the means of the samples that we that we take. If you're being asked to find the probability of a sum or a total use the central limit theorem of sums. So think of it like the normal distribution that we talked about last time. We're essentially from whatever population or whatever samples we already have that may or may not be normally distributed. We want to be able to calculate the probability of either that samples mean. What's the probability that we're going to find a group with this mean or what's the probability that we're going to find a group with a particular sum of values. So then that's whenever we would use central limit theorem or central limit theorem for sums. So this allows us to examine approximately normal distributions and it provides useful relations for mean and standard deviation. So again, we're essentially just looking at the data in a slightly different way and being able to work with distributions that are not normal in a slightly easier way as well. With small sample sizes distributions of sample means are not normal in general and hints are difficult to use to gain useful knowledge about the population. Again, we've already talked about this a little bit or quite a bit. If we don't have enough data, we most likely are not getting a representative sample of the overall population. Again, we just need as much data as possible to make sure that we're representing the population accurately. With a central limit theorem and a large enough sample size, we can examine approximately normal distributions and easily obtain valuable properties about our population. So again, dealing with distributions that are not quite normal or sometimes very far away from normal is a little bit more difficult to work with than the normal distribution that we've already talked about. In this case, we're essentially transferring at least some of those features into a normal distribution that then we can work with relatively easily. So here's the two, I have two practice tasks for you, let's say. In this case, we're going to roll 10 dice and you can use this dice virtualization software at random.org. And what I want you to do is record the dice rolls and calculate the mean for each dice roll. Now what are we actually calculating here? Well, we're calculating the central limit theorem over these dice rolls because we're calculating only the mean. In this case, if we do 50 rolls, we generate a histogram based on the total mean values. Now the question is, what should the distribution be? What should the distribution be? So I want you to try to do this yourself. I think we can all guess what the distribution is going to be. Maybe even try it, roll, roll, I didn't put it on here, I really should have. Roll 10 dice and just look at their distribution and then roll 10 dice and calculate the mean for each roll and think about what the distributions would be. So I urge you to at least practice this. It will really help bring home what the central limit theorem does for our distribution. And the practice two is, again, roll 10 dice using the same virtual die. And we want to record dice rolls and calculate the sum total for each roll. If we do this 50 rolls or minimum 30 rolls, then we generate a histogram based on the sum total values. What should that distribution be? Again, these are practice for you. I think you probably can guess what the distribution is going to be. But I just want to bring home exactly how this works and what kind of information can we actually get out of it. So that's pretty much it. I'm going to add some additional kind of supplementary videos this week just for your reference. If you don't understand or you're not really getting the idea of what the central limit theorem is for or how you can use it or how you calculate it. So I'll post some more videos about that. And that's pretty much it for this week. Thank you very much.