 Hello there in this video. We are going to talk about the incredible central limit theorem You may not think incredible and you first learn about it But actually this is one of the most powerful theorems and statistics that it allows us to make inferences and guesses about the true values of population means and population proportions very Very powerful. So what's going to be happening now is instead of finding the probability that? Some data value is less than greater than or between some designated values We are now going to look at what is the probability that a sample mean The mean of a sample is less than greater than or between some designated values. So here we go central limit theorem So by definition the central limit theorem for sample means says that if you keep drawing larger and larger samples and Calculating their means the sample means form their own normal distribution called the sampling distribution. So let me put this in the perspective for you So what's going to happen is imagine I have this pool of test results like I have 500 test results from some tests that was given at the college Anyway, if I randomly grab 10 results 10 test scores and average them I Found the mean of that sample. Well, what if I grab another 10 results and Then I find their average. I find the mean of that sample. I grab another 10 results. I Average them. I find the mean of that sample. I grab another 10 another 10 another 10 I keep grabbing in groups of 10 randomly and I keep recording What is the mean of each of those samples? Well, if I make a histogram Using those sample means the histogram will actually be bell shaped the data will be normally distributed And as my sample sizes get bigger and bigger like if I pull in groups of 30 groups of 40 groups of 50 I can guarantee That the distribution of sample means the means of those samples will be normally distributed. That's what the central limit theorem is telling us So the following are assumed the random variable X has a distribution which may or may not be normal With mean u and standard deviation sigma and in simple random samples all of size n are selected from the population So remember we could do sample sizes of 10 20 30 as that sample size goes up the distribution of the sample means will become normal approximately So when they use the central limit theorem well ideally the central limit theorem is used for samples of size n larger than 30 so larger than 30 30 is kind of that magic number where our distribution of sample means becomes clearly normal To if the original population is normally distributed then for any sample size and the sample means will be normally distributed So sample sizes larger than 30 And if the original population is normally distributed I can guarantee to you that the Distribution of sample means will also be normally distributed not just for values of n larger than 30 Remember 30 is that magic number So the mean of the sample means also known as the sampling distribution The mean of the sample means so remember I was taking those test results 10 at a time or 20 at a time or 30 at a time And I was calculating each of their means well if I take all those means and then I average them The mean of that sampling distribution mu sub x bar the mean of those sample means Will be the mean of the distribution itself mu the standard deviation of the sample means the standard deviation of the sampling distribution is not so obvious the Standard deviation of the sample means that's why you have the subscript of x bar the standard deviation sigma of the sample means is The standard deviation of The actual original data values distribution itself divided by the square root of the sample size Make sure you make a note of this formula in other words the standard deviation is adjusted when you're trying to look at the standard deviation of a sampling distribution the distribution of the sample means as opposed to the distribution of basic sample values So here's an illustration showing you okay if I pull data values and sizes of one Yeah, I don't get a very cool distribution If I pull data values five at a time I Get closer to a bell shape pull data values ten at a time Pretty nice bell shape still 20 at a time Really good spread out bell shape if his n was equal to 30 he would be even more bell shape 40 50 60 and so forth So larger sample sizes are definitely idea and are definitely beneficial So the big idea to summarize the central limit theorem to you as the sample size increases the sampling distribution of sample means we're looking at the distribution of the sample means rather than data values now Approaches a normal distribution So how about the classic elevator question? Have you ever gone into an elevator and looked at the maximum weight capacity? Is that maximum weight capacity safe? How was that maximum weight capacity found? I'm going to talk to you about that now suppose an elevator has a maximum capacity of 16 passengers with a total weight of 2,500 pounds Assuming a worst-case scenario in which the passengers are all male because males have a tendency to be heavier than females most of the time What are the chances the elevator is overloaded? Assume male weights follow a normal distribution with the mean of 182.9 and a standard deviation of 40.8 So when looking at just the actual data values themselves they follow a normal distribution With a given mean and a given standard deviation Now we're going to answer some questions So part a if 16 males are in the elevator. What is the average weight? Beyond which the elevator would be considered overloaded Well, let's think about this. What is the total capacity of that elevator? How many pounds? well That elevator can hold up to 25 pounds If it was distributed up amongst 16 males or 16 people that is about 156.25 pounds on average per person that would be allowed Not very much That's step one Step two find the probability that one randomly selected male That's just one data value This is just like previously what we did in the module has a weight greater than one fifty six point twenty five Find the problem Find the probability that one data value is greater than one fifty six point twenty five So this is the same rhythm that we've done previously in the module We put our mean right in the middle of our bell curve, which is 182.9 and Then label or data value were interested in one fifty six point twenty five Now we're looking for probability that a male will be greater than one fifty six point twenty five So the probability a single person a single male on that elevator will have a weight greater than one fifty six point twenty five Well in Google sheets you will have to Take note. What is your mean mu? 182.9 and what is your standard deviation Sigma? This is just like all the other questions in the module. We are dealing with one data value So our Sigma is exactly as they gave it to us in the question 40.8 and then lower bound of our shaded region is one fifty six point twenty five and Upper bound of our region will be Technically infinity, but we just use six nines six nines And we will type this into Google sheets So where are you Google sheets? Here we are. Well, we're going to go to the compute tab and We're going to be in the normal region and Our mu is going to be 182.9 standard deviations going to be forty point eight our lower bound is going to be one fifty six point twenty five and upper bound is six nines You could do seven if you want, but it's not really going to change anything get about point seven four three one point seven four Sorry point seven four three two if you round the four decimal places point seven four three two point seven four three two That's the probability one data value one person's weight will be above one fifty six point twenty five But that's great But I'm interested in the whole group as a whole. I'm interested in knowing those sixteen people in the elevator Find the probability that a sample of sixteen males have a mean weight Greater than one fifty six point twenty five. I want to know about their average So I'm looking for the probability That the mean of that sample of sixteen people will be greater than one fifty six point twenty five because if their average Weights more than that. We have a problem that elevator might fall down. It may not be able to handle that load So we are now talking about a Sample mean we are now talking about our sampling distribution. This is where the central limit theorem kicks in So the good news is the mean of the sampling distribution the mean of the sample means It's just the mean of our given situation the mean weight of men 182.9 that does not change ever It's the standard deviation where you need to be careful my standard deviation of My sample means Sigma sub X bar is equal to the standard deviation of my population of my male weights Divided by square root of the sample size So that's forty point eight divided by square root of sixteen sixteen because I'm dealing with sixteen people So that's forty point eight divided by four or ten point two So Take my warning when I say this is where most people mess up. They forget to adjust their standard deviation I'm doing this because we're now looking at the probability of a mean weight being greater than one fifty six point twenty five as Opposed to a single weight a single data value. I'm looking at a mean now So Google sheets You'll type in you you'll type in one eighty nine point two. You'll type in sigma Type in your adjusted standard deviation ten point two If you do have to round the more decimal places you keep the better in terms of sigma and Then you just label your lower bound and your upper bound and my picture. You're still the dealing was one fifty six point twenty five And you're looking at greater than So my lower bound would have to be one fifty six point twenty five And upper bound is six nights It's the practically the same information as the previous part of the question Except because I'm dealing with the sample mean the standard deviation has to be adjusted to ten point two Sir Google sheets you adjust the standard deviation to ten point two and you get point nine nine five five point nine nine five five point nine nine five five oh Gosh So the probability that that elevator will be overloaded is actually ninety nine point fifty five percent So because the maximum capacity sixteen and they have that weight threshold of 2,500 pounds If sixteen men get in there, which would be a worst-case scenario where there's the most weight in there There's a ninety nine point fifty five percent chance or point nine nine five five Probability that elevator will be overloaded Yikes So I think we need to call the elevator company and be like hey, you should reconsider your safety guidelines here And then they thank us and they sent us a check again for like two thousand dollars, right? Yeah, the power of statistics pretty awesome. I would say So suppose women's heights because we never talk about weights of women So let's be safe and let's talk heights Suppose women's heights are normally distributed with a mean of sixty two point one inches and a standard deviation of two point six If one woman is randomly selected, that's one data value That's one data value. I'll be near here one data value Find the probability her height will be between sixty one point six and sixty two point seven So always put your mean right smack dab in the middle of your bell curve and always make sure you draw picture I'm looking for probability data values between about sixty one point six and sixty two point seven So these data values are down here along the quote x-axis I'm looking for the area between them the probability of data value is between them All right, so I'm going to get Let's just write out our notation the probability a data value will be between sixty one point six and six to two point seven is Whatever Google Sheets tells us So we're talking about one data value. So it's business as usual Just like the rest of this module mu is going to be sixty two point one and sigma will be two point six Then you need your lower bound and your upper bound Lower bound sixty one point six upper bound sixty two point seven and type those four things in the Google Sheets So normal region under the compute tab and you got sixty two point one two point six for sigma the lower bound of sixty one point six Upper bound of sixty two point seven. So it's reach out to extract those numbers from the question Make sure you draw your picture you get point one six seven five you get point one six Seven five That's great in all when you're dealing with one data value, but we want to be a bit more powerful here Let's talk about if ten women are selected a sample of ten men a Sample of ten men are randomly selected ten women are randomly selected find the probability They have a mean height. So we're now looking at sample mean sample mean Between sixty one point six and sixty two point seven All right, so it's still the same picture. You have your mean of sixty two point one in the middle and then sixty one point six and sixty two point seven Are The two data values you want to find the area between the probability that a sample mean is between these two values So now we're dealing with a sample mean. This is my sampling distribution So this is where the simple the central limit theorem comes into play I want the probability that a sample mean a mean of ten women will be between sixty one point six and sixty two point seven You're looking at a sample mean now The average of those ten women So think central limit theorem sample mean think central limit theorem So your mu is still going to be the same it's still going to be the good old sixty two point one It's your sigma That has to be adjusted So put a star by sigma that's like the one thing people forget quite often you take your original standard Deviation and divide by the square root of your sample size So you're going to take two point six and you're going to divide by square root of ten So when you do that you get about point eight two if you want to keep more decimal places Please do your answer will be more accurate And you're lower bound and upper bound are the same as the previous part of the question So Google sheets the only thing you have to change is that sigma to point eight two Said more keep more decimal places to get more accurate answer And you'll get point four nine six eight rounded the four decimal places point four nine six eight Point four nine six eight So that's the probability that a sample of ten women will have an average or a sample mean of Between sixty one point six and sixty two point seven so the probabilities point four nine six eight once again central limit theorem we had to adjust sigma By dividing by the square root of the sample size the square root of ten in this case So really powerful result. We're going to be doing more with the central limit theorem as we progress further into the course But anyway for now. Thanks for watching. I hope you enjoyed