 Hello, and welcome to chapter 3.2. It's Monica Wahee, Library College lecturer, and I'm here to go over with you measures of variation. All right, here are your learning objectives. So at the end of this lecture, the students should be able to state three different measures of variation use and statistics. You should also be able to explain how to calculate variance and standard deviation, which I'll give you a hint. Those are two of the measures. All right, you should also be able to calculate the coefficient of variation and explain its interpretation. And finally, you should be able to state Chebyshev's theorem. So now we're going to be concentrating on measures of variation. And the first one I'm going to talk about is range. And then I'm going to talk about variance and standard deviation, which are two different ones, but I'm going to talk about them together. And you'll see why. Then we're going to go over the coefficient of variation, which is abbreviated CV. Then we're going to talk about Chebyshev. Chebyshev came up with a theorem, we're going to talk about his theorem. And then his theorem leads us to calculate these intervals. Remember, intervals are like, have a lower limit and an upper limit. I'll remind you that and when we'll calculate Chebyshev intervals together. All right, let's get started. So let's think about variation. Okay, what does variation even mean? Well, it means how much does the data vary? So imagine I taught two classes, which isn't too hard, because I do teach two classes, I teach two of the same class, it's two different sections. So imagine that I gave a quiz, and the same mean grade was in each class. Okay, and I said that. Could we tell how internally consistent those grades were. So for instance, let's say that I gave a five point quiz, and the mean in each class was three. Do we really know how many people got something far from three? Like, maybe in one class, people got a lot of fives and ones. And that's how we got the average of three. And maybe in the other class, everybody just got three. Like we really can't tell from a measure of central tendency, like median, or mean, or even mode, we can't tell how internally consistent the data are, especially we can't even tell that from a mean, two different classes can have the same mean, and a totally different kind of variation behind the scenes. So when you're talking about quantitative data, and you have a whole data set, and you do the measures of central tendency, like mean, median mode, it doesn't tell the whole story, you have to also add on the information about variation. And these calculations that we're going to learn here in this lecture are about ways to express how much the data vary in the data set and it's just separate from central tendency. So central tendency is just about central tendency. And then this variation is about variation. And you need to know both before you can really evaluate your data set. So we'll get started on talking about ways to calculate these measures of variation. So as I said, I'm going to go through range first. Then we're going to talk about variance and standard deviation. And I just want to remind you, you know how I'm always going on about sample statistics versus population parameters. Well, this starts playing in in that the formulas are slightly different than for sample variance and standard deviation and population standard deviation. So we'll go over those separate different formulas. Finally, we're going to talk about in the measures of variation, we're going to talk about the coefficient of variation or CV, but we'll do that after these other ones. Okay, so we're going to start with the range, because it's the simplest to calculate. So here's how you do it. So you'll notice on the right, I just made up five numbers, I just totally made them up. I don't know what they are. Okay, I just did that for a demonstration, because the range is the difference between the maximum and minimum value. So literally, it's pretty easy to calculate, you have to first search around for the highest or the maximum, which in this little data set, it's cute. It's only got five numbers. So it was obvious that 78 was the highest, right? And it's sort of obvious that 21 is the lowest. So how you calculate the range is you take the highest minus the lowest and then you get a number. And that's the range. And sometimes my students actually take the highest, and then they put minus and then the lowest. And then they tell me that's the range. And I'm like, No, you actually have to subtract it out. So you'll see here, it says 78 minus 21 equals 57. So it's 57. That's the range. Okay, so all it's telling you is the distance between the top and the bottom. And I'll just say that that's not very useful. In fact, I had a problem with that when I was working, I worked at the army on this army database. And I looked at the range of ages of soldiers when they started. And the range was age four, through age 107. All right, obviously, there was a problem with the data right? Just for some reason, there was a screwed up record that said somebody got in when there were four. There was another screwed up record that said somebody got in when they were over 100, they were just screwed up data. Okay, and that caused me to have this ridiculous range. And so the range is not very stable or resistant, right? If we just fix that, you know, record that said somebody was for when they got in the army, then we might have a normal range, you know, like lower, like a minimum, we might see 18 or 17 or 19 or something. But as you can see on the right side of the slide, I just picked out that the minimum and the maximum, we could just change arbitrarily change those numbers. And suddenly, we have something totally different from 57. So as you can see, even though this range is a measure of variation, it's not stable or resistant. And it actually kind of doesn't tell you much. If I say, we've got a range of 57, you don't know if the minimum is like zero, or like negative, or like 105, you know, you really don't know where that range is in. So it's not very useful, but it's a place to start because that's our first measure of variation. Now we're going to get into what we really use in statistics a lot. You'll sometimes see in articles where they state with the ranges, usually they don't state the actual number I tell you to calculate, they actually state the minimum and the maximum. And sometimes that's interesting. But variance and standard deviation, that's what we really live on in statistics for measures of variation. And you're probably wondering why I'm talking about them together when they're totally different calculations. Well, it's because they're friends. Okay, and how are they friends? Well, the variance calculations kind of a big formula. And so you get through that and then you have the variance. And then all you have to do to get the standard deviation is take the square root of the variance. So that's why they're friends is like you go through all this trouble to get the variance. And then the next step is just take the square root of that. And you get the standard deviation. So before I actually talk about those formulas, I wanted to just set in your head, what these words mean, because, like I remember, I worked in a mental health place. And I don't know, we didn't have enough licensed people there. And so our leaders said, Oh, I'm applying to the state for a variance, right, meaning that the state would give us allow us to vary from the rules. Well, that's what variances is, is how the data vary. So you think of the spread of the data and how well does the mean every represent that spread, it doesn't right. So variance is a way of representing how the data vary really around the meat. Now, you're probably wondering Well, then why do you even have standard deviation? It's the square root of variance. But let's just think about what the word means. You know, standard means sort of following a standard or the same. So it's just the amount of variation that standard in the data set. And you know what the word deviation means, like you say, Oh, that person's a social deviant, because they go do crimes or something. Or like this guy with a healthy nose, he does not have a deviated septum. But you know, some people do have a deviated septum where it's like crooked, right, and they have trouble like sneezing and blowing their nose and sometimes even breathing. Well, a standard deviation would simply mean that everybody's deviation is about the same. So variance as a calculation says how much things vary. And so does standard deviation because it's just a square root. of variance. But I just want you to imagine in your head, Oh, standard deviation, that means how much the data deviates around the meat. Because a lot of times students get confused about the measures of central tendency, they try to apply them to variation, but variation is totally different thing. So just remember what variance literally means. And what standard deviation literally means that might help you get through these formulas and understand the interpretation. So as I mentioned earlier, the formulas for variance and standard deviation are different, whether you're talking about a sample, or a population. And admittedly, we don't use the population variance or population standard deviation calculation very often because we don't measure the population that often. So we tend to use the sample variance and sample standard deviation all the time. So I'm going to demonstrate those. But you'll notice conceptually, they're really similar. Like, you know, if you have population parameters like mus, and like population standard deviations, they tend to behave similarly in formulas, as sample versions, it's just that in statistics, we always want to be really clear about what we're talking about. So we always want to use the right symbols. So we're hinting towards, we're analyzing a sample versus we're analyzing a population, even though conceptually, like means are a mean, right? But you want to represent which mean you're talking about one, that's a parameter, or one that's a statistic, whenever you write out the formula. So I'm just being picky about that. And then there's also two other things you want to know. There's two different ways of actually doing each of these formulas. You know how like an algebra, you can have a big equation, and you can express it more than one way. So that's all they do is they put a formula in one way called the defining formula. And then they put the formula, same formula, but rearranged by algebra into the computational formula. Now I always think that's kind of funny that they call the computation, right? I mean, both the formulas give you the same results, it's just plugging in numbers and getting out the answer and the answer is going to be same, whether you use the defining formula, or the computational formula. But what I think is so funny is they call it the computational formula, but I cannot compute it, like I always get confused when I use it. So I pretty much ignore the computational formula in my entire life. And I just teach the defining formula. And I found my students always remember the defining formula, they always can get through it. Although people who are into the computational formula, they tell me that I'm doing things the hard way, I'm going the long way around. But you know what, just go the long way around. It helps you not get confused. It helps you convince yourself you actually got the right answer. So let's just do the defining formula. All right. So let's look at the defining formula, you can look it up, you can look up the computational formula, but this is the defining formula. So let's just get get our minds wrapped around that. Remember, I told you that variance is great, because you calculate that. And then you just take the square root of that and you get the standard deviation. So as you can see on the left side of the slide, we abbreviate the sample variance by just saying s, which is the standard deviation to the second. I know that sounds ridiculous, right? Like, why don't we have a special thing just for the variance? Why did we just say it's s to the second? And then say sample standard deviation is just this s. Well, actually, to be honest with you, people use different notation, I'm just using this, because it matches the textbook we're using. But people will often say var for variance. And so in other textbooks, they'll do that and in statistical software. But they'll also say s to the second like this. And it's maybe a good way of you remembering that the standard deviation is just the square root of the variance, right? So if you ever see s to the second, remember, s is the sample standard deviation. And s to the second is a sample variance, and I'll show you the population in a minute. But if you see those, that's what they're talking about. Okay, now let's look upstairs. At the top formula, see this thing on the top, it's really kind of scary, but we're going to work through this and you're not going to be scared of it. Okay, I know you know that there's a little sum sign there that capital Sigma. So you know something's going to get summed up. But that looks kind of scary that x minus x bar to the second thing, we'll handle that. Okay, but n minus one on the bottom, that's not so scary. Okay, and we'll handle that one too. And then you'll just notice all I did for the bottom part is I just put this huge square root sign over that whole thing. So that's only difference between the upstairs and the downstairs. And then I also wanted to show you a picture of a calculator, because a lot of times, if you haven't really done math or statistics for a while, you forget the whole concept of square root. And I'll just remind you, whenever there's a square root of something, it just means that if you times it by itself, you'll get that number. So remember, like 25, the square root of 25, if you put 25 in your calculator and you hit that square root thing, you'll get five, right? Because five times five is 25. However, if you put in 24, you're going to get something with decimals, right? But whatever it is you get, if you times it by itself, you'll get 24. So I just want to remind you of that, because sometimes people forget that if they haven't been doing statistics or math for a while, or they haven't used a calculator for a while. Alright, I told you I talked to you about this numerator, right that the top is the numerator in a fraction, and the bottom is the denominator. So I'm going to talk to you about this numerator. So the sum of x minus x bar squared, you know, that's how I would say it, this is actually called this little piece of the formula is called the sum of squares. And so when from now on, when I say sum of squares, I literally mean the top half of this equation. So what you do when you do the defining formula, is you just kind of relax and say the first thing I'm going to do is figure out the sum of squares, I'm going to figure out the top part. And then I'm going to just write that down. And then later I'm going to come back to this formula and enter it in. So this next part is how do we figure out that top part of the equation, how do we get the sum of squares, and I'll show you. Okay, so let's just look at the slide. On the left, there's this blank table. And that's usually what I do first is I make this blank table. And you don't have to say column one column two column three, I just put that there. So I could talk about the columns. And then you know what I was talking about. But usually what I put is I put x in the first column, they put x minus x bar, I wrote out minus but you can just use a dash. And then I put in parentheses in the third column, x minus x bar to the second, like that. Remember, when you have parentheses, you have to do what's inside the parentheses first. So this means you literally have to do x minus x bar before you to the second or square it. And I'm just walking you through this to get you ready for what we're going to do with this table. On the right side of the slide, I'm just reminding you that the sum of x minus x squared to the second, in other words, the sum of whatever's going to be in the column three, that's another way of saying the sum of squares. Okay, so an easy way to explain this with the squares are is to just show you how to calculate it. So I just pulled out some data. So imagine a sample of six patients presented to central lab. So this happens to me when I go to my doctor, sometimes she'll say, you know, it's time to do a lab panel for you. So she gives me this slip of paper, and I go downstairs to the central lab, and I give them the slip of paper, they say, Okay, sit down. And then we'll call you up and we'll draw your blood or whatever. So we're imagining six people did that. And then they got up to have their blood drawn. We asked them, How long did you wait? Okay, and I'm in the central lab, where I literally do wait two minutes, it's a really good lab. But sometimes it's really busy if I go like during lunch, and I'll wait something like 10 minutes. So here are six patients. One of them waited two minutes, a couple of them waited three minutes, probably the other three came in during lunch, because they waited eight minutes, 10 minutes and 10 minutes. Okay, so that's our data. It's a little tiny data set, but I just wanted to use something small to show you how to calculate the variance and then the standard deviation with just this little data set. Okay. So what's the first step? After making the table, you have to make the blank table first is you fill in the first column, which is called x. So what is x? Actually, each of these patients, waiting time is an x. Remember, some of x, if we said some of x, we would mean add all these access together, right? So, so that's all I did. I just put each x in the column, you'll see 23381010. It's just like identical. To these access. And then I put at the bottom, I put that little fancy sum of x and said 36. Okay. And so that's just the first thing you do just put them all in and do the sum of x. All right. Now the next step is don't look at the left side of the slide yet. Look at the right side. Before you go and fill in column two, you have to do x bar. In other words, you have to figure out the mean. Now, you can kind of cheat because you just figured out some of x. And if you remember the formula, the mean or the x bar of the sample is the sum of x divided by n. And remember, I told you add six patients. So you just take 36 divided by six, and you get six. Now, you just hold that number, you hold that. So between column one and column two, you got to calculate x bar and you hold right. And then while you're holding that you keep it off to the side, you realize that this is how we're going to fill in column two is what x minus x bar means is the x bar is just six, but we have to go through each x in minus x bar from It's helpful to order the x's before you do this. Like notice I put them in order 233 810. It's a good idea to just do that because it helps your brain think whether or not you're doing the right thing. So let's start with the two. So we do two minus six, which is the x bar. Now you can look at column two, two minus six equals negative four. I hate negative numbers, but you just have to deal with them sometimes. Okay, so it's negative four. So you just deal with that. Then you go to the next line. And it's three minus six, which is negative three. So we're still under water here with the negatives. But you'll notice that the next one x is three. So you can kind of copy what you just did. So you're getting negative three. So what you're actually technically filling in this column, I showed you the equation, but you're putting negative four in the first one, negative three in the second one, negative three in the third one. And then now finally, the fourth x is eight. So eight minus six, we got above water. Now we're in two, right? And then we have 10 minus six was four 10 minus six, which is four. And when you order them like that, that's often what happens. In fact, that's always what happens is you end up with a bunch of negative ones at the beginning, and a bunch of positive one later, that's just totally normal. Don't worry about that. But as you got to be careful, you got to make sure you make the right mean. I've had people on tests, accidentally screw up this mean. So you can just imagine what a train wreck happens after that is you do not get anything right after that. So make sure it means right, and then make sure you subtract it from every single x, and put the right answer in column two. That's the next step. All right. Okay, so we're done with that step, what do we do next? Now, we just take whatever we got in column two and square it. So we have the first one was negative four. So we take remember square is just the number time itself. So if you don't like to use x to the second button on your calculator, you can just do negative four times negative four, same thing. And so you'll notice we do negative four times negative four, we get 16. Now, it's pretty easy, negative three times negative three is not, you know, two times two is four. But what I want you to really look at is the 10s. Notice that they get a 16 to just like the two did. And that's the trick here. Remember, I said I hate negative numbers. Well, a lot of statisticians feel the same way I do. And so they often fix it by squaring the number because it erases the negative. Just remember, negative times negative positive, and positive times positive is also positive. That's little trick, you know, when it comes to multiplying. And so when we do that, we are squaring each one of column two. And they're called squares, right? So we've got 16994 1616, these each are squares. So what do you think we do? We add up that entire column, and we get the sum of squares. So look at that, we add up that entire column, and we get that super complicated looking thing at the bottom, which is the numerator for our variance equation, right? Like this wasn't really that hard, wasn't okay, so we sum that up. And as it turns out, we get the number 70. So 70 is our sum of squares. Alright, alright, now we're back at the sample variance formula. And I'm so excited because look at the top of the formula, we answered it's 70. Okay, so we got that 70, but we still have to deal with the bottom of the formula. Remember, n was six, right, we had six patients in the bottom of the formula is n minus one. So the bottom of the formula is going to be five. Right. So let's fill this in. And I was kind of running out of room. So I just filled it in upstairs. So you see that 70 divided by five, suddenly this looks super easy, right? So 70 divided by five is 14. Okay, that's the variance. Totally easy, right? Once you make that, I mean, it's not it's tedious, right? You have to make that whole table and add things up and stuff. But here, it's not really that hard. Now, guess how we're going to make the standard deviation? You probably guessed it. We're just going to take a square root of 14. So remember that button on your calculator, you could put in 14, hit that button, and you get 3.74 and a bunch of other stuff. But I just chopped it off at 3.74. So that is your sample standard deviation. Now I promised you, I would talk about the population formulas for standard deviation and variance, as well as the sample ones. And I told you they won't really be conceptually much different. As you can see on the left side of the slide, sample variances expressed, I made things read so you can see where the differences were sample variances s to the second, but population variances is other Greek letter. Remember, I told you that that other sum was capital sigma, like, you know, Greek is like English in the sense they have capital and lowercase letters. Well, that thing that I always think it looks like a jelly roll. But the jelly roll looking thing is actually lower case sigma. So that I'm never going to say lowercase sigma, except for now, I'm going to say population variance and population standard deviation. So you see at the bottom of the slide, the lowercase sigma alone is the population standard deviation. And then the lowercase sigma to the second is the variance. So just remember if you see that jelly roll thing, we're talking about a population version of the standard deviation or variance and not the sample. Also, you already know about mu versus x bar, right? So we have x bar on the left. And that's a sample mean in mu on the right, which is population mean. And you also already know about n, which is the number in your sample. And this is where there's a big difference actually. In the sample, you have to do n minus one. On the bottom. And in the population, you just do n capital N, the whole population. And if you think about it, it makes kind of sense because populations are huge. So it wouldn't even matter if you like subtracted one. Whereas, you know, if samples are small, so you sometimes have to, you know, adjust or something. So you have to minus one, but you wouldn't even matter like people who make a mistake and accidentally minus one from the population one, they don't get much of a different answer. And so that's why I'm concentrating on the sample ones is that's what we normally do. But I wanted to give a shout out just so you know, if you ever see the formulas on the right side of the slide, you know their population level formulas. All right, now we're going to move on, we made it through range, variance and standard deviation. So now we're going to move on to talk about the coefficient of variation. And this is used a lot for comparisons for comparing between two different labs often. I say that because my friends a pathologist, in the first time I actually use this in medicine as we were comparing lab values on the same assay from two different labs. I just wanted to explain to you, this might be the first time you've heard the word coefficient. And that gets a little confusing for people in statistics who are new, because the word coefficient is actually just kind of a generic term for certain kinds of numbers. So you'll hear some of the things that we're going to talk about. Somebody say coefficient of variation. And you'll say, you'll hear somebody say coefficient of something else or coefficient of something else. And just the word coefficient, most people haven't even heard it. It just means a certain kind of number. If just somebody says, Oh, the coefficient is not good, or it's high or whatever, you need to ask them what coefficient are you talking about, right? So in other words, coefficient doesn't mean a specific thing. It just means a number that comes out of statistics. And so you have to know which coefficient they're talking about. So this is the first time maybe you've heard the word coefficient. And I'm going to talk for the first time then to you if you've never heard of the word coefficient before, about a specific coefficient called the coefficient of variation. Now, we're, you'll, as we go through this textbook, there's other coefficients in it. So please remember, this one is coefficient of variation, right? And a way to remember it is a CV for short. And so other coefficients have different abbreviations, but the coefficient of variation is CV. So I put on the right side of the slide, the, the formulas, and nobody seems to have any trouble doing the formula, right? Because once you calculate the standard deviation, the sample standard deviation of the population one, as you can see in the formulas, and once you calculate x bar, which is a mean for the sample, it's pretty easy to do the division. And then they like it when you do it in percent. And you'll notice that about statistics is certain things, they prefer as proportions, and certain things they prefer as per sense. It's just like, I don't know, it's just like our culture in a way. And so coefficient of variation is always expressed as a percent. So you have to times it by 100, and then put a percent sign after it. But really, that's pretty easy to do. You take the standard deviation, you'll see I did it for our patients 3.74, it took us all that work to get there, right? Remember, square root of 14. And then remember, our x bar was six. So we needed that remember earlier for that column two. So I just dumpster dive, dumpster dove, those numbers. And then did this calculation out and I got 62%. And so students generally don't have trouble getting that number out. But what the problem is, is like, what does the number even mean? Right? Like, what does it mean if you divide the standard deviation by the x bar and times by 100%. Like, how do you interpret that percent? So the easiest way to talk about it is to actually compare it with something. Because one thing you'll also notice in statistics is if you make ratios of things, they don't have any units. So if I take your blood pressure, like your systolic blood pressure, and I say it's whatever 130 mmHg, if I divide that by your diastolic blood pressure, or even by some lab value or your temperature or whatever your IQ, suddenly I get a ratio and that doesn't have units, right? It doesn't have mmHg or anything like that. And if I do that to a bunch of people, all of those ratios don't have any units. And so they technically could be compared to each other. So you'll see that that's a strategy in statistics is they'll make ratios of things and say, Oh, those don't have any units. So it's, you know, sort of lacking in that way. But the power is you can compare these ratios. So I decided to just pull out other patients, I just made up other patients, right? I pretended we went back to the lab the next day and we gathered some data. And we gathered some data and we came up with I just made this up an x bar of eight, and a standard deviation of four. It's a little close to what we had before, right? Like x bar of six and standard deviation 3.74. But anyway, in this next sample of patients, the s of four divided by the x bar of eight times 100 equal to 50% and not 62% like the other one did. So how do you interpret that? Well, the CV is a measure of the spread of the data relative to the average of the data. So in the first sample, the standard deviation is only 50% of the mean. But in the second sample, the standard deviation is 62% of the mean. So what I would say is that the second sample, the red one with the 62% has more standard deviation compared to the mean. And so that means it's less stable, right? It's got more variance compared to its mean and it's more standard deviation compared to its mean. So it's less stable. So it moves around a lot. So if you said to me, if these were actually two different labs, I would say, you know, I prefer the first lab, the purple lab, because it's more predictable. I know it's going to be like less variation because it's 50% and the 62% means that that's less predictable. It's a little hard to see in this example. But what happens is, if you have two different labs, and you're looking at this, like maybe you split a blood sample or a bunch of blood samples and send half to one lab and half to the other, where you're supposed to get the same mean in the same standard deviation, right, they're the same blood, you just split it. But sometimes you don't sometimes you get something like this, in which case, if you're comparing labs, you would go with the purple lab and not the red lab because they produce a more predictable result. So CV is a little hard to interpret. But it's easy to calculate. So that's one awesome thing about. Now we're going to move on to Chebyshev and his theorem. So Chebyshev figured something out a long time ago. And this is how he started thinking about it. He first started thinking, Well, let's say you have an X bar and an S like we just did with the CV, he noticed something else about it, he didn't notice the CV, he noticed that you can create a lower and upper limit by subtracting the S and adding the S to the X bar. So remember back when we were making frequency tables. And I said, Well, we need to make class limits, we need to make a lower class limit, and an upper class limit. When we use those terminology a lot like lower limits and upper limits. Well, Chebyshev was like, wait a second, I got an idea. Let's say I take a mean. And I, you know, this will force the mean to be in the middle of this, I can subtract one standard deviation from it. And I'll get some sort of lower limit. And I'll add a standard deviation to that mean it gets some sort of upper limit. And of course, let's pretend the standard deviation was one, like you'd subtract one and add one. And so this would be like, totally symmetrically in the middle, right, the X bar would be in the middle. And then it would be surrounded equally by these two standard deviations. And I'm just saying standard deviation generically, because you could do this with a mu and a population standard deviation to you can do the population version. So he just sort of like figured out, that's a thing that can happen, you can add and subtract the standard deviation from a mean, and you can get these limits. And so example, let's say I had a mu. So I'm going to pretend I have a population, a mu of 100, I don't know what I measured, but I got 100. And a population standard deviation five. So Chebyshev was thinking, you know what I could do? I could take that 100 and, and subtract the five from it and I get 95. I could take that 100 and add five to get 105. And so he just started like working with this concept, like I could subtract and add like a standard deviation. And then he thought, wait a second, I could even do this with two standard deviations, right? So I could take like, if it was five, I could take that times two, that's 10. And so I could do 100 subtract 10 and I get 90 for lower limit, and 100 and add 10. And I get 110 for the upper limit. And so I can make this this range or this interval, right, from the lower limit to the upper limit, we call it an interval, right? And so he just sort of conceptually realized that if he used some rules along with this, there might be some useful interpretation of these limits, right? There might be some way to use these limits to mean something. So we're going to look at how he figured out to be able to use, you know, one standard deviation on either side of the mean or two or three, or four multiples of these standard deviations on either side of the mean, to actually come up with some lower and upper limits that meant something. So he realized that what these low lower and upper limits would mean is that at least some percent of the data would be between these limits. So in other words, some percent of the of the X's would be between the lower and the upper limit. But that percent would depend on how many standard deviations you're going out, right? Like is it one is a two is a three, the more you go out, obviously, the more percent of your data are covered by the limits, because they're just huge, you like get it so the interval so big, it almost covers the whole thing. So you would expect that percent to go up as the number of standard deviations you use goes up. So so he was working on this out and he came out with this formula, right? And he also he was figuring out he wanted this to work for all distributions, like normal, but also skewed, and also like uniform and bimodal. So this was the formula he came up with. Now in this formula see at the bottom, k stands for the number of standard deviations or the number of population standard deviations that he's going to use, right? So let's pretend that he made kb two, like two standard deviations, right? Then you would see this it says one minus one divided by k to the second, which would be to the second. So that would be to the second is what four. So one divided by four is point two five. And so one minus point two five is like point seven five. Well, you make that a percent, it's 75%. So he's like, Okay, that's what I'm going to say. If you go out to standard deviations up or down and you make those upper and lower limits, at least 75% of the data of the x's are going to be there, at least there might be more, but it'll be at least that. So he did this, he used two, and he used three, and he used four. So two standard deviations either way, three standard deviations either way, or four standard deviations either way. Now, students in my class often think that they have to memorize this one minus one over k to the second, you don't memorize it. This was just a story of how Chebyshev did this proof. So you can memorize it for fun, but nobody memorizes it. I mean, you know, Chebyshev did the work. I'm just showing you the proof, right? So he figured this all out. So as you can see, how he like, you can do this with two, three and four, you'll get the same answers Chebyshev does. So it's kind of a waste of time, but you can do it just for fun. So he did the two one I showed you that on the top, I even talked you through it. So you plug two into the equation, you'll get 75%. So in that thing I was just talking about, like imagine I had 100, right? And the same that was my x bar and my standard deviation was five, right? And then two times that is 10. So I go, Well, my lower limit then would be 90. My upper limit would be 110. And I would be able to confidently say at least 75% of my x's are between 90 and 110. So if I measured maybe 100 people, right, I'd say at least 75 of them are going to be between these limits. In fact, it could be 80 could be more, but at least 75. So then remember, I told you to predict that as we made this number bigger, you know, we go out more standard deviations, we're going to cover more of the data, right? So we needed three, it didn't come out as even it came out to 88.9% of the data. So almost 89% will be covered if you go out three. And at least almost 88.9%. And if you go out for standard deviations, it's at least 93.8%. Right. And just to remind you, you know, when you have upper and lower limits, you have an interval, right? That's just we just call it that. But this particular interval, if you get it this way, it's Chevy shows interval, because everybody's so happy did all this work, right? Because I want to figure it out. So I just wanted to demonstrate an example of Chevy shows interval, because then you can know how to interpret them or why anybody does them. Okay, so remember our patient sample, they're in the waiting room at the lab, right? So they waited on average six minutes and then the standard deviation of them waiting was 3.74, right? Now, when I gave you this demonstration of how to calculate the standard deviation, I use this patient sample. I did that. I only had a few patients in the sample on purpose because otherwise your table that we made with the defining formula would be huge and I never finished this video. So what I'm going to ask you to do is pretend that instead we had 100 patients in there, right? Instead, I measured 100. And I got my x bar and my 3.75 standard deviation. Okay, so if we measured 100 patients, and we got that, I just want to I put this Chevy chef rules in that table. So if we go out to standard deviations from the mean, from the x bar either side, whatever limits we get, whenever interval we get, we know at least, because I made it so we, you know, studied 100 patients. So by law, we're at least 75 of those patients will be between those lower and upper limits if we follow Chevy chef's theorem. And if I do go out three standard deviations, at least 88.9 patients will be in there. Okay, I know that doesn't make any sense like 88.9 patients, how do you have point out of a patient. But what they're saying is, I guess it would be 89. All right. Yeah, 89%. The percent of the patients or in other words, 89 patients at least would be in that interval. And of course, if I went out for at least, I would have to say 94, you cannot point out of a patient. But at least 94 patients would fit in that interval. And if you're thinking about it, if we only start with 100 patients, that's almost all of them. So the four one isn't so useful, right? So you'll see me on the left side of the slide calculating the intervals, right? So let's start with the first one. The first one is two standard deviations on either side of the mean. So the chebyshev interval we get is negative 1.48 to 134.48. And you probably notice you can't wait negative time. So already this is kind of weird, right? But what this is saying is of our 100 patients, at least 75 of them because this is 75% on chebyshev interval, weighted between negative 1.48 minutes so that you might as well round is a zero between zero minutes and 13.48 minutes, right? And so at least 75% of them are fell in that range. Now 13.48 minutes is kind of long. So we would be happy, I guess is 75% of them fell in that range, because then that means that they were probably not waiting that long. But if you go out, then you widen this interval like 88.9. If you do that, then you say at least we'll round it to 8989% of the patients weighted between negative 5.22 minutes, which is you might as well make it zero and 17.22 minutes. So as you see, if we wind the interval, we're going to get some later waiters in there. And so then we'll say, well, at least 89% were between there. But at least 89% were between there and that means it wasn't bigger, right? And then again, we go out one more we get 93.8%. So let's just round it to 94. So at least 94% of the patients or if we have 100 patients, at least 94 of them weighted between negative 8.96 minutes, which again is nonsensical up to 20.96. But then we're starting to get where well, if almost all the patients waited somewhere between zero and 20 minutes, we really don't know how long they waited. So this is just kind of to show you what happens when you wind that interval, you maybe have less certainty about what individuals happen, be sort of a better idea of what the range is. So again, I just put this at the bottom. If we had 100 patients, this is how you would interpret it. At least 75 would have waited between the lower and upper limit for the 75% championship interval, and then at least 88.9 patients, I know nonsensical, and then the 93.8. So you see that interpretation lower part of the slide. So this is a really difficult concept for a lot of students. And so I'll just give you this take home message. First of all, Chevy chef interval works for any distribution, normal skewed whatever reason why that's part of the take home messages later we're going to learn about intervals that only work with normal distributions. Okay, so this one is Lucy goosey. It works with all distributions. So that's one of the take home messages for Chevy chef's interval. Also, Chevy chef's interval tell you that at least a certain percent of the data are in the interval. Later, we're going to learn about intervals where exactly a certain amount of data are in that interval. And so Chevy chef again, little Lucy goosey, right, he says at least. Next, Chevy chef intervals are sometimes nonsensical, as we just talked about negative time doesn't work right. Sometimes you'll have very high limits, especially with the four. And so, ultimately, they're not very useful. And they're not used in health care. I literally had never heard of Chevy chef's interval until I started teaching this class. So what is the purpose of teaching you Chevy chef's interval, the purpose of teaching this is to point out in statistics. We often use the s or the population standard deviation, you know, just standard deviation. And we add or subtract, well add and subtract it from the mean is a good way of making lower and upper limits that have special significance. That's really the main take home message is that you'll see this pattern as we go through this class, where we get a mean either population or sample. And we have x bar, you know, x bar or population mean, and then we have a standard deviation, right, either from sample population. And then we take either one standard deviation, we add a subtracted or two, or multiples. And those intervals then have certain significance. I only taught you in this one about Chevy chef, but you learn about other intervals later that are made similarly. So in conclusion, what did we learn, we learned how to calculate the range. We learned how to calculate the variance and standard deviation. We learned about how to calculate the coefficient of variation and how to interpret it. And we talked about the difference in the formulas from sample versus population. And we learned about Chevy chef in this theorem, how he figured it out, and how he calculates this intervals and how you interpret them. Now I just wanted to show you this picture of Chevy chef here he's a Russian guy, well, the stamp was from the USSR for the iron curtain fell. But I just want to show it to you. So you knew who figured all this out. Good job, you made it through the measures of variation. And now you're ready to do what the quiz the homework, whatever right, you're totally knowledgeable. Good job.