 Hello, it's Monica Wahee again, your lecturer from Labere College, and we are moving on to chapter 3.1, which is measures of central tendency. And here are your learning objectives. So at the end of this lecture, you should be able to explain how to calculate the mean. You should also be able to describe what a mode is and say how many modes a data set can have. You should be able to demonstrate how to find the median in a set of data with odd number of values as well as in a set of data with an even number of values. And you should also be able to define trimmed mean and weighted average. All right, so what's this measures of central tendency? I'm going to explain that why we kind of call it that. And then I'm going to talk about them, which the three biggies are mode, median and mean. So I'm going to talk about those and explain how to get those. Then towards the end of the lecture, I'm going to go into some special situations. One is called a trimmed mean, and the second is a weighted average. So let's get started. What is the central tendency thing? Well, if you think about quantitative data, which you can only do this with quantitative data, not qualitative data. But when you think of having a pile of numbers like this, one of the things you want to know is how much they tend towards a center. Now, of course, you don't know where the center is until you start looking at the data. Some data are kind of high up in the hundreds, like systolic blood pressure. I give a five point quiz in one of my classes. So those numbers are low, like one, two, three, four, five. But then the question becomes, do they group towards the center of whatever list of data they're in? Or don't they how how sort of sensory are they? You see these distributions on the slide. You'll see, on the left, you'd probably say, well, that looks more sensory than what's on the right, you know, this normal distribution on the left, and the skewed right distribution on the right. And so intuitively, you kind of know what I'm talking about. But what this lecture is going to be about is how to actually put numbers on the difference between what you see on the left and what you see on the right. So these are the numbers. These are the measures of central tendency, we're going to go over mode, median, and mean. And the median is a little different depending on whether you have an odd number of values or an even number of values. I mean, it means the same thing, but you calculated slightly differently. So I'll go over that. And then the mean, a lot of you already know what a mean is, but there's a couple special means we can make. One is called a trim mean, and another is called an weighted average, which is a weighted mean. I don't know why they chose the word average for that one, because mean and average mean the same thing. But I'm going to go over these things. Okay, well, let's start with the mode. The mode is the number in the data set that occurs the most frequently. So I put up this little tiny data set here of just five numbers. And it's obvious that then five is the mode, right, because it repeats once they're two fives there. But look, I just changed one of them to I changed it to a six and now there's no mode. So I just want you to know that a lot of data sets don't even have a mode, there's just no repeat at all in them. And that usually happens when you have a broad range of numbers they can have like systolic blood pressure. I mean, it would be kind of lucky you just got two people with the exact same one. But that can happen. So don't think there's always going to be a mode, there might not be one. It's also possible to have more than one mode, like look at that. So I've got six numbers up there. And the two repeats once and the three repeat once. So you've got two modes, right. But let's say that the three actually repeated three times, then it would only be one mode because the three threes would trump the two twos, right. So you can just imagine how confusing the skits when you got a ton of numbers. What's a little less confusing is, if you like I said have a broad range of numbers, it would be kind of a coincidence if two patients had the exact same systolic blood pressure or platelet count, you know, like you get a repeat in there and then that would be the mode. Of course, if you measure a whole bunch of people, then eventually you're probably going to get one. But I just wanted to say, and also if you look at the slide all those numbers, you'd really have to go through and organize them and count them up and see if there is a mode. There probably is one because you see a lot of repeats. But then which was the one that wins that's repeated the most, or are there two that are repeated the most, it becomes kind of political when you really do it. And it's not worth a lot of work because what does the mode tell you it doesn't really tell you much. It does tell you the most popular answer. The word mode in French means fashion. So like I put on the slide, you know, all the mode, it's in fashion. So it's the one that's most popular or the most common result. But it's not used a lot in healthcare. And it's actually not used it very often. Once in a while, I'll say, Oh, the mode in the class for my five point quiz was five, meaning everybody did pretty well, they mostly got a five. That was the most popular result. But you hardly ever have to say that. And so, remember, we learn the words resistant, like if a measure is resistant, you can't whack it out very easily. Well, you can change things pretty easily with the mode, the mode's not resistant. I even just demonstrated that on those slides, by just changing one number, you can erase the mode or add a mode or whatever. And so it's not stable, it's not resistant. And those are the kind of things we don't really like in healthcare. So we don't really use them. So I'll move on to some cooler measures of central tendency. And here's a really cool one, which is called the median. And it's the middle of the data. And I'll explain that a little bit more, what we mean by the center of the data. Okay, so remember, we're talking about quantitative data. So you've got some pile of numbers. It doesn't matter, you can always sort them in order of lowest to highest. And I keep talking about this five point quiz I give in my class. It's an easy quiz and most people get fives. But even so somebody gets a four usually or somebody doesn't show up for the quiz and they get a zero. And so it doesn't matter, I can have 100 people in the class, I still could put all of those numbers in order of lowest to highest, even if most of them were fives, because you'll get repeats in your data sometimes right. And also sometimes you'll get outliers like if I said one person maybe didn't take the quiz and they get a zero, but everybody gets else gets four and five because an easy quiz, well then that zero would be an outlier so you don't have to worry about that. And like I said, you know, the data values sometimes are almost the same, like almost everybody gets a five on my quiz because it's so easy. So it doesn't matter, even if you have these weirdnesses in your data, you can still just arrange them in order. And that's what we mean by the median is the number that is halfway up, or halfway down, right. So if I've got 100 people in my class, and I've got the zero over here on the left and I put all the, you know, fours and then the fives. You know, I have to count up what 50, right, to see where the middle is and it's probably going to be in the five range, right. But that's all we mean. We say, you know, take however many values you have, put them in order, even if there's repeats and outliers or whatever, just put them in order, and then count up halfway and that's where the median is going to be. So I'll demonstrate this here. So how to find the meeting the first step is to order the data from the smallest to largest. So I'm giving you two demonstrations, and I don't even know what these data mean, I just totally made them up. The one at the top, the data set the top that starts with 42, that only has five numbers in it. So I'm going to demonstrate the odd version with them. The one set at the bottom has actually six numbers in it. So I'm going to demonstrate the even version because remember, it goes a little differently, whether you have an odd number of numbers or an even number of numbers. Okay, so those are the numbers. And we still have to do the first step, which is order the data from smallest to largest because you can see they're not in order. So I'm going to do that here. Okay, there it is. So those are the same numbers, they're just in order from smallest to largest. Okay. So we're going to get rid of those numbers on the top and instead put the position there in. So let's look at the top data set, which is the odd one. So I'm going to say this is how you find the median is you number the positions, you know, it's 12345. And it's the middle position. So you can imagine if we had had seven data points, we go out 1234. And we'd circle that one and that would be the median. So that's what you have to do is you take these, if you have odd values, you just put them in order, and see I numbered them for you. And then you take the middle number. And that's the median, that's what it is, it's 42 in this one. Okay, we'll do the downstairs data set there that has six in it, as you can see the positions are numbered. And then what do you do, you go to the third and fourth position, which is the kind of the middle, right? And you literally make an average of them, you add the two. And they happen to be seven and eight right next to each other. But if they had been like, eight and 10, then the average would have been nine, and that would have been the median. But because this was seven and eight, you do seven plus eight, divide by two, and at 7.5. So when you do the median with an odd number of values, you're going to add the two. values, you're going to be taking one of the values in there. If you do the median on an even number values, you might get something with like a decimal, because you're looking for the two values that straddle the middle, and you're going to be making an average of them. And so you might get kind of a wacky number like 7.5. That's not in the underlying data set. So this is fine for like, if you have five or six numbers or seven, but what if you have like 150 numbers? I mean, you do still have to put them all in order to begin with, you know, like I use Excel, I'd probably just sort. But you have to know how many numbers to go up. It's not obvious. So this is how you find the middle number, they have a little formula for it. So let's say we have an odd number of values. And I'm giving the example like 21, let's say 21 students in my class. And that's how many values I had, and I wanted to make a median of their grade. What I would do is put them all in order, and I'd say, Well, I have to go up so many. And that's the median. But I don't know how many to go up. So I would use this calculation. So I take the n, which in our case is 21. And I'd add it to one to it. And then we get 22. And then I divide by two. So that's just how it works. So if you had 41, you would do 41 plus one, it would be 42 divided by two. Or if you had like, I don't know, I'm picking on ones like 27, you do 27 plus one, and that would be 28. And 28 divided by two is 14. And so you see, it would just force it to be an even number that you come out with. And then that's the position you go go up. And so if I had 21 students in my class, and I took the grades and arrange them in order from lowest to lowest to highest, like if they were that quiz grades, you know, most of them would probably be four and five, but it wouldn't matter what I would do is just start with the lowest and count up to the 11th one, the 11th position. And then that would be my median. Now, you also have to do that. You have to find the middle number, even if you have an even number of values. So I took an example 14. Now you'll notice we use the same formula. But if you use this formula, you get 7.5. And that doesn't that's not the median. That's just how many positions you have to go up, right? And so remember on the earlier slide, we had, we had to go between the third and fourth position, we had to average those two numbers. Well, this is basically saying, if you get 7.5, you have to go to the seventh and the eighth, the one that straddles in, and those are the two that you average. So if my end, like 100 is a nice, even number. So if you have 100 plus one, and you get 101, then you've got, you know, 50.5, right? And that just is a secret message that when you line up all your data, you take the 50th one in the row, and the 51st one in the row, add them together divide by two, and that's going to be your median. So I just wanted to share with you this little formula just in case you get like a large number of numbers thrown at you and putting them in order is a big pain. And then you have to figure out how many to count up, you can use this formula to get the middle number. So what does a median tell you, we have a lot more to talk about here. First of all, it's called the 50th percentile of the data, what it means is 50% or half of the data points are below the median. And the other half are above and that intuitively makes sense because you just created we created this median together. And we could see that half of the points on the bottom half on the top. And so it's also known as a middle rank of the data. And what's nice about the median is it doesn't really care much about the ends of the data. Like if I gave extra credit to a few people in my five point quiz, and they got a few sixes, probably the median wouldn't even change because it's in the middle where all the action is where we find the median. And all ours don't really bother it because like, if one or two people get a zero on the quiz, it's really, you know, if there's 21 people in there or 100 people in there, it really isn't going to affect, you know, these things happening at the end. So we like the median because it's very resistant and it's very stable, you can't really whack it out with some outliers, throwing them on the ends. Now I'm moving on to the third measure of central tendency, which is a mean, but I also threw in here, trimmed mean and weighted average because there are other kinds of means. And we're going to talk a little bit also about resistant measures because like I just mentioned that. But I'm going to step back and talk a little bit about the Greek letter sigma here, it's actually capital sigma, I do not speak Greek, and I actually have trouble speaking statistics because a lot of it's in Greek. So I try to avoid that and my lectures, but sometimes you can't get away from it. So I have to really introduce you to this capital sigma. So in English, we say or statistics these I guess, is whenever you see this you say some of blah, like you expect something to be right after it. Okay, so if you see like the sigma and then x, you would say some of x. That's how you say it. So what is x? Well, remember how we were just making medians. And we were looking at modes, well, each value there is considered an x. Okay, so each of the values in those days sets in x. So some of x would mean add these all up or add up all the axes. And then I just threw on another example, let's say somebody came to you and said some of x y, it would mean you must have some x y is lying around and you have to add them together. Or somebody came up to you and said, you know, some of the prices on your of the food in your basket and the grocery store, right, somebody said some of that, you'd be like, Oh, okay, I have to go through all these prices and add them up. Right. So that's what some of means. Okay, and it's used a lot in statistics and we're going to use some of all the time. So I just want you to get in your head that whenever you see some of there's probably going to be this thing next to it. And it's going to be a batch of numbers that you have to add up. And if it's numbers from our data set, it'll be called x. If it's other numbers from something else, it'll be called whatever they're called. But just know that this means some of. And I see on the slide, the upper one is times new Roman and the lower ones aerial they look kind of different. But I just wanted you to get ready to deal with this sum of a lot. Okay, so here we are I'm hitting you with a sum up. This is the formula for the mean. And a lot of you already know how to calculate the mean and you just kind of do it and you didn't know this is how you say it in statistics. But basically it's this ratio. So this is like a fraction. And on the top of the fraction is a sum of x you add up all your axis. And on the bottom of the fraction is n, which is however many you have. So you add them all up and divide by however many you have and you've probably been doing this for your whole life. But this is actually the formula. So I just thought I demonstrated. See, I put that sum of remember those six data points I was using from the median, I just kind of copy them over here. I add them all up. And so I got some of access 40, right? And then I counted them and that was six, while I made them be six. And so 40 divided by six is 6.7. So that would be the mean for these data. And you probably already knew how to do that. But I wanted to sort of cross hatch it with the actual formula. Okay, now I'm again going to take a little break here to just talk about means. Because remember, we talked about sample statistics and population parameters. If somebody just talks about a mean to you, and they say, look, the mean such and such as six or something. Unless you really get into it with them, you're not going to tell it's not going to be obvious if they did a sample mean, or did a population mean. So but when we write this down, it becomes obvious. If I say x bar, see that x without line above it, that's pronounced x bar. And you'll see I write it on the slides x bar because it's so hard to put that little line up there. But that means the same thing. This x bar, whenever you're x bar, or you see that x with the line over it, it means that it's a sample statistics. So if you ever saw like x bar equals six, not only do you know the mean is six, but the secret code says this mean comes from a sample, because x bar is being stated. But if you look on the right side, you'll see that it says, there's this m, and it's pronounced mu, it's a Greek letter again. And I you'll show, you'll see on the left, I put it in aerial. And on the right, it's in Times New Roman, it looks a little different. But it's pronounced mu. And so if you saw mu equals six, you'd be like, Whoa, that was a population they measured. And the probably say that to because you don't see mu a lot like people usually don't measure the population. It's a lot of work. You often see x bar. But even so, I want you to be cognizant of whether it says mu or whether it says x bar, because it's still going to be a mean. But if it's mu, they're talking about the population. And if it's x bar, they're talking about a sample. And that might be more important later, but just keep this in mind. Also, when we talk about samples, we use a lowercase n to mean the number of numbers we have. Whereas if we use, we're talking about populations, we use an uppercase and a capital n. So you'll see that the sample mean formula on the left side, this x bar equals sum of x divided by n, it changes if you're talking about the population mean, and you're like, come on, you add it up the same way like mu is basically the population mean. And capital N is just the number in the population. And it means almost the same formula. But the issue is you really are supposed to label things what they are. So if you're doing a population mean, you're supposed to call it mu, and you're supposed to use, you know, write it like that on the right side of the slide. And if you're doing a sample mean, you're supposed to call it x bar, and you're supposed to do it like on the left side of the slide. So I just want to make that clear to you as you go through the rest of these lectures, because when I say mu, I'm going to mean a mean, but it's going to be from a population. And when I say x bar, gonna mean a mean, but it's going to mean it's from a sample. All right, so now we've talked about several measures of central tendency, but I wanted to put means and medians together in kind of a cage match, because I wanted you to look at them and see what their differences are. Now, I've been sort of giving accolades to the median, right, because it is very resistant to outliers and it's very stable. Remember how I pointed out if you throw some outliers on either side, it doesn't really affect it much. Unfortunately, means are not resistant to outliers. You could just throw like if I took my five point quiz, and I just felt like favoring a student and giving them 10 points, it would totally screw up the mean for that class. And it's so it's not very stable. So one of the things we can do if we've got outliers in our data is to just use the median. But sometimes we want to use the means so we got to do different things with it. So one of the things we can do to try and make a more stable mean or honest mean is to trim it. So I'm going to talk about how you do that. So as you can see on the left side of the slide, a very high value or very low value like an outlier, or more than one outlier can really throw off the mean. And it's not a problem with the median. So if you want to make the mean a little resistant, what you can do is trim data off of each end. So the outliers get cut off. Okay, the problem is, you can't look at the data when you're doing that really, you would just have to make a rule when you're not looking and say, Okay, I'm going to trim x amount off the top and x amount at the bottom, and it has to be equal. And you just have to look away when you're doing it. Okay. So what some people do is a 5% trim mean, which means you take 5% of the data at the top and cut it off, and 5% at the bottom and cut it off. So you basically lose 10% of your data. And in healthcare, a lot of people get mad about that they don't want to lose any data. So they don't like to use this way of fixing the problem of outliers, they use other ways. But I wanted to show you this is a simple way to fix it. So I'm going to imagine we have a hundred data points because it just makes it easier for you to see what's going on. So if you had 100 data points, 5% of them would be five. So basically, you'd be trimming five off at the top, and five off the bottom. So the first step would be is probably you already made the mean out of this 100 and you didn't like it because you saw outliers at the top and bottom. So what you have to do is put the data in order just like you do for the median, you put them all in order, you sort order from, you know, the lowest to the highest take all of your 100 and do that. Then what you would do is you would like circle the five most bottom ones and they're going to get cut off. And you circle the five top most one and they're going to get cut off, they get thrown out. And then you're, you've got the 90 values left in the middle. Now you make a mean out of those. And then that's a 5% trend mean and you got to tell people if you do that, you say here's the original mean and here's a 5% trend mean, because then people get an idea that there must have been some outliers and some of your data got hacked off. But then this might give you sort of a more stable estimate of the mean. Now I'm going to move to something else entirely. It's not about trying to make the mean stable. It's just about trying to make the mean a little different. Sometimes certain values in your mean should count more than others towards the mean. And that's sounds really esoteric. But the way we see it all the time is in school. So you might get a great grade on your homework, you might get a is on your homework, right? But if homework is only worth 10% of your final grade, it doesn't help you much. And so what that 10% is in when you have a class like that is it's called a wait when you move into statistics, you say, Well, I'm going to, you know, I as the teacher, I'm going to wait your homework grade at 10% of your final grade. So it doesn't matter how awesome your homework grade is, or how bad it is, it's really only going to count for 10% of your final grade. And that's why we do weighted averages is, you know, I don't think your homework should be worth like 50% of your grade, right? That doesn't make any sense. And so even though, so you might want to have different things contribute a different amounts of weight to that final mean. So this is a way of messing around with the mean, and making certain things going into it, count for more, or have kind of a bigger vote than the other ones. And so I again, I'm just going to stick with school to give examples, because this is where we normally see it. So I made up this example where homework is worth 10% of your final grade and quizzes would be worth 20% and the final with worth 70%. And I just want to point out, I've actually seen people do this, like because I tutor, and like, this is horrible, making your final worth like over 50% of your grade. So this is just a shout out to any like professors watching this, don't do those Okay. But anyway, let's say I was mean and I did it. And let's say you were pretty good student, and you got an A on the homework, right? And so we're going to say that's a 4.0 because a lot of schools would say a is 4.0. Then let's say you got B plus on the quizzes, maybe because the lectures weren't very good, right, ha, ha, ha. So you got B plus on the quizzes that would translate to the number 3.5 on that four point scale. And let's say you got a B on the final, that's too bad. But that's 3.0. So why do I say that's too bad? Well, you probably wanted an A because the final counts for greater weight, right, it counts for 70% and you'd want that to be really high grade. Now I first wanted to show you the non weighted average like the normal mean you would make the normal mean you would make is you just add the four to the 3.4 to the three and then divide by three because you have three in there. And you'd get 3.5 you'd get a B plus in the class, right. But let's just look down or let's look up at that formula. So this is the weighted average formula. It's the sum of x times the weights divided by the weights. And remember when I said sum of x y, like as an example, so we have to instead of just summing x like we did in the non weighted average, we have to do x times w on all of them in summit and you're like, what's w Well, remember, I told you, with the homework worth 10%, that's the weight for it, right. And so instead of using percent when we do the weighted average use the decimal version. So you'll see under the weighted average, I'm doing that sum of x w thing, by taking the four and timesing it by point one for that 10% first, and then see that B plus that 3.5, that gets multiplied by point two because that's worth 20%. And then there's that B you got on the final right, that gets multiplied by point seven. So that's the sum of x w thing going on. And what do you get you get 3.2. Now I don't even bother to to divide this by some of w, because some of w is one in this case, like if you add up point seven plus point two plus point one, you get one and that often happens you just make the weights add up to one. But I just wanted to let you know for some reason you had goofy weights that didn't add up to one. The last thing you have to do is divide by them. So as you can see in the lower part of the slide, the sum of x w is 3.2. And if we divided it by one, we get 3.2. And now you don't get B plus in the class. Now you get like a B. And that's the difference between the non weighted and weighted average is the weighted average weighted this final B extra. And then that caused the grade the final grade to be lower. And that's what waiting is. Now I just want to say a few things I've gone through all our measures of central tendencies, but I wanted to talk about how they relate to the distributions we learned recently. So I just put up an example of a normal distribution. And then I color coded these lines. So see on the way right, there's a color coded mean. And then there's a green median. And then there's a purple mode. Technically, these should all be right on top of each other. But you couldn't see him if I did that. So I just wished him up next to each other. What the point is, is if you have data with a normal distribution, all these three things are on top of each other. And what the magic of this is, is you don't even need a histogram to know it. So like I use statistical software. And I'll feed in the data, like a quantitative variable. And I'll say tell me the mean median and mode. And then it will tell me the mean median and mode. And even if I don't look at the histogram, if it says almost the same number for mean median mode, I automatically know it's a normal distribution. Well, that's not the case with skewed distributions. So with skewed distributions, the measures of central tendency are not right on top of each other. In fact, they're in a different order depending on whether we have right skewed, or left skewed. So at the top of the slide, I've got an example of a right skewed distribution, right, because it's light on the right. Alright, so what's happening here? Well, the mean is getting dragged around by that tail, that big tail. So you can see that the blue mean is on the right side of the median. So the median is more resistant. So it's sort of hanging out closer to the bottom of the data. But the the tail, that right tail is pulling the mean up. And then the mode is the lowest one. So if I get this print out, and I see that the mode is the lowest, the medians in the middle, and the means the highest, I can say without even looking at the histogram, this is probably right skewed. Now let's look at the bottom of the slide where we have the left skewed distribution, you know, because it's light on the left. And you see the same phenomenon, but it's going the other direction, that that tail, that's towards the low end of the data, it's dragging the mean down now. And notice the median is more resistant, doesn't get dragged down as much. And of course, the mode stays at the high part of the data where there's more data, right? So if I get the print out, and I see that the mean is the lowest and the medians in the middle and the modes highest, I'm like, Okay, I don't have to look at the histogram. And I know this is left skewed. So this is basically what I wanted to tell you about this, the distributions, and these actual numbers and how they sort of relate. So in conclusion, what this lecture was mainly about was the measures of central tendency, right, mode, median and mean, and how to calculate those. And, you know, I've been kind of begging on the mean, I'm sorry, but the mean is just not resistance, totally not stable. And the median is so you want to remember these things. Yeah, you can kind of fix things by doing the trimmed mean, we don't really like to do that in health care, because we lose some of our data, we find other ways of fixing the fact that our mean may be kind of goofy. But they're outside of this lecture, how we do that. I also showed you about weighted average, you know, just in case you have to hand calculate your grade. Actually, I had a student in my class once and this is back when we had blackboard and there was something wrong with blackboard. So she she was really upset because she thought she was getting a really bad grade. But she was getting a bad grade because she didn't do a good job of learning weighted average, because when I showed her how to actually calculate her grade, it turned out to be a B. I remember she was crying, because she didn't unweighted average, she was crying in my office. And then I just showed her how to do the weighted average, and she stopped crying. She was getting a B. So just don't cry, try the weighted average first. Okay. And then finally, I went over distributions and measures of central tendency, and just related to you how the distributions, how the numbers we get from the measures of central tendency, how we can put them on distributions and see some information about the distribution. All right, well, you made it through the measures of central tendency, get ready for 3.2 measures of variation.