 Hi, ho, it's me again Monica Wahee, your statistics lecturer from a library college. Now we're going to go go back and cover what I didn't cover in the last lecture about chapter 2.1, which are frequency histograms and distributions. So here are your learning objectives for this lecture. So at the end of this lecture, you should be able to state the steps for drawing a frequency histogram. You should also be able to name two types of distributions and explain how they look. You should be able to define what an outlier is and say one reason why you would make a frequency histogram. Finally, you should be able to define what a relative frequency is and what a cumulative frequency is. Okay, so let's get started. First, we're going to review frequency histograms and relative frequency histograms. So you'll figure out what I'm talking about there. Then we're going to go over five common distributions in statistics. So you know what that's all about. And then I'm going to talk about outliers. Now you'll notice I have a lot of pictures in this presentation of skylines. And the reason why is they remind me of histograms. So let's talk about what is a frequency histogram. So a frequency histogram is important in statistics because as you'll see, you need to make one in order to see what the distribution is. So I'm going to go first explain what one is like show you what one looks like. And then I'll explain how to make one. And then I'll explain the relative frequency histogram. And then we'll move on to looking at why do we need that for distributions. So here's another skyline because it looks like a histogram to me. So what is a frequency histogram? Well, it's actually a specific type of bar chart. And it's made from data in a frequency table. So you might see a frequency histogram and go, Well, that looks like a boring old bar graph. Well, it's not just any old bar graph. It's got specific properties that I'm going to talk to you about in this lecture. Okay, both frequency histograms and relative frequency histograms are bar charts, but they're special bar charts that have to be done a certain way. And why? Because if they're done that way in their histograms, they will reveal the distribution of the data, which I'll explain later. So here is a frequency table. We had this before. This was of those fake patient transport miles, right? So you'll notice here where the class limits. And then we put in the frequency and we even threw in this relative frequency. Okay, so this is the frequency table I'm going to use as a demonstration for how you make a frequency histogram, you first need a frequency table. Okay, now here's the histogram version of what's in that frequency table. So I'm going to annotate this one image to explain the order in which you draw it basically by hand. So the first thing you do is draw this vertical line for the y axis. Okay, you just draw a line. Next, you write words next to the line. And you always start with frequency of, and then whatever in our example, it was patients. Okay. And I'm telling you, you need to do it in this order, or you'll get confused. So you start with that first line, and then you write this frequency thing. Okay. Next, you draw the horizontal line for the x axis. Okay. And then after that, you write the classes below. Remember, like the lowest class is one to eight, that's a lower class limit and upper class limit of the lowest class, like you literally write those labels in. And why do we have to write those labels in? Remember, like the lowest class is one to eight, that's a lower class limit and upper class limit of the lowest class, like you literally write those labels in. And why do I why am I so freaking out about this order? It's because I totally get confused if I do not do this y axis first, because then all there's all these numbers, and it's totally confusing. So just try to do it in this order. Okay. Now, number six, I had to flip the slide here. Okay, at step six, you've drawn like the basic background, you've got the x and y axis and those labels. So now you have to start drawing in the bars. So for your first bar, you look at the first class, and you find the frequency on the table, which I think it was 14 or something. And so you look for it on the y axis. And you want to label the y axis so that the maximum one is is incorporated in it. Like you see our maximum is above 20. So we wouldn't want to end our y axis at 20 or 15 or something, you have to make it bigger so you can put everybody on there. But our first one was what at 14. So we draw this horizontal line around the 14 right there that that horizontal line because we're going to make that first bar. Then you draw the two vertical lines down and you position it over where you labeled the class. And that makes the bar. And then you, you actually color in the bars, like, and you repeat this for each class, right? So you go, that's why I labeled the classes first on the x axis just to make sure everything is even. And then I go through and I make all the bars. And again, this is why you need to prepare your frequency table first. So you know how to graph it, you know what to put on this graph. Okay, this is the relative frequency histogram, you already understand what relative frequency is, right? It's that proportion, the proportion of your sample that's in each class. And so the change, if you're going to do a relative frequency histogram, you basically go through the same steps, it's just you're changing what's on the y axis, you change what you label it. Okay. But the x axis stays the same. And even though you're you're charting the relative frequencies, like you'll be like, Okay, this is a totally different number. What you'll see is the pattern ends up being the same. So it takes on a similar pattern, which is the pattern is actually what we're going after. That's the thing I'm going to talk about with the distribution. And so I tend to prefer, since the pattern is going to come out the same, I tend to prefer using a relative frequency histogram, versus a frequency histogram, because if I have two different groups, like let's say, there were two hospitals, and I gathered two sets of data, and I wanted to compare the models transported, then I could use this relative frequency histogram and not only would the patterns be evident, but I could compare them fairly, like, whatever's 35, you know, point 35 or 35% in this, even if the other hospital maybe had tons more transports, I could see it as like 35%. And I could really compare the percents, right? So that's why I lean towards relative frequency histogram. But ultimately, you're going to get the same pattern on your histogram, whether you use frequency or relative frequency. So again, another picture of a skyline. So you can see why I think of skylines because they look like histograms, right? So after making a frequency table, which you do with quantitative data, right, because you're trying to organize it, it's also important to then make a frequency histogram and or relative frequency histogram. And why? It's because it reveals a distribution. And now, that's what we're going to talk about. We're going to talk about distributions. So first, I'm going to define what I'm talking about the distribution. And now you're going to see a lot of other kinds of pictures like this on the right, see that that shape, that's one of our distributions. Okay, and so that's a little prequel to what I'm going to say. So first, we're going to talk about what these distributions are, that I'm going to describe what an outlier is, and, and how you can detect them by using histograms. Finally, I'm going to wrap it up by explaining what cumulative frequency is, and what an ojive is. Okay, so what is this distribution thing I keep talking about? Well, it's actually just a shape. It's the shape that is made if you draw a line along the edges of the histograms bars. So on the left, you see I drew the scribbly shape. But you'll notice you can do it with a stem and leaf too. This is not the same data graphed on the right and the stem and leaf, I'm just using, you know, recycling the old picture that I used before. But you see you can do the same drawing that squiggly line, you know, and that's actually the distribution. I mean, they don't all look exactly like that. But that's what you do is you draw this line thing. I know it's kind of odd that that's what a distribution is, it's just a shape. But there's actually five of them that we use a lot, there's way more than five, actually, in statistics, but you have to get into kind of higher level statistics to care about those. We're only going to concentrate on these five. Okay, so the first one is called normal distribution. And it's called that everywhere. Except I noticed the book called it Mount shape symmetrical distribution. But I'm going to call it a normal distribution. And there's nothing really normal about it. It's just named that for some reason. Then there's uniform distribution, skewed left distribution, skewed right distribution and bimodal distribution. So those are the five we're going to cover. So let's start here with the normal distribution. So as you can see on the right, somebody made a histogram. And then they drew that squiggly line. Well, actually, it was me who made this histogram and drew the squiggly line. And notice the squiggly line, what it looks like, it kind of looks like what the book called it, it's Mount shaped and it's symmetrical. But that's the shape of the normal distribution, it looks like that it's got kind of a hooky things on the side, and and a mound in the middle. And if that's what your histogram ends up looking like, where it's kind of like a little mountain like that, then you've got a normal distribution. Okay, let's look at a different histogram. Okay, in this histogram, you'll notice that like each of the bars, each of the frequencies is almost like the same, right, it's either five or six. And it doesn't matter what class we're talking about. When it's like that, the little line you draw across, it's not squiggly at all, it's straight. I don't see this very often in healthcare data. But it does happen in other kinds of data more frequently. And this is called the uniform distribution, which makes sense. It's almost all of these bars are a uniform height. So that's what a uniform distribution is. Okay, now this is one kind of like the one we were looking at before, where it looks kind of like a slide like at a playground, where you know, like you climb up the right side and then you slide down to the left side. Okay, and that whenever it's like that where it's low on one side and high on the other, it's called skewed. The problem is, which way is it skewed, right? And how I remember which way to say it's skewed, is it skewed where it's light or short. So here, I would say it's light on the left. So it's skewed left, right? Because on the left side, it's really the bars are all short. And then you can just imagine what's going to come next here. Well, look at this, this is skewed right, because it's light on the right, it's short on the right. So it's skewed right. So technically, I mean, both of them are just skewed distributions, I like, I just like to explain them separately, because sometimes people don't know which way to say is left or right. And this is how I remember light on the left, light on the right. Finally, we have by modal. Now, the word mode in some areas of statistics and engineering and stuff, often means like a high point. And by modal means two high points. So as you can see, it looks like a camel with two humps. And it's a little hard sometimes to tell by modal from normal. Because if you remember normal, like, let's say you have a normal distribution, but you just have one little one little bar kind of in the middle, you're like, is this by model or is this normal? How I tell coach people to see if it's by modal is if there's a really big space between the two humps. That's not so apparent on this image here. But you'll see class three and class four, they're both short. If only one of them was short, I might might have called it a normal distribution. But I've really seen by modal distributions when it comes to like lab data, because my best friend is a pathologist, and he'll show me, you know, with situations where people have like really super high platelet counts, and then like no platelets practically, and there's nothing in the middle. And that's where you'll see a by modal distribution. Now we're going to talk about outliers and outliers are data values that are quote very different from other measurements of the data. What's very different, right? Like it's an opinion. But people in statistics come up with different formulas to try and figure out as something is very different from the other measurements. And we'll talk about that actually later in later chapters in the class, not so much for identifying outliers, but just to just to better understand our distributions. But just as a quick and dirty representation of what would be an obvious outlier like nobody would disagree on is this histogram here. So you'll notice I just threw down nine classes, I made up this data. But you'll see in class two and class three, there's just like nothing. And there's nothing in class eight. But when you get and then suddenly there's something in class one and something in class nine. And when you have these big gaps, this is kind of like that platelet thing I was telling you about, only this maybe would be, you know, you would say this was tri modal like there's three modes, but there's not really three modes, right? There's a wacky low one and a wacky high one and everything else is in the middle. So because that one in class one and that one in class nine, they're so far away from what's in the middle, like just about every statistician would agree, these are both outliers. But you can just imagine how much we argue about what actually is an outlier. It's especially hard when you're getting data on weight of people. Some people really do weigh 400, 500, maybe even 600 pounds. You don't know if they're really outliers or data mistakes, or what to do with them, they're real people and maybe they have really high weights and unfortunately, some of them have really low weights too. So the one of the main points of doing the histogram is not only to look for these distributions, but also to see if you've got any super obvious outliers that you're just going to have to think about before you proceed with your analysis. Now I'm going to talk to you about what cumulative frequency means. You know, the word accumulate means to just like keep accumulating things like if you have a gutter on your house, it will accumulate leaves like old leaves will sit there and new leaves will keep coming and the old ones will still be there until it like totally clogs your gutter, and you have to clean it. So that's what cumulative frequency is, is where it accumulates all the frequencies. So you see on the slide, you know, in the first class one to eight, we had a frequency of 14. So your cumulative frequency, those are like the leaves at the first beginning of the season, that's all you got is 14. But when you add on the next class 21, now you add to where the cumulative frequency it accumulates, you add that 21 to the 14. And now you've got 35. And if you can extrapolate as you walk up all these classes, eventually you get to the total, right. And so, yeah, so that's what you got. And the first class is always the same as the frequency and each cumulative frequency is equal to or higher than the last one. I'll have to say in health care, we don't really use cumulative frequency a whole lot, you'll see it, but we are really into relative frequency, I'll just tell you that. But some groups are into cumulative frequency and those who are, they like to plot it in a plot called an oji. And again, I'll be honest, in health care, I've never seen an oji. That was just in the scientific literature, which is why you'll see this is about NFL team salaries, because I think they use it a lot more in economics. But at any rate, what you'll see is that the classes are along the x axis, you know, you're used to that because that's what we do in a frequency histogram. But along the y axis, you see these numbers called cumulative frequency. And you just graph it, right? But one of the things you'll just notice is that it's going to go up, like each one is going to either, unless you have a class with zero in it, it's going to stay the same for that one. But otherwise, it's just going to keep going up. So you'll always see some sort of shape like this where it's always going up and it hits the top. At the end, it hits the total cumulative frequency at the end. So just to review, there are our five main types of distributions used in statistics. And I emphasize main, there's other ones, but these are the ones we're going to look at. And so that's why we were doing our histograms in our stem and leaf displays is we were looking for these distributions. And also, we were looking for outliers. And then finally, I just quickly did a shout out for your Oh, Jive here in your cumulative frequency. So you know what, what's up with that. So in conclusion, the purpose of the histogram is to reveal the distribution and also the stem and leaf displays reveal the distribution. And you look then for outliers, you're probably wondering, Well, why do we do all this work to reveal the distribution? Well, you'll find in later chapters, it matters what kind of distribution you have, what kind of statistics you can do. And sort of in a way, you know, like I went kind of on and on about the normal distribution, well, we all really like that in statistics, we're all really partial to that because it allows you to do a whole bunch of different statistics, you know, pretty easily if you get a normal distribution. However, what's often happens is in health care, because I've done it is you get a skewed distribution left skewed, right skewed. And then you have to make some decisions, that makes a little harder. Also, I've had a bimodal distribution before I'm remembering that one day. That was kind of an issue. And then I had to figure that one out. So that's roughly why we have to go through this chapter and figure out how to do these distributions. And then later, I'll explain to you what you do with that knowledge.