 Yoo-hoo! It's Monica Wahee again, your statistics lecturer from La Brea College. I decided to chop up chapter two and reconfigure it. So this first lecture is going to be on part of chapter 2.1 frequency tables and the entire chapter 2.3, which is stem and leaf displays. So here are your learning objectives for this lecture. At the end of this lecture, you should be able to state the steps for making a frequency table, define class, upper class limit, and lower class limit. You should be able to explain what relative frequency is and why it's useful for comparing groups. Also, you should be able to state the steps for making a stem and leaf display. And finally, you should be able to describe the difference between an ordered and an unordered leaf. And if all that sounds foreign to you, don't worry, you'll understand it all at the end of this lecture. So just to introduce what I'm going to cover, first I'm going to define for you what a frequency table actually is, and then I'll explain to you how to make one, which will help you understand even better what it is. After that, I'm jumping right into what a stem and leaf display is and how to make one of those. And the main reason why I combined these is because I feel like stem and leaf displays can help you make frequency tables. That connection was not really made in the book, so I'm making it here. So let's just start with the frequency table. So what is one of those? Well, you know, when I think of frequency, I think of the radio, right? Like I think of REM, what's the frequency Kenneth? I think that was our last hit. Okay, that's not what we're talking about. We're talking about frequency like the word frequently, like how frequently do you go to work per week, right? And you would count how many times you go to work or go to class per week. Well, frequency is like frequently, it's like how frequent something happens. So first, I'm going to explain to you what a frequency table is and why you make them. Then I'm going to define some more terms, I just define frequency, I'm going to just define some more that you're going to need to know. And then I'm going to explain the steps for making a frequency table and a relative frequency table. So remember quantitative data. I'll just remind you qualitative data are categorical. So that's like gender race diagnosis, where you put individuals into categories. And quantitative data are numerical, remember, like age, heart rate, blood pressure. Now I just want to calibrate you to the idea that this whole frequency table thing, this, this whole thing is about quantitative data. And so this entire lecture actually is focusing only on quantitative data and not qualitative data. Alrighty. So when you have quantitative data, as you probably noticed, if you've ever had it, right, like let's say that you, let's say you go on Yelp, you know, I always give that example. And you're trying to decide whether to go to restaurant or not. You have a bunch of fives and fours and threes and twos and one stars. How do you know, you know, you just have a pile of numbers. So how do you organize them? I'm going to give you like a totally fake example I made up. Okay, so I'm pretending that 60 patients were studied for the distance they needed to be transported in an ambulance. So how far they needed to be transported from where they called the ambulance and were picked up and actually got to the hospital. So the shortest transport in my fake data, or the minimum was one mile, which is awesome. That's kind of what happens to me because I live right near a hospital. Hopefully I don't need to be in an ambulance very often. But that's what happens in urban centers. The longest transport the maximum was 47 miles, which would really suck. And I just want to point that out that happens to people in the rural areas because of lack of access. So this is kind of realistic, even though it's fake data. But anyway, it's hard to just look at a pile of numbers. So how do we understand these data? Well, now I'm going to start those definitions. The word class means an interval in the data. So in, remember, we're talking quantitative data. So let's say I just made up. Well, how many people got transported between 30 and 40 miles? Okay, that would be a class 30 to 40, right? And a class limit is the lowest and highest value that can fit in a class. So carrying on with my example of a class, I just randomly pick 30 to 40. If we made that a class, we would say 30 would be the lower class limit. And 40 would be the upper class limit. Make sense. Alrighty. So then, of course, you have the width of the class or the class width. So that's how wide the classes. So carrying on with the example, if the upper class limit was 40. And the lower class limit was 30. What you do is you minus 30 from 40, which you get 10, and then you add one, and it equals 11. That's the little formula. But if you're like me, and you count on your fingers, you would go 30, 31, 32, 33, 34, blah, blah, blah. And you realize that there are 11 numbers in that. Now we get to frequency, like I sort of quickly explained, and that is how many values from the data fall in the class. So how many patients were transported 30 to 40 miles, or another way of saying it is, if you look in all the data you have, and you find every single person that either got a 30, 31, 32, 33, blah, blah, up to 40, count all those people up that then you will get the frequency. For that class. Okay, but you probably realize you do need to decide on classes before you go counting frequencies because you need to know the lower and upper class limits. So let's talk about some rules about classes. First of all, classes have to be the same with you can have 30 to 40, and then 40 to 42, right, or 41 to 42, right, you can have skinny class, fat class, they have to have the same with But there are different ways to pick it right. So class with can be determined empirically isn't that a fancy word empirically just means you just choose it because you like it, right. And if you ever look at survey data about just about anything when they look at the quantitative variable of age, they often put that in classes. And as you'll see on the slide, these are the classes we often see 18 to 2425 to 3435 to 44. And you can go on right like that's what you normally see and that means empirically you just picked it out of the hat. And already you're probably noticing Well, 18 to 25, or 18 to 2465 and older, those classes aren't really equal as the ones in the middle, right, like what's the upper class limit for 65 and older. Okay, well, that's just normally what happens in the world and especially in health care. And in health care, when you pick classes, even though the classes are technically supposed to be the same with you really should be guided by the scientific literature. And you'll see why later, when I show you the other videos on in this chapter, it's because you really want to be able to compare whatever you find to whatever other people have found before you. And therefore, you don't want to cut up your classes in different ways, or it's hard to compare them. However, in the book, they teach this class with formula. So I thought I should really show you that too. So here's the class with formula that I don't really see used much in health care statistics, but I'm going to teach you anyway. So this is the formula. First, you calculate this number, you find the maximum in your data and you're in the minimum in your data and you subtract the minimum from the maximum. So the example I was giving from the fake data about the transport is 47 was the maximum and the minimum was one. So I did the first step and got 46. Okay, looking back under the formula, you divide whatever you got there by the number of classes desired. In other words, like however many, you know, categories you want, right? So if you never want too many, like you don't want 10 or something, you know, three, four, five, six, seven, usually something in that range is a good number of classes. So let's pick six just for fun. So we'll take that 46 number we got, we divide it by six and we get 7.7. Then back to the formula side, how you decide then your class with is you increase this number. You get to the next whole number. Now a lot of people are confused by that because even if I got something like low like 7.1, I'd still go up to eight, you have to increase it up to the next whole number. So you have like this, this integer, you know, that's a number without any decimals after it. So you have this integer for your class with so our class with in this example then would be eight. So now I described to you that whole class with but I'm not going to use it in the example, because we don't really do that much in healthcare and it makes it actually kind of hard to understand because you want something that's a little intuitive, like if you look on the slide right now, you know, less than 20 miles 21 to 29 30 to 39. And then 40 or more. That makes a little more sense in your head. You know, that's how we think of miles. If I had put like 18 to 24 and 25 to 29, you know, we don't really think that way. So this is helpful in healthcare to boil it down to something like this. And by the way, if I was writing a real paper and this or real data, I'd be looking at the papers before this that talked about transport times and looking at those class limits. Okay, so a frequency table displays each class, along with the frequency, the number of data points in each class, as you can see the class limits are on the left side of the simple frequency table, you know, the classes, and then the frequencies on the right side, right. And you'll notice that they all add up to 60 because we measured 60 fake patients. And it's really good to do that little check, because you don't want to double count people put them in two classes, they only get to be in one, etc. So selecting arbitrary class limits can make the frequency table unbalanced. So in other words, doing this empirical thing can make it sort of weird because less than 20 is big, and 40 or more miles is big. And it's bigger than the other classes. So it's does it kind of breaks the rules of class with But not following the scientific literature can make your results non comparable and can make the science less useful. And so that's why sort of flail against the book with this class with formula thing. So I'm going to just give you another example for a frequency table. Okay, this one is more, it's also health carry, you know glucose is measured in the blood and expressed in milligrams for 100 milliliters, right. So glucose is a huge molecule, and it should be cleared from the blood, especially a fasting. So if you're not eating anything, you're not putting any glucose in you, your body supposed to be like metabolizing that. The problem is, some people don't metabolize glucose very well, you know, that's what diabetes is. So you care about how much glucose is sitting around people's blood. So blood glucose levels for a random sample of 70 women were recorded after a 12 hour fast. And this is what they got, they got the minimum was 45 the maximum was 109 and they picked six classes. So this is how they set up their class limits. And again, this is using a the class with formula, and just to demonstrate, you know, it sort of comes out a little weird here. But then they they got these frequencies. Okay. And this is again just another example using this time the class with formula to get our six classes and to make sure that they covered everybody. Now you'll notice in this, we start with the minimum like 45 to 55, and we end with the maximum, which is up to 110. And that's really the clearest way to do it. It's just not typically done that way. If you read, like scientific literature in healthcare, you just don't see these frequency tables labeled like that. So and just to wrap up this part, make sure all of your data points are accounted for only only once in one of the classes. So whether you use a class with formula, or you use empirical or arbitrarily picked classes, every single data point only gets one vote, it can only be in one of the classes. And, and also you don't want to leave any of the data points out. So you want to make sure that that happens that you account for all of them. And also you need to make sure your classes cover all the data, right? In healthcare, when we do that thing up to 20 and 65 and over all that stuff, we cause that to happen. However, if you're going to use a class with formula, you really have to pay attention to where your minimum and your maximum are, because then you want to make sure all of your classes cover all of your data points. And like I mentioned, make sure the total of your classes of the frequencies in your classes adds up to the total number of data points. It's just a little check. Make sure you didn't do something wrong. Now I'm going to talk about what is a relative frequency table. And that builds on what you already just learned about frequency. So we all know what our relatives are, they're like our family, right? We have relationships with them. And so what relative means is in relationship to the rest of the data. Okay, so in statistics, they often use this fancy F to stand for frequency. And as I've mentioned before, the sample size, if you have a sample, they use a lowercase n. So what they use as the formula for relative frequency is f divided by n. And if you're clever with math, you realize what that means is, is if you take a frequency of any of the classes, you know, it's just a portion of the whole sample, and you divide it by the total sample, which is that n, you'll get the proportion of values that are in that class. It's not really that fancy. So relative frequency is something very useful to put in a frequency table. So you'll see that I, I kind of crammed it in on to the right side. This is the old frequency table I just showed you with glucose, but I crammed in this relative frequency next to it. So it's super easy to calculate, like for example, for the first one see 45 to 55, the frequency is three, what did I do? Pull out the old calculator. Well, actually, I use Excel, and I did three divided by 70 because that was the total. And I got 0.04. And those of you don't really like proportions, you can do that thing where you move the decimal two places to the right and then put as percent sign. So that would be like 4% of those 70 people are in that first class. And then the same thing happened with the next one I took, you know, the 56 to 66, I took seven divided by 70, which came out to 0.10. And those of you in 2%, I'm really into I like moving that decimal over. I think of it as 10%. But whatever, as you can see at the bottom, it all has to equal 1.0 if you like proportion land, or 100%. If you're like me and you like percent land. But in any case, this is all you have to do to do the relative frequency table, you just make another column and do all those calculations. And it's super easy to calculate and it's very helpful. So why did we even do this? Because we had a pile of quantitative data. And it was really hard to organize, right? And the first thing was, we had to do was select class widths. And I talked about the politics behind that. But ultimately, whatever you do, you do in the lower and the upper class limits need to be determined and put in the first column of your frequency table. Then in your second column, which are the frequencies, you count up how many are in that class and you fill it in. And then if you make that third column, then you can do that dividing thing and get your relative. Frequencies and that's great. That's how you build your frequency table. And as I go through future lectures, you'll see even more why you would make that table, like how useful that can be given that you have quantitative data and it kind of gets all over the place. It's very helpful to organize it in that table. Now I'm going to move on to talk about the stem and leaf. And the reason why I picked that talking about it now is because it's on the theme of organizing quantitative data. So I'm going to talk to you about what the stem and leaf plot actually is. Here's just an example on the slide and how you make one. And why why you might make one of these you'll find it feels a lot like making a frequency table. But why do you make these instead of a frequency table? And it's just more food for thought. So first, one of the things that I got hung up on when I took biostatistics is I could not get over the fact that it was called a stem and leaf. So I had to understand that. So this is the an example of a stem and leaf there. So why is it called a stem and leaf? Well, there's always the stem. And that's so see these corn stocks, I'm from Minnesota, I'm used to seeing them. You'll notice that there's a stem, right, like this big corn stock has the stem. That thing you see that vertical line and a bunch of numbers on the left, that part of the stem and leaf plot is called the stem. And then leaves are added onto the stem as we tally up the length of the leaves and that may not make much sense right now, but I'll show you how to make one. But essentially what you end up doing is adding these leaves, like you see under two, there's a little leaf that just has a zero on it. But if you see under five, there's this big long leaf with a whole bunch of numbers off of it. So I'm making one will help you understand this terminology. But I first wanted to just show you this picture because it's actually kind of hard to understand what's going on with a stem and leaf unless you understand that that vertical line and the numbers to the left of it is considered a stem. And then each one of these things we build off, you know, off of each of those numbers is called a leaf. So people talk about the four leaf in the five leaf. Alrighty. Okay, so again, I'm just so into making up data, right. So I decided to make up data from 42 patients who visited a primary care clinic and referred to mental health. Now, the reason why I made up data on the subject is I'm very upset about the subject. I think people are waiting too long to get mental health treatment, especially if you've been following the news about the Veterans Administration in the US. A lot of people are put on hold even for primary care, you know, they're put on waiting list and I don't like it. So I made a fake data about that as a demonstration just to highlight these issues. Okay, so what did what data did I make up? I made up the the number of days between the referral and their first mental health appointment. That was what was collected. So let's say you go in on January 1 and you get a referral and then 10 days later, you actually show up at the clinic, then that would be 10 right that would be your value. So that's quantitative. So let's take a look at it. So on the right side of the slide, you see just this pile of numbers from all these people that came in and and and got a referral. So like you look at the first person had to wait a month, go see a mental health professional. But if you look, you know, the third one in that person only needed 12 days. So that's how you sort of consume this fake data I made. And then you'll see over on the on the left side, I already made a step. It's blank. It doesn't have any numbers on it. But I knew I need that vertical line. So I just made that in preparation. Okay, so let's build our seminary. So what we do is we start with the first number. And that's what's awesome about this is you just start with the first number. And if you want, you can kind of cross them out as you go along to keep track. So we start with this first number. And you'll see what I did, I went over to the stem. And I put the three on the left side of the stem and the zero on the right. This begins the three leaf. Okay, here's the next number. Now, I put the two above the three, because it's like right before it. And you can kind of imagine, we're going to walk down like 23456. And then I put the seven on the right side to start the the two leaf. All right. Here we are with the next number, which is the first number. 12. And as you'll see, I started the one leaf, you're starting to see the pattern, right? And you can probably guess what's going to happen next. We start the four leaf and put the two there. Okay, our next leaf, we've already started right for 35. So what do we do there? Well, we just add the five onto the three leaf, the three leaf was already started with that that 30 at the beginning. So we just pile a five on there. Here's 47, we just pile a seven on there. Now you'll notice I tried to line up that seven on the four leaf with the five on the three leaf. When you're doing this by hand, well, even when you're not doing it by hand, you really have to keep those things lined up or you, you won't have a good stem and leaf. Okay. Now I'm going to just fast forward a little, little because you can probably imagine how to do the next row, the 38 36, you just keep piling it on. But I want to show you what happens when you get to this special case here. Okay, we'll go with this 29. This is the last thing before the special case. So you'll notice that 38 got put in there, see that eight and three leaves that 36 got put in there, you know, from the second row. See, we put everything in there. And now we put in the 39. Look at that, we got a three after that. That's our next one. So where are we going to put that three and I, you know, you might think on the three leaf, but that's not right, right? Because that's 30 something. So where do you put the three? Well, some of you figured this out, you have to add a zero on to your stem. So look at that, I put that zero there. And then we put the three in. And then you can already guess how to do the 21 next, we'll just tackle one onto the two leaf. But then when we get to the next zero, we just add a zero onto the zero leaf. So you can probably figure out how to pile up all of these. But I did want to talk to you about something else that happens with these stem and leaves. As you go on adding to the leaf, you got to be careful because you might end up with a situation where you got something big. Now I really feel sorry for this fake person. 51 days for mental health appointment. That's too long, right? But it causes us later to have to add a five. Now this and cause real estate problems, especially on a piece of paper, you know, what if you, the four was right at the bottom of the paper, right? It's kind of hard, you maybe have to tape some paper at the bottom, I have this problem a lot. You'll see here this, I even had to move this up on the slide when we got later to the 70 and add the seven leaf. Now I just want to show you, for some reason in the state that we didn't have any 60s, but you still have to put that six leaf placer in that that's got to be there. So even if you know, as we go on, if we're missing any leaves in between, we just need the placer there because that space has to be there. And here's here's an outlier. We're going to learn about outliers pretty soon. This is a really long time 105 days. This is kind of like VA status, right? But again, you'll see that and of course, this is fake data, but unfortunately reflects real data. You'll see when we get to 105 not only did we skip the eight leaf and the nine leaf and we need to leave a space for them. But 10 becomes the part of the stem. So if we went on to 200 or 300, I mean, that would be awful to wait that long. Though the first two digits of it, like if we had 365 the 36 of the 365 would be the part of the stem. Alright, so I just as a little demonstration to explain certain nuances of the stem and leaf that you might encounter in your life. So now I'm going to just reflect back on the two ways that I've described in this lecture for you to organize quantitative data. First, I showed you how to make a frequency table. But what you need to do with that one is you need to set up classes and class with and and to count up frequencies. And there are a lot of there's a lot of pre processing a lot of pre calculations. You really want to think when you're doing this and you don't want to be distracted. However, if you're trying to do a stem and leave you really can do that on the fly. You don't need to set up classes or class with as you noticed, we just went through the line of those pile of numbers and just crossed them off as we put them onto the stem and leaf. And there was really no need to count. You can tally the data as you go through the list, you know, cross it off. And it's just really quicker to do. Of course, those of you who are pretty clever saying, Well, basically, you're forcing in a stem and leaf everything to be in a class of, you know, the 10s, right, you know, the 20s and the 30s and the 40s. That's like the two leaf, the three leaf and the four leaf. And yeah, it is kind of like a simplified way of making those kind of classes. But in any case, I just wanted to alert you to this because you might see some similarities between the two. And I wanted to highlight those as well as the differences. Now I'm going to give you a few tricks here. I want to tell you about the concept of an unordered leaf. So an unordered leaf is what we were making before when I was demonstrating. It's just where the numbers are out of order in the leaf, like you'll see this two leaf, it says 7729. Well, if they were in order, it would say 2779 right like the two would come first before the sevens and nine. And the same with the three leaf, that's out of order because you can see that it's zero and five is fine, but eight doesn't come before six and five, right? That's no problem to make an unordered leaf. However, after making an unordered version, you can rewrite the stem and leaf in an ordered way. So you see how I did that, I rewrote the two leaf and the three leaf and now they're all the leaves are in order. Okay, you know, they don't have to be but you can do that. And if you do that, if you make your stem and leaf first unordered the way I was demonstrating, then you rewrite it into ordered, it is way easier to count it up to make a frequency table, no matter what classes you choose. Or you can just make each leaf a class and then it's super easy to make the frequency table. So that's why I combined these two pieces of the chapter together is because I wanted to show you how you can use a stem and leaf to help you make a frequency table. So a stem leaf is just another way to organize quantitative data. And it's easier to make kind of on the fly than a frequency table because it requires less preparation. And it can help you put data in order before, like in preparation for a frequency table, sort of to help you as a first step to make sure that you can organize everything in. And at the end, remember, I keep emphasizing, your frequency table has to reflect all your data points and they can only be in one class blah, blah, blah. Well, this is one way to make sure that happens is to first do this pre organization using an ordered stem and leaf. So in conclusion, frequency tables and stem and leaf displays organize data, they organize quantitative data. And the stem and leaf may help you make a frequency table so you might want to start with And the purpose of both of these things is to reveal a thing called a distribution. And I'm going to explain that in the next lecture.