 Hello there, it's Monica Wahee, Library College statistics lecturer. We're going to circle back now to chapter 2.2 and talk about these other graphs. I'm doing things a little out of order because it makes sense to me. I hope it makes sense to you too. Well, for this lecture, we're going to have these learning objectives. So when you're done with this lecture, you should be able to describe a case in which a time series graph would be appropriate. You should be able to explain the difference between what would be graphed on a bar graph versus a time series graph. You should be able to describe the type of data graphed in a pie chart, and you should also be able to list two considerations to make when choosing what type of chart to develop. Alright, so let's get started here. What I'm going to be doing in this lecture is first I'm going to explain what a time series graph is. Then I'm going to talk about a bar graph. And of course, I'm going to show you roughly how to make these. I'm going to explain a pie chart and how to make that. And then I'm going to go over a review of all the graphs I've talked about for chapter two and just summarize when to use what type of graph. So let's start with the time series graph. And actually, the word time is the key. The time, we're going to talk about this time series graph and what our time series data, right? As you can see by this little example, time is across the x axis, and that's kind of a hint for where we're going. Okay, so then I'll show you roughly how to plot one. And I'll explain why we have these time series graphs. Like how you interpret them and why you even make them. So, of course, I'm an epidemiologist. So what am I into M&M mortality, morbidity. So here's a nice time series graph, wonderful graph of the percentage of visits for influenza like illness, reported by the US outpatient influenza like illness surveillance network, by surveillance week in the And this is October 1, 2006 through May 1, 2010. And you're like, Oh, time. Yeah, that's the deal. Time series data are made of measurements for the same variable for the same individual taken at intervals over a period of time, only in this case, in the example here, the individual is not a person, right? Because remember, individuals are just what you measure what you're measuring variables about. Here, the individuals are actually weeks, right? Because every week, they're making a measurement. So like I said, time series data are made of measurements for the same variable, which is what percentage of visits for influenza like illness. So every week, they went to I don't know who is in like what clinics are in this outpatient influenza like illness surveillance network. But let's just pretend there's like 10 clinics in there. So each week, these clinics have to go in and say, Yeah, I had, for example, 100 visits this week, and 10 of them were for influenza like illness. So then that would be 10% that week for that clinic. Well, they got all the clinics together. And they found out what the percents were. And you can see on the y axis, right? There's a percentage. And then you see on the x axis, all the weeks in the year. So, um, so you've seen these before, right? You especially see it with stock market. Right? You go on Yahoo, and look at your favorite stock, right? You know, we're all so rich, we own so much stock. And so you track your favorite stock that way. Personally, I'm spend more time looking at mortality and morbidity, things like influenza. But hey, after I get some money, I'll be looking at stock market prices. So when we see these time series data, graphed in these time series graphs, it's often about things like influenza rates. Other rates, you'll see life expectancy, rates of heart attack. And that's usually what we see because we're trying to affect those rates and we're trying to see if they're going up or down. So I'm going to just roughly go through how you make one. If you ever wanted to make one, the first thing you need is as a table kind of like the one on the right. I just made up these data, they don't mean anything. But roughly what you need is a column that says, in this case, I put year, the influenza people, they put a week, but you have to put like regular time increments in the first column. And then you have to put that variable measured at that time in the next column. So let's say it's today and you're like, Oh, I want to measure how many times I went to the gym each week, you know, weekly over the last few months. Well, you're gonna have to reconstruct that data, right, like maybe from your memory or your calendar. So normally, when you're going to go do time series stuff, you start and you collect the data as you go along. And then it's nice and accurate. Okay, so let's say you did that and you managed to get some time series data together, then how do you plot it? Well, the first thing you do, and I'm using this influenza thing as an example, is you draw a horizontal line and you make that your x axis. Now you gathered your data based on years or weeks or something. So you can label those time periods there because you already know those time periods. And so you just label that x axis. Then you draw the vertical line for your y axis. And again, you've done all your measurements, right? So if you were measuring how many times you went to the gym per week, you know, maybe once a day, you know, that would be seven would be the maximum, right? So you want to make sure your y axis is tall enough to get that seven and if you had a good week there. And so that's really what you're looking for in the y axis, you don't want it too tall, like you see the highest point that they have in 2009 they had an outbreak there. They needed to make sure that the y axis was tall enough so that they could graph that. But other than that, you don't want it too much taller. And then make sure you label it. I'm big on labeling here, because otherwise people get confused. Okay, now we're going on to the next step. Then this is where you get into actually putting in your data. Now, because there were so many weeks, like if you look at like 2007 is only about like the x axis is only about two inches wide and all like 52 weeks of 2007 were plotted in there. So it literally looks like a super smooth line. But honestly, what they did was they went and they put each point in. And so they put each point in separately. And then they connected the dots. And that's why it looks so smooth. If you only have a few points, and you have a wider x axis, it'll be a more choppier. It'll be it'll look a little bit more like those stock market graphs that that go up and down up and down and kind of look like a roller coaster and not so smooth. But if you have a lot of points and you mission together ends up looking really smooth. You also I just wanted to point out can have more than one line on the graph for more than one set of data values like here they're comparing I don't know some sort of book performance how much it was sold in US versus Canada, you just have to make sure that you have a legend. If you do that so people can tell the lines apart. So to summarize time series graphs are useful for understanding trends over time like whether things go up or down like you saw on that on influenza chart we could see when they're apparently was kind of an epidemic or an outbreak. So graphing more than one set of time series data like you saw in that last graph on one graph can help in comparing the differences between the data sets. I worked it for the US Army and there's a lot of problems with people getting injured in the army. And so I made a lot of time series graphs of rates of injury over the years because we were trying to do things to make the rates of injury go down. And then that way we could see if the trend was there that we were actually making them go down. So that's the main goal of these time series graphs. Now I'm going to move on to talk about a bar graph which can display quantitative or qualitative data. And I'm going to first start with the features of the bar graph. Here's just an example on the right here. I'm going to talk about how to make one. And then we're going to talk about what happens when you change the scale, meaning the x axis like how how tall the x axis is on a bar chart because it really changes things. I call it a bar chart sometimes or bar graph. They're really the same thing. I don't know why they chose graph in the book. But then finally there I want to do a little shout out to what Pareto charts are we don't really use them much in health care but I still wanted you to know about them. All right, so let's look at the features of a bar graph. The first thing you want to know is that they the bars can be vertical or horizontal. So don't even though I'm showing you this horizontal exit or this vertical example, don't be thrown off if you see a horizontal example. Regardless of whether they're vertical or horizontal, the bars are supposed to have a uniform with an uniform spacing they can't be wider skinnier and they have to be spaced apart at a uniform rate. I'm going to use like I said this big one here as an example to talk about bar graphs. I just want you to notice what is being graphed here and this is the percentage of people in the US not covered by health insurance and it's split up by race and ethnicity and it's looking at the years 2008 through 2012, which is like bad right like you want people to have health insurance. Okay, um, so item three here says the lengths of the bars represent either the variables frequency or percentage of occurrence. So if we were looking at instead of percent like it's I've circled percentage because that's what we're looking at in this one, we could have looked at, you know, number of visits at a health care clinic and that would be frequency right, but we happen to be looking at percentage here. So I so I just wanted to call that out. So you see then on the Y axis, we have the measurement scale and as long as we write it there and we use that same measurement scale for graphing each of the bars, we will be fulfilling the item for which is the same measurement scale is used for each bar. I don't know why anybody do it any other way, but that's part of the features of the bar graph. Now, this is a feature that really is like my pet peeve, I get so irritated when I find a bar graph or any other graph where things are not labeled, I get totally confused. So you really want to put on a title, you need to put the bar labels, at least on the on the x axis, right, like you have to know, see how it says white alone black alone like you wouldn't even know what those bars were unless somebody put something there, right. And some people also add the actual values for each bar. I'll do that if there's space, like there was space here. If it gets too busy, I don't do that. But because you can kind of see them from the graph. Now, you're probably wondering, you're probably kind of having a flashback, you're like, this looks totally like a histogram. What is the difference? Well, I started by talking to you about histograms. They're actually a special case of a bar graph, right. So bar graphs are more general in a histogram is a specific type of bar graph. So histograms are bar graphs that must have classes of a quantitative variable on the x axis. So you can already see that the bar graph I'm showing you is not a histogram, because it says, categorical qualitative things, it doesn't have a class, right. Also, histograms must have frequency or relative frequency on the y axis, which as you can see, this has percentage of something so that's not that. So this isn't a histogram. But whenever you make a histogram, you're just making kind of a special bar graph. And I just wanted to point that out. So you weren't confused. Now I said I was going to work with the histograms. histogram. But whenever you make a histogram, you're just making kind of a special bar graph. And I just wanted to point that out so you weren't confused. Now I said I was going to warn you about what goes wrong when you change the scale. And what I mean by changing the scale is when you look at that y axis, notice how it the top of it, the way this person made it is at 35 or 35%. But notice that the highest racial group without health insurance, which is unfortunately those of Hispanic origin, that that's close to 30. But it it's not all the way up to 35. So I'm not exactly sure why they made it so high. So I wanted to see what would happen, what the shape would change of these bars if I actually made the top 30. So I regenerated this, and then you'll see what happens. See, it's the same data. I just made it. And I made the top be 30. And it's kind of subtle, but suddenly, all the bars look bigger. Right. So if I were like some advocate, and running around saying this is terrible, you know, these people don't have insurance, I'd like to look at the one on the left more than the one on the right. But, you know, in a way that's a little misleading, right? It's the same data. So the differences between bars are more dramatic when we change the scale to be shorter, a little bit more dramatic. But let's go the other way. And this is where I see people do things a lot. Let's see what happens. See how that the top of the y axis is 35 right now. Let's double that let's just make it 70. And then let's see what happens. As you can see, the differences between the bars look small, right? Like the difference between that big Hispanic origin one and the lower white and Asian alone ones isn't really that big anymore. So my opponents would rather look at that graph. In fact, everything looks kind of small on that graph, they say, Oh, there's no problems with insurance. And that's, you know, when people talk about lying with statistics, so to speak. I mean, these are the kind of tricks people do to try and change how things appear. The best way to do it is to just do kind of what I suggested is look at the next one up from your tallest one and do that. Use that as their top of your y axis. What I would have to do with the army is I was looking at rate of knee injury and also rate of ankle injury. But knee injury was way more common. And so if I wanted to compare the two, I always use the same scale. Because otherwise, people wouldn't be able to see that the ankle injury was really, really low compared to the knee injury, even though they're both important. With tall with the taller y axis, the differences between the bars look just less dramatic. And also the taller you make your y axis, the less it looks like you have over the bars. So you got to be really careful. I don't think you would do that. But you know, other people do that when they're trying to make their points. So just be careful for that. Also, a term that was mentioned in the book is the term clustered and clustered bar graph. It's not that complicated. It just means more than one bar is graph for each category. You'll see in the in the last one I did, it was just on on one topic. And here, if you look at this one on the right, and of course, I mixed up a little I did the horizontal version. But this is life expectancy at birth. And it's it's separated by you'll see that there's three sets of bars, right? There's both sexes together. And there's a bunch of bars for that. And you see the legend Hispanic non Hispanic black non Hispanic white, and then they mix them all together all races and origin. And then they also have separate set of bars for male and female. And so this would be clustered. And if you do that, you really need a legend so people can tell what's going on. You'll also notice that you know, life expectancy, that's good if it's high, right? You want to live to be 80 90 100. But if you look at the bottom of the slide where we have the x axis, if we made if we started at zero and just made it all long, it would not even fit on the slide. So what they'll do is I'll make these little hash marks or this little squiggle and indicate that they just skipped ahead. But like I said in the first part of this, if they skip ahead on the female one, they have to skip ahead on all of them, right? So everything is skipped ahead there. This is a fair comparison. It's just like we're sort of, it's like we're fast forwarding through the movie up to about 50. And then looking at the differences there because everything's the same up to that. So that's just another thing about scale is notice whether it's clustered, if you've got a legend, and also look for the squiggle. Okay, now I'm going to give you a shout out to a Pareto chart. And you probably already noticed we don't really use these much in health care because this example is about causes of an engine overheating. Well, we don't do that a lot in health care. And you'll see I kind of slapped on a label on the y axis, the word frequency. Okay, so in a Pareto chart, this is you remember how I was saying this histogram is a special bar chart, or bar graph, well, Pareto chart is a different kind of special bar graph. Okay, and in that one, the height of the bar indicates the frequency of an event. Like if you look at these events here, like damage radiator core, that happened 31 times, right? And that happened more often than faulty fans, which only happened 20 times. So what they do is they figure out what happened the most and the second most and least whatever, and they deliberately arrange them in order left to right, according to decreasing height, it's a way of sort of zoning in on what is the most important problem you're finding. So it's really meant to graph frequencies of problems. I actually only saw one Pareto chart I've ever, ever in health care so far, I really looked for one. And what it was about was it was about things that can happen that are bad in a nursing home. And I remember the tallest bar was for falls, right, like people fall in a nursing home. And then there was a smaller bar for medication errors that happens. The reason why we don't, I think the reason why we don't use these a lot in health care is, you know, let's pretend that's what this was of, let's pretend this 31 instead of damage radiator core said 31 falls. Well, the first thing you'd probably ask is, Well, how many people are in that, that nursing home, you know, and how long did you collect data for, right? 31 falls is pretty bad. But it's not bad if you have hundreds of people over 10 years of that, all you get is 31 falls, you're doing pretty well. So I would say that the reason why we don't use Pareto charts a lot in health care is it sort of leaves out some important information about these serious events. And so we like to look at things in different ways. So just to summarize about bar graphs, bar graphs must be made following a few rules. I talked to you about the, you know, the difference with, you know, you have to keep the width the same and, and how you have to label the axis. So we know what you're talking about, because you can visualize both quantitative and qualitative data using a bar chart. So these labels become really important as do scales, right? Like I showed you how you change the scale and you can make things look different. So you want to be careful and be cognizant of that. And also I did a shout out to Pareto charts. And I explained why I think they're not used that much in health care. Now we're going to jump into pie charts. You know, just even the thought of a pie chart makes me hungry. Doesn't make you hungry. So here's what a pie chart is they're also called circle graphs. They're used with counts or frequencies that are mutually exclusive. And that sounds really fancy. But all it means is when every individual can only fall in one category. So I'm going to give you the example on the right, which is actually from a real report you should probably read. It was a survey that was done by the Massachusetts Nursing Association. And they got 339 nurses to fill out the survey. One of the questions was, Do you receive annual bloodborne pathogen training? Now the answer is only going to be yes or no. They can't say yes and no. That is what mutually exclusive is is where you can only answer one answer. So as you can see 234 people said yes, which is good. And 105 said no, which is bad. I'm worried about that. But these pie charts are often made in graphing programs because they're a little difficult to do by hand. And I'll explain to you why. And unlike Pareto charts, these are super common in healthcare. As you can see right there on the slide. So let's look at the features of a pie chart. I actually just made up this fake pie chart. I pretended I had a class where I gave a five point quiz, right. And the reason why I did that is I wanted to show you how to do it with a quantitative variable because remember the last one, it was yes or no. And that's qualitative. Those are the answers that the nurses could give to that survey question. Well, this is a different one. This is where I actually put, you know, fake students into their their points on this quiz into classes, right, like you see zero points, one to two points, three to four points and five points, right. So, regardless of whether you're doing yes, no, you know, qualitative, or, you know, different categories like that, or you're doing classes like this, every individual in your data must be in only one of the categories, only one of the classes kind of like frequency tables and histograms, you everybody gets one vote. And that's really important in a pie chart. Even though it can be used with qualitative or quantitative variables, and you'll see later what I mean by that. And so here is just a fake example I made of how you would then make a pie chart out of a quantitative variable. So I'm just going to briefly go over how you would do this by hand and I'm realizing I've never done this by hand. I always use Excel as you probably recognize that lovely purple color, which comes out of Excel. But if you were going to do it by hand, I guess you'd have to go buy one of those things in the lower left, which is a protractor, because that helps you see the degrees of a circle. Remember, a whole circle has 360 degrees, right? I don't know if you remember all this from like trigonometry. And but then like a half circle would be 180 degrees. And so that's how you figure out like how much of the piece of the pie you need is using this protractor. So if you're going to make a pie chart by hand, you first have to make a table, you'll see we make tables constantly and statistics. And I put class in the first column, because I was doing a one that required class because it's quantitative. If you were doing that one with the nurses saying yes or no, you would put category and you just say yes or no, right, and then total. Then of course, next, you put the frequency. And I always put total to add it up to try and make sure you know my fake class apparently at 37 people in it. So I just want to make sure you know, everything adds up. Then the next step will remind you of relative frequency, it's where you figure out the proportion of the circle that that's going to take up, right? So see the five points, how the seven people who got five points, well, if you divide seven by 37, you're going to get point one nine, well, that's I like percent. So that's 19%. So that would say what proportion of the circle they get, right? And then finally, in the last column, remember how I was telling you the whole circle is 360 degrees, we take that proportion you get, and you multiply it by 360 to figure out how many degrees you're going to make your circle and that's why you need the protractor. And that's also why I always use Excel for this because it makes it so you don't have to worry about those things. All you would need for Excel is actually just the class or the categories and the frequency. And then if you use their automatic PyGraph function, then you can get all this other stuff out very quickly. So I just wanted to make a few notes about pie charts. This is the thing I'm coming back to is this mutually exclusive categories. So I want you to imagine that I do a survey, right? And I ask the question, what is your favorite color? And I give some choices like red, green, blue, whatever. There's only going to be one answer to everybody's question, right? Because you can only have one favorite, right? And that then is eligible to be used in a pie chart, because everybody gets one vote. But a lot of times I'll see people who do a different survey question, they'll say, check off all of the colors you like. So if I get that, I'm like, Oh, I love red, I like orange, I like green, I'm checking off a bunch. There's some people I know who don't really like color, like they just wear gray and black. So they probably wouldn't check off anything. And then there are the people who just check off one or two. Well, as you can see, people can have multiple votes or no votes or whatever. And if you have that situation, like I was telling you where people can say multiple things, you've got to go into bar graph land. Okay, because a whole bunch of people can like red, a whole bunch of people can like green, a whole bunch of people can like blue, and you won't get a circle out of that. If everybody answers just one answer. And so therefore, everybody's in a mutually exclusive category, then you can use the pie chart. I also wanted to let you know that I find it and I think a lot of people do more informative to put the percentage on the actual chart than the frequency. Some people put both the frequency and the percentage, which is good. It's not so helpful to just put the frequency as you see the the nursing report did on the left. And it's because you really don't know, you know, 234 seems like a lot. But what proportion is that of the circle? That's what you would kind of want to know. Whereas if you look on the right on mine, you can see like, for instance, only 5% got zero points. That's a small amount, right, you know what 5% means. It's just hard to tell, you know, if you look at that one on the left, looks a little like two thirds, which would be 66%. But we don't know what the percent is, right? And so it's really helpful to have that percent. And always include a title and a legend. Because if you're if you're graphing a pie chart, you're going to have more than one category. And so people are going to want to know what that color means. This looks so good, doesn't this look good? I'm pie charts are common in healthcare, and they graph mutually exclusive categories. Okay, so so you'll see this all the time. And like I said, it's easier to make using software, I use Excel, it can come out of other software, but I just like Excel, because you can really put fancy labels on and you can do that squiggle thing. And but choosing a graph requires some consideration, like whether or not you actually want to make a pie chart or a bar chart or whatever, requires some thought. And also, regardless of the chart you make, you should follow these rules, you should always provide a title. Okay, even if it's just for your private use, trust me, I've done this, I go back and I'm like, I don't even know what I graphed. So take your time, sit down, write a little title. So you remember what you grab. Also label the axes. Because again, you think you're going to remember, or maybe you think it's obvious, everybody in the audience is going to tell, don't leave anything to be assumed, just be absolutely clear about what's on each axis. Always identify your units of measure. So if you're talking about a rate per 10,000 people or a percentage, or maybe you're talking about an average, or you're talking about a frequency, it doesn't matter. Just make sure you're clear about what you're talking about. In the units of measure usually this ends up on the y axis. So the thought is to make the graph as clear as possible. Thinking font size, thinking number of items graph, you know, I've sometimes seen a bunch of time series graphs, or they put so many lines on there, I can't even see anything. Or they'll have these really tiny font sizes, or they'll just try to put too much on one graph. And it's hard to read. So if you find, if you have trouble reading it, probably everybody else will. So you want to modify it. So I just throw this on the right. Can you tell what's missing from the above graph, the above graph is really missing a lot of information. I mean, we don't even know what it's about. We, we can kind of guess it's a time series graph because of the time at the bottom. But what else, right? So the person who made this really knew what they were talking about, but we don't. And you don't want that to happen to your graph. Okay, so here, what I'm going to do is review all the different graphs I've talked about in chapter two, and talk about the cases where that graph is useful. So you can keep this straight in your heads, what why we have all these graphs, right? So first, there's the frequency histogram, remember that that was only for quantitative data. And that's what you make when you want to see the distribution, right? Remember, the distribution was a shade. And you and a frequency histogram is a particular type of bar graph that is meant for showing these distributions. I also showed you how to make a relative frequency histogram, which is almost the same thing, only it graphs the relative frequency instead of the frequency. And that also will show you the distribution, right? Because a pattern will be the same. But this one's specifically good for comparing to other data. So if you have two sets of data, maybe from two different locations or two different groups, then you want to use the relative frequency histogram, because then it's easier to compare distributions, right? I also showed you how to make a stem and leaf display. I explained what a stem and leaf is what the leaves are and what the stem is. And that's also for quantitative data. And that's also if you want to see the distribution, it's also good for organizing the data. It's a little easier to make by hand than a histogram, because the histogram makes you make a frequency table first and stem and leaf display, you can kind of skip that step. So again, these first three, we're just about trying to take quantitative data and visualize it so you can look at distributions and also look for outliers. Next, we went into the time series graph. And that is really about time, right? That's for graphing a variable that changes over time, and is measured at regular intervals, mainly to see trends like is it going up? Is it going down? Was there an epidemic? And that's what a time series graph is for. A bar graph that now this is the generic bar graph, not the specific histogram. Like I described, but the generic bar graph can be used for qualitative data or for quantitative data. And it can be used for displaying frequency or percentage and we went over some examples. Then I shouted out to the Pareto chart, which is a special bar graph, right? And that special bar graph graphs frequencies of rare events in descending order, usually bad things, you know, rare bad things. And again, we don't really use this much in health care. Finally, I went over the pie graph. And that's for mutually exclusive categories, quantitative or qualitative. And we use those a lot in health care. So in conclusion, in this particular lecture, I first went over the time series graphs and explained how they show changes over time. And then I went over bar graphs and showed you how they can display quantitative and qualitative data, they can be up and down or horizontal. It showed you some different examples. And then we went through pie charts, looking at mutually exclusive categories, which I think are my favorite, like, look at this pie, this looks makes me so hungry. But at the end, it's important to pick the right chart, because you want to have a useful visualization of your data, if you're trying to look for a distribution, choose the right kind of visualizations, the right kind of graphs, if you want to instead look for trends over time, you got to choose the right kind of graph. So I gave you some pointers on how to do that. And now my mouth is watering. So I'm going to go eat some pie.