 Over the past few months, we've been looking at a special type of data called paired data, where we take an entity and we measure it twice. Over these episodes, we've been looking at different countries' willingness or intention to receive the COVID-19 vaccine, looking back in 2020 at August and September. Over these episodes, we've looked at a lot of different things using GG Plot 2 and related functions and packages, but basically we've been looking at different ways to visualize paired data. We've looked at dumbbell charts, where on the y-axis we put each country, on the x-axis we put the actual percentages. We've also looked at slope charts, where we made different lines with two discrete variables on the x-axis and continuous variables on the y-axis to better see the change in people's intention, typically decreasing in intention. In the last episode, we then made a labeled scatter plot, where on the x-axis we put what people said their intention was, the percent intention from August and on the y-axis, October, and those countries that fell below the line indicated that those countries were less willing to receive the vaccine between August and October. In this episode, we're going to look at yet another way to represent the paired data. If you look at the title that we've been using in all these different charts, it has been COVID-19 vaccination intent is decreasing globally. Well, we've shown it qualitatively, right? We show the downward trends, but we're asking our audience to make that mental calculation between intention from August and October. Why don't we just plot the difference? Well, that's exactly what we're going to do in today's episode. On the x-axis, we're going to plot the percentage point change in people's intention to receive the vaccination, where on the y-axis we'll plot the different countries. This will allow us to highlight the differences between using a dot chart and a bar plot. The main difference here is what do you do with countries that have zero change? Also, that seems like a lot of ink to make that rectangle to that bar for not much gain in the amount of information. I'm looking at my code in RStudio that I used to build that labeled scatter plot in the last episode. I've gone ahead and created a new branch for this episode that I'm calling dot chart, because we're going to convert what we have for this labeled scatter plot into a dot chart. Again, we load our libraries, our different fonts, we create our data data frame, we created a legend to indicate things above the line, we're increasing intent things below the line, we're decreasing an intent, and then we build out our plot. And again, this is a labeled scatter plot, and we save it to a tiff file. Again, this was the labeled scatter plot that we created in the last episode. And what we're going to do in today's episode is again, convert this to a dot chart, and also we'll look at it as a bar plot. So as I mentioned, we're asking our audience to do some work for us in all these different visualizations, which is to mentally calculate the difference between the percentage from August and October. Why don't we calculate that for them? So if we look at the data data frame, we see that we have three columns, we have the country, August and October. So I'll create a column in my mutate called change, and this will be October minus August. So that when I look at data, I see that these 15 countries, most of them have negative signs, they're red indicating that they've got a decrease in tension to receive the COVID-19 vaccine coming to our ggplot pipeline here. To make this dot chart on the x axis, I'm going to put change on the y axis, I'm going to put country, and I don't need that label anymore. So I'll go ahead and remove that. Also, I don't need this AB line. That was the diagonal line that I used in that labeled scatter plot. And geom point is going to stay. I don't need these labels. I'm going to go ahead and delete this for the chord fix. I was setting the x and y limits. These are not going to work for our data on the x axis. It goes from 50 to 100. Nobody increased their intention by more than 50 percentage points. So I'm going to look back down here at my data data frame. And I see that the smallest was minus 12. And the largest was about, I think four. So I'm going to do minus 15 on x axis up to five. And then my y limit, I'm going to, I'm going to remove this because I'm going to let ggplot to figure that out for me. I also don't need this clip equals off. And now we have all the data. But I'm realizing that I also have chord fixed on here, rather than chord Cartesian. Okay, also, if I scroll down to the bottom here, I've got my legend code in here, I'm going to go ahead and comment this out. Again, remember that was putting it way up at like around 9500%. We now have all of our data represented in the plotting window. However, our x and y axis labels aren't quite right. Let's go ahead and modify those. I will come up to my labs. And for why I'm going to put null. And on my x axis, percentage point change in intention to receive COVID-19 vaccine between August and October 2020. Now that's a really long title to put on our x axis. I'm going to put a line break in here somewhere right around there. And let's double check in our theme that we've got axis title x is element markdown. So that should get automatically converted. I don't need this axis title y because I don't have a title on my y axis. So I can probably put that line break before the COVID-19 just to make it look a little bit more even. So we'll go ahead and do that. Good. So the two lines of our y axis title look about the same length. I'm pretty happy with that. The next thing I want to think about is, can we order our countries such that they have a little bit more of a meaningful order? Currently, they're in alphabetical order, depending on your orientation. They're either alphabetical going from Australia up to USA or opposite or reverse alphabetical from USA down to Australia. Let's go ahead and see if we can get it to plot the decrease in intention at the top. So we basically want China at the top. Let's see what that looks like. And we might try it the other way where China would be at the bottom. Well, how do we do that? Well, we can set the order of these countries on the y axis by modifying the country column in our data data frame as a factor to order it by the change in percentage points. So we can come right back up to our data data frame here. And I will add a line to this mutate statement where we'll do country equals FCT reorder. Again, that's a great function that comes to us from four cats built into the tidy verse. It's already loaded and everything. So we'll reorder country and we'll do it by the change column. And so sure enough, now we see that we have China at the bottom and South Africa at the top. I would like to flip this because again, the title is decreasing globally. So the emphasis on decreasing. And so I think the thing that the top should be the country that decreases the most. So how do we do that? Well, again, we come back up to that FCT reorder. And I can put a negative sign in front of change. We now have China at the top and South Africa at the bottom. And just to kind of check ourselves, we also have India and Canada, right at about zero. Something else I'm noticing about my title is that in the previous episode, we added some padding around the title, because we were using that cord fixed. Let's go ahead and remove some of that padding around the title. And let's maybe make the font just a little bit bigger so that we have intent at the right side of the title. We can come back up here and I'll do bottom of 10. And we're going to remove the top margin. And I'll make my size 28, which is what I think it was before. So now we have COVID-19 vaccination intent on the first line, decreasing globally on the second line. I kind of like that the typography of the title is working with us to help tell our story. And I think that the proportions around the title between the title and the rest of the plot look pretty good. So I think this is a good looking figure as it is. But I think we can do more to help our audience to understand what's going on here and to allow them to better see the representation of the data. I would like us to put a vertical line here at zero to indicate where the zero mark is. Again, maybe then we could bring down our legend and we could show that things to the left of that line are countries that are decreasing intent, things to the right are increasing intent. Also, something we could do is perhaps put a hairline grid line for each country so it's easier to connect each point with the country. When we get to these kind of three or five countries at the bottom, it becomes a little bit more challenging to have to scan back and forth to know what country corresponds to what point. So coming back up to the top of our GG plot pipeline, you'll notice that I commented out GM AB line. Well, GM AB line is a really nice way to annotate a figure with a line, right? In this case, a sloped line. There's two other types of geomes that we will use to add our vertical line as well as our horizontal lines. So let's do the vertical line first. We'll do GM V line. And so V line is a vertical line, right? And here we can say x intercept equals zero. So let's look at what this looks like. Very good. We have a solid black line going up at zero, which is great. I want to go ahead and make that a thin line and have it be gray. So it's not so pronounced. It's not really part of the data. It's there to help us interpret the data. So I'll do size equals 0.25. And then for my color, I'll use the same six As that's a nice light gray color. Very good. We now have that thin gray line indicating where the zero mark is, where's the break even of countries having the same intention in August and October. The next thing I want to do is add those horizontal grid lines to indicate what country corresponds to what point. Of course, we could add these grid lines using those grid arguments as arguments to the theme function. But I want to show you how we can do this with a GM. And so we'll come back up here to where we have our GM V line. And I'm going to add a GM H line. And here we use Y intercept, right? And so we're going to use the Y intercept. And what I want to do is I want to map my country to the Y intercept. And because I'm going to map a column from my data frame to with the aesthetic Y intercept, I need to wrap this in the AES function because we're mapping country to the Y intercept, right? If I was only using one intercept, like I did up here for GM V line, I can give that as a direct argument to GM V line without needing to use the AES function. But again, because we're mapping a column's worth of data to an aesthetic, we need to use that AES function. I'll also use size. And let's make it a little bit thinner. So I'll do a point one. And I'll use that same gray color. So I think we're gaining on it. But I think that horizontal grid line could still be a bit thinner. I'm going to go ahead and shrink it further, maybe make it 0.05. Yeah, I think that decrease in the size to 0.05 really cut into the intensity of that line really well. And so now it's a lot easier. I'm not I'm noticing my eyes aren't straining so much to compare back and forth between the country name and the point that that line lies on. And also then that vertical line for zero is a bit more pronounced because I think that is important to help draw attention to these are the two countries that didn't have any change. And those to the right had an increase of change. Those to the left had a decrease in intention. I'd like to bring back in our legend now to make it clear to our audience what dude points on the left side of the line mean versus those on the right side of the line. You'll recall that way back up here we had this data frame legend and we gave it our X and Y coordinate positions. And so for the Y I'm going to put it at 14. So the country labels are put at one point increments so 1234 up to 14 or up to 15 I guess so if we do 14 there'll be one down from China at the top for X. Let's go ahead and do minus three and three actually looking at the values I have here so decreasing is on the right so let's make that minus three and increasing will make that positive three and then we'll come down we'll make sure this is loaded and then we'll come down and we will take off the comments for our geome text stuff here that we use to build out that legend so those are more or less in the right place we notice that our titles are left justified on the point that we gave it so let's go ahead and instead of h just equals zero I'm going to make this a vector so zero is left justified and one is right justified so we had increasing and then decreasing so increasing needs to be right justified and decreasing needs to be left justified so I like the symmetry of the justification around that vertical line noticing that the increase intention is a little bit to the right of that line a little bit more so than decreasing intention so I'm going to come back up to my legend data frame and instead of three let's make it 2.75 again there's all his poking and prodding to get things lined up right and I think that looks pretty good I think the spacing between decreasing intention and increasing intention is good now the thing I don't like about these labels is that we've got that grid line for Australia running through them so by now you should know the trick that we can use to hide that grid line around or going through our label and that is to use the label instead of geome text and so here we can use geome label of course now we have the line around the label as well as the padding one thing I'm not quite sure of is why there's this extra padding to the right or I guess to the left depending on your justification whatever we'll go ahead and get rid of it all and so we can then do label dot padding 0.2 and we'll do label dot size equals zero and so it's complaining about as unit e2 I'm not sure which one of these it's complaining about let me comment one out and we'll figure out if we got the right one nope it's complaining about label padding and I think geome label wants us to use the unit function so the unit 0.2 and then let's use pt for points and so 0.2 isn't quite what we want let's go ahead and use a two-point pad and then label size we should be okay with that good so that then got rid of our border and used really nice padding that's now symmetric around our text and I think that's not so intrusive of a legend that hopefully this makes it more clear to our audience what things mean on the left side of the vertical line versus the right side of the vertical line I think this is a really attractive way to represent the data um this is what's called a cleveland dot chart or a cleveland dot plot depending on what you want to call it there's also something called a wilkinson dot plot that is more of a histogram it's uh where the dots stack on top of each other and so let's look this that's a wilkinson dot plot this is a cleveland dot plot or cleveland dot chart never quite know when you're supposed to use plot or chart probably doesn't really matter whatever the alternative to this however it would be to use a bar plot so let's see how we can modify this to make it a bar plot well it's really straightforward actually so we can comment out geome point and instead we could do geome call so this is the same data represented as a bar plot again this has a couple problems yes yes the legend is in the way of australia perhaps we could move it down to india or canada but moving it down here highlights another problem with this data um is that there's no bar there for india or canada right you can't have a bar for zero and so i'm left wondering as an audience member did did india or canada participate in the survey was did they not have a change what's going on here right so there's no no marker here to indicate the level of change and so i'm a little bit confused right so that's a problem with the this particular data set and how we can visualize the data the other challenge is something that's called the data to ink ratio that you want more data and less ink here we have a whole lot of ink and when you think ink think about if we were to print this out how much ink would we use right well china is using a ton of ink to represent one number right and so we could replace these rectangles for the bars with that individual point which of course is what we already did with that cleveland dot chart so i'm going to turn it back and that's ultimately why i'd prefer the dot chart to this bar plot the two reasons being how we represent countries with zero and how we maximize our data to ink ratio using the dot chart over the bar plot again i can turn it back very easily and this is just i think the fun of ggplot is that it's easy to do these types of experiments without having to change a whole lot of code so i really like this dot chart representation of our paired data yes the dumbbell chart and the slope plot were nice because we showed the actual values but we forced our audience to mentally calculate the difference to fully appreciate the data right you know perhaps with the slope plot showing that line going downward they could get a sense of the change and we know with the scatter plot if things were below that diagonal line they could also see the decreasing intent but they didn't really get a good sense of the actual change in intent whereas here we're calculating the value and we're plotting that value so there's so many trade-offs and data visualization that every figure has limitations what are the limitations of this representation of the data well first of all china looks really bad here right and yes it had the biggest fall off in terms of representation of people that wanted to receive the vaccine but what is missing from this is that china had more people in october willing to receive the vaccine than the us did right or pretty much any other country except for one or two right so it looks looks bad right the same time france looks like it's kind of in the middle of the pack when we know that france was at the bottom of any country wanting to receive the vaccine also we say well india didn't really change a whole lot well india was kind of up at the top and so maybe there's not that much room to move up right and so you know there's some things you could do to get around that perhaps instead of looking at percentage point change we could look at percent change you know month over month and that would give you a sense of how things are changing one other thing you might think about doing is sizing the point or coloring the point to indicate the percent of people in say october that were willing to receive the vaccine that way you'd have both the absolute value of people willing to receive it as well as the percent change ultimately i don't think that works so well again humans aren't really good at interpreting color on a gradient nor are they in kind of comparing relative area of different symbols so i think this works well and again i would really something i've been harking on is make your title match your figure right and so if we're looking at intent decreasing then show the decrease right and so i think this figure actually does a really nice job of highlighting that decrease in intent please keep practicing with this i'll put a link up here to the playlist that we've been developing over the past couple of months and looking at paired data encourage you to go back and look at the history of this pipeline of this series of videos and how we've built out these different figures i think you'll learn a whole lot by going through these about ggplot2 and other ways of visualizing data