 When our audience looks at our figure for the very first time, one of the first things that they notice is the relative position of different data points on the X and Y axis. And so what data that we represent on that X and Y axis is critically important. In recent episodes of Code Club, I've been building out a series of versions of a slow plot, where on the X axis we put categorical data, two different months say, and on the Y axis, continuous data say the percent of people willing to receive a future COVID-19 vaccine. Today, what I want to experiment with is instead of putting those categorical or discrete months on the X axis, what happens if we put one response, so say responses in August of 2020 of whether or not people would receive a future vaccine, and on the Y axis put another continuous variable, say people's willingness to receive the COVID-19 vaccine as they reported it in October, two months after their initial being surveyed. So that's exactly what we're going to do today. Now, because a scatter plot is a little bit boring, we're going to kind of spice it up a bit, and we're going to put country labels next to the data so that we can see what country each point corresponds to. Let's dig into our studio and we'll get going on creating a labeled scatter plot in this episode of Code Club. You'll notice in my Git tab that I've already gone back into my commit history and created a special branch off of my highlight branch called labeled scatter. This is taking a commit from after we added the labels and cleaned up our X and Y axes a bit. You'll notice that I've also gone ahead and renamed my R script and TIFF file to remove the underscore slope because we're making scatter plots rather than slope plots, and I guess I couldn't call it underscore scatter, but I want to try to keep things as clean as possible. So we're loading three libraries right off the bat, tidyverse showtext and ggtext. Tidyverse gets us things like Deplier and ggplot2. Showtext allows me to customize the fonts that I'm using in my figure. I've really grown to like seeing patril1 for the title and Montserrat for all the other text labeling in the figure. Ggtext allows us to also incorporate markdown or HTML and CSS into the text of our figure. We'll maybe talk about that later. And then we read in the data, we do some cleaning up, making it tidy. And then here in the data chunk where we're generating the figure, we then output the slope plot using geomline. So one of the first things we'll do is go ahead and remove that. And then we output it as a TIFF file. And so here again, I need to remove that underscore slope. Again, this is the figure that we're starting out with. One of the challenges that we've had is how do you represent two time points from 15 different countries? And as I said, with this slope plot, we're going to convert this to a scatter plot so that we'll have August on the x-axis and October on the y-axis. And then each country will be a different point. If we look at data, the data frame that's being fed into the pipeline to build out the plot, you'll notice that it's tidy for the purposes of building a slope plot, right? Where we have country representing each different line and each color, month being the position on the x-axis and percent being the position on the y-axis. Now, in contrast, what I'd rather have is a column for August and a column for October, rather than separate columns for month and percent. And I can achieve that by running everything but the pivot longer line. So now if I remove that pivot longer line, you'll notice I have a column for country, August and October. And so I could easily map August to x, October to y, and then country or whatever to color or some other variable. So I'm going to go ahead now and remove that pivot longer and we'll load data and confirm that data looks the way we expect it to. So that's great. Turning to my ggplot pipeline, I now need to make a series of modifications to turn my slope plot into a scatter plot. So x, I'm going to put August, y, I'll put October. And I don't need the group because I'm not using lines. I'll go ahead and leave the color mapping to country, country to color. And I need to change the gmline now from line to gmpoint. So I get that scatter plot. I also then need to change my x and y access labels. So I need to say percent willing to receive the vaccine in, and again this is y, so it's going to be October 2020. And I'm going to use the same title for my x and this will be August. And because it's a really long title, I'm going to go ahead and put in a line break between receive and vaccine using the br tag from html. And so what that reminds me then is that down here in my theming, I need to go ahead and do access dot title dot x equals element markdown and access dot title dot y equals element markdown. So this looks good as a start, but one of the things that I'm struggling with is trying to come up with context for where the points lie. And I'm noticing that the y limits are a bit different than the x limits, right? And so, you know, 60-60 is above the diagonal that I might intuitively draw across this plotting panel. And so what I'd like to do is have the x limits and the y limits be the same. And so then we could also begin to think about drawing a diagonal line so that we know if points are below that line, then the intention has dropped between August and October. And if it's above that line, then intention has increased. To start improving the appearance of this figure, I'm going to go ahead and remove the color equals country. That way, all of my points will be black. Also to get my x and y axes to have the same range, I'm going to use chord Cartesian. And we'll do x limb from 50 to 100. And y limb from 50 to 100 as well. So we no longer have the legend because all of our points are the same color. Our x and y axes go from 50 to 100. But one of the things you'll notice is that this plotting window is not square. And I can kind of see this in my preview software here, where if I highlight the whole thing with a rectangular selection, that the x direction is about 1278 pixels, and the y is 935. I would like this to be say 935 by 935. So it's square. And that way then the distance between 50 and 100 and 50 and 100 for the y axis and x axis would be the same. I can achieve that by modifying my chord Cartesian to instead be chord fixed. And what chord fixed does is it allows you to basically set the aspect ratio of your x and y axes. The default is an aspect ratio of one. So now we see that our plotting window is square like we wanted. But unfortunately, the formatting of our title got kind of screwed up, so that we don't have enough margin between globally and the plotting window. And the top of the title, the first the top half of the first line is getting truncated. So we'll go ahead and add some margin. We might also make the font a little bit bigger or smaller. So I'm going to go ahead and make the bottom 20, the top let's do 10. And maybe I'll make the size of my font 25. Before we start digging around and playing with the labeling of our points, I want to go ahead and make our plotting window a little bit cleaner. I want to put in a line for the y axis and the x axis, probably want to go ahead and remove those tick marks. I'm going to get rid of that grid lined background. Again, we can do all that here in the theme section of our figure. And we'll do panel dot background equals element rect, fill equals white. So that's an hexadecimal six f's. We'll also do axis dot ticks equals element blank. And then axis dot line, element line. So I think this looks a lot simpler and attractive than what we had before with the default theme, which I think the default theme is theme gray. I want to go ahead and put in a diagonal line from 50 50 up to 100, so that my audience can get a better sense of if the points are below that diagonal or above the diagonal, I can do that by coming way back up to the top of my gg plot chunk. And actually before geom point, I'm going to add geom ab line. And geom ab line draws a line with a slope and an intercept. So my slope will be one intercept equals zero. And then color, I'm going to make it a light gray, probably with six a is here. I want it to be light gray. And I also want it to be a thin line. So I'll do size equals 0.25, because I don't want it to detract from the points that are there. I want the points to really stick out. And now we can see that we have this thin gray line. And so hopefully the audience can see that if it's below the line, then it's decreasing intention. And if it's above the line up into the left, then those are countries that are increasing in their intention to receive the vaccine. You know what, I might want to go ahead and add a little legend in here to indicate that if you're above and to the left of the line, you're increasing. And if you're below and to the right of the line, then you're decreasing an intention. How can we do that? Well, we can do that by creating a separate data frame that I will call legend. And this will be a table where I'll have x coordinates, a vector of x positions, a vector of y positions. And then I'll have my label, and then I'll have two labels, which will be increasing intention. And I'll also have decreasing intention. And I'll go ahead and put a line break in here. I'll use the backslash n as another way to demonstrate how you can add line breaks. So for my x and y positions, for the x position for increasing, again, that's to the top and to the right. So for my x position for increasing intention, I'll put 90. The y position I'll put as 100. And then my x position for decreasing intention, I'll put that as 100. And then we'll do 90 for the y position there. We'll go ahead and load that as legend. And then down here at the bottom, I'm going to go ahead and add a geom text. So we'll say data equals legend. And we'll also then have mapping equals as x will equal x, y will equal y label will equal label. And we will also then do inherit.as equals false. So those labels are there. But they've got issues, right? So we need to decrease the size of the label, decrease the line height, perhaps make them left justified and go ahead and make them the same light gray color as our line. So for my color, I can go ahead and add in that hexadecimal for the line color, that same light gray, my h just, I'll do zero for being left justified. And my line height, I can make one. And my size, I can make let's start with three. I'm noticing my decreasing intention is getting chopped off by the right side of the plot. It's also down a little too far. So I think I want to bring that up. And I'm going to turn off the clipping in our chord fixed, because that clipping is removing the right half of that label. So again, we can come back up here to legend, and the y position for decreasing should be 95. And then chord fixed, we'll go ahead and do clip equals off. So I think our legend does a nice job of clarifying for our audience, what points on either side of the line represent, you know, perhaps it's not quite as easy to see as those slope plots. But you know, it's another way of looking at the data. The next thing that I want to do is I want to go ahead and add labels to each of these 15 points to indicate what country they represent. Now these points are kind of on top of each other. So the typical GM label, I suspect is going to look like a bit of a mess. But we'll start there. And then we'll see if we can't do better. I'm going to come right back up to GM point. And after GM point, I'll go ahead and add GM label. And I will add to my GG plot, I'll say label equals country. So yeah, that's a bit of a disaster. What I want to do is I want to move those labels off of the points and have an arrow or a line pointing back to the data. And if you've been watching previous episodes, you know that we learned about a really cool function called GM label repel. GM label repel will move the labels away from the data and away from each other so that we can then have lines pointing back to the data so that we can clearly label what each point represents. To use GM label repel, we need to add another library, which is library GG repel. So this isn't perfect, but it's looking better than what we could do with GM label. One thing that's not immediately clear is say these two points here, which is for Italy, which is for USA, the Canada is kind of on top of the point. We need to clean this up a bit. Okay. So the first thing that I want to do is I want to come back up to a GM label repel, and we're going to start adding some arguments to make it look more attractive. The first that I'm going to do is min segment length, and I'll set it equal to zero. And so that we'll put a segment length between every label and every point. The other argument I'm going to add is max overlaps. And I will say INF for infinite so that I don't want to lose any of my labels if the labels happened overlap with each other. So now we have a much more clear indication of what point corresponds to each country in here in the middle between kind of like UK, Japan, Canada, it's a little bit jumbled. So it would be nice if those were pulled apart a little bit. And so one thing I might do is decrease the size of the font. I'm up here in GM label repel. And again, because I'm going to start getting a lot of arguments, I'm going to break these apart on different lines. So I'll start with say size equals three, also do family equals Montserrat, to make sure I've got my special font there. So that definitely made the size of the country labels a bit smaller. We still have overlapping labels here between Japan, Canada and the UK, which I'm not a big fan of. One thing that we could think about doing is going back to GM text repel. But I kind of like having the white background. We've seen this before. Because what if one of these labels say like India here falls on the line, I don't want that line to run through India. I want it to, you know, I want the India to have the precedence over the line and have a white background. So I think what I'll do is I'll go ahead and remove the border around the name. I'll make the padding smaller. And we'll go from there. So again, up here in GM label repel, and we'll do label dot size. So that's the size of the border. I'll make that zero. And then label dot padding. I'll make that. Let's do that with say zero as well. And then label dot R, which is the radius. This doesn't really matter. I'll make that zero as well. And so now what we see is that we have much clear labeling of our points. And you can definitely see that that line no longer goes through India, right? And so that's nice. But what I notice is that the line goes kind of right into the letter. And so I would like to have a little bit of padding, I think, between the name and the start of the line. So maybe instead of zero for padding, let's do 0.1. So I think this looks a lot cleaner without the border for our individual labels. And it's clear now what country corresponds to each of the points. And I think this of all the options that we've played with, I think this really does a nice job of connecting each country with the actual data in a fairly clean and presentable manner. I think this looks pretty good. There might be a little tweaks here and there that we might play with to try to make it look a little bit nicer. But I'm pretty happy with this. One final thing that I'm going to do just because it annoys me is that I forgot to change the font family for increasing intention and decreasing intention. Again, we can come back up here into geome text and do family equals Montserrat. So again, I feel like this is a pretty attractive version of our data. You'll notice it's all black and white. I don't feel the need to add any color here. And I think, you know, we could add color to perhaps say increasing intention and label label those red and decreasing blue again. But I feel like, you know, it's pretty clean. And I think it's pretty simple and attractive and conveys the message. Now, is this better than a slope plot or a dumbbell chart? I don't know. You tell me. I feel like we're at kind of a happy middle ground in terms of number of points and countries. And that if we had many more, say 100 countries, then this would get really busy. And then we'd probably want to go back to some of those ideas from the highlighting episode where we'd really want to focus on individual countries, perhaps we might have 100 countries. And then we'd want to highlight, you know, the, you know, most wealthy or the most poor countries to kind of make some story about those countries and whether they're intending to receive the COVID vaccine, right? So again, the appropriate visual really depends on the data on the question on the story that you're trying to tell. Anyway, I hope you keep practicing with this data. And by all means, if you're interested in how you can be following along using the branching strategy that I'm using, feel free to check out this episode that I've got linked over here. And we'll see you next time for another episode of Code Club.