 Hey friends, I'm Pat Schloss and this is Code Club. If you've been following along in recent episodes, you know that I've been working with a dataset that was generated and published by Ipsos, looking at people's attitudes towards receiving the COVID-19 vaccine. These data were collected in August and October of 2020, so the data are a little bit old, seeing as that the vaccine wasn't available then but is now. It's a little bit interesting to kind of think about how attitudes are changing. But what's really interesting to me are the visuals that Ipsos made as well as another group called Chart R generated and that I saw in one of their newsletters to represent these data, looking at attitudes towards the COVID-19 vaccine across 15 different countries. To this point in our progress, we have taken that Ipsos dumbbell chart and modified it to look a lot more like the Chart R version. As you can see, we've stepped through a lot of different things in recent episodes and we are at a point now where we would like to go ahead and put a legend on this figure. As we've been going along, we've been saying show legend equals false, don't show us the legend, but now we're ready to show the stylized legend that the Chart R version of the figure has. And we are going to go ahead and finish off creating this figure to make it look as close as I can without wasting all my time in trying to make it look like the original Chart R version. Over here in our studio, I have the code that we are going to be working with. If you want this starting code as well as the data down below in the description is a link to a blog post for today's episode. And at the bottom of that are the data. And then also in there, you'll see the starting code as well as the final code that we finish with. Across the top here, I'll put a link to the playlist for all of these different episodes. Again, this code generates this figure. I also have the Chart R version here, which you can see is a little bit different, but that we're getting close to this version of the chart. The first thing I'm going to do is go ahead and change the layout that I have from being more of a landscape rectangular format to being the square format of the Chart R version. As I've talked about in the past, my preference is to specify the width and the height and the file format of any figures that I'm generating. This keeps things reproducible. Alternatively, you can use our studio to manually click to export figures. It's a little bit klugey and isn't altogether reproducible because you still have to put in the dimensions. And it's just not as easy, I find, as manually specifying it like we are here. So I will go ahead and do width equals five, height equals five. So now we can see that our plot is square. I think it's nice to get the dimensions right, because again, things are going to be moving around. And so if we have the dimensions right, then we don't have to worry about things moving around. Again, that's another great reason to be developing your figure in the file format and the dimensions that you want it to finally be at the end. And so that's why if you're thinking about publishing papers in a journal, it's really smart to look at the instructions to authors early. So you know what the dimensions are that they're looking for for figures in papers that they publish. The next thing I want to take on is abbreviating some of these country names. You'll notice things like South Korea, United Kingdom, South Africa in United States. In the Chart R version, they have S Korea, UK as Africa and USA. Let's go ahead and change those. You'll recall that this data data frame has in the first column the country name, as well as then all the other data that we're plotting. What I'm going to do is I am going to use a function called a recode. And so we will pipe the output of data to this point to a mutate, we will mutate country, and we'll say recode country. And what recode allows us to do is to take a column. And if you see one value, you can substitute for something else, right? So for example, you might have an M in your data that you would rather be male and F to be female, right? So here we're going to say if you see South Korea, change that to S period Korea. This is kind of like the rename function that we've used in previous episodes. The main difference, though, is that the order of the arguments is different. Let me show you what I mean. We can say South Korea equals S period Korea. And if we look at data, we now see that South Korea has been changed to S period Korea, I get these mixed up all the time the order what goes on left, what goes right of the equal sign. So for example, I might say S period Africa equals South Africa, it doesn't give you an error message. But if you look at the output, you'll see that South Africa is still written out longhand, not abbreviated. And so that's again, a signal to you that you got the order at that equal sign flipped. So I will go ahead and grab South Africa, make that equal to S Africa, and get rid of that extra equal sign. And I'll also go ahead and put in United Kingdom equals UK. And then United States. And that needs to be in quotes, equals USA. And now if I look at data, I've got those country names abbreviated. The next thing that I want to take on is getting rid of this total row from my figure, you'll notice the chart our version doesn't have the total the ipsos version did. So we'll go ahead and get rid of the total. What do you think we're going to use to get rid of the total row from this data frame, right? So you'll see that here right up here on row one. Well, if you said filter, you're correct. So we will pipe this to filter. And then we'll say country, not equal to total. Right? So now if we run everything else, we see that total now is gone. But there's a couple things you might notice that is a little bit weird, right? So in these strips that we made before using GM ribbon, we actually created three classes A, B and C for three different colors. So the two alternating colors here before USA and France, but then also total was the same color as the background. That now is the same as the color that India has, right? So I need to modify my, my labeling for those different color groups. So that India is this lighter shade of blue that South Korea has. So I keep that alternating up here then where total had been, there's nothing being plotted there as a GM ribbon. And instead what we are, I think seeing are the grid lines. So we'll need to go ahead and turn off the grid lines. So let's start with the grid lines. And I will do panel dot grid equals element blank. Wonderful. Those went away. We're in good shape. Now we need to come back up to where we assigned the groupings for our ribbons. And I think we had that up here in strip data, where we had fill being C and then repetition of A and B. So we'll get rid of the C, we'll repeat A and B, and we'll do that in row times. So we don't want the minus one. That looks great. We've gotten the India color back to being like South Korea's. And then this top row where we had top total is now the same color as the background. One itty bitty detail that I notice that maybe you also notice is that the top of the line no longer has an axis tick mark, whereas over in this chart R version, it's there. So to add that little tick mark, we need to come back up to scale Y continuous. And effectively we have lost again that top mark. So looking at the breaks we have here in scale Y continuous, you'll notice that we have a break or place that we want a mark of some type at all of our Y positions for the 15 different countries. We also have 0.5. So that's the tick line below France. And then data Y position plus 0.5 takes the points where all the country names are and adds 0.5. Well, we need to add 1.5 to get the tick mark that would have been above the country. And then down here what we see is that we're putting the country name at these Y positions. And then wherever we have something on a half increment, we're adding a quote mark because we don't want actually a name, we only want the tick mark. So we'll go ahead and do length, data, dollar sign Y position. And then add 1.5 to that. And so if we look at the output of this, we see that it's got 32 values in that vector. If we look at our labels, we want to make sure our labels and our breaks are the same length. So here we've got 31. So I need to up this to be plus two. So we have, we're repeating this length. So 15, 16, 17 times, right? So now if we look at the output of this, it should have 32 values. And I guess I need another parentheses here. So then we need to come back down to our axis ticks Y. And we're repeating na, n row times, so that 15 times. And again, that's because we don't want a tick mark where we have the countries. And then we're repeating a dark, gray, gray color, n row data plus one times currently. So that's like 15, 16 times, we actually want it 17 times. So the plus two, there's our pesky little tick above where total used to be. And so we're now ready to move on the next step of talking about placing our legends. So if you look at my different geomes, you'll notice that they all have show legend equals false in them. Because I turned off all the legends, because in the Ipsos version where we started, they didn't have an explicit legend. They had text boxes kind of pointing to points indicating which color is from August and which color the data are from October. The chart R version, though, does have a legend at the top of the panel of the plot. So I want to bring back the legend for a geome point. So I'll turn off that show legend equals false, I could delete it, or I could go ahead and say true. So what you'll notice here is now we have the kind of characteristic placement of a legend from GG plot off in the right hand margin. What's interesting is that we're also getting a legend for some reason that I think is for the fill. And so when I created those geome ribbons, the two groups I made were A and B. And so that's showing up there. So now we want to get rid of that. So it's interesting to me because with geome ribbon, we say show legend equals false, but it's still showing up. What we can do is I'm going to come down to the bottom of my code chunk for building out the figure. And I will do guides. And I'll do fill equals none. So that gets rid of the legend or the guide, if you will, for the fill color. Again, what I did was I used guides with the argument fill equals none. You could use that with other aesthetics where you are getting a legend that you don't necessarily want. So I want to show you a few more things about working with legends. And I'm working here in the arguments for my theme function. You can see there's already quite a few arguments in here. And don't worry, we're going to get a lot more. I will add legend dot position. And we can use words, right? So we could say bottom, and that will put my legend across the bottom. I could say top, that will put my legend on the top. I think the default is right. Again, that is the default location. Alternatively, you can also give a C function, so a vector of two values. So the X position and the Y position. So I might do something like 0.5 for my X and 1.0 for my Y to get it in the middle and up at the top. Again, I'm now getting my legend in the middle. And at the top, what you notice different here between using the legend position top is that with top it put it horizontal layout where it's still a vertical layout here. So how do we get that back? Well, of course, there's another argument for our theme function to do that where we can do legend dot direction equals horizontal. Again, the default was vertical. And so now we have a horizontal layout of our legend looking good, we're getting closer to having something along the lines of what the chart our version does. I'm going to go ahead and add this text if a vaccine for COVID-19 blah, blah, blah, as the title for my legend. So I'm going to come back up to my scale color manual and add that text. And so instead of null, again, this looks great. It's looking more and more like what we have for the chart our version. One thing I'm noticing is that the font of the title is a little bit larger than what I see in the legend title for the chart our version. So again, anytime you want to change the way the text looks in a non data element, then there's probably an argument for that. And in this case, it's going to be legend dot title. And for changing text, we're going to give it the function element text. And of course, we've seen that elsewhere, I'll do size equals nine. And so that makes the font a little bit smaller and looks pretty good. One other thing I noticed about the chart our version is just a little bit of spacing between the two lines of the legend title, we can do the same thing and are by giving that legend title element text function, a argument called line height. And let's do 1.3 and see how that looks. So yeah, that gets us a little bit more spacing between those two lines of our legend title. One thing you may be also noticed is that the white rectangle around our legend isn't overlapping with the y axis. Now it's everything has been kind of brought in a bit. And again, that's because the legend is centered at 0.51, basically where the crosshairs are here on my mouse. It's a little bit easier to work with the legend when we are justified to the left rather than to the center I find. So we'll do legend justification equals left. So this is left justifying it to 0.5 and one, right? Alternatively, we could say right. And now it's right justified to 0.5 and one. So let's go ahead and do zero and one and let's make it left justified. So it's hugging that y axis. Now we have our legend hugging up against that y axis. And then we can kind of work with our formatting. If I want to get things more spread apart, it's easier when we're anchored to the left side rather than being on the center. The next thing that I noticed about my legend is that the background is white. And I'd really like it to blend in with the rest of the background here. Coming back to my theme function, I'm going to go ahead and add a line for legend background. And I'll say element blank. And this element blank basically gets rid of all the formatting and will make that background transparent. There we go. We now have a legend that blends in with the background, much like the chart r version of the legend does as well. So I'm happy with that. One thing I do notice is that the points for my legend still have this gray square around them. So I'd like to go ahead and get rid of that to get rid of those gray squares that can come back up here into my theme function and do legend dot key equals element blank. Again, that takes that square, it's a rectangle and removes all of the formatting. And so now we no longer see that gray square around the point. I'm actually going to turn that back on for a moment though. So I'll comment out this line, because it's helpful to see this to see the formatting of the legend. And so one thing that bugs me a little bit is that there's a space here between the right side of this blue point, and the right side of the square. And then there's another margin between the text and that square. I'd like to bring this in a little bit more to make it look a little bit more compact, a little bit more like we see over here, where there's kind of a shorter space between the point and the Oh, and a bigger space between like the zero and the 20, and that big circle. So I can do legend dot key dot width. And I will then give this a function called unit. So I'll do three and I'll do PT in quotes. And so this is a three point legend with key width. So that gets it a bit more compact, you can see the gray rectangle now around my individual points. Before we go too much further in kind of formatting things, I want to go ahead and get this looking much more like what we have here for the chart our version. So the first thing I noticed is that they have October on the left, in August on the right, they also have the tick 20 to indicate the year. Let's go ahead and change both of those things. Let's come back then up to scale color manual. And we can basically flip the order of these arguments. So I will do that quickly here. Great. So that's all flipped. And so now I have October and August like they do in the chart our version. The next thing that I want to do is add those 20s for the year, and do the same thing for August. And very good, we now have October 20, August 20, in the same order they do, and we've got the colors matched, we're winning. Okay, one other thing to notice is that the color of the month and year corresponds to the color of the point. We can come back up to scale color manual, and we will use some of our awesome element markdown functionality from gg text to color the text specifically. So again, we've seen this in previous episodes, but we can do an HTML tag of span. And we can then say style equals and then single quote color colon, and then back out of that, and we will put this hexadecimal in here. And then we can back out the span. That needs to be inside my double quote. And then I will do the same thing, but for the August 20, so I will copy this down here, and then put August. And then instead of this hexadecimal, I want this, then to get this to actually work to get the HTML to render properly, what we need to do is come down, we'll do legend dot text equals element markdown. And now we've got our coloring of our points in the legend. Let's go ahead for now and let's go ahead and turn back on that legend key element blank so that we're not looking at that gray rectangle anymore. One little thing that I'm noticing about the legend that we don't have in ours is that the points here are bigger than the points of the data. And they're also bigger than the font of the text. Our points are actually smaller than the text. So the point size is driven by the size aesthetic, right? And so that's why our points are the same size as they are in the plot. This requires a little bit of an advanced move to scale color manual. So I'll come into my scale color manual, and I'll do guide equals guide legend. And I will then give an argument, which is override AES equals, and I'll say list size equals, let's do three. What this is doing is saying we want to override the AES argument. And so if you look at our geome point, we have size equals two. So that's the size it's using to create the guide the legend for scale color manual. So I want to override this. And so I can then give it a list of size equals three. All right. So this then will make that point in the legend of size three. So that gets us a bigger circle, bigger point, let's see if we can make it a little bit larger still. And I'll go ahead and make that four. That looks great. We do now have that nice plump dot next to the October 20. I would like to get a little bit more space between that zero from the October 20 and the blue dot. So I will come back to legend text. And I'm going to add the margin argument. So the margin argument is something that we haven't worked with a whole lot. But it takes a function. And the function then as you can see in my yellow rectangle here, a value for the top margin, the right, the bottom and the left. And the default is in units of points. And so what I would like to do for my legend text is add a margin to the right. And so I'll say r equals and let's do 10 as a starting place and see where we go from there. And then that gives us a nice spacing now between our October 20 and our August 20. And so I like that pretty well. For once in my life, I picked the right number to get going with. So this discussion of legends hopefully gives you a better sense of how you can place the legend, how you can format the legend. As we saw how you can override some of the aesthetics of the points in the legend. And again, depending on the type of legend, there might be other things that you want to override. I've done this in the past in figures, where I perhaps have multiple colors and multiple shapes. So I might have three colors and three shapes. And so I might have the colors be one shape, but I don't want to necessarily be the same shape as one of the three shapes that I'm using. And I don't want those three shapes to be the color of the colors I'm using. Does that make sense? Right. So for the three shapes, I might make those gray. And my colors might be like, I don't know, not really, but like blue, green, red, right? Those types of colors. And so again, by overriding the aesthetic for those guides, you can make it more clear to your audience what aspects of the data you're putting into the legend. Hopefully that didn't confuse things, but again, it's a really powerful tool to formatting the appearance of your legend. So I'm looking at where we are in the progress of today's episode and where we still have to go. And we need to hurry up because what I'd like to do is to make the complete transformation from what we currently have to basically what the chart R version looks like. What we're going to move on to now talk about is more of the formatting and using the different arguments in the theme function to achieve the appearance that we see already in this chart R version. I'm going to start by modifying my caption. And so we set the caption up here in our labels. I'm going to just make that source ipsos. And I'll also want to make my X axis title, which we have reset the font and the face. I'll go ahead also and make the size equals 25. So that makes that chart R nice and large. I'm kind of comparable to what we have for the chart R version. One thing that you'll notice about the chart R version is that the the source ipsos what we're using as the caption is kind of at the same level as the X axis label. We can do the same thing by coming back up to our plot caption. So I'll go ahead and put in here our margin argument, which we just recently talked about. And I will do margin and I will do T. You might be saying T like why do you want to set the top margin and by default it's zero? Well, we can set a negative margin to move things up or further left or further right or further down. So let's go ahead and try minus 10. So that brought up the source ipsos to a nice position to kind of be at the same bottom edge as the chart R. Maybe you could go up a little bit more, but I'm not going to worry about it. The next thing I want to turn my attention to is the size of the font that we have for our title. So if we come up to plot title, we could go ahead and here and say let's do size equals 30. So that's a little bit too big. Let's go ahead and change that size to 25. And so you can see it's not quite all the way across the figure. So I can't help myself. Let's go ahead and make that size 26. And so now the whole title spans the entire width of the figure. And that looks really nice. Let's go ahead now and modify the margin around the title. You'll see there's a top margin and a bottom margin. You'll see that we have that bottom margin of 20. Let's go ahead and make a top margin of 15. And I think I'd like a little bit more spacing below ours. So again, I'll come back up to our title and maybe we'll make that 25. And there we go. We've got pretty good spacing that more or less parallels what they had in the chart R version. So one subtle point that kind of annoys me about our figure is that our percent sign for the 100% is getting truncated over here in the chart R version. There's clearly more room for that. There's also a little bit of room on the left axis. For some spacing between the edge of the file, the figure, and the start of the country names. Again, we can clean that up by doing plot dot margin equals margin. And I'll do L equals 5, R equals 15. Again, you can play with these settings to kind of dial them into the right configuration you like. So you'll notice we get a good amount of space to the right of our percent sign, perhaps not as much left hand space on our version compared to the chart R version. But we'll call this good enough. Another styling point that I want to look at is that their country names seem to be bolded, whereas ours are normal font. Again, we can come back up to our axis text Y, which we don't have yet. So I'll go ahead and do axis dot text dot Y equals element text, of course, and we'll say face equals bold. So now we've got our bolded country names, they really pop out, I feel a lot better than with the lighter font. One thing I'm noticing is that our fonts for the title and those Y axis labels and then the chart R are a little bit heavier than what they have in their version. I suspect they're doing some CSS styling with their HTML to kind of dial in, you know, the exact weight that they have for their font. So as we come into the homestretch here, a couple of little things I want to tweak their points are a little bit bigger than our points. And so their text is also a little bit bigger. And let's go ahead and make ours June point three, and our text three. Yeah, now those points really pop out there. They're a lot bigger and they're a lot more juicy. I don't know why I like that word today. But yeah, I like that a little bit bigger. Two little things I'm noticing now is that our text that the percents are a little too close to the point. So maybe want to bump them a little bit further out. And then also for things places like Germany and Italy, the arrowheads are a little bit too squished. I'm going to come back up here to my mutate, whereas doing the bump August and bump October, and maybe I'll bump it by 2.5 instead of just two. And for my angle on the arrow, we'll come back down to our geom path. And we had angle 30, and let's do 45. If you remember that episode, I went back and forth between all these angles many times. So I think that looks a lot better. Those numbers now are a little close to the point. But I think they look pretty good. And we also then have those arrows being a little bit more open. So that for like Italy and Germany, we can actually see the arrowhead more clearly and looks a lot nicer. One final thing that I'm noticing is that the percent sign on the 97 now is getting clipped. And that's because that part of the geome lies outside of the plotting window. We can very easily clean that up by adding to the end here a chord Cartesian. And there's a variety of these chord functions, we're working Cartesian space, so we'll use chord Cartesian, and we can give it clip equals off. So the default is on so that it clips all the data to only what is seen in the screen. So we'll do clip equals off. And there is our percent sign for our 97%. I am ecstatic with how this worked out and how well it looks. There's a couple little things that an eagle eyed observer might notice as being a bit different between my version and the chart our version. I'm going to leave those for you to play with. And to see if you can't figure out how to improve my version, even just a little bit more. I really encourage you to play with this, to tweak this, see if you can make some styling improvements again to match theirs better, or to match your own aesthetics all the more. More importantly, I strongly encourage you to be not afraid of those arguments in the theme function and to play with them on your own figures. You will only learn how the theme function works by experimenting with it. My deep dark secret is over the last year I've been making these videos. Before a year ago, I hardly ever touched the theme function without a lot of Google searches behind me. But I've used as you've been watching these videos, you know, I use the theme function a lot now. And that's because I'm teaching it to you and because I'm practicing with it, and I'm getting better with the theme function. So I strongly encourage you to also experiment with the theme function. Also, play with your legend. You don't have to leave it over there in the right hand margin, just kind of languishing away in default land, right? We've seen a couple creative different ways of playing with the legend, whether it was the Ipsos version where they had text boxes pointing to the August and October data, or this situation where they have the legend at the top, which I find to be really compelling as a way of thinking about reading this figure, you see the question, you see, you know, the color scheme, and then you see the data. I'm not having to look all the way over to the right side. Play with this with your own visualizations. Don't be afraid of the theme function. Next time when we come back, we're going to talk a little bit about storytelling, and we're going to start to critique this version of the figure as well as the Ipsos version of the figure, but see what we like, see what we don't like, and make a plan for making it better. Keep practicing, and we'll see you next time for another episode of Code Club.