 Hey folks, if we haven't had a chance to meet yet, my name is Patchloss and this is Code Club. We're in the midst of analyzing a data set that was published last fall by the agency Ipsos looking at attitudes towards vaccination. In their initial report that they published, they created a barbell chart or dumbbell chart or whatever you want to call it, that in a previous episode we went ahead and we recreated using all our all ggplot2. That was a lot of fun. I also commented on how I had previously seen this figure, a version of this figure in a newsletter that I get from a group called ChartR, ChartR, I don't know. That is a far more stylized version of the figure and although the name of that group ends in R, they did not make their plots in R as far as I can tell and they certainly have not made their code available or accessible to me or to anyone else that I know of. So my goal over these series of episodes now is to take the version that we made based on the Ipsos version of the plot and convert it using ggplot2 functions and approach to create this version that much more mirrors what is produced by ChartR. In the last episode, I laid out about six different steps that we're going to take over these episodes to convert the Ipsos version of the plot to the ChartR version of the plot. In the last episode, we also went ahead and using ggplot2 with the geome ribbon function, we created that alternating colored background that I thought was pretty attractive as a way to highlight what points go together from the same country. Yes, yes, yes. I know those colors are pretty horrible. They're pretty garish. They're actually the default colors that come to us from ggplot2. Don't stress hard about the colors. I know they look bad. The next episode, we're going to talk about how to match and adjust the colors in the plot to better reflect what is in the ChartR version of the plot. But the topic of today's episode is how do we add those arrows to the chart? The alternating colored background, as I said, is helpful for connecting points together from the same country. We could also get the same kind of effect with the grid lines. And certainly the line between the two points helps doing that as well. But for the life of me, even though there are two colors, I can't keep track of which is August and which is October. I don't know. You'd think my brain could figure that one out. An arrow, on the other hand, makes that crystal clear because an arrow indicates the flow of information, the flow of time, the flow of anything, right? And so we can use an arrow like ChartR did to clearly indicate the flow or the time progression from August of 2020 to October of 2020. That line just makes it crystal clear. But unfortunately, there is not a geom arrow function within ggplot2. I know I'm sure there's a ggplot add-on out there that has something like geom arrow. Don't at me. I want to learn how to use ggplot2, the base package as it is, far better. And it's too hard for me to keep track of the bazillion other ggplot add-ons out there and all their related functions. So I want to do as much as I can here with ggplot2. So in today's episode, we're going to review how you can add lines to a plot. We will talk about how you can add arrowheads to those lines. We'll also talk about how we can alter the attributes of those arrowheads, whether they're filled or open, the angle, the length of those arrows. And then finally, we'll go about adjusting those arrows and pruning the arrows to better match what we see in that ChartR version of the plot. You'll notice perhaps that not all of the countries have arrows, because in some cases, the points are too close to each other. As always, if you want to follow along with me today, which I strongly encourage you to do, down below in the description is a link to a blog post for today's episode, where you can get the code that I'm starting with, as well as the data that we're using. If you want to look at the other videos in this series of episodes, I'll put a link across the top here so you can go back and check out those previous videos. Here we are with my August-October 2020 ChartR.R script. It has all the wonderful code. Let me go ahead and run it so that we can see what the figure looked like that we ended with from the last episode. As I said, the colors are pretty horrible. But again, this is basically the Ipsos version with the colored background added on. The first thing that I'm going to do today, so it's not a distraction, is I'm actually going to turn off the GM ribbon that got us that alternating colored background. Don't worry, I'm not going to delete it because we'll want to add it back in the next episode. So this GM ribbon, I'm going to go ahead and comment out. I'm going to remove this mutate line. So I noticed that removing that GM ribbon gets us a little bit of truncation for France and the total. We can get that back by coming back up to scale Y continuous and I can add limits equals C0.5 to 16.5. There we go. We're back in good shape. What I noticed about the chart our version of this figure is that the arrows don't actually go all the way to the October point, they start at August, and they go midpoint midway to October. So the way I'm going to do this is we're going to leave our baseline of the barbell, the handle of the barbell, if you will, and we'll add on top of that another line, which we will then add the arrowhead to. So to do that, of course, we're going to use another something like GM line to add that line segment to our code. I'm also going to come back up to the top here and I'm going to take data, my data frame that was fed in to ggplot and we're going to modify this to create that midpoint position where we want to start the arrowhead at. So again, data we have the country, the percent August percent October, and then our bumps and Y positions. I don't care about the bumps. I care about the Y position, the percent August and percent October. So we'll start out by doing a mutate and I'll call it midpoint and let's do an average, a simple average between percent August and percent October divided by two. Let's then go ahead and select to get the country, the Y position, percent August and midpoint. And again, that gives us the 15 countries plus the total, our Y position, percent August and the midpoint. So these are the two X positions. So now we need to do a pivot longer to get those to be in a same column to make our tidy data frame. I should also step back and say there's another function that we could use, which is called GM segment, which allows you to give it a X start and X end position. I'm going to do it with the GM line family of functions, but I would certainly encourage you to see if you can take what I'm doing here and modify it using GM segment. It should be pretty close. We'll again do pivot longer and then the columns that will pivot longer will be percent August and midpoint. The names too will I'll say type, so the type of X position and the values to I'll say X. So now we have our country, our Y position, our type and our X position. Excellent. I will go ahead then and call this data frame arrows data. So coming down into our pipeline, we see that we've got their GM line, GM point, GM text. That was what built out the barbells as well as the labels on the barbells. I'm going to put on top of that line. So I'll do GM line. And I will say data equals arrows data, AES X equals X, Y equals Y position. And then again, we want to do group equals Y position. And let's give this some attributes so we can kind of distinguish the color. And so we'll do color equals red. So we'll have a red line on that. We'll also do show dot legend equals false. We'll also do inherit dot AES equals false. And let's see what this all looks like. Wonderful. We now have a red line segment that starts at August. I don't have a legend here. So I'm assuming that's August and goes midway to October. And so this is great. Again, the final color will not be red, but the red helps us to see what the color of the line is or where the line is. So to add those arrowheads to our lines, we can come back up to GM line. And we can then give it the argument arrow equals the arrow function. So what we get are some pretty ugly arrowheads, but it's something to start with. One thing I notice is that with the exception of like India and Canada, where the values are actually the same from August and October, all the arrowheads are pointing to the right. They're pointing in a positive direction. Rather than the direction we actually want them to be pointing. And that's in part because we are using GM line. And GM line does a sort on the x-axis before plotting it, right? So if I look at my arrows data, I see that the order, say if I count for total, right, goes from percent August to midpoint, percent August to midpoint. But again, what GM line is doing is it's sorting it to go from 75 to 77, and then drawing that arrowhead. But again, for something like India, where there's a tie, it doesn't rearrange these because it's already sorted, right? And so that's why the arrowhead on India and Canada is actually to the left. And all the others are to the right. Anyway, how do we fix that? Simple, simple. So I think we've maybe seen this in another episode where we use GM step, right? So GM line, GM step, GM path, they're all from the same family of adding a line to a figure. So instead of GM line, what we're going to use is GM path. So GM path, instead of using the order of the x-axis, use the order that's in your data frame. So let's try this with GM path. And now our arrowheads are pointed in the right direction. They're all pointing towards this blueish point, the point from October. Now, say your points weren't in that order, how could you perhaps get them to be in this order? Well, let me show you how we would do that to get all the arrows pointed towards August. Again, if we look at arrows data, we see that our type column goes percent August midpoint, percent August midpoint, right? So if we wanted to flip that, what we could do is we could come back up to our arrows data, we could then add a range on type. So now our arrowheads are pointed in the opposite direction, because the line is being drawn from midpoint to percent August, because that M in midpoint comes before the P in percent August, right? The alternative way that we had it was already the reverse ordering on the data frame. Anyway, I want the arrows to go in the other direction. So I'll go ahead and remove that arrange our function from my arrows data. So the figure is back to the way we want it. The next thing that I want to take in is altering the attributes of those arrowheads. The arrowheads currently are way too big for the spacing that we have between our countries. If you were to look back at the original chart, our version of the figure, the arrowheads are actually as wide as the height of the individual points. So there's a few things that we can change with these arrowheads. We can change the angle of the arrowhead. We can change the length of the arrow. We can change whether it's open like this or whether it's closed. And there might be one or two other things that we can do. Oh, yeah, we can also change whether we have the arrow at the beginning or at the end of the line or at both ends of the line. If you want to learn more about the arrow function, we can do question mark arrow. And this then shows us that there's actually two help pages for arrow. One is from ggplot, which this basically says we're getting it from the grid package. So we can describe arrows to add to a line. And this shows us the documentation for that arrow function that is being used by ggplot. So again, angle, length, ends, type are the different attributes of the arrows, arrowheads that we can modify. So if we come back up to our geompath now in our arrow function, I could say change the angle to be 45. This might make it a little bit more open. And so now our arrowheads, you can see are much more open. Alternatively, we could come back up and we could do something say like 20 degrees and get a much more narrow span of our arrowhead. So let's stick with 20 for right now. The next thing that we want to do is alter the length of the arrowhead. Again, we can come back up to that arrow function. And we can do length. And we give it the unit function, the unit function where we give it a value and then the unit we want. So as you've noticed, I build everything out in the TIFF file or the output file that I want with predefined dimensions. And so I'm going to use units of inches. And that way, I'm not worried about things resizing and my arrowhead length getting kind of weird. So let's go ahead and do let's do 0.2 inches for the length of the arrowhead. And so that makes them a little bit shorter, but let's go a little bit shorter even still. I think I'll go down to 0.1 inches. And so that that gives it a much shorter, more compact arrowhead. And I think that looks pretty good. Again, we might adjust things before everything is said and done. The next thing that I want to show you is how we can alter the position of the arrowhead on that line. So we can give it the argument ends. And we could say, first, and so first is going to put the arrowhead on the first point, right? And so we can now see that all of our arrowheads point back to August. The default is last. Again, that's what we want. But we could also perhaps want arrowheads on both ends. So we could say both, right? And so now we have arrowheads on both ends. So we'll go back and we'll stick with the default of last, which we don't need to add here. We could also add type and we could say closed. And so closed gives you that closed arrowhead. The default is open, which I think we'll stick with because again, we are trying to recreate what chart R did in their version. Very good. So now we have those open arrowheads. And we have effectively added arrows to our plot. Again, if we look at this chart R version of the figure, we see that the the arrow is the same width as the handle, if you will, on that barbell. We will come back and mess with that later. The color and the sizing and everything in a future episode, I really just want to focus on playing with these arrows and these arrowheads in this episode. The other thing I notice in this version of the figure is that not all of the countries actually have an arrow, right? So India, Canada, South Korea, they don't have an arrow, because the difference between August and October wasn't large enough to actually plop an arrowhead in there. So let's go ahead and see how we might remove those arrowheads. We'll come back up to our arrows data. Let's give us some more room here. And again, if we look at what this is outputting, we see that it is outputting the country, the Y position, the type and the X. And if we think about what this looks like before we do the pivot longer, we again get the country, the Y position, the percent, August and the midpoint. I think what we could do here is we could actually add a filter after our data. And so we could say filter or the absolute value, the difference between percent, August and percent, October is greater than one. So we want to keep those countries where the difference between the two is greater than one percent. Now, we're not getting rid of the data from those countries like India and Canada and South Korea. We're only getting rid of the arrows, right? So this data frame isn't going to have data for where to draw an arrow for those three countries. I'm getting an error message looking back at my arrows data pipeline here. I noticed that my closing parentheses is in outside of the difference. So I went to the absolute value of the difference between August and October to be greater than one. And again, if I now run this, everything looks good. And looking at arrows data, we see that we no longer have India in there or Canada and probably even South Korea. And so now we see that India, South Korea and Canada no longer have those red arrows. The other thing to notice about this chart, our version of the figure is that India and Canada, where the August and October percentages were the same, have the number to the right. And so something else that I noticed with these numbers, if the numbers are not colored according to the date, like they were in the Ipsos version and like they are in our version. So we'd like to have this 87 be on the right side of the point and this 76 to be on the right side of the point. I'm going to use the October value. So it'll be blue in our version. To do that, we're going to leave behind the arrows for just a minute. And we're going to come back up to our data, big data frame, where we were creating these bump positions. So again, if we look at the output of data, we get our countries, our percentages, and then the bump and the bump was kind of the, the amount we were moving the label on the x axis. So I'm going to come back into this mutate statement, these two mutate statements. And I'm going to actually replace them with a case when, because I'm now going to have three different criteria where percent August is less than percent October, or percent October is less than percent August, and where they're the same. And so you can do that with an if else, it gets kind of messy. So let's replace this with a case when. So case when percent August is less than percent October, we then use a tilde. And then the output will be percent August minus two. Again, this gets kind of funky because the line is so long, right that this is the actual syntax. But if we add a carriage return, a line rate break, it indents it a little bit. Then we want to do percent August greater than percent October. And then we will then tilde out percent August plus two. And if percent August is equal to percent October, then I'm going to return an NA real. So if the two values are the same, we'll output the real. Typically what people do instead of using a logical in the final spot of a case when is to replace it with a true value. That way, you know that this statement will always this case when we'll always have a true value. Anyway, like I said, you could put in that double equal sign, but I'm going to replace that with true. So that's what happens to the August, right? So if it's an August and India where, you know, the percentage was the same as with October, I'm not going to draw the number, right? So now we come down to October. What do we do with October? Again, we're going to do a case when percent August less than percent October. Again, this is going to be very similar to what we did up ahead. And then if percent August is greater than percent October, we'll tilt to that. And then if percent August or true, right, if percent August equals percent October, that's the only thing left really, then we want the exposition to be to the right. And so we'll then do percent October plus two. So we're going to take that value and we're going to put it to the right. Good. I think this should work. So let's go ahead and give these a run. And I'm getting all sorts of error messages. Let's see what I did wrong. I forgot a comma here at the end of line 12, getting another error message, all these error messages. Yes, folks, even I generate copious amounts of error messages. And I have true equals. Why did I do that? Should be true till the percent October. Hopefully you're able to find that bug of mine. Excellent. We now have that percentage on the right side of the dot. It is currently blue, the same color as that point. Again, the final chart our version, all the colors of the text are the same for labeling the points. So we're not going to worry about that right now. But we did get the number on the right side. And we're in great shape. One thing that I am noticing is that we're getting warning messages about two rows containing missing values in geome text. And that is unfortunately coming to us from our data, right? And so if we look at data pivot longer as it's being fed into ggplot, we will see of course, that India and Canada will have NA values for those bumps. And so that's what this warning message is telling you is that it removed two rows containing missing data. If I were to run Drop NA on data at this point, well, then I wouldn't get any of the August data for that point. And you know what, that's really not a problem because I still have the October point, right? So let's go ahead and add that Drop NA in here. So we'll do Drop NA and voila, nothing changed. And we no longer get that warning message, right? So we're in good shape. Again, that warning message about Drop NA values is not a big deal. I don't like having those warning messages in there just because it tells me that something's wrong, right? And while we could say, ah, it's a nuisance, I'm going to ignore it. I think it's best to kind of clear out all those warning messages as best as possible, or at least to be really sure that you understand what's causing that warning message. As I kind of look through each of the different countries now in a little bit closer detail, I'm noticing things like Germany in Italy, the arrowhead is included kind of inside of the August point. And so it might be nice to make that arrowhead a little bit bigger still. So again, let's come back up to our geom path. And we can then add angle, let's do 45. And that looks bigger. It's no longer running into that point. One thing that still kind of bugs me about those two is that it seems like the arrowhead is still kind of too close to the point. So what I'd like to do is maybe move the tip of that arrow out a little bit. So instead of being the exact middle, maybe we can move it up to be about two thirds of the way to the October point. Let's come back up to our arrow's data data frame. And what we'll do here in my midpoint calculation is that I'll wait the percent October twofold, and then I'll divide everything by three. And so now the arrowheads move over quite a bit. If I come back to my chart our version, you know, maybe I will go back and make the arrow a little bit more shallow in its kind of spread. Maybe we'll go back to 20. Again, as you futz with things and kind of move the position of things, you need to sometimes go back and change things back. Let's go back to 30. I think 30 is actually the default, the starting point, right? So let's try 30. I think this looks a little bit better after all that futzing with, you know, the angle on our arrowhead. Hopefully you feel a little bit more comfortable now with adding arrows to your lines and thinking about how you can do that, how you can change the position of the arrowhead on that line, whether to put on one end or the other or both ends, how to change the angle, the length of that arrowhead, and then even how to kind of remove arrowheads from certain values in your dataset, just so we can kind of see something that looks horrible, even worse. Let's go ahead and add back on that GM ribbon. Oh, that looks really bad, doesn't it? Anyway, what we do know now is that everything works together except for the color. So make sure that you are subscribed to this channel, the Riff Amonus channel that you've clicked that bell icon, so you know when the next episode is dropped. So you're not left dangling thinking, does Pat actually think this is a good looking figure? No, I do not. We will come back in the next episode and we'll talk about how we can match the colors in the original chart R version of the figure and then how we can incorporate those into this version of the figure to make it look a lot closer to what we see in this chart R version. I really hope that you're practicing with these data, you're getting the data, you're getting the code, and you're trying things out. I hope you're even trying to do things differently. As I mentioned, I'm pretty sure you can do the same thing I did today with that arrows data data frame, but without the pivot longer and using GM segment. So I'll leave that for you to do as homework, and we'll see you next time for another episode of Code Club.