 Hey folks, I'm Pat Schloss and this is Code Club. If you had a chance to watch the last episode, which I strongly encourage you to do, and I'll put a link across the top here for you to go check that out. What I did was I took data that was generated and provided by an agency called Ipsos and recreated a figure that they had in their report indicating the willingness of people in 15 different countries to receive the COVID-19 vaccine. Those data are now quite dated, so their initial time point in that study was of August of 2020 and the second data point was of October of 2020. So although the data are dated, I think there's still a lot to learn. And if you watched that video, even though it was quite long, I think you'll find that there was a lot of content in there jam-packed in that. So I have a little confession to make. I didn't actually learn about that data through Ipsos directly. Rather, I received a newsletter that comes to me once or twice a week, I don't really remember, from a group called ChartR, or ChartTR. I'm not quite sure, and I'm not really sure how I even got subscribed to it. They provided a figure with the same exact data that got me thinking about, you know, how we could implement a lot of the design choices in the ChartR implementation of the figure using R. So although it's called ChartR, don't let that fool you. They do not use R as far as I can tell to make their visuals. As you'll notice, the ChartR version of the figure is far more stylized than the Ipsos version. I'm not sure I'm a complete fan of everything going on in the ChartR version. At the same time, I did get my head kind of spinning and thinking, you know, how would I implement that particular element of this figure in R? Because it wasn't immediately obvious to me. And I thought, you know, I could learn a lot myself about how to use R, how to use GG plot to implement some of these effects. And I thought for sure you could too. Although I could make another video where I kind of run through all the changes I would make to my Ipsos version to generate the ChartR version, I'd rather slow things down and take each difference in each episode to kind of break things down and show you and teach you more about what I would do to go about making these changes. Because again, if I'm teaching you how to do it, then you're going to be more able to take that change or that difference and implement it into your own work to decide for yourself whether or not that helps you to communicate your story with your data. What are some of the big differences that I key in on? Well, the first thing I notice is that in the ChartR version, they've got alternating colors in the background within the panel. The second thing I noticed is that to help indicate the flow of time, they are using arrows. The third thing that I noticed is that they are using far more color than I ever use. If you've watched my videos, you know that I kind of reluctantly use color and don't really use color very much or very well at all. And so they'd certainly have a color scheme going on that will work very well with the ChartR branding. Something else I noticed is that the fonts they use within this figure are very different. They've got a serif font and a sans serif font, and they are actually quite different than what comes pre-installed within R. So how do we get different fonts within R? Well, we'll figure that out. The other thing is that they're presenting the legend in a very different way than what I typically do or what we typically see in science and certainly different than what Ipsos did with those two text boxes at the top. Also, this version is quite a bit different from the Ipsos version because it communicates or tries to communicate a story about the data. My plan, therefore, is to take each of these six elements and perhaps we'll identify more as we go along and to chunk it out into individual videos where we can perhaps have shorter videos but with a lot more detailed focus on an individual element. And that way I can perhaps do a better job of teaching that material to you so that, again, you're better able to implement this into your own data visualizations. Because these videos are going to come out as a series over the course of several weeks, please make sure that you're subscribed and that you get your friends and everyone in your group or whoever subscribed to this channel so you can see when each episode is released. And I would very much welcome your feedback on other big things that you're wondering how they pulled off in this Chartier figure that isn't immediately obvious to you. Today, I'm going to start with the alternating color in the background of the Chartier figure. It's kind of like a pretty version of grid lines as I see it and I kind of like it for that. You know, in a scientific publication that I'm more familiar with producing, I wouldn't use these different blues but perhaps be more likely to use different grays, something a little bit more subtle. And again, is a way to help me as a audience member to see what points go together. So I'm sure that there's a ggplot related package out there that you could get to build out these alternating colored backgrounds. I have a hard enough time remembering how to use ggplot2 that I don't want to have to go learn another package. And I want to build out this alternating colored background so I can learn ggplot2 all the better. And again, help you to learn ggplot2 even better yourselves. So what's the strategy we're going to use to pull this off? We are going to use a geome ribbon to pull off that alternating colored background. Something else that we'll have to do along the way is that we'll have to convert our y-axis from being discrete, right, labeled by each of these countries to actually a continuous variable. But we still want to be able to keep our country names there. So we'll have to figure out how to do that. The other thing we'll have to figure out how to do is how to include three different colors into our strips. It might seem like there's only two here. But if you're a keen eye, you might see that there's actually three. The next thing that we'll have to figure out how to do is how to put the tick marks on that y-axis between the different country names. With the ipsos figure, the tick mark actually goes right into the country name. But in this chart R version, it's actually between those country names. So we'll have to figure out how to do that. If you want to get the code that I am starting with, by all means, check out the link down below in the description, which will take you to a blog post associated with today's video. There I will put the code I'm starting with and the code that I end with. And again, I strongly encourage you to follow along. Here is a copied version of that August, October 2020 ipsos.r, but I made chart R, chart R dot R. And everything's the same. I'm going to start out with going to change the name of the file I'm saving from ipsos to chart R. Very good. We've got that working. To kind of simplify my code a bit, I'm going to remove some of the code that we included here to add those text bubbles, which we used as a legend. And so that is everything here from line 48 all the way down to before the save. Again, that got rid of those legend text bubbles to create the alternating color in the background. I'm going to go ahead and build out another data frame that will allow me to create those strips. And again, what we're trying to do is say for France, you know, so if we think of France at being at like position one in United States at position two, thinking of this on the y-axis, that I want France to have a rectangle on the y-axis that goes from say 0.5 to 1.5, United States from 1.5 to 2.5 and so forth. And I want them on the x-axis to go from 50 to 100. So there is a geome called geome polygon that you could use to do this. But I find that it's just perhaps a little bit more complicated than I really need. The alternative that I found worked really well is called geome ribbon. So geome ribbon is really nice. And I think we may have used it in the past, certainly with building those rock curves, if you recall that episode from a couple days ago, a week or so ago. But what that allows you to do is set an x-axis position, and then a y-min and a y-max. And so when I was thinking about putting in those strips into this figure, I thought, well, I could have my x-position be 50 or 100, and my y-min for France say be 0.5, and my y-max be 1.5 and so forth. So I want to build a data frame out that has x-positions of 50 and 100, and then has each of the y-min and y-max for all 15 countries plus the total. To create those strips, I'm going to go ahead and create a data frame. And I'll start with the data data frame, where again, I've got the country name and then all these percentage values for the x-axis. I don't need those for this strip data frame I'm creating. All I want right now are those country names. So I'll do select country. Again, that gives me a column data frame with the country. And then I will do a mutate to create two columns. So I'll do x-min, I'll say equals 50, and x-max equals 100. So to create the y-min and max, I'll start by creating a variable called y-position, which will be 1 to n row, period. And that will be the y-position, right, from 1 to 16. That's actually kind of the opposite direction from what I really want. So I'll do rev on that. So that France is at the bottom position 1, and then total is at 16. I can then do y-min, and that will be y-position minus 0.5, y-max equals y-position plus 0.5. So now we've got our country, our x-min, x-max, our y-min, y-max. And what we'd like to do is go ahead and pivot longer to get our x-column values in a single column. And we can then do pivot longer. I'll do calls equals x-min, x-max, values 2 equals x, and then names 2 equals x-min, x-max. I don't really care about that column. We'll probably get rid of that eventually anyway. Let's go ahead and do a select minus x-min, x-max. And now what you'll see is we've got two rows in our data frame per country, right? So India has two values. And so on the y-position, 15, we have 14.5 to 15.5 is the y-min, y-max, and then our x-position from 50 to 100. This then has the data frame that I want to use to create my strips. I'll go ahead and save this output as strip data. And so again, we've got strip data. And coming down to my pipeline here where I've got the ggplot, I'm going to go ahead for now and interrupt this flow to add in my alternating colored backgrounds and make sure that we've got everything we need to do that. So we can then do geom ribbon and we'll do aes. And it occurs to me that we need a different data frame. So we'll say data actually equals strip data. aes will then put in x equals x, y min equals y min, y max equals y max. And let's go ahead then and add in inherit. I can never spell that word, right? aes equals false. What inherit dot aes equals false means is ignore what was kind of the parent level aesthetics, right? And so ggplot has x being percent, y being country, color being month. Well, my strip data data frame doesn't have any of those columns. I guess it does have country. So, but we want to ignore the color and the x equaling percent. And so what we get back is a diagonal black line. And I think what is the problem is that we need to tell geom ribbon how to group our data. And so we could then say group equals y position. And so now we can get a big rectangle. And what you can maybe notice is that these are actually like 16 different rectangles, the black rectangles with just a little, you know, hairline line separating each of the ribbons. What we need to do is come back up and let's go ahead and geom ribbon. And let's go ahead and do fill equals country. What we get back now are a bunch of ribbons 16 different ribbons for one fill color for each of our different 16 countries. That's not exactly what we want. But you can tell hopefully that we're getting close. What I'm going to do is come back up to my strip data. So I'll add a column for fill to my strip data pipeline. And I will say wrap C, and I'll say A and B. So did different colors. And I want the length out to be n row of whatever's coming through the pipeline, right? And so if I look at that, so that's really data. If I look at this, what this should give me is 16 values, right? And so I get AB, AB, AB, AB, right? And so that works well. So let's go ahead and I think I can replace that data with a period. Now if I look at strip data, I see I've got AB, AB for my different fills. And then in my geom ribbon, instead of fill equals country, I want that to be fill equals fill. And so now I get that alternating colored background, which is pretty slick, right? One thing to notice, however, is that if we look at the chart R version of the figure, there are two different colors here, right? But there's actually a third color for the top line for the total, right? And so they don't actually show the total, but the grid line goes up to that 16th spot for the total. So we need to get a third color. To do that, again, very similar instead of length out equals n row, I'm going to do length out equals n row minus one. And then I will also add a C parameter, right? And so it's upset, why are you upset with me? Oh, because I need to include this in a C, right? So that should work. Although we need an extra parentheses to close out the mutate. And now I can do strip data. And now I see I've got all of those. So I'm noticing that the total is getting a when I want total to get C. So I should probably move that C up to the front of my C statement. Let's see what this looks like through this position in the mutate line. And again, we will see, yeah, so total now is C, and then India China goes AB AB AB all the way down. And then when we do the pivot longer and everything else, then it should work and wonderful. We now have alternating colors. So a is the bottom, B is the second, right? And it alternates. But then at the top row, we have C. So that third position. So these colors look pretty horrible. We're going to change them eventually. But for now, I'm going to stick with these colors. But I want to bring back in my barbell or dumbbell chart. And again, to do that, we can simply add that plus sign. Let's run it and see if we get those barbells on top of our strips. And so we're getting an error message, discrete value supplied to continuous scale, we can kind of go through this line by line and figure out where exactly did we run into the problem. So we know that GM ribbon works, but does GM line work? So no, that does not work. So the problem then is that Y here for the making that line is adding country as a Y axis, which is discrete, whereas the ribbon requires continuous values, we can go ahead back up to our data data frame. So I'll go ahead and add a Y position variable. So I'll do Y position equals and very much like what I had down here on line 20, rev one colon and row on that. And let's make sure that we get everything we expected. And sure enough, we get all that and our Y position on total is 17. And France is one. I think we could also then remove this Y position here and do select country and Y position. Let's make sure we've got both of these data frames loaded. So down here in the pipeline for building out the barbell plot from what we had before, there's a few things we need to modify the first being in this pivot longer, we had calls equals minus country. And so that said, basically pivot longer all of the columns except for country, we also want to add in Y position to that. So we'll do minus C country, Y position. And let's put this on a separate line. And now let's run these lines to make sure everything looks good. And so sure enough, we have the Y position, this all looks great. I don't think we need this mutate line, we had used this previously to get the order of the countries correct. I'm going to comment it out for now, I'm not going to totally delete it. And we might delete it later, when we were pretty confident, we actually don't need it. And then this Y equals country, I'll do Y equals Y position. Doesn't that look horrible? So we have created a line plot. And what we really need to do is we need to tell Geome line, what two points go together, we hadn't needed to do that before, because putting Y equals country automatically enforce that for us. So we can come back all the way up here to our plot. And we can say yes, and we can say group equals Y position. So now we have our country names along the left border there. That's wonderful. One thing I would like to do here is tighten up the spaces between our country names and the plot, something that you've perhaps noticed before with other plots you've made in ggplot is there's a bit of padding that is added around the x and y axes. To get rid of that, we can actually come up to scale x continuous and scale y x continuous and do expand equals C zero comma zero. And that means don't expand, don't expand on the x axis. And we'll do the same thing down here with scale y continuous expand C equals C zero zero. That means don't add padding on either side of the y and x axis. So now you can see that we certainly have those numbers right up against the first strip here. And these country names are much closer, right? We don't have that buffer right before 50% or right below France. So that looks a lot tighter. So the next thing that I want to take on are adding the tick marks. Before we didn't have tick marks, we were actually removed them because we're using the grid lines that again expanded beyond 50% to indicate the grid line for each of those countries. So we want to insert a tick mark in between each of the countries kind of at that boundary between the red and the green. So I'll start by removing the grid lines that we had before we don't need those. And you'll also see that our axis ticks is element blank. So for both the x and y axis, we said no tick marks. So for axis ticks dot x, I'll make that element blank. But for axis ticks, dot y, I want that to be element line. And now we have a tick mark for each country. But that's not where we want it. We actually want that tick mark between the country lines. To achieve that, I'm going to modify my scale y continuous, so that we have data y position. But I want to add extra breaks, I want breaks in between each of those positions. So I'm going to add to this vector with data y position. So we'll add 0.5 as the bottom tick mark. And then I'm going to take data y position and add 0.5 to that. What we'll now have are the break positions for our countries that fall on, you know, one through 16. But we'll also add a position or break at 0.5, and then at one through 16 plus 0.5, right? So that'll be good. Now we need to change our labels. And so we'll have data country. But then we'll also have nothingness, right? So we'll put things like an empty quote, but we need to repeat this to be basically 17 times, right? And so that will be length, data, dollar sign y position, plus one, right? And so we need the plus one because we're adding that 0.5. And I'm missing a comma here. All right. So this now, I think, should take care of everything. And I'm missing a closing parentheses here. Now what we've got is a tick mark on each of those break points, right? But we don't want to break on each of the points. We want those that are matched to the country to actually be blank. So we'll then set the color of our tick marks. But the color of the tick mark that's right adjacent to the country, we want to be transparent so that we don't actually see it. And so we'll do color equals and then we'll do C wrap on NA. And we'll repeat that N row data times, we can then do another rep and of let's do black, and we'll do N row data plus one. And now what we get is we get tick marks in between the countries, right? But they're black, rather than the gray that we had in the chart, our version of the figure. So instead of black, we could go ahead and do dark gray. I'm also going to make the lines a little bit thinner by doing size equals 0.2. Very good. We now have those tick marks in between the countries, indicating kind of the boundaries of our different colored strips. One other thing I noticed about this version of the figure, as opposed to the Ipsos version, is that they also have a solid gray line for the X and Y axes. Go ahead and then do axis dot line, because we want both the X and Y axis line to be element line. And again, I'll do color equals dark gray, size equals 0.2 to hopefully match what we had with the tick marks, we are making progress converting the Ipsos version of this dumbbell chart to the chart, our version of the plot. Baby steps, right? We've in this episode changed the background so that we have this alternating color scheme. We've also added tick marks so that they fall between the country names. Again, in future episodes, we'll come back and fix the colors, don't worry. But what we have right now is a framework for adding these different colors. See if you can figure out based on what we've covered in previous episodes, how you can go ahead and change these colors. Like I said, we'll get there in a future episode, but I'll leave that for you to do as homework for right now. We've made a lot of progress, hopefully by kind of taking this in steps and chunking it down. It will help you to learn the material a little bit better and allow me to go a little bit more in depth as to what's going on in the code that I'm showing you please make sure that you're subscribed so you can come back and see the next episode where I talk about adding those arrows, arrows might be a really useful way to direct the reader to think about the flow of time, make sure you're subscribed, and we'll see you next time.