 Hey folks, I'm Pat Schloss and this is Code Club. Over the past few months we've been looking at data related to the COVID-19 vaccine and specifically looking at people's intentions to receive that vaccine from August and October of 2020, so last year, using data that was generated by an agency called Ipsos. They looked at 15 different countries where they surveyed people. More recently we've taken those same 15 countries and we've looked at their actual vaccination rates as of October of 2021 right now, right? And so what I'd like to do is to take those two data sets which we've already combined together and I'd like to visualize them to see how close people's stated intent matched their actual vaccination rates. While we are going through the Ipsos data, we looked at four different ways to visualize paired data. We looked at a scatter plot where on the x-axis we would put one observations percentage and on the y-axis another observations percentage. We also looked at a dot plot where on the y-axis we put the names of those 15 different countries and on the x-axis we put the percentage point difference between people's intentions to receive the vaccine in October versus August. We also looked at a slope plot where on the x-axis we put the time variable, so August and October of 2020 and on the y-axis we put the percentage of people who said they wanted to receive the COVID-19 vaccine and then we had 15 different lines where the slope of that line would indicate whether the change was decreasing or increasing with time. The fourth visualization approach we used was actually what got us into this whole exploration which was a dumbbell plot. In a dumbbell plot we put the names of those 15 countries on the y-axis and on the x-axis we put the percentage of people that wanted to receive the vaccine and so because we had two time points we could have two points with a line between them and that looks like a dumbbell. So what I want to do in today's episode is review with you how quickly we can interchange between these four different types of visuals in an effort to explore the relationship between what people said they would do in October of 2020 and what it turns out they actually were able to do in October of 2021. The straightforward transition between these four different plots is enabled because of the power of GG plot and as we go through this I think you'll also get a really good review of how aesthetics and geomes relate to each other using the same data to make four very different plots. At the end of the exploration of these different types of visuals we'll pick one or two that we will delve into in a next episode to try to make them look a little bit more presentable. So I'm looking at our comparison figure dot r script where we load in the tidy verse we get our our world in data data as well as the ipsos data and then we join it all together we can see the combined ipsos oid data frame where we have our 15 different countries and then we've got four columns of data so all vax is the percentage of people in each country that received any vaccination so at least one jab fully vax or the people that received the full course of the vaccination and then we have percent august and percent october for the percentage of people in august of 2020 who said they'd get the vaccine and the percentage of people in october of 2020 who said they'd get the vaccine i'm filming this in the middle of november and what i had been talking about was october data so what i'd like to do is i'm going to add a filter to my pipeline here so that we can be sure to look at the last date in october to do that i'm going to insert a line here at my line eight where i'll do filter and i'll do date less than 2021 hyphen 11 01 so that will get me all the dates before november 1st of 2021 and then we can go ahead and with the group buy and slice max we'll then get the last date in october that we have for each of the countries so i'm going to start with a scatter plot i'll take my ipsos oid data and pipe that to ggplot and with our aesthetics again it's a scatter plot so in the x axis we're going to put one observation and on the y axis i'm going to put a different observation so for my x axis i'm going to put my percent october and on my y axis i'm going to put fully vax i'll go ahead and put y in there and then i'll do a geome point so we can get the scatter plot and so we now see that scatter plot right where we have percent october on the x axis fully vax on the y axis it seems like there's a cloud of countries up at and above 60 that have been fully vaccinated and there's a few countries that are further down below something we might like to do to add to this would be to do geome ab line and that will draw a 45 degree line with an intercept of zero and a slope of one through the data and so if a country met its goal it would be on the line if it exceeded its goal it would be above the line and if it didn't meet its goal it would be below the line so i'd also like to know who do these points belong to we can go ahead and label the points using geome text repel and to use that we need a couple things so i need to use library ggrepel make sure i've got that loaded we also need to say label equals country and so now we see the labels of each country next to each of the points that that country goes with i think the ggrepel does a pretty good job of spacing the labels and making sure that the name isn't right on top of the point and isn't right on top of other points or other country names and while we might want to clean this up if we went further with this visual we really are in an exploration mode trying to find what visuals tell the story or really even to figure out what is the story before we do too much reformatting to make each of the figures look pretty so i think for a scatterplot this looks pretty good i think there's an interesting story perhaps emerging that we see that like mexico south africa india and brazil and to the united states are kind of below the 60 mark of people that are fully vaccinated whereas these other 10 countries are above that line um it's interesting also to me that brazil mexico and india had really high intentions but low realization of that vaccination rate which you know the united states i'm willing to bet that we're just being stupid right you basically are drinking the vaccine at this point in the drinking water whereas perhaps it's harder to get a hold of the vaccine in brazil mexico and india and and perhaps south africa as well so the next type of plot that i want to share is making a dot plot so in a dot plot we will put the names of the countries on the y-axis and on the x-axis we'll put the percentage point difference between people's intention and what actually happened now when i made the dot plot with the ipsos data people weren't really enthusiastic about it when i put up a poll on twitter and ask people what their favorite version of these four different plots were people kind of shrugged at the dot plot i like the dot plot because if i'm talking about a difference then i'm actually showing them the difference right if i show something like this scatter plot or a barbell plot or some other plot like a slope plot i'm asking my audience to to visually do the calculation where again if i do the dot plot i can actually show them the difference so we'll go ahead and make that and we may or may not use that going forward so i'm going to take the ipsos oid data and pipe that into a mutate because i need to make a new column that i'll call diff and this will be fully vax minus percent october and so now we see that we've got this extra column here at the end of the difference the percentage point difference and that's what i actually want to plot so i'll go ahead and pipe this into ggplot with aes and on the x-axis i'm going to put the diff on the y i'm going to put the country and then let's do geom point and so we can see we've got those 15 countries on the y-axis and again the difference percentage point difference on the x-axis something we might think about doing is can we order these countries by the difference value and to do that we could come back into our mutate we could take country and we could then do fct reorder we want to reorder our countries by diff right and so that will then order our countries by the percentage point difference between what they intended and what actually happened and so we can then see that spain uh did the actually exceeded its goals whereas india as we saw before um did worse than its goal if we wanted to get the opposite order so if we wanted india and south africa mexico brazil at the top we could do minus diff and that then kind of shows the problem at the top and i kind of like that better because i think what the story is going to be is that most countries are lagging in their ability to meet their stated intention one other thing that i'm going to add to this to kind of give some context is a vertical line at zero so i'll do geom v line and we'll do x intercept equals zero and again we get that nice vertical line indicating where kind of the break even point is of of matching intention with realization for covet 19 vaccination goals again i think the dot plot is an attractive way to visualize the difference between two different observations so the next type of visual that i want to make with you is a slope plot and again on the slope plot on the x axis we'll put the time point and on the y axis we'll put the percentage point response favorability to either getting the vaccine or having actually received the vaccine then we'll have separate lines for the 15 different countries so our current ipsos oa data is not in a good format for making this type of plot we have the countries we have fully vax we have percent october but to map time so percent october and fully vax to the x axis that needs to be in a single column so my data aren't currently tidy so i'll take the ipsos oa data and i'm going to simplify the data set a bit so i'll do select on country and fully vax and percent october right and so that then gives me those columns i don't really like these column names so i'm going to rename them here in the select function i'll rename fully vax to be actual and then percent october to be intended on percent october and again that renames our columns here so that's good and then we can feed this into pivot longer where we can then do calls equals actual and intended and we can then do names to equals status values to equals percent now we have that nice tidy data frame and i can take the output of this pipeline then into ggplot where on the x axis again we can put status and on the y axis we can put percent and we can group it by country so we can do group equals country and then i can do geome line and what we get then is a slope plot we see we have actual and intended on the x axis that's actually the opposite order of what i really want because i want intended before actual because intended was from 2020 actual it was from 2021 so i will come back up here into my pivot longer and i'll do a mutate where i will make the status a factor with the order that i want right so i'll take status factor on status levels and i will use intended and actual and so then that should get the right chronological order on my x axis we do now see intended on the left and status on the right which makes it clear that a lot of these countries are dropping and that these two countries so i think these were brazil and india are really falling off one thing to comment on is that i'm using fully vax data here if we looked at the all vax data the story is basically the same the india data comes up a little bit better but it still indicates a really big difference between what people intended and what they actually got so i feel okay running with this fully vax data one of the challenges that we talked about when we were looking at slope plots is how we label these lines right and up here where we have a lot of lines really close to each other it gets really challenging to label those lines we previously talked about you know 15 different colors which was a big mess and we looked at a couple different tricks for labeling these but ultimately none of them were super satisfying in my mind so let's move on and we'll go back to the original type of visual we looked at which was a dumbbell plot so again what we're going to do here on the y axis is put the names of the countries the x axis then will be the percentage point response to whether or not people wanted the vaccine and whether they actually got the vaccine so i'm going to start by taking the first four lines from the pipeline to make the slope plot because i need the data to be tidy right we'll go ahead and pipe this into ggplot and for our aesthetic on the x axis we're going to put the percent and on the y axis the country and we'll group it by country right and maybe we can also just do color by status and we'll go ahead and do geom line to connect the two points and let's also go ahead and do geom point so we get our barbell plot again it's a bit rough we talked about this before with the barbell plot of how could we use color to indicate the chronological time and so again the teal is the actual and the salmon color is the intended one thing that we played with before was instead of using a blank line would be to use an arrow so we could do arrow equals arrow and so now we get those big ugly eras and one thing i'm noticing is that all the arrows point to the right and that's because we're using geom line and so in geom line the line goes from small values to larger values if we wanted the data actually in the order of the data in the data frame we could then do geom path so now my arrows are going in different directions they're not all going to the right but they're not at the right end right so this india the arrowhead should be on the left not on the right so it's putting the arrow at the end of the line not at the beginning of the line to fix that i can do ends equals first now we have our arrows going in the correct direction this looks not great i think what i'm gonna do is actually i'm gonna turn off the points because i think it's going to be obvious the direction that time is flowing and i want to go ahead and make those arrows black give them a closed head make it a little bit smaller so first i'll go ahead and turn off the point with the arrow we can do length equals a unit 0.5 inches then we could do type equals closed and then i'll do color equals black so that's a black arrow but it's way too big let's go ahead and take it down a bit to 0.1 inches and then we get a much more subtle size and we can as we've seen before with these we can play with the angle i think currently it's 45 degrees let's do 20 so i think setting that angle to 20 makes it a little bit more of an attractive arrowhead and i think overall i kind of like this let me know down below in the comments what you think we do have a couple countries like say canada perhaps germany and japan where what they specified their intent and what they actually did are pretty close to each other so the arrowheads are kind of on a very short line so to speak so i'm gonna stop here because i can keep picking at this as as you all know from watching my previous episodes and trying to improve the theming i think what i'm gonna do in the next episode is i'm gonna take this arrowed uh dumbbell plot and combine it with the dot plot in a two panel figure so you can see the trajectory with this arrow plot as well as the calculated difference so you don't have to do that visually in the dot plot and so i'm kind of thinking about putting them side by side with each other using the patchwork package so that you don't miss that please please please make sure that you've subscribed to the channel you've clicked the bell icon so you get the notifications and that episode is actually going to come out on wednesday rather than thursday because here in the united states thursday is a holiday thanksgiving and i'm not going to watch any videos on thursday and i don't think you should either but i'll give you a little early thanksgiving present from me to you on wednesday so that you have something to watch as you're getting excited for the holiday if you're here in the us and if you're not here in the us know that i am very thankful that you watch these videos and that the the outpouring of support for these videos is definitely one of the things here in 2021 that i am very grateful for i'm also grateful that i've gotten three jabs the two doses and a booster shot of the covid-19 vaccine so keep practicing with all this and we'll see you next time for another episode of code club