 Something I come across frequently in science is a situation where I have a bunch of different things. Say people. And we take two measurements on each of those people. Say before and after some type of intervention. That intervention might be some type of medication we've given the people or perhaps we've changed their diet. And so before we might look at their weight and after we might look at their weight, right? Or we might look at cholesterol before and after those interventions. Alternatively, another place that I see these types of data is in economics and politics and social science where there's some event. And you want to know what happened? What was the situation? What was the sentiment before that event? And what was the sentiment, say, after the event, right? So as we are currently in the midst of the COVID-19 pandemic, you might think about things like vaccines, right? So what were the sensibilities about vaccines before the pandemic or perhaps before the vaccine was made available, as well as what are the sentiments after the vaccine was made available? So in both of these types of situations, whether it's science or politics or who knows what's going on, we have what's called paired data. Often I'll see these types of data not presented as though the data are linked. So for example, we might take all the pre points and display them as a box and whisker plot. We'll also take the post points and display those as a box and whisker plot. But as we've already established, this divorces the pre and post from each other and that the fact that they are measuring the same individual, but perhaps many, many individuals. So we need a better way to represent paired data than showing the pre and the post points separately. As we've talked about in previous episodes of Code Club, pre-attentive attributes are those characteristics of your figure that your audience will use to decipher what the story is you're trying to tell them. One of those pre-attentive attributes is connectedness or grouping and how your data are grouped together. So we have a couple of different ways that people regularly use to indicate the connectedness of their data. For example, we might put all of the individuals displayed on the y-axis and then along the x-axis we could put the continuous variable. That way then we know that those individuals or those points in a row are all connected together because they're kind of anchored to the entity on that y-axis. Alternatively, we could take all the points from the same entities, the same individuals, and we could give them the same color. The downside of that, of course, is that if you get more than five different entities that you're displaying, you quickly run out of the ability to discriminate between those different colors. Perhaps the best tool that we have to indicate connectedness and grouping is to use a line to connect your points. So we can have two points and if we connect them with a line then that signals very clearly to your audience that those two points are connected. This type of plot is called a barbell plot or a dumbbell plot. So I was really intrigued by this figure for a variety of reasons. I thought the data were really interesting and again it represents this type of paired data and looking at sentiments in August as well as October. I also had some questions about, you know, how would this look if I modified the plot to display the data differently? So the plot was originally made in Data Wrapper. I don't know Data Wrapper. It's an online tool that a lot of publications use to make these kind of HTML based figures. I do know R and so my goal was to recreate this figure and trying to get the styling as close as I could using R and along the way we would learn more R and we could practice our R skills and then we could have something that perhaps in future episodes we could modify to perhaps make some other design decisions on how we display the data. So that's exactly what we're going to do in today's episode. We're going to create this figure as best as we can in R. I've gone ahead and created a project here in R Studio associated with this vaccine attitudes project if you will and I've got my R script. If you want to get a copy of this R script down below in the description is a link to a blog post associated with today's episode where you can get my final code. Also I will put in there the data that I was able to download from Ipsos at the bottom of their figure. They indicated a link where you could go to get the CSV file and so I have called that in my project August October 2020 CSV spoiler. They repeated the survey but we'll start with the August October so again that we can replicate that figure that they had in their initial report. I'll go ahead and start with library tidyverse and so now we have all the great tools from the tidyverse GG plot the plier all at our disposal and we'll get started by reading in the data so I'll do read CSV because that file is a CSV and again it's August October 2020 CSV. We see that it's got 16 rows so those 15 countries plus the total looking at the order of the rows here they seem to be in the same order as the as the countries in the plot which is convenient right. I think they're actually ordered by the percent that would get it from October right so India is the highest and France is the lowest so that ordering is helpful these column names are not helpful so let's go ahead and clean up these column names so that they're easier to work with to rename them we'll do rename and I will say country equals x period one and then I will do August equals and then in quotes we'll put I'm going to copy this because it's so long and we'll do the same thing for October so now we've got country August October and I'll go ahead and call this data and so now we've got that data frame at our disposal for creating a plot so we'll go ahead and start with making that initial barbell dumbbell chart so we'll say data we'll pipe into ggplot aes so along the x axis we want our percentages and along the y axis we want our countries so we want our percentages and then y we want the country right and then our color I want to be the month now the problem is I don't have a column percentage and I don't have a column month so I need to take this three column data frame and kind of reorient it so that I have a country column I have a month column and a percent column so the data are wide and I need to pivot them longer to be tidy so within this pipeline I'll go ahead and do pivot longer and I will do calls equals minus country I could say August October but that's cool so we're going to pivot longer the columns that aren't country I'll do names two equals and that will be month and values two and that will be percentage and then we can pipe that into ggplot if we want to see what this data frame looks like again we've got country month percentage and we can then do ggplot and I will go ahead and do geom point I'm going to go ahead and save this as a figure so we'll do gg save and I will call this um August October 2020 ipsos.tif and I'll do width equals six height equals four it was wider than it was tall so this is a good start we've got the the bells if you want of the dumb bells so the next step is to draw that line connecting the bells to do geom line and here I'm going to do color equals black because I want the connecting line between the two the the circles to be black so voila we now have that line I want those to be behind the points though so maybe I'll put geom line first and then geom point and so now we see that the circles are on top of the black line the next thing I notice about the figure from ipsos is that they put the percentages next to the points and so they've labeled them with the actual percentages so we can do the same thing if we do geom text and we can then say aes label equals and I'll say percentage and so this gives us our percentages and they actually are right on top of the points and so we need to dodge them so they're not right on top of the points I'm going to add a bit of a dodge to the x-axis coordinates and so that's going to need to vary for each of the points because you'll notice that for some like South Africa October was actually higher than August whereas for others like the United States sadly October was less than August and so we're going to need to put in a little bit of logic to tell our the tell gg plot where to put the the text relative to its point I'm going to go ahead and break into this code chunk for building the plot and remind us what data looks like so again this is the data frame before we've pivoted longer and the 77 and 73 along with country are the coordinates for where those individual points or the bells of the barbells are plotted and so for the text I need to add a separate column to indicate the coordinates for where the text should go so I'm going to go ahead and rename these columns to be percent August percent October and then I'm going to create new columns which I'll call bump August and bump October will do percent August percent October gives us the same output but again just different names we're in good shape there we can then also go ahead here and I will add a mutate and I want to create a bump August and I'm also going to create a bump October and so bump August I'll do if else so if percent August is less than percent October if August has a smaller percentage than October it's going to be on the left and I'm going to want the bump to also be on the left then so I'm going to do percent August minus let's put in say two to start and if August is actually larger than October it's going to be on the right so I'll then do percent August plus two this then will create that data data frame and we now see that we've got bump August along with percent August right so that's again adjusted depending on whether percent August is smaller larger than percent October now we also need to add in a bump October October equals if else so if August is on the left of percent October then we need to add two otherwise we need to subtract two so now we have our data frame with our country percent August percent October bump August bump October but I want to pivot this table longer so that I have a column for the country the month the percent and for the bump and so looking at our pivot longer this isn't going to work because we're going to get weird output right so the results are a little bit funky in that it took all the those four columns and put them into the month column right and so we could do something like the month column where we could separate that into percent bump and month but it's just kind of funky so to get this pivot longer to work where we have those four columns that we want to convert to two columns plus the month we're going to need to modify our pivot longer statement and so we will take names two and we're actually going to make this now a vector and so we still want the month but we also want a special variable called period value and so what period value means is we're going to take percent August percent October bump August bump October we're going to split those apart those names apart into month which is the latter half and then percent or bump which is the first half right and so we'll then take values to and get rid of this and we're going to replace that with names sap because we need to tell pivot longer what character to separate those names on right and so we're going to use an underscore so again this might seem a little abstract so far better than me just kind of trying to explain it over and over with words is to show you what's going to happen so we'll go ahead and run this and voila we've taken that very wide data frame and we've now converted it to have country month percent and bump and so we have now the country which will be on the y-axis the month which will use for the coloring the percent for plotting the point on the x-axis and the bump for plotting the percent label using geome text on the x-axis let's go ahead and throw this all into our plot so I need to go ahead and add to geome text my x being the bump also I notice I've got percentage rather than percent I changed that when I was kind of doing this stuff up here with the rename function so I'll change those percentages to percent so it looks like my text is working for some of these but not everything and what I've noticed back here for bump October is that I copied this block down and I forgot to change percent August to be bump October or to be percent October so we go ahead and replace August with October again copying and pasting is helpful except when it's not that looks much better right so now we have the numbers to either side of the points and we can tell that we got the right number with the point because they're matching in the color and that looks pretty nice the sizes are a little bit funky don't worry we'll come back to that in a moment one thing I want to take care of right now though is that here I have the number 64 say for Spain in October but in the plot it's 64 percent sign right so let's go ahead and modify that and to do that I'm going to make use of the glue package so I'll do library glue get make sure that's all loaded and then down here what I can do for my label is I can do glue as a function percent I will then put in curly braces as well as in quotes so what that means is take the value of percent put it in the quotes and then add a percent sign and then we also need to then put the closing parentheses for the glue function so this is a good start for our barbell plot trying to replicate what ipsos made the next thing I want to take on is the sizing of the numbers the line and the points one thing about this plot is that the text for the the percentage seems to be about the same size as the points and the line between the bells is rather thick and almost the same size as the points so what I'll go ahead and do for my geome point I'll put in here size equals two for my line I'll do size equals 1.75 and maybe for my geome text I'll also then do here size equals two so it looks a little bit closer to what they had in the original figure one thing I notice is that the the text like the 64 percent here again for like South Africa it actually looks to be a little bit smaller in height than the size of the point next to it so maybe I'll try to make that uh size a little bit larger so instead of two let's go ahead with three and that looks I think those proportions look pretty good again I'm gonna hold off on kind of the positioning of the points because one thing we're gonna do eventually is remove this legend and what that will then do is to kind of spread things apart and that might make the positions of those numbers look a little bit more attractive than they currently are the next thing I want to take on is modifying the colors to better match what was in the ipsos figure these are the default colors from ggplot which I generally don't like so the function that we use to manually set the colors is scale color manual and we can do name equals null that will get rid of the legend name ultimately again we're going to remove the legend so who cares what we're calling it we can then put in breaks august and october and then values equals uh something and something so we'll come back to those colors in a moment and then labels we could then say august october again I'm going to remove these labels eventually so I'm not totally concerned with what I put here but what we need to get are those values coming back to the ipsos web page where we got the idea for this plot I have a little eyedropper color picker widget added to chrome is a chrome extension I can use this extension to highlight over anything on the web page click it and then come back up to the extension button here in chrome and I can then get the hexadecimal code right so it's that gray is 72 72 72 and then I can come back and so that was for august and then for october I will then pick a different color this bluish color come back up here get that hexadecimal plug that in there with the pound sign let's go ahead and do the same thing with this gray line that's connecting the points which is e6 e6 e6 and so instead of black I'll go ahead and put that in and so it's hard to see that bell because again I've got this background theme I'll do theme classic that will give us a white background so it's easier to see the gray line connecting the two points so that looks pretty good again the numbers are a little bit on top of the points but as we get rid of the legend those will separate so let's go ahead and do that now actually so to turn off the legend I will come into geom point and do show dot legend equals false and I will use that as well for my geom text and I don't think I need it for geom line but for good measure we'll go ahead and add it as well and so now you can see we get that separation and actually I think we have a pretty nice spacing between the number and the point now I want to turn my attention to the axis labels you'll recall from this original plot we had we go 50 55 all the way up to 100 so by 5 percent point increments and also we have the original ordering that was in that csv file but that we can see is also the percent that agreed as of october of 2020 so to take care of those country names first what I'm going to do is go ahead and I'll mutate the country to be a factor of country and the levels then I will make equal to the data dollar sign country and so this is the right order except it's reversed so we can come back up here and around data dollar sign country I'll do rev so now we get the right ordering of our data again India to France and it again is ordered by the response from october and it keeps total up at the top let's go ahead now and look at the x axis so for that we'll do scale x continuous limits from 50 to 100 and then breaks we'll do seek 50 to 100 by 5 so this will take 50 to 100 and it will then do 55 50 60 all the 5 percentage point increments so we can now see that we've got our values on the x axis that we like again they put the percent signs right next to the number so let's see if we can add that as well so to add those percentage signs we will again use the glue function and I'll do labels glue and then in quotes I'll put the that function that we used for breaks so seek 50 to 100 by 5 and so that's the basically kind of like what goes in the curly braces so that needs to be in curly braces right and then we will put after that the percent sign and there are our percent signs right with our numbers again looking at the original there was no x or y axis label they have a title and they also have kind of an annotation down here in the bottom left that I would like to add to my figure we can add the labs function so we'll do x equals null y equals null and I'll then do title equals and I'll lift that from the plot and then caption is what will allow us to put that text underneath the figure so here we'll do I'll grab both of these lines and instead of chart I'll say source equals ipsos and for now we'll leave it like this good so we cleaned up those axis labels to get rid of the percent and the country we have the main title we have the caption the formatting isn't quite right I want the title to be left justified so that it starts over here in the top left corner of the plot not of the panel the caption is also right justified rather than left justified I can change the position of both of those using the theme function and so I will do plot dot title dot position equals plot and so again that's aligning it relative to the plot window not the panel where the data go and then we can also do plot dot caption and I will then here do element text because it's actually right justified so we'll do h just equals zero so h just one is right justified point five is centered zero is left and then again because that's still going to be positioned relative to the panel I want to also give this a plot dot caption position argument and there again we'll say plot again now we have that left justified looking back at the original figure the title is bolded here so I'm going to need to make mine bolded and also this caption the down at the bottom is actually an italics whereas the chart ipsos or in my case source ipsos is a vertical font and those actually are both gray colored so I'm going to use element markdown from gg text to do that we'll do library gg text and we can give plot dot title we will do element text and I will say face equals bold that will bolden the title and then for my plot caption instead of element text I'm going to do markdown element markdown will allow me to insert html inside of the caption so to italicize I can use the eye anchor and then I can also use the forward slash eye to italicize that baseline and then I can do a br for a line break before going to source ipsos in the plot caption I can also add color and I'll say dark gray I'm not so concerned about getting all the colors just right and so there I now have a bolded title I have the italicized caption along with the source of the data with that base being italicized and the source being a vertical font there's still a few more things we need to look at the next one frankly scares me a little bit because it's not something I do a lot of but I kind of like the idea that we saw in the original of having the legend with the data right and so instead of having something off to the right saying you know this blue is the October and the gray is August to actually label a point to label an example point to say the blue points are from October and the gray points are from August right I kind of like that look it's not something I do a lot in my figures and so this is frankly stretching me a little bit to think about how I would go about implementing that in this figure I'm going to save this code block for building out the plot we've got so far as a variable that I'll call main plot and so now I've got main plot and I can easily then add on to that other attributes other things that I want to plot and so I'm going to then do that to plot on top of this those labels that are replacing my legend if you think about this figure these labels are connected basically with the x position the percentage for the total line I'll come into r and I will take data again I will filter data to get country equals total again it's not a real country but that's the line in the country column this gives us our one row data frame for the total category so we need to pivot this longer again like we did up ahead up above here I'll copy that and paste it down here and add that to this pipeline and I'll get rid of this final pipe and so now I've got country month percent and bump I'll call this data frame total right and so now I can take my main plot I'll add geome text box and again for my aesthetics for my x I will do percent month equals country and then I'll need a column label that I will take from another column called pretty which I haven't created yet and so I will do that up here in total I'll do mutate and I'll say pretty equals if else month equals august then I'm going to come way back up here where we had those really long hideous names copy that down here otherwise it's going to be october so now we have our total data frame with that pretty label that we can use in with geome text box but before I can do that I need to give it the data frame total because the data frame that we used up ahead data doesn't have a pretty column so we'll get all sorts of errors if we don't do this we'll do data equals total so we're going to put percent on the x axis the y actually should be the country color should be the month and then we'll have label be pretty so we've got our labels here they're pretty hideous don't worry we'll get there eventually the first thing I need to do is turn off the legend on my geome text box so to do that I'll go ahead and put some of these things on separate lines again I can do show dot legend equals false let's turn our attention now to these labels a couple things I'm noticing so first of all the plot is clipping the top of the labels ultimately I want these labels actually above the plot between the title and the plot but the plot automatically clips the aesthetics to only show what's in the plotting window so we need to fix that also the font here seems perhaps a little bit bigger than we really want if you look back at the original figure it's total agree hyphen and then a line break before the date in the month also I'm noticing that the the background for these labels appears to be white it's opaque so you can't see the august behind the october so there's a few things to work on and we'll we'll kind of work through these we can go ahead and put in the br anchor geome text box recognizes markdown and html that will be good the next thing I want to do to avoid that clipping at the top of the the figure is we can do chord cartesian clip equals off and that will turn off the clipping so it will show plotting things outside of the plotting window and so now we see the full text box and all its glory and it is kind of bleeding up into the title don't worry we'll fix that before we're all done next I want to reformat these labels so they look more like the original plot I'll start with changing the size of the text let's try size equals two again we can modify this later if that doesn't work I'll then do box dot color equals na that will get rid of the boundary on those boxes I can do width equals null that will remove the width specification for those boxes so geome text box comes with a predefined width I only want the width of that text box to be the width that's needed to kind of fit all the characters so that looks a bit more attractive they're still overlapping on top of each other I think we still have that white fill but we want to move those up a notch I can change the vertical justification of those labels so they're not right on top of the points with vjust so it should put it at the top of the points and also do fill equals na to get rid of the background color of those windows we're getting there we still have our labels overlapping on top of each other so you can't totally see them I would like my october label to be right justified on top of the point and I would like my august to be left justified on top of the point I think right now it's basically center justified on the point so I'll create a column called align and so if else month August that is the label on the right then I want that alignment to be left justified so that's a value of zero and if it's october so not august then that is going to be right justified so I'll put in a one there and then down in my aesthetics I want to do hjust equals align and now I've got them separated but I want them on top of the point and one thing we would see if we had that line still the box dot color still being shown is that there's actually a margin or patent padding within those boxes you can do box dot padding equals margin and we'll do 000 to get rid of any of that padding so that got rid of the padding but it kind of moved where those labels went to so I'm going to change my vjust to be negative 0.5 to get that bumped up a bit I think that looks pretty good it might be nice to bring it in a little bit but I'm pretty happy with the way that looks in general the title though needs to move up and so to move it up what I can do is I can add a bottom margin to the title and again if we come back to our main plot the plot title element text can add margin equals margin and I could say bottom equals let's say 20 so I think those labels look pretty good they're not a perfect representation of what was in the original figure from ipsos but I think it's close enough and I think we get the the effect that they're going for coming back to the ipsos plot the last thing that I want to take on is thinking about kind of the overall appearance of the figure they've got these gridded I guess if I look closely I think the vertical lines are solid and the horizontal lines are dotted also the axis ticks tick labels are gray so let's go ahead and let's modify what we have to better represent the theming that's going on in the ipsos figure so I'm going to go ahead back up here for now I'm going to remove theme classic I'll start with doing panel dot background equals element blank which gets rid of those x and y axis lines I also want to get rid of those tick marks we can do axis dot ticks equals element blank get rid of those we can make our axis dot tax dot x to the element text we can say color equals dark gray and so that gets us a gray color for those axis labels and then still the black for the y axis labels now we want to go ahead and put in those solid grid lines for the vertical and horizontal dotted grid lines I'll do panel dot grid dot major x element line and color equals gray I'll do size equals 0.1 because they're really thin and they are solid and then we'll do panel dot grid dot major dot y equals the same thing but we'll do line type equals dotted overall I think we've done a pretty good job of replicating what was in the original ipsos figure there might be a few little things here there that you might want to tweak a little bit to make it look a little bit more like the original I am not giving any critique at this point on this figure there are things I like about it there's things I don't like about it there's things I want to experiment with so if you want to see what those things are be sure that you are subscribed to the rifa mona's channel so that you know when the next episode comes out because that is how you will find out what I like and don't like about this and how we can experiment to see if we can't perhaps make a more effective visualization the purpose of this was twofold first to think about how can we represent paired data using these types of barbell charts or dumbbell charts whatever you want to call them the other thing that I wanted to illustrate in this episode was how we can take a model figure get the data that they used and try to replicate reproduce what they did and along the way learn a lot about our tooling I certainly learned a lot doing this about kind of things from the theme function certainly putting these labels in for the legends not something I'd done before but we were able to do that using geom text and a lot of our old friends if you've been watching previous episodes of code club let me know what you think of this I would be very receptive to hearing your critiques of this figure and maybe I can add them to my laundry list of things I like and don't like about it so down below in the comments let me know what do you like and what you don't like so for everything you don't like tell me something you do like about this figure okay we're not just going to be a bunch of bashers we need to be constructive as well so keep practicing with this go out into the wild see if you can't find a figure that you want to replicate as well and see how well you do let me know and we'll talk to you next time for another episode of code club