 In my house, I'm in charge of teaching the kids math. Yep, we've been homeschooling since before you were forced to homeschool because of a global pandemic. So what do I like about teaching math? Well, I really like math. I like the idea of multiplying big numbers and adding and kind of getting with a pencil and paper and working through all the great math. Well, one of the cool things about teaching the kids math has been that their curriculum involves some amount of data visualization. And so while they are doing things like plotting things in a Cartesian coordinate system, they're also doing things like bar plots, which aren't ideal but not too offensive. This past year, I taught my 14 year old how to make box and whisker plots, which was pretty cool. They used the range rather than one and a half times the inter-cortile range, which we've talked about previously. But I had to teach my kids pie charts. So you know what? Today, I'm going to teach you how to make pie charts in R. And really, why you shouldn't make pie charts in R? In today's episode, I'm going to show you how to make pie charts in R. But before I show you how to make them, I'm going to show you the critique of pie charts and really why they aren't ideal. Because the lay audience out there really loves pie charts, this filters back into science because we are part of that lay audience in many other disciplines than our own, say like microbiology for myself. And so, you know, a lot of PIs will encourage their trainees to make pie charts of their data. And so a lot of times people are kind of compelled to make a certain visual because either their boss or PI forces them to or because their audience is demanding it. In the microbiome literature, we used to see tons and tons of pie charts. I think I once saw a graph or figure in a paper that maybe had 30 or 40 different pie charts. That is a little bit over the top. Pie charts aren't ideal, but like I was saying, there are ways that we can make them better. And I think that by studying pie charts and why pie charts aren't ideal, we can learn a lot more about making good visuals. We have created a stacked bar chart that I think is about the best stacked bar chart we could make. This is a stacked bar chart showing three different disease status groups. So we have individuals who are healthy, people who have diarrhea and are C. difficile negative and people who have diarrhea but are C. difficile positive. And so this then shows the relative abundance of the four most abundant phyla of bacteria found in the feces of these participants in the study. We also then have a group of other that's pulling together more rare populations or rare phyla in these individuals. If we looked at all of the populations, all of the phyla across these three disease status groups, there'd be 13 different categories represented in the stacked bar chart, which is we've seen before just way too much. And so one of the big problems with the stacked bar chart is that you can have too many different categories, too many different phyla or populations that you're measuring the relative abundance of. And because there's so many, it's difficult to discriminate between the different shades that are represented in your rectangles of your stacked bar chart. Another challenge of stacked bar charts is that oftentimes you don't have a common basis of comparison. As I have created this stacked bar chart in the previous episode, I put the most abundant phyla on the bottom and so that it has an anchor at the 0% relative abundance position on the y-axis. Then the second most abundant phyla, the Bacteroidetes, is at the top so that it is anchored at the top at the 100% line. And so then at least for those two phyla, it's easy to make comparisons across the three different groups. The poor proteobacteria are kind of stuck in the middle and float wherever the proportions have them. And so it's a bit harder to differentiate the relative abundances of the proteobacteria in these three different groups. It's difficult to make that comparison at the phylam level. If I were to go to a more fine scale level, like say the genus, it'd be even harder because there'd be so many more wedges and populations in here and there'd be so much more variation. Another challenge that we saw with stacked bar plots is that we don't know the end. We don't easily see the data to know how many individuals are represented here. Sure, I could put the numbers down along the labels on the x-axis, but that doesn't really give you a good sense of the data. The other challenge with stacked bar charts is that we don't get a sense of the variation in the data. Here we are representing the mean, the average relative abundance across a large number of subjects in the study. But I don't know how much variation there is. I don't know what the range is for these different phyla in the different disease status groups. Why am I going through all this? Well, believe it or not, a pie chart is the same thing as a stacked bar chart. But instead of having a linear y-axis, we actually have a curved y-axis. And so we will learn a new coordinate system today, which is Cord Polar, which is in GG plot, mainly for demonstration purposes so that you can make pie charts. The documentation is very clear that you should not make pie charts. Go look at the help page for Cord Polar or do question mark pie in R, and you will see great references on why you shouldn't make pie charts. The first pie chart that we will make in today's episode is a variation on what's actually called a donut plot, where each ring, the concentric rings here, represents a different disease status group. And so you can imagine if we only had the outside, it would look like a donut, right? Homer Simpson would be very happy. In this depiction, again, we're only working with the five different groups, the four phyla and the pooled other. One of the nice things about this depiction is that for comparing the three disease status groups, we have a common axis or common anchor to make a comparison. That's basically at 12 o'clock. If you imagine that this pie chart is the face of a clock. And so that is useful because then we can again compare, at least the formicities and bacteria deadies across these three different disease status groups. And again, we kind of lose the proteobacteria in the mix after those two most abundant populations. So I think this highlights, again, the challenge with stacked bar charts and pie charts is that they are really only effective, if you can say that, if you have a small number of wedges, like say two or three, because otherwise it gets too hard to compare across your different pie charts. One of the challenges with this depiction is that humans, our perception is in terms of area, first, not angle. And so if we look at this outer band, even though the proportion is the same as, say, one of the inner bands, perhaps, the area is much larger for the same angle. And so that way then anything in that healthy ring on the outside is going to appear much more abundant than it really is because it's taking up more area in the plotting area, that there's more filled area, so to speak. So that's a real challenge with this type of visualization is that things on the outside automatically get more emphasis because they're taking up more area than they do proportionally to those inner circles. One of the other challenges with this depiction is that it's not immediately clear what these three circles refer to. And so here I have created labels kind of off to the left that are kind of aligned with the three rings to make it clear what they represent. The other type of pie chart that we'll make today is what you might think of as a simple pie chart without the concentric circles, without the donut plots. So you won't feel so hungry looking at these data. Anyway, the good things about these types of plots is that we are no longer scaled by kind of position in the concentric circles, right? That everything is on the same size basis. I have a friend who once made pie charts where he varied the size of the pie depending on the number of bacteria in the community. And so I think he thought he was being cute. It was kind of cute, but again, perceptually, I think it becomes very confusing. In contrast to the concentric circle version, it's much clearer in this case what each of the three pies refers to, what disease status group they're coming from. I created this plot being vertical so that again, we could try to emphasize along that kind of 12 o'clock to 6 o'clock line a comparison point for the three different pie charts. If I were to lay this out horizontally, it would be much more difficult because I wouldn't have that common reference point kind of across the three pies unless I made that reference point kind of wrong, the 9 o'clock to 3 o'clock axis on a clock face. So the goal for today's Code Club episode is to create these two pie chart visuals. Again, they are not ideal ways to represent the data. You talk to anybody about data visualization and aside from perhaps the idea that the flag of Japan is a pie chart indicating the percent of Japan that's in Japan, pie charts really are quite flawed unless you're really going to a lay audience and you only have say two or three different groups at most that you're trying to represent within that pie chart. I think we can learn a lot about our tooling, learn a lot about R and GG plot by going through the process of trying to make these pie charts look as attractive as possible. Let's go ahead into our studio and we will take up the code that we generated in the last episode where we make those stack bar charts that I was showing you. If you would like to get this code as well, please be sure to check the link down below for a blog post that's associated with today's episode where you can get the same code that I'm starting with. Also along the top here, you will find a link to a video that describes how I got everything installed are our studio, the tidy verse package and the data that we are working with here. Again, we load libraries. We load our data and data frames. We calculate the relative abundance and make the data tidy. Here we're pulling out the phylum level data. So if I wanted to make the same plot, but at the genus level, I could change this level equals phylum here on line 40. But here again, we are getting the average relative abundance for each phylum across the three different disease status groups. We then create a data frame to indicate which phyla should be pooled because they're mean relative abundance across all three disease status groups is less than 3%. We then join all that together and make the plot. And here again, we have that stack bar chart coming back to the code. I'm going to change the name of the output file to be Schubert concentric pi.tiff. I am going to add to my ggplot pipeline. I'm going to add cord polar and cord polar takes things from a Cartesian coordinate space. So basically linear to a polar where we have an angle around a circle as well as a length. So our length is going to be associated with the x-axis and the angle or theta is going to be associated with on the y-axis. So I will go ahead and do theta equals y in quotes. And I'm getting some error messages here that I think is perhaps coming from some of the stuff in my theming. But I'm not totally sure. So what I'm going to do is I'm going to run everything up to my cord polar and see if that works. And I will keep going until I get the error message. So that worked fine. So let me come down another line scale fill manual that works fine scale x discrete that works fine scale y continuous that works fine my labs labels that's fine theme classic that's fine. And then I think I'm down into theme and I think the problem is coming from the theme. So I will go ahead and comment this out and again that works. It's very ugly and so we're going to go about making it look better without all the error messages. So the first thing that I want to do is I'm going to turn those scale x discreet. I'm going to turn this off for now so we can get something that's a little bit more attractive and easier to look at I'm going to label y to be null again y is associated with the theta the degree the angle on the axis. We also have here scale y continuous and so that's also being depicted around the perimeter of the pie chart. I don't need that and so I'm going to do breaks equals null and so we see we get rid of the numbers around the outside and so that looks pretty good. I think what I'm going to start to do now is to play with the theming because I definitely want my phylum names in the legend to be italicized. I don't want this axis so let's come back and look at our theming and I'm going to see if that's the problem. The first theme statement so that's causing problems. So let's go ahead and leave that out for now and see if we can't get the legend stuff to work. So that runs without an error and we also see now that we have our phylum names being italicized and a little bit smaller key size so it doesn't take up so much space. I'm going to go ahead and add some other things to my theme axis dot line equals element blank and so element blank means nothing we see we then get rid of the axes. I'll also do axis tick ticks equals element blank so you get rid of the ticks there so that's looking pretty good. I'd next again want to come back and format those x axis labels. Let's go ahead and uncomment this so that we have our labels in there for our x axis or the the rings and we see that we've got our three groups so I'm going to go ahead and add axis dot text equals element markdown and actually nothing happened and so I wonder if this should be dot x or dot y sometimes the hierarchy of these axis text things don't quite work so again that gives us an error on x if I give it a y so that didn't give an error I'm not totally sure why that worked except that these are positioned on where the y axis typically is and so I think that that works pretty well one thing that I'm noticing is that these are kind of shifted to the left maybe a little bit further away from my plot that I'd like also the font is a little bit big so that the text kind of runs together so I will go ahead and do size equals eight and we see that there's a little bit more separation now between the three different labels which is nice so two things occurred to me now looking at this first of all the labels seem a little bit too far to the left there's a bit of a gap between the labels and the circle the other thing that occurs to me is that when we made the stacked bar chart we went healthy cdiff negative cdiff positive as I approach this visual visually I start from the outside in so I think I want healthy on the outside negative in the middle and positive on the inside so we can change that very simply by coming back up to where we define disease stat as a factor here and we had the levels where we laid out the order of the levels and so I could go ahead and manually change this to make it case diaryl control non diaryl control and then we'll run everything and now we see that we have healthy diaryl cdiff neg diaryl cdiff pause and we're in good shape there so now we want to move everything back over to get it to be a little bit closer to the pie I will go ahead and do a margin as an argument to element markdown and so margin equals margin and I will say then are equals and let me so let's start with zero and I'll do units are unit equals lines that didn't really change anything what if I do say like for minus four units so that brings it a lot closer right well unfortunately it's overlapping with it so let's make it minus one instead of minus four and so then again that brings it closer but not on top of the pie it's also making things right justified let's go ahead and make it left justified so to do that let me put these arguments on separate lines I will then do H just equals zero and so again that makes everything left justified on those access labels if I wanted to be centered I could make it H just equals zero point five let me go ahead and make those bold so that they kind of pop I will do face equals bold and there we go we now have bolded labels for our three different circles the take home from this visual that I really want you to get is that a pie chart really is a stacked bar chart in a different coordinate system it's in a polar rather than Cartesian coordinate system also the benefit as I mentioned earlier of the concentric circle layout is that we can have a common anchor point at kind of the 12 o'clock line on our pie charts the downside however is that this wedge looks a lot bigger than it would be if it were in the middle or on the outside and that size our brain thinks has to do with abundance and it doesn't the abundance here is being depicted by the theta the angle of that wedge the next way of presenting a pie chart that I want to create is to have three pies for each of the three different disease status groups and I'd like to have them arrayed vertically so that I can I can do my best to have that common reference anchor point along that 12 o'clock to 6 o'clock line to create the three different pie charts we're going to use the facet wrap function from ggplot and facet wrap creates a different panel or facet for your data depending on the variable that you tell it to facet on so we will do tilde disease stat and then we'll do n row equals three so we want three rows I could also do and call equals three to put them on three columns I want to make a vertical so we will do three rows with one column we'll add the addition to that then and I'm going to go ahead and change my file to be concentric from concentric to be vertical and so what you see now is that we have our vertical array of our three pie charts and they are a varying degree of doughnut nest so the diarrheal control and non diarrheal control are our doughnuts and then this is like a Tim bit or something like that so the reason it's doing that again is because those concentric circles corresponded to the x aesthetic so we need to fix that x aesthetic to make those pies the same size we can again do that back up here where instead of x equals disease stat we'll do x equals one and so we now see that we have our three equally sized pies and they are again arrayed vertically so that we have that common anchor comparison point at kind of the twelve o'clock six o'clock axis on our plot we'd like to clean this up a little bit right by changing the labels on our facets the things are a little bit out of order because again for making the concentric pie circles we we change the order so I'm going to go ahead and change that back and make sure then that our pie charts go from non diarrheal control down to case so that was back up here where I was defining the factor so back up here I now want this to be non diarrheal control diarrheal control and case I now need to change the labeling of these different facet panels to fix those labels I'm going to come back to facet wrap and add an argument labeler and equal to the function labeler and I then need to assign a named vector back to disease stat and I will say disease stat equals pretty names let's go ahead and put these on separate lines they don't run off the edge of the screen for y'all and so I need to create pretty names pretty names will be taking the information that I have down here in scale x discreet so I don't need scale x discreet I'm going to go ahead and cut that out I'll come back up here ahead of my inner join paste that in and I will then make pretty labels and this is going to be a named vector where again I have names that I can assign to each value of the vector and I'll show you what that means here in a moment so non diarrheal control will equal healthy diarrheal control will equal this string and then case will equal this string and I can clean up the code a little bit and then if I look at pretty labels and say I already do pretty labels and in square braces and quotes I then put non diarrheal control the output should then be healthy right and so that's what this labeler function does is it returns the pretty labeled pretty label for the facet and now if we go ahead and run this and I see that I named it pretty names not pretty labels so let's go ahead and put this as pretty names and so now what we see is that we have our pretty labels across the top unfortunately we also have our x position on our pie chart so I will go ahead and turn that off and so I can do scale x continuous breaks equals null and add that in I can also go ahead and remove this access text at why because we don't need that theming anymore so I got rid of those x axis labels the next thing I'd like to do is get rid of that rectangle around the facet label and go ahead and format that facet label to be element markdown so that is with strip background equals element blank and that will get rid of the background and then strip dot text I will then use element markdown wonderful we now have the formatting of our three different facet labels correct I would actually like to move it to the left side because I feel like the labels are getting in the way and maybe it's a little bit confusing does this label correspond to this pie or the pie above it and I think if we have it to the left it will be more direct clear what each pie refers to also then the pies could perhaps be a little bit closer and a little bit bigger so it's easier to see those wedges and to make the comparisons to do that back up here in facet wrap we can also add strip dot position and I'll say left in quotes and now we've got our label on the left and it's at an angle to get this to work we're actually going to do strip dot text dot y dot left and so we're looking at the text on the on the y axis on the left rather than the strip text itself the strip text being the text above the plot and so we now see that we've got our italics and our nice formatting unfortunately I'm creating my neck to the left to get that to work to get that label turned I'll do angle equals zero that looks really nice again the pies are a little bit bigger and the names are off to the side so it's they kind of get out of the way of allowing you to visually make a comparison across those three pies let's go ahead and make those labels bolded and do face equals bold so for pie charts I think these look pretty good I do kind of prefer this version of the pie chart to the concentric pie chart that problem of the human eye trying to guesstimate so to speak the area rather than the angle I think is a major shortcoming of the concentric pie chart here again we've done our best to line them up vertically so we can overcome that problem of having an anchor point also working for us here is that we have so few phyla or taxonomic groupings that we are comparing across the three different disease statuses there's other things out there that people do to their poor pie charts is kind of tilt them on an angle to make them have like this 3d appearance that also causes all sorts of perceptional problems also then they make them explode or perhaps they have a wedge coming out at you that also kind of triggers all sorts of perceptional challenges as well anyway I do prefer the stacked bar charts to these pie charts I think because it does give you more common basis basis points for comparison also we do better at comparing area than angles as I've already mentioned and pie charts really are leaning on that angle and forcing you to make a comparison of the angle whereas the pie chart or the bar chart forces you to look at the area anyway like I said we'll be seeing other ways in coming episodes that I think are superior to stacked bar charts and pie charts but I think it's important to kind of see how do you build a pie chart because again it forces you to learn a new muscle this you know cord polar as well as different ways that you can take a substandard data visualization and and make it better what can we get the most out of it and I think we've done a pretty good job of getting the most out of these pie charts whether it's the concentric or this vertical stacked pie pie chart so I'd really encourage us not to dismiss a visual out of hand there are situations where it's necessary again perhaps your audience your job your boss is demanding it I know that I am very quick to bash different types of data visualizations and I'm really glad that I that I I took this on and making pie charts and took as a challenge of how can I make this look good how can I make this not look like pile of crap but make the pie charts actually look as best as I can make them and so I would encourage you to do that with you know other types of data visualizations that you just personally don't like how can you make them look the best that they can possibly look again knowing that they'll have their limitations I hope you play with the data and we're able to download the data and work through it with me in parallel if you want to learn about those other ways of visualizing relative abundance data well be sure that you're subscribed to the channel and you've clicked that Bell icon so you know when those videos are released also be sure to check out the materials that I have linked down below there's a link to a set of materials for minimal are and in that tutorial series I go through a variety of ways of analyzing and visualizing microbiome data whether it's in ordination to relative abundance data like this it's also the basis of a three-day workshops that I teach I teach these three-day workshops that you can register over at the reform on this website the applications for the minimal our workshop our microbiology data and the other general are is from non microbiology data sets and I think both of them are really powerful and I've gotten really good feedback from people that they really liked it anyway keep practicing sitting with this material please be sure to tell your friends about what we're doing here in Code Club and we'll see you next time for another episode