 Hey folks one of the most undervalued components of ggplot2 I really feel is the ability to make faceted plots We've seen these in recent episodes like in the last episode I made a facet for the average annual temperature over the last 130 years as well as the total Precipitation each year over the last 130 years and so they shared a common x-axis One of the things I just think is so cool about these is that it's a way to make a multi-panel figure Without having to make a whole bunch of different plots then figure out how to assemble them together It does take a little bit of kind of creative thinking though to realize that the facet label that typically shows up at the top of each facet You can actually make that into your y-axis label So that's what I want to do today I want to explore with you some of the different ways that we can use facets to make a really attractive figure The practical question that I have for today's episode is I would like to know For any given day of the year, what is the probability based on the past 130 years? What's the probability of a precipitation event? Okay, so what's the probability? The other thing I want to know is on average how much precipitation do we see each day of the year, right? And then finally I want to know if there is a precipitation event on any given day What is the average amount of precipitation, right? So we'll have three different facets We'll have three different y-axes We'll have a probability and we'll have two different measures of length or size And then on the x-axis we'll have The date date within the year And so we've seen a lot of these concepts over recent episodes But I want to bring them all together today to make what I think will be a really attractive figure As always, I'm here in our studio I have an r script called precipitation risk that are created If you want to get what I'm working on today as well as code local weather r Which will download from the noa weather station data from the local Noa weather station that you can look at to kind of make your own plots Then by all means go down below in the description There'll be a link to a blog post where you can get everything you need to get up and rolling There is something that you'll have to change in this local weather r script to enter your own latitude and longitude But other than that, you'll be off and running with data for your local area I'm going to go ahead and load this again. This r script includes the tidyverse package Lubra date as well as the glue function. This will create a data frame called local weather This is a data frame that'll have the date t max prcp and snow for this episode All I'm interested in is the date and the precipitation I'll go ahead and select for those two columns to get rid of everything else So I'll do select date prcp I'll also go ahead and add in here a mutate to get the three different components of the date So I'll do day equals the day function on date Month equals the month function on date And then the year as the year function on date And so now what you'll see is that we've got date prcp day month year I don't want any rows that have na values So I'll go ahead and do a drop na on prcp Again, that gets rid of the data Where we didn't have any observations those na values So the first thing I want to do is to get the probability of precipitation on any day and month of the year And so to do that we'll do group by month And day we're then going to summarize across all of the years So we'll do summarize and I will then say prob prcp Equals and I'm going to do something interesting. I'll do mean prcp greater than zero, okay Let's step back and let me remove that mean for now. So we'll say prob prcp equals Whether or not prcp is greater than zero. So that will turn trues and falses, right? So we'll get trues and falses and so if I create a vector That's got false true False false true. So there's true two trues and three falses Several episodes ago. I told you that true numerically is one and numerically false is zero So if I take x right and I then do mean on x What it's going to do is it's going to add up the numerical value of all those trues and falses And then divide by the total number of trues and falses And so we get point four and so what that tells you is that point four or 40 percent of the values in x were true So I can come back here and do mean on prcp greater than zero And that will then tell me for any month and day combination of the year What is the probability of precipitation? I then get back my month my day and the probability of precipitation So january 1st probability of precipitation is about 43.4 percent Very cool, right? So I am then going to be able to plot this where I have the month and the day across the x axis And the probability of precipitation on the y axis, right? Now I want to get more information I also want to know what is the average amount of precipitation I can expect for any given day of the year So I could again do mean prcp equals the mean on prcp. And so this is going to Get the average precipitation whether or not there was precipitation that day Again, we now can see that on average January 1st, we get about 2.58 millimeters of rain. These values are in millimeters So the final metric I want to calculate is if there is a precipitation event What is the mean amount of precipitation? So I'll do mean event Equals mean on prcp. And so that's a vector and I can then index into that vector The case is where prcp is greater than zero. So if prcp is greater than zero We'll basically then generate that vector and then calculate the mean of those values And so now what we see is If there's precipitation on january 1st, then we get about 6 millimeters of precipitation on average, right? And so again, that's basically cutting out all the zeros to say again, if there is precipitation How much are we going to get wonderful? So this is still being grouped by month We can remove that grouping that groups equals drop And so now that's all gone I need to make an x-axis and we've seen this before we can make kind of a fictitious date By combining a year a month and a day. So we'll go ahead and do a mutate To generate a date and we'll do the glue package And I will then do 2022 because that's this year. It's as good as any other year. I figure and then we'll do month hyphen day Close curly parentheses and I need to put this all into the ymd function to convert it to a year month day And I'm getting a warning message problem while computing date One failed to parse and so that reminds me there's there's basically one month and day that failed to parse Do you have any ideas what that might be? Well, that is going to be the leap day, right? So in here is february 29th And so there was no february 29th of 2022, but I think if we go back to 2020 That will work splendidly because that was a leap year. And so that now will be our fake year So we won't actually see 2020 in the final visual We just need to give it a year so that we can make use of Scale x date as we saw previously Wonderful We now have our information for the x-axis as well as for the y-axis But again, I want to put these into panels Where I want to put the probability in the top panel the mean precipitation in the second and mean event in the third And so what that means is that I need to get these data to be in one column One column for all the values in one column for the column names And again, we've seen this but we can do that with pivot longer So I'll do pivot longer And we'll then do calls equals prob prcp mean prcp and mean event This then puts everything into column format where we've got the date the name the value I don't really need the month and the day anymore So let's go ahead and remove that. I could probably actually do that before the pivot longer. So I'll do select minus month minus day Pipe that in there and again, we've got our date the name of the variable we're measuring and the value So now we can feed this into ggplot and start having fun visualizing the data ggplot aes x equals date y equals value And then we'll do geomline And we want to facet it right so we'll do facet wrap And we'll do that tilde name. So each name All the values for the same name will be in the same facet And then we'll do n call equals one Because I want to have a column of these three plots with a common x axis being the date We now have our three plots right on top of each other. We still have a fair amount of work to do But we're getting there. So the first thing I notice is that they're all on a common y axis scale We can fix that as we've seen by doing scales equals Free y that then puts everything on its own y scale. I would like to have everything start at zero So maybe what I'll do down here is scale y continuous And then we'll do limits Equal c zero to na So the zero means all the plot should start at zero The na says let the data determine what to go up to and so yeah now we have a zero on the x axis I think would also be helpful to add on a geom smooth to get a sense of kind of the The direction of the data if you will The geom smooth and I'll go ahead and do se equals false So the probability is here on the in the bottom and so we basically see that from like january through Mayish juneesh The probability of a rain event is about 40 percent again based on the last 130 years worth of data It then dips for july august In september and then comes back up again into the fall The mean amount of rain so the most amount of rain that you're likely to get is in june so Any given day you're going to get more rain In june than you would any other month I guess not necessarily rain but precipitation because they take the snow And they melt that to get kind of what the the rain equivalent would be But if we look at a rain event So if it's going to rain when is it going to rain the most and that tends to be in august, right? So after july into august in september. Okay Cool. So these are all out of order I would like to have the probability Of prcp being at the top then the mean and then the mean event So we can do that by Making these different names Um, that's the variable that they are right the name here into a factor And so let's do that so we can come back up So I will come after the pivot longer when we've made that name column and we'll do mutate name And I will make that a factor On name and then I'll set my levels To be those levels in the order. I want them. So I'll do prob prcp And then we'll do a mean prcp Then mean event Okay, so those are the levels and we'll pipe that in and so now we see the probability The mean prcp and the mean event. We've got the order as we like it So let's go ahead and work on the x-axis labels So as we saw in the last episode we can do scale x date And we can then do date Breaks And I think we did two months in the last episode and I think that worked pretty well And then we can do date labels And that again, we're going to use a special notation with the percent And capital B will get us the fully written out month name And so again there we see february april june august october december So now I want to go ahead and get rid of the labels we have on the x and y-axis And I want to move the facet labels to be the y-axis labels. Okay, so we'll go ahead and do labs x equals null again I'm turning that off because it's obvious those are months And then y equals null and I'm turning that off because value doesn't mean anything And we're going to use those facet labels as my y-axis labels Again, we've gotten rid of the x and y-axis titles And so now I can come back to facet wrap and do strip dot position Equals left and what that'll do is take those strips and put them on the left Of course, we now see that they're inside of the y-axis. I can come and create a theme function Where we'll then put strip dot placement Equals outside and that will put the label on the outside of the y-axis We now have those titles for the facets on the left side outside of the y-axis text, right? And so we can now see those as y-axis titles. The problem Is that it's not so obvious how we're going to change that to be a proper title That is a bit more descriptive than pro underscore prcp We can do that a couple different ways as always and are there's more than one way to do it My preferred way to do it is to create a special variable that I'll call pretty label And this will be a vector and it's going to be a named vector, right? And so I'm going to grab these three titles or these three levels of the factor And plop them in here And I'm going to then give them special Labels I don't need these quotes and I don't need these quotes around the names But they were there and it's just going to be too much to remove them. So whatever. So I'll do probability of precipitation And down here for mean precipitation, I'll say average amount of precipitation By day And then I'll say average amount of precipitation By event Okay So now we have pretty labels and the nice thing about pretty labels Is that I can then do a pretty labels and then prpob underscore prcp And that will return the pretty label, right? So now we can come back down to our facet wrap So again, my preferred way to do this is to use that named vector and we can use that with the labeler argument, which we then assign the labeler function And then we can say name equals pretty labels And so again pretty labels was the name of our named vector And so what the labeler function is doing is it's taking each name of those three facets And evaluating it into pretty labels to get the pretty label like I have in the lower left corner here Now what we see is that we've replaced Those short names With our pretty names. We have probability of precipitation average amount of precipitation by day average amount of precipitation by event And so we're going to want to put in some line breaks here because some of them are just kind of running outside of the facet So I'll come back up here and I'm also going to add the units for the mean prcp and mean events So we'll do meet mm There as well as here and let's go ahead and put a line break Let's do it right here after average amount of so we'll do backslash n And backslash n very good. So those titles are there for the axes And now we want to clean this up and make it look more polished and more attractive Now we'll come back down into the theme function and I can then do strip dot background equals element Blank so the strip background controls the background of the strip label that then obviously gets rid of The background so i'm going to go ahead and save this to a file. I'll do gg save figures And I'll do prcp prob amount dot png And then our width I'll make three and my height i'll make Seven and it's supposed to be figures not figure. So I think I can make that a little bit wider So I don't have such overlap in the names So let's go up to five for a width and I think that looks pretty good Happy with the way that appears. I think the titles are a good size and they they match pretty well The the labels for the different months. So now I want to go ahead and clean out the background I think I'll also make the expand equals zero on the y axis So we can do that back up here with scale y continuous to expand Equals c zero comma zero. That's good Let's go ahead and clean out the background. So we'll do panel dot background element blank and panel dot grid Equals element blank So now let's put on our y axis lines And so we can then do axis dot line equals element a line So that puts an x axis for our bottom It doesn't provide axes for the other two panels I'd kind of like to have a line there as well So what I might try doing would be to come back up here and do geome h line with y intercept Equals zero and so that gets us a line there And I'm noticing that it appears to be either thin or getting clipped And what what happens is that if a line goes outside of the plotting window It gets clipped and so we can see if it's if it's getting clipped easily enough by coming back up and doing chord cartesian and then we can do clip equals off And that will turn off the clipping so anything that's outside of the plotting window will show up Sure enough. It was getting clipped. We now have our x axis lines for the three different panels So that kind of works. I feel like it's a lot of ink in there and it's not carrying a lot of weight What I'd like to think is that it perhaps splits up the three different plots pretty well But at the end, I don't know that it really does I think what I really want to be able to do is provide some kind of x axis contrast or positioning or context For my audience. So I think what I'll do instead is maybe draw in a red vertical line for today Okay, so let's go ahead and remove That geome h line And instead I'll put in geome v line, right and then I'll do x intercept equals today And so those lines went away from zero, but my vertical line is missing. Why well because again We made a fake date from 2020 right and it is 2022 as I'm speaking to you right So I need to fix that right and so I'm going to create a variable for today So I'll come back up to the top here. I kind of like to put all these stylings Or variables for styling things up at the top up on my libraries. So I'll do um today month equals month on today That result of that function that equals sign will work, but it's more arish to use the arrow I'll use today day as day on today Right And get make sure I got that loaded and so then let's do today date And again, we're going to want to use Um the ymd function And we're going to do glue And we're then going to give 2020 because that was a leap year and that will be compatible with all the other data We have right and then we'll put in month hyphen day in curly braces So now uh month is a function Uh, this should be today month not month and today day there You should have yelled yelled at me and told me right You need to help me out come on folks. So today date we can use that now to plop into our gmv line And it'll pop that there good. So now we have that vertical line. It's black. It doesn't really stand out So let's go ahead and make it thicker and red Um, so let's come back up and I'm going to also put it behind the line So I'll put that before gm line and I'll do color equals red And I'll do size equals two. Okay. That's maybe a little too thick. Let's come back up and do size equals one I think that'll be a little bit more toned. So I'm pretty happy with the way this turned out I could probably go ahead and put in a title or something to indicate what the red bar indicates that that's today Um, but you know on the whole I'm pretty happy with the appearance of this More than just the story is what I really want to drive home to you Is that I didn't have to make three different plots Move them over to powerpoint or some other tool and clean them up and then figure out how to assemble them together to make a composite figure No, I did this all with one command facet wrap, right? And then I was able to move the facet labels off to the y axis on the left side and outside of the axis To make for what I think again is a really attractive plot and here I'm showing this with a common x axis, but that ultimately isn't necessary either That you could also do scales equals free x to free up the x axis so that only those values That are represented in each facet along the x axis are shown. Maybe we'll show that in a future episode But again, this is the idea but doing it on the y axis again I think this is a really unheralded underappreciated part of gg plot 2 Is the ability to make these faceted plots whether it's with facet wrap here or facet grid If you have two variables that you're trying to facet some set of observations on Give this a shot obviously with this data from wherever you live But also more importantly try to do build these types of plots Using the data that you have for whatever projects you're working on for your your job, right? I think what you'll ultimately find is that if you start looking for where you can use faceted plots You will find more opportunities to use them and you'll really grow to appreciate the power of this technique Well, let me know what you think and what your experience is and I'll see you next time for another episode of code club