 Hey folks, dates are one of those things that are surprisingly challenging to work with. If you think about it, some years have an extra day, others don't. There's things like day late savings time here in the US. Some years we have to add a second or fraction of a second to the clock. There's just all sorts of goofiness when it comes to time and dates. Thankfully we have the lubricate package which comes to us as part of the tidyverse. And so what I'd like to do is to use a variety of different functions from the lubricate package to show you some of the breadth of what is possible with dates in R. The specific question that I'm going to work on relates to the climate and weather data that we've been working on in recent episodes. My son has a job or business where he bails hay. He has to cut the hay, it has to sit in dry, and then he has to go and bail it up. And then he has to wait a couple months for the grass to regrow again. And so he needs it to be dry at certain times. At other times he needs it to be wet. And so it's this constant yang and yang of is it going to rain or isn't it going to rain and I'd like it to rain or I don't want it to rain, right? So the question that I have is that this year seems wetter than normal years. But I know that my mind is biased and very easily plays tricks on me. So what I'd like to know is as of today, so it's July 22nd as I'm recording this, has the year been wetter than it has been based on other years that we have data recorded for? And so we're going to use some of the tricks that we've talked about recently with things like group buy and summarize the data that we've pulled down from my local NOAA weather station. But we're also going to see how we can use a variety of date related functions to make an attractive visual so I can go tell my son whether or not it's been an abnormally wet or dry year. Again, I'm going to do all of this in our studio. If you have been following along, then you are familiar with my code forward slash local weather dot r script. This is a r script that will download data from my local weather station, which is just outside of Ann Arbor a few miles from where I live here in Dexter, Michigan. You can go into that script and you can futz with the latitude and longitude to get data closest to where you live. My data goes back to 1891. And if you want to get these scripts and everything related to this series of episodes, down below in the description is a link to a blog post that will help you get caught up to everything that I have. I'm going to go ahead and load this r script. This r script also includes a library call for the tidy verse as well as lubricate as well as the glue package. So again, running this gets me my local weather data frame, where again, in the first column, we have the date, the T max, the amount of precipitation and the snow. I'm interested in the precipitation. So I'll go ahead and do a select function on that. So we'll do select date and PRCP. I also want to know the year, the month, and the day. So I'll add a mutate functions will do mutate year equals the year function on the date. So the year function comes to us from lubricate. I don't think I've shown this in the past. But if I do like year on 2022 hyphen, 0423, I get the year 2022, right? If I did month on the same thing, I would get four for the month of April. And if I did day, I would then get 23 for the day of the month. If I did on the other hand, why day, that would give me the Julian day. So 113 days into the year was April 23. I don't know why I picked April 23. It is 722. And that gives us day 203. I also showed you this in the last episode function that I think is kind of cool is today, which is, as I said, July 22, 2022. Again, there's a lot of these great functions in our to extract the day out of a date, you could even do something like W day, so weekday out of that. And this is the sixth day of the week. It's a Friday. And so I believe it is starting on Sunday as the first day of the week. Alright, so we'll go ahead and do the year, the month, equaling the month on date. And then the day being the day on date. And so now I can see that I've got the year, the month, the day, the date, I also have these NA values for PRCP. So I'll go ahead and drop those with drop NA on PRCP. And I forgot to so that definitely got rid of those NA values. But I also need to pipe that in. I also want to get rid of the 1891 data because it starts in October. So I will then do a filter year not equal to 1891. And we now see our data starts on January 1 of 1892. So my vision is to have a plot where on the x axis, I have the different months, right? And on the y axis, I would then have the cumulative precipitation total, right? And so then I'd have each line represented by a different year. And I will then have a blue line. So those other lines might be gray and have a blue line for the year 2022. And I can kind of see where it is relative to all those other lines. To make that easy, I'm going to go ahead and remove February 29. And just hope that there's not a whole lot that happens on February 29. I think we'll be close enough, it'll be within a few millimeters of rain, regardless. I'll go ahead and do and in my filter. And then I'll put things in parentheses just to keep things simple. Month not equal to two, and day not equal to 29. And this then will remove all of those leap year days. Again, I want the cumulative precipitation total by year. And so when we every hear like by year by month, that should tell you that we're going to group by our data. So I'll do group by year. And again, as we saw in the previous episodes, we now have the 131 year groups 130 plus the year 2022. And then I can do a summarize. And I will then do cum prcp equals cum sum on prcp. So cum sum is the cumulative sum over prcp. And so now what we see is that we have our year and our cum sum. If I had done mutate instead of summarize. So I think I'd rather use the mutate approach, because if you come back to my output from summarize, you get the year, but we're not also getting the month and the day, right? And so I've got a whole bunch of rows for 1892. But I don't have the month or the day. So I'm going to roll with mutate so that I have the date as well as the cum sum. So now we could go ahead and ungroup our data to remove that grouping by year. And now we want to feed it into ggplot. So on the y axis, I'm going to put the cum precipitation, I'm going to group the data by the year. And on the x axis, I'd like to put the date, but I have 131 years worth of dates, right? I'd rather have one year worth of date, right? So maybe what I could do for the year column is turn the year column into 2022 effectively, and then add to that the month and the day and create kind of a pseudo date. So let me show you how I'll do that. We'll go ahead then and do mutate new date. And I'm going to use the glue function to do this. So what I'll do is glue. And in quotes, then I'm going to put 2022 hyphen. And then in curly braces, I'm going to put the data from the month column. And then in curly braces, I'm going to put the data from the day column, this then gives me a new date column, I can then feed this into YMD, which is another function from lubricate that will take dates formatted like this and turn it into a year month day format. And so now we see a slight difference in the output instead of being hyphen one hyphen one, it's hyphen zero one hyphen zero one. And instead of being type glue, it's now of type date. Very cool. Now we can feed this into ggplot. So the ggplot aes. And on the x axis, I'll put new date on the y, I'll put qm prcp. Right. And then we'll do geom line. And again, because it's a line, I need to tell dgplot what to group the data by. And so then we'll do group equals a year, because again, I still want each year to be treated separately, even though we're putting it over kind of an annual x axis. Very good. We now have each year being represented by a different line. There is some kind of oddness in the data. Like I'm noticing there's a line at the very bottom here that looks like it was dry for like August and September and October, like there was no rain, which I kind of doubt. So I wonder if maybe there was a sensor problem that year. But again, I want to try to make an attractive plot, not get so hung up on the actual data. But you know, if I was making this for publication, I might go in and double check that there was actually data being recorded or see if there was some epic drought whatever year that was. Okay. So what I'd like to do again, is to make all 130 years worth of data from 1892 to 2021 gray, and then put my year on top of that in blue. And again, what we can do is I'm going to create a variable that I'll call this year. And this will be today. Again, the today function will return today's date. And I can then do year on that. So that then this year is of course 2022. And I can then do is this year. And that's going to equal the year equal equal this year. And again, the double equals says is year is the data in the year column, the same as the value in this year. And again, if I run this, I should be able to see that yeah, 1891 was not this year. Right. And what I could then do is I can color the lines by is this year. So I'll do color equals is this year, we now see each year colored salmon, and this year's data in that teal. It certainly does seem like it's maybe a little bit wetter than average. So cool. And I'd like to do now is let's go ahead and try to make this figure a bit more attractive. There's a variety of things here, of course, the color, the dates on the x axis, these aren't from 2022 and maybe rather just have the date the month labeled here, right? And I don't need this legend. So let's get to work on making this look a bit more attractive. Alright, so I will start with scale color manual, and we'll do breaks equals false and true. We saw this in the last episode, I'll do values of light gray. And then we'll do Dodger blue. Again, that gets us our lines in the background. And our current year there, I'm going to go ahead and maybe make this year a little bit thicker than the others. And so in addition to color equals this year, I'll do size equals is this year. And I'll go ahead and put this on a separate line so it doesn't scroll off the right side of the screen. This always looks very funky. And you get a warning message saying using size for discrete variable is not advised. So I definitely need to come in here and do scale size manual. And I'll do breaks equals C, F and true. And then values for false, maybe I'll make it 0.3. And then for true, I'll make it one. And I need this to be in a vector. So I forgot that. So we'll go ahead and add those parentheses. And so now I see that I have a thick blue line on a whole bunch of thin gray lines. Very cool. I can get rid of this legend by adding guide equals none to each of my scales, or I can come back up to GM line and do show dot legend equals false that got rid of our legend and opened up a whole bunch of space. I'm going to go ahead and as I usually do save this to a file. So we'll do gg save. And I will then do cumulative a PRCP dot PNG. And then my width, I'll make six inches in my height. I'll make that three inches. And actually want to put this into my figures directory. So we're going to save that. And so that's the figure that we get out. I think I want to make it a little bit taller. So maybe I'll make the height five, because I'm going to add a title to this. So we've got a starting place. Let's go ahead and clean up the x axis. And so gg plot recognizes that the new date column was of type date. And so it then treated that as dates, right? So January, April, July, October, January of 2022 to 2023. Of course, we don't want that to have the year, we just want to have the month. And so there's a variety of ways that we can format this all using the scale x date function. So again, I can add another scale, we'll do scale x date. So normally when we have like scale x continuous, or scale y continuous or scale x discrete or scale y discrete, you can give it the limits, the breaks, the labels, right? Well, you can do the same thing for scale x date, except you can then give it date breaks or date labels. So let's start with date labels. So we'll do date underscore labels. And we can give it a special code. So let's head over to the help. And I'm going to search for scale x date. And so this gives us all sorts of scales related to date and time scale x date, scale x date time, scale x time, and there's also the y right. And so you see that you can give it date breaks, date labels, and that the codes for the date labels are set by strf time. So coming to that help, we can scroll down and we can then see a handy dandy cheat sheet if you will, for different ways to format the date, right? So if you give it percent a, then you're going to get back abbreviated weekday names across the x axis. B would be abbreviated month names, capital B would be the full month name. And I think the m is the month number as a decimal number, right? And so there's all sorts of different ways that you can format your data, right? So what we'll do is again, date labels, if I do percent m, I expect to get numbers across the x axis. And sure enough, I get 01 through 01. And the other that we saw was lower case B. So now I have the three letter abbreviation of the month. And if I do capital B, I get the full name of the month. And I can add things to this, right? So I don't really want to do this here, but I could do percent y to get the month and the year. And if I do a lowercase y, I get the two number abbreviation for the year. So there's a lot you can do to format the labels of these different dates. I want the capital B to get the month. So this is outputting the labels every three months, right? So January, April, July, October, January, maybe I want it every two months. Again, scale x date makes it very straightforward to do that. We can do date. So you can give it special syntax to define the breaks for the spacing between your dates. Again, if we come back to the help, and we go to scale x date, then we see date breaks is a string giving the distance between the breaks, like two weeks, 10 years, right? So let's go in here. And let's do two months. So we'll do two months. And now we get our dates every two months. So February, April, June, August, October, December, and it leads out the beginning and end of January, which I'm cool with because I think it's kind of weird to have a January at the beginning and a January at the end. I kind of like having this with those two months. Again, what I'm trying to show you here with scale x date is a variety of ways that we can manipulate the appearance of the dates on the x axis, you could also do it on the y axis, if you had dates on the y axis, just a lot of really powerful things you can add in there and encourage you to really explore that. The next thing I want to do is go ahead and clean up the theming of the plot. I'm going to turn off the grid lines turn off the background and add x and y axes. Again, we can do that here by adding theme. I'll do panel dot background equals element blank. And I'll then do axis dot line equals element line. And I'll do panel dot grid equals element blank. Maybe I'll go ahead and put all my panel related arguments together. And I forgot parentheses for element blank and all that talking. So it already looks a lot cleaner. One thing I'd like to do is remove this gap at the bottom of the zero, we would never get a negative cumulative precipitation. So I'm going to go ahead and remove that. And if you've been watching, you know that there's an argument called expand. And so I can come in here to scale y continuous. And I can do expand equals and then zero comma zero as a vector. And what that does is it turns off the expansion. So ggplot naturally adds extra space to the bottom and top of the axis. And so now we see that we start down at zero, we could also remove the expansion on the left and right by coming back to scale x date, and then doing expand equals C zero comma zero. I'm not totally a fan of that. So I'm going to go ahead and go back to not having the expand on the x axis, kind of like having that extra space. So what I'll do now is go ahead and remove the labels. I think I'm also going to transform the y scale into centimeters. So one way we could do that is with mutate, where we could take the cube pure CP and divide it by 10 to get it the centimeters, we could also use labels and breaks to do that. I think I'm going to do labels and breaks with scale y continuous. I'll do breaks equals seek from zero to 1200 by 300. That was what we had before on the current version. And then labels equals seek zero to 120 by 30. Right. So effectively, my labels are my breaks divided by 10, I can then add labs, I'll say x equals null, because it's obvious those are months, y is going to be cumulative precipitation. And that's in centimeters, right? So we're going to add that. And I think I'll make that y go up to 120. And we can do that with limits in scale y continuous limits from zero to 120. Do you know what I did? I'm getting warning messages about removing 33,000 rows. And in my plot, I only have a couple months of worth of data. Well, I made it go to 120. But that's the label, the actual data was 1200. So I need to set my limits from zero to 1200. Then it should work much better. Right. So now we go from zero up to 120. And everything looks pretty good. I want to add a title, but I want to be able to say in the title, whether we are above or below what we'd expect at this point in the year. So what I'm going to do is go ahead and generate a line through the data to indicate whether or not that's the average. So to add an at line for the average, I'm going to use GM smooth, you could of course go in and calculate the average for each value of new day, and then add that line, I'm going to do it with GM smooth. I think it'll be close enough. This is looking pretty funky. And I'm realizing that what it's doing is it's fitting a line through each of the year's worth of data. So I want one line across all years. So again, we can come up here to GM smooth, and I'll do AES group equals one. And so that means that I only want one grouping, I'm basically make everything as part of one group for the purposes of running GM smooth. Great. So now we have our smooth line through the data. I think I'm going to go ahead and make that line thinner and black. So it's not so bold or confusing to the audience. So we'll come back up to GM smooth, and I'll then do color equals black. And then we'll do size equals 0.3. One thing that the smooth line has on it, I think are the standard error. There's so much data here that the standard error is really small. Of course, you could come up here to GM smooth, and go ahead and do SE equals false. And you can't really tell a difference. But anyway, that's something I like to do because I don't really like having that standard error cloud around the fitted line. Now what I want to add is a title to my plot. I can do that back here in labs by doing title equals, I'll say through July 22. The cumulative precipitation precipitation near Ann Arbor, Michigan is above average for 2022. Okay, so my title of course goes off the right side of the screen. I would also like to have my title be left justified on the plot. And so I can take care of that with two things. I'll come up here to the top of my theme function and give plot dot title dot position equals plot that will left justify it. Also, I'm going to then do plot dot title. And I will then say element text box simple. And so element text box simple comes to us from the gg text package. And this will automatically wrap my title to the size of the plotting window. It will also allow me to use markdown to format with my plot with color or with bolding or any of the other things using markdown or HTML. But to use that, I need to load gg text. So I'll come back to the top here. And I will then do library gg text. And I'll go ahead and save my r script because gg text shouldn't be loaded yet into our end. So I'll put this into code. And I will then do cumulative prcp plot. And so now it's saying package g text required but it's not installed, install or don't show again. Yes, I want to go ahead and install that. So I'm not sure what happened I had to restart our studio and run everything again. So I'm going to go ahead and install gg text. I can then go ahead and run library, as well as my plot. And so now you see that I've got the wrapping of the title might be good to have a little bit of a margin below the title. But again, that element text box simple wrapped it for me and that plot title or plot title position equals plot left justified the title to the overall plot, not to the y axis line. I can also then in my plot title I can do margin. And again, you give it the margin function and the arguments are trbl. And so I'm going to put a bottom margin, let's do 10. And so yeah, that gives us a fair amount of space below the title. What I'd like to do is instead of hard coding July 22 and 2022, I want to go ahead and programmatically insert that. I'd also like to make above average to be blue. So let's go ahead and do the above average first. And so again, we can do that by inserting HTML. I'm going to do that with a span anchor. So we'll do span style equals. And then in quotes, we'll do color. I guess it needs to be a single quote, because the outside for the title is a double quote. So I'll do single quote, color colon, Dodger blue, single quote, close out that anchor. And then over here, I'll do another closing anchor of forward slash span. And so now I see above average is blue. That's great. So now I'd like to go ahead and replace the date elements. And so again, up here I have this year. Maybe I'll repeat this a couple times for this month. And this day. Right. And so again, the day function and the month function are our friends here. Get that loaded. And then I can use a glue on the title. So I'll do glue on this and get my closing parentheses. And then because I'm at the end here already, I'll go ahead and replace this 2022 in curly braces with this year. And then here, I'll do this month in July this month. And then here I'll do this day. And so I get through 722, the cumulative precipitation area in Arbor, Michigan is above average for 2022. Again, I would like that to be July and then 22. So let's come back up here to where I defined this month. And again, for month today, we get seven. And there's a couple arguments for month. So I can do question mark month to pull that up. And I get the label and the abbreviation. So I want the label, right? So I want instead of seven, I want July. And I don't want the abbreviation, I don't want JUL, I want July, right? So I'll do label equals true. And ABB are equals false. And now let's see what this gives us. It should give us July. Sure enough, it gives us July. It's also a factor. Again, because months are ordinal. And so now I have this month. And now if we come back and run it, we should have July in the title. So now we have July 22. But I'd like to have July 22. And so thankfully, from the scales package, which is built into ggplot, there is a function called ordinal. So I can take again this day today to two and do ordinal on day today. And that's complaining that I can't find ordinal. And what we want is scales colon colon ordinal. Again, I could do library scales, maybe I'll go ahead and do that. Again, we could do library scales. Right. And then here we'll use ordinal. And so then this day should give us 22nd. And it does that's great. Now when we come and rerun all this with the plot, we now see through July 22, the cumulative precipitation neary and arbor of Michigan is above average for 2022. We'll make that title just a little bit bigger. And then we'll call it a day. I'll come up here and add to my plot dot title within the element text box simple. Let's do size equals 16. So that's larger and more striking. I'm pretty happy with the way this looks. Again, I think this all came together really nicely. What I was trying to focus on in this episode, as I mentioned, was how we can work with dates within the tidyverse, using all the goodness that comes to us from the lubricate package, whether it's extracting things like the year, the month, the day, the weekday, the day of the year, working with months on the axes, these tools in the tidyverse makes it just so much easier to work with the dates than trying to do this on your own using base R. Take me, take me hit my word on that one. All right, well, I hope you also can see how we can use ggplot, dplyr, all these great tools to answer interesting questions in novel ways. Let me know what you think of this. Let me know if you have other ideas for ways that we can visualize this information. A few of you have left comments already, and they have seeded ideas for future episodes. So thank you for the feedback and know that in the next couple episodes, I will be kind of digging into some of the ideas that have been inspired by the comments. So thank you very much. And we'll see you next time for another episode of Code Club.