 Hey folks, one of the things I love about living in Michigan is that we have four, count them, four seasons. I actually like winter. I like snow. Believe it or not, there's actually people that live in Michigan that hate snow. I mean, why? That'd be like living in Arizona and hating the heat, right? Anyway, what I want to do in today's episode is build off of some of the things we've been talking about in the last few episodes and kind of repackage them to look at a different set of questions. Namely, I want to look at snow accumulation. I want to do it by year and by month. However, I don't want to do it by a calendar year. I want to do it by what I'm going to call a snow year, right? So you can imagine from July through June, that would be a snow year, whereas from January to December, that would be a calendar year, right? So here in Michigan, as extreme as the weather is, we don't actually get snow in July, believe it or not. And so you wouldn't expect to see any snow in these summer months. But you know, coming into perhaps September, October, November, December, you would start to see snow. So what I want to do in this episode is to build two plots. In the first, we are going to look at the total snowfall by snow year over the past 130 years using data from here in southeastern Michigan. The second plot that I want to make with you is to break that down by month, right? And so on the X axis, we'll perhaps have it go from July to June. And then each line in that plot would be a different snow year with the total monthly snowfall. As always, if you want to get the code that I'm starting with here, go down below in the description. There's a link to a blog post that'll get you everything you need to get going. I'm starting with this code local weather dot r script. That's also in that repository you can find. And this will download the local data for a NOAA based weather station just outside of Ann Arbor. It's about 15 miles from where I live. So I consider that local. It goes back to the year 1891. That's great. So that all gets read in and creates a variable called local weather. So that outputs a data frame that has the date, the maximum daily temperature, the amount of precipitation and the amount of snow by day precipitation and snow is in millimeters. All I care about from this is the date and the snow. Also, I'm not interested in those days where we have an NA value. So we'll go ahead and clean this up by doing select on date and snow. And we'll also then do a drop NA. I'll go ahead and put snow in again, drop NA will remove any row where there's an NA value in a specific variable, right? So snow, right? If I didn't put in snow, then drop NA would remove the row where any column had an NA value. And so now what we see is that we've got our date and our snow no longer have any of those NA values or in good shape. I know that I would like to get the year as well as the month. So let's go ahead and pull that out. So we'll go ahead and do a mutate to get year using the year function on date. And we'll also then do month equaling the month function on date. And of course, this gives us our calendar year as well as the month. And I think I'm going to go ahead and call this cal year to get the calendar year. So now what I want to generate is the snow year. So we'll do snow year. And I'm going to use an if else statement to build this, right? So we're going to take the date, the value in the date column. And if that's less than July 1st of that year, then it's going to be Cal snow year is going to be Cal year minus one. If it's not, then it's going to be that Cal year, right? So we need to build out the date for July 1st of that year, right? So we can do that with the YMD function wrapped around the glue function. And then within glue, we'll use those quotes. And the curly braces then, where we'll put Cal year, right? So that for like this year, that would be like 2022 hyphen zero seven hyphen zero one. And so that then will basically make a date variable of July 1st of whatever year corresponds to that date, right? And so if that's true, I think I'm in the right spot here. Yeah, if that's true, then snow year is going to be Cal year minus one. If it's false, it's going to be Cal year. And so now what we get is 1891 for the snow year for this date in November 1st. That works, right? Let's go ahead and do tail. Because again, I'm here in August. And that should be 2022, right? That's the beginning of the 2022 snow year, right? So why don't we go back, let's say 90 days. So we'll do tail n equals 90. That should get us yeah, into May, right? And so again, with my logic May 5th, 2022 was part of the 2021 snow year. And as you see, we didn't get much snow in May. Thank goodness. All right, cool. So we have our snow year in our Cal year. Let's just remind ourselves what this output looks like. We have the date, the snow, the calendar year, the month, the snow year. All I really care about at this point now is the snow year, the month, and the amount of snow. So let's go ahead and simplify things down a bit, because I get very easily confused between all these different year variables and dates and whatnot. So we'll do a select and we'll do month, snow year and snow. So that gives us our three columns. Awesome. I'm going to go ahead and call this snow data to make that a data frame that I'll use to create our two figures. Okay. So again, I'll take snow data. And the first plot that I want to make is a plot for each snow year of the accumulated amount of snow for that year, right? So we can go ahead and do a group by year, right? And we can then do a summarize on snow to make a new variable snow. I maybe can call it total snow as the sum on snow. So column year is not found. Of course, this should be snow year, not year. See, I get confused very easily. All right. Naming things is hard, right? So we've got snow year and total snow. Awesome. We can then pipe this to ggplot as x equals snow year. Got it right that time. y equals total snow. And then we can do geom line. Very good. We've got our total amount of snowfall by year. And I can see that the 1891 is quite low. It started in November. And I suspect there may be snow in like October and maybe September even. So I'm going to remove that 1891, because I don't totally trust it. The other thing I noticed is that we've got 2022, right? Again, I'm here in August. It's like 90 degrees out at the same with a bazillion percent humidity. There's no snow, right? So I can go ahead and remove that 2022. I'll come back up here and I'll do a filter to do a filter snow year, not equal to 1891 and snow year, not equal to 2022. Again, if you're watching this a year from now, you're going to want to adjust this for your own use. And again, if we look at snow data with this whole plot, again, we have seen this with precipitation, not just the snow component. And I will leave this for you to see if you can take these five lines of code here and make this plot look a little bit more attractive. What I would like to move on to is a different type of plot. It is actually kind of similar to something we did earlier with temperature. What I would like to do is to put across the x-axis the months and then have the y-axis be total snowfall by month. And I want to connect that with a line with each year representing a different year. Okay. So again, I'll go ahead and take snow data and bring that down here. And again, we've got this snow data. And what we can do is to go ahead and let's put it in a ggplot, right? We'll do ggplot aes. So again, on the x-axis, I want the month, the y, I want the snow. And then the group, I'll make snow year. And we'll do a geom line. All right. So that does look so wonderful. Again, we see the months are going from one to 12, as we'd expect. I want to change that ordering, right? I want it to start at seven and go to six, right? Seven through 12, and then one through six. To do that, I'm going to make month into a factor that has a different order than chronological or numerical, right? So again, we can do mutate on month, factor month, and then levels. I'm going to make this as a vector. So we'll do seven colon 12, and then one colon six. And again, what this is going to do is this will set, this will create a vector, right? That goes seven through 12, and then one through six. And so now it should plot month on the x-axis in this order, right? I think if it sees it's a factor, it'll plot month as a discrete variable rather than as a continuous variable. And I need to put a pipe at the end of there. So we're getting these spikes, and it occurs to me that I forgot to group by the month and then sum within the month. Why don't you all say something, right? Again, we've got snow data that has, well, in this case, right? So 1892, month of November has four observations, right? And so the spikes that we see over here in the plot are those four observations being represented as a vertical spike. I need to sum those together, right? So again, we can do a group by month, and we can then do a summarize, sum on, we'll say snow equals sum on snow, and pipe that into everything else. And this should solve that problem. So I'm now getting an error that objects no year is not found. And when you get errors like this, and you've got a pipeline, even if it's only like the six lines I've got here for between lines 28 and 33, what I encourage people to do is walk through this line by line, right? So snow data, good, no errors. And then group by month, let's run that. Still, still good, no problems. We see it's grouped by the month. And then we'll run the summarize. And so now what I see is that I've got a tibble with 12 rows and two columns. Basically what it's done is exactly what I told it to do, right? It's summed the total snowfall by month over the 130 years. However, I want each month by year, right? So I need to group by snow year and month. And so the reason then that we got that error is that when we come down here and group by snow year with this as the input to ggplot, right, there is no snow year, right? Because we summarized on month getting rid of that snow year variable. Now when we run this, again, if we just run those first three lines, we now see that we've got the year, the month, and the snow, it's still being grouped by snow year. I want to go ahead and remove that because I find that that grouping can sometimes cause undesired side effects down the road. So we can remove that grouping of snow year by doing dot groups equals drop. And again, if we look at the output of those three lines, we see that our grouping is gone. And if we generate the plot, sure enough, we see a much better looking figure, right? We now see going from month seven out to 12, and then one to six. And then each year being a different line, we can maybe make this look a little bit more attractive by doing color equals snow year. So now we see the more recent years are the lighter shades of blue, and the year the lines that are darker are further back in time. This is a pretty epic February it looks, I think this was back in like the 1920s or so, if I remember right from looking at this earlier, something I also notice in here is that there's a few lines like this here. There's also another here, where it doesn't fall back, right? Like there's no October, September, or earlier data from that snow year. So what I'd like to do is make sure that every year and month is represented in the data frame. And so thinking about this, I think what I would like to do is to create a dummy data frame that has a bunch of zeros, okay? And so what I could do is make a dummy data frame with all combinations of years, all combinations of months, and then a dummy value of zero, right? And so that I can join that with my snow data, and I can then add that dummy value to the snow amount, right? That way, then I can be guaranteed that every month and year has at least zero represented, right? Because we can see that there's some months for some years that don't have any data at all. And so let's see how we can do that. So I'm going to create this dummy data frame. And we can use the crossing function from tidy R, which comes with the tidy verse. So we'll do crossing year equals 1892 to 2021. Again, I don't have the 2022 snow year, because that's just starting here in July, right? And we remove the 1891 data, right? And then we'll make a month from one to 12, right? And so if we look at crossing, what crossing will do is it'll take all values of year in all values of month and cross them together so that we get 12 months for all those years. And so now what we see, right, is we've got 1560 rows, two columns for every year we've got every month, I can then go ahead and add my dummy value. So I can do mutate dummy equals zero. And so now if I look at dummy D F, we get our three columns, wonderful, right? We can then join that together. So we'll then do, well, you know what, I want this to be your snow year, not just year, right? snow year. And let's reload that. And so then we can do inner join. And we will then do snow data with dummy D F. And we'll say buy. And then we're going to join it by two variables, right? So we'll do snow year. And we'll do it by month. And so this then gives us our composite data frame with the month, the snow year, the snow and the dummy. So I did an inner join because I always do an inner join, right? But an inner join is going to remove those cases where we don't have a specific month or a year. So what I actually would rather do would be a right join. With this right join, we're going to have NA values for snow, right, for some months. So let's go ahead and do a filter. And we'll do is not an a on snow. And so now we see that we've got 225 month and year combinations that were zero or NA sorry, for snow, right? So how do we deal with that? Well, we want to make a new snow column that has zero for these cases, right? So to do that, I think what we will do is go ahead and we'll do a mutate. And we'll then do I'm going to do snow dummy. And then we'll do if else snow is NA, right? So is dot NA on snow. So if snow is NA, then we will use dummy or we could just use zero, right? But let's use dummy we made it whatever. If it's not NA, then we will use the value of snow. And so now we can see that snow dummy has the value of zero, whereas snow had NA, right? So I'm going to instead of call this snow dummy, I'm going to write over the snow column to give that snow, right? And so we can then see what this looks like. And now we see that for those 225 rows that had NAs for snow, they are now zeros, right? Awesome. So I'm going to go ahead and remove that filter, because we want to look at all the data. And so we can then feed this into the rest of the pipeline. And let's see if those ends go away. And so sure enough, we now see that that line that ended right here for like November goes down to zero back to the rest of the month, the rest of the year. And then there was something over here also, I think we noticed that was missing some May and June data, I believe, right? Cool. So now we have all the data on even those months that had NA values are now being represented as zeros. That is good. And now what I want to do is go ahead and make this plot look a bit more attractive. What I'd like to do is because these, you know, lines colored by year, isn't really informative. What I'd rather do is turn all of the lines to be gray, except for 2021, right, the most recent snow year that we have, what we'll do is I will start by removing that legend because it's just not helpful. So we'll then do show dot legend equals false. Let's go ahead into our colors, right? And so what we can do would be to scale color manual, and we'll do name equals null. And we need to create a variable that we are going to color by right. So before we did, where is ggplot right here, we colored by the snow year. And I will say is this year. And so I need to make a variable is this year, right? And so we can come in here into our mutate statement and say is this year. And I will then say 2021. So is this year equals 2021 equals equals snow year, right? So if the snow year is 2021, then is this year will be true. Otherwise, it's going to be false. Again, if you're watching this in the few months or next year, you'll kind of need to adjust this, right? Cool. So we'll then map is this year to color that's going to be truths and falses. And so the breaks will be true and false. And then the values are going to be Dodger blue. And then for true if it's this year, otherwise, it's going to be gray, right? And I don't need to worry about labels because I'm not showing the legend. So now we see 2021 is lit up in blue. And we have all of those other years worth of data as those gray lines. So before I start changing the theming, I'm going to go ahead and save this to a PNG file. So we'll do gg save. And we will call this no by snow year dot PNG. And then I'll do width equals six, height equals four. And I need to go ahead and put this in my figures directory. So this is the size of the final outputted figure, I will use this as the thumbnail for this video. So this should look familiar if you clicked on the video. Now what I want to do is go ahead and get rid of the background grid. I want to make that white. And I want to make the axes solid. So we can again do that with theme. And we'll do plot dot background equals element blank, that will get the background to be blanked out. And then panel dot grid equals element blank that will get rid of those grid lines. And then we can do axis dot line equals element line. And that will give us solid x and y axes lines. So I did something funky. Well, we've got our solid lines. The background is gray. And the interior is still a different gray, right? And so I think what I did wrong was I used plot background, rather than panel background. So I made the plot background, that area that was this darker gray, basically transparent, right? Whereas this is the same color it was originally. So let's try this again with panel background equaling element blank. And that looks much better, right? So now we have a white background for the panel and the plot. So let's turn our attention now to the x axis and those months. I think I am going to go ahead and try to center the data to start August 1st, going through December and then January through the end of July. So we'll come back up here. And I think there were a couple places where we did that. So here, we're defining month as a factor, we can go eight to 12. And then one to seven. Let's see if that does it. So that moves everything over a month. And that is better centered. I think I think I prefer that. So now I want to go ahead and put on month labels. And maybe I'll start with September at nine. And so we can do that again by doing scale x discrete. Again, I made the x variable the month a factor. And so it's not a continuous variable anymore, right? And so I can then do scale x discrete breaks equals and we'll do nine, eleven, one, three and five. And then what we'll do will be the labels. And we can then do month dot ABB on these, right? And so month dot ABB is the three letter abbreviations for each month, right? And I can then index into that my different break values. And I put a pipe rather than a plus sign. I'm going to go ahead and remove the x axis expansion. And so again, back up here in scale x discrete, I can do expand equals C zero comma zero. Before I run that, let me just show you what that's going to do. There's a gap here between where the blue line starts and the y axis. And there's a gap between where it ends. And the end of the x axis expand equals C zero zero. We'll remove that extra spacing, right? So we'll go ahead and run that and see. So now we see that the blue line goes right up to the y axis. I like leaving a little bit of a gap on the y axis, I think just because if we didn't, then that would be right down on the floor. So let's see what that looks like, actually, we can go ahead and do that easily enough. We can do scale y continuous. And we can then do expand equals C zero to zero. And so now we see that that's right at the floor. And I can't even see my blue line for those other months, right? So I like having a little bit of space at the bottom there. So I'll go ahead and remove that scale y continuous. Actually, I'm going to leave it there, but I'll get rid of this expand. So we talked about doing this in a previous episode, where we used breaks and labels, much like we did for scale x discrete. Let's go ahead and do that here just to kind of practice doing it again. So again, our values go from zero to 1500 by 500. So we can do breaks equals a seek zero to 1500 by 500. So seek will make a vector of values going from zero to 1500 by 500, right? So these will be our four breaks. And then we can do labels equals seek zero to 150 by 50, right? And so now we see we've got centimeters rather than millimeters. Let's now go ahead and change our labels on those axes. We can do that, of course, with the labs function. And so x, I'm going to get rid of that with null because I think if it's if you see Jan or March, you're going to know that it's January March, those are months, right? And then y will be total monthly snow, snowfall, and we'll do cm to indicate the units. So to finish this off, what I'd like to do is go ahead and put in an informative title. And so what we could do is perhaps put something in like the snow year, the 2021 snow year had a total of however much precipitation, why don't we do that, right? So I'm going to go ahead and take snow data down here. And we'll go ahead and pipe that to a group by year snow year, not year. See, I keep doing that. So used to doing that in all these episodes where we're looking at annual data, summarize total snow equals some on snow. And that gets us all of the years, right? But of course, I want to filter to get year equals 2021. And so again, man, I can't help myself. Can I snow year? Great. So now we have that total snowfall. And I'm going to go ahead and save that. And we'll let's go ahead and do a poll on total snow. Maybe we can also divide it before we pull it by 10, right? So we'll do mutate total snow equals total snow divided by 10. And I noticed I didn't pipe that into the poll. All right, so let's try that again. So now total snow is 144.9 centimeters of snow, right? Great. Okay, so now we can take that and let's come down to our labs. And we can then do title equals the year, the snow year 2021 had a total of and then in those curly braces, we'll put total snow, cm of snow. All right, so but we need to wrap that in the glue function, right? So glue all that together. And so we see the snow year 2021 had a total of 144.9 centimeters of snow. I'd like to move this over to the left. And I'm going to want to go ahead and make a snow year 2021 that blue color. So let's do that by again, in theme, we can do plot dot title position equals plot, that'll get it moved over a bit. And then we're going to want to do plot dot title equals element markdown. And so that will render HTML and markdown in our title that comes to us from the gg text package. So I need to go ahead and make sure I've got that loaded. I'm not sure if I loaded it in that weather dot r script, but we'll go ahead and do it here. So that's bumped it over the left, we need to insert the color now, right? And so in here, we can do a span. And we'll then do style equals color. And then we'll do Dodger blue. Close that. And then we'll close out the span anchor. Wonderful. We now see that we've got an informative title where we're tying the color, the snow year 2021 to the color of that line. And I'm pretty happy with the way that looks. I would love to see what this looks like for your neck of the woods. I'm thinking of my friend Dave Baltris down in Arizona where I don't think they get any snow. I suspect he'll have a flat line, right? Anyway, if you are watching this in a few months when it has, you know, gone beyond say November of 2022, I'd be really excited to see what your plot looks like. Also, I would encourage you to think about how you'd manipulate it so that instead of showing 2021, show the 2022 snow year, right? And again, you can do this with any other year that you might be interested in, you know, perhaps go in and figure out what was this year that had the really high snowfall for February? What did the other months look like, right? Can you turn that line to be Dodger blue and look at the total amount of snowfall for that year? Again, there's all sorts of things you can do to explore this code and learn to use these wonderful tools just so much better. All right. Well, I hope you found this useful. And we'll see you next time for another episode of Code Club.