 Hey folks, in recent episodes of Code Club here, I think I've made a pretty obvious point that I love living in Michigan. I like the snow, and I also like the water. Michigan is well known for water. We have a number of great lakes surrounding us, Lake Michigan, Lake Huron, Lake Superior, Lake Erie. We have many rivers and smaller sized lakes. Of course we have water disasters like what happened in Flint. At the same time, I am struck by news stories coming out about Lake Mead in Nevada. Lake Mead is a man-made lake, human-made lake, that was created as a reservoir to basically give water to the whole Southwest. It is at historically low levels. I think I saw recently that it's at a level that it hasn't been at since Lake Mead was created. They're even finding dead bodies. I mean, what's up with that? Anyway, this got me thinking about the whole concept of drought. Using a little bit of research, you'll quickly realize that there's many ways to define drought. One way that we think about drought is that it's a lack of precipitation compared to what you would normally expect for that place and that time of year. You could say, well, the Sahara is under a drought. If the Sahara has a desert and it doesn't get precipitation, then that's kind of what you would expect. It's not really a drought. There's also other types of drought. You might think of hydrological drought, where there's basically no water in the ground. Agricultural drought, where there's not enough water to support the plants that they're. There's ecological. There's economic drought, right? Well, the type of drought that I'm going to look at in today's episode of Code Club is called meteorological drought. It took me a few times to figure out how to say meteorological. I still can't do it. Anyway, meteorological drought is that idea that for a given time and place, you'll have an expected amount of precipitation. This brings up, I think, a really interesting concept of how do you measure and quantify metrics over a sliding window of time, right? And so we've done a pretty good job of looking by month or by year. But say I want to look at a 30-day interval or a 14-day interval or a 10-day interval or a 100-day interval, right? How would I create a, say, 100-day window and then slide that over the past 130 years' worth of weather data that I have for my local NOAA weather station to look at the total amount of precipitation over those 100-day windows? If you want a copy of the project that I am working with, down below in the description is a link to a blog post that'll get you all the information you need for getting all this code off of GitHub so that you can follow along. If you want to customize it for your neck of the woods, you can go into CodeLocalWeather.r, change the latitude and longitude for where you're living to get the data that's local to where you are. So again, this creates a variable called local weather, which we have seen in a number of episodes now. So this creates our data frame that has the date, the maximum temperature, the amount of precipitation, and the snow. I'm interested in the date and the precipitation. So let's go ahead and select on those columns. So we'll do date and PRCP. This then gets us our date and precipitation. There are NA values in here that I'm going to go ahead and turn into zeros. So to do that, I'll go ahead and then do mutate PRCP equals if else is.NA on PRCP. So if PRCP is an NA value, so if that's true, then I'm going to put in a zero. Otherwise, I'm going to leave the current PRCP value. Again, I'm assuming that that NA really means zero. I might be wrong in that. But I think for our application, that should work pretty well. All right, we have date, we have precipitation. What I would like to do is be able to take any size window and calculate the total precipitation over that window. I don't want to just look at, say, October and then November and then December. No, I want to look at a 30-day window starting October 3rd to about November, I don't know, whether it be like first or second, right? And to get then the total precipitation over that. And then to go to October 4th and keep proceeding, right? So one approach that we could think about would be to add another mutate line. And so we could say mutate and I'll call this one day. And so this column will be a one-day lag of precipitation. And so within the tidyverse, within dplyr, there are a lag and a lead function. So I could do lag on PRCP. This then gives me a one-day lag. So basically, it starts the next day and it puts in an A value. Alternatively, so I'll call this one-day lag. I could also then do a two-day lag, lag PRCP and equals two. And then this, as you'll see, gives me a two-day lag, right? But what I'm more interested in is looking back in time, right? So you never know that you're in a drought until you compare to what's in the past because you can't predict the future. So instead of lag, what we might do would be to do lead, right? And so I'm going to go ahead and copy those lines. And instead of lag, we'll look at lead, lead, lead, and then the function again is lead, all right? And so now we have the lead function. And so we can basically see that one-day lead moves things up a day. And then two-day lead is moving things up two days, right? So again, I could do this for any number of days. And then I could take the columns, say for the lead, and I could calculate the total over those columns. That, though, is going to get really annoying and it's going to be really hard to flexibly change the time of the window that I want to look at. So this is not what I'm going to do, but I wanted to introduce you to the concept of lag and lead and how we might go ahead and use these functions to create a sliding window, perhaps for a smaller number of days. So what we'll use instead is a special package called slider. Slider is a lot like the per package. So per allows you to go one day or one row at a time through your data frame. Slider allows you to create a number of rows that is slid across all of the rows of your data frame. So to load that, we'll go ahead over to packages and I will do install slider and then I'll go ahead and do library on slider and get that loaded. And I'm going to go ahead and save this R script into my code directory and I'll call it drought.r. As you might recall from several episodes back, I'm using a special tool called RN, which keeps track of all the packages and versions that I'm using. You can see that if you come way back to the top where I started R that it says that project desktop climate vis loaded RN 0.15.5. So because I added a new package, then what I need to do is go ahead and do a snapshot using RN. So I'll do RN colon colon snapshot. And this then shows that the following packages will be updated in the lock file. Do you want to proceed? Yes. And so then those new packages slider and I think Slider uses warp are now written into my project. So that if you get ahold of this, then when you load the package, it'll automatically prompt you to install this version of Slider and warp as a dependency. Very good. So let me show you a few of the tricks of using Slider. It has a core function called slide. Again, it's a lot like the tools from per and those functions are called map. So basically you can think of instead of map, I'm going to use slide. So let's create a vector that I'll call x. And we'll say that that's the values from 1 to 10. Again, 1 through 10. And I can then do slide x and I can then do dot before equals 1. And I forgot a special argument, which is what do I want to do with each value of x? Well, what I'm going to do is tilde dot x. So dot x is the value in this first slot of the function called. So if you were to look at the arguments for slide, this first position is dot x. So the second is dot f for the formula or the function you want to run. And so we're going to basically output the value of x. But we're going to slide it using the value before the current value. So we'll see that this outputs a list much like the map function from per does. In this first slot, we get just the value one, very much like what we saw with lead, right? So basically with lead, we got 1 na. This is only outputting the one without the na. But then for the second slot, we get 2 and 1, 3 and 2, right? And so if I were to then change before to say 2, I now get the current value, right? So like 5 plus 4 and 3. And if I come back to slots 2 and 1, we see that we don't have everything there, right? Alternatively, we could kind of recreate the lag approach by doing after, right? So if I do after equals 2, now what I get is I get spot 1 has 1, 2, and 3, 2, 2, 3, and 4, and so forth. But then when we get down to like spot 9, we only get 9 and 10. And 10, we only get 10, right? So say I only wanted those values, those rows, to be outputted where I had, in this case, at least all three values represented. What I could do would be to do comma.complete equals true. And so this will output only the complete sets of values. And so now what we see is that spots 9 and 10 are outputting null, whereas all these others are outputting three values, okay? Here I'm using a formula to output the value of x from that row, right? Well, what if I want to output a function output, right? So we're going to want to look at the sum of those values. So let me go ahead and come back to this one where we used to be 4 equals 2. And again, I'll go ahead and add the dot complete equals true. And instead of tilde dot x, what I'll do will be the sum on period x. Now I get the sum of the three values that day, so 10, and 9, and 8. So the sum of 10, 9, and 8 is 27, 24, 21. So again, this is outputting the values as a list. Alternatively, we could output it as a vector, right? And so I could do slide underscore dbl. Again, it's analogous to map dbl, where we'd have x tilde sum dot x. And then dot before equals 2. So this outputs the vectored version rather than the list version which we saw before. Now we can start thinking about how we might incorporate this into a typical pipeline where we start with a data frame or a tibble, right? So we could start with tibble x equals 1 to 10, right? Again, we get a one column data frame, one column tibble. And then we could pipe that to a mutate. Again, much like we would using a map function. So I'll also call that new column total. And the output of total will basically be this, right? And so we'll go ahead and throw that in there. And so we're gonna do the slide dbl on the x column. And we're then going to go through this. And hopefully for each row of x, we should then get out the sum. But spots one and two should have NA values for the total column. Sure enough, we now have NA's in that total column. So let's come back up to our pipeline. I'm gonna go ahead and grab these first three lines. And so now, again, just to refresh ourselves, this output should have the date and the precipitation for that date. One thing I wanna confirm, though, is that my data is in the proper order. So I can do a range on date. Now I know all the rows of the data frame are in the correct order. And so we're good to go from here. Now what I can do is I can pipe this into my mutate, right? And so I can then do a window prcp. And we'll then do slide dbl. First the x spot will be prcp. And then we'll do tilde sum on dot x. And then we'll do dot before. And let's do a 30 day window. So it's got the current day plus the previous 29. And I only want complete. So we'll do dot complete equals true. And of course we see that the first 10, probably the first 30 values are Na's. We can see this if we go ahead and print n equals, let's do 50. And so again, if we make this window bigger, we see, yeah, that the first sum we get is at 47.9 and all these other values initially are Na's. So I'd like to remove those, right? So what we could do, of course, it would be to then do, yeah. So we'll pipe that in. We'll do drop Na, window prcp. And let's go ahead and pipe that into our print. So again, we see that we start way back up here with 47.9 as the total over those first 30 days. And that it kind of slowly accumulates over time. I'll go ahead and remove this print statement. It's to only output the first 10 rows of the data frame. One subtle thing to notice is that if you're feeding an argument into the function, we need to use this tilde with the sum. I don't need the tilde if I am only doing a function without any explicit arguments going into the function, right? This gives me the same output that I had up above. I'm gonna go ahead and stick with the tilde notation because who knows, I might decide to add other arguments to a function. I mean, the sum function really only takes one value, one argument, so that's fine. But let's go ahead and put this over a couple lines so it doesn't scroll off the side of the screen. And again, we're looking at a 30 day window. We're then dropping the values that are NA's. So those are like the first 29 days or whatever the value of before is. I would like to change date to be the end date. And I wanna get the start date. So let's go ahead and make our start date. We'll do mutate start and we'll then do date minus 29. And then this gives us the start date. And so now I'm going to rename date to be end using the select function. So we'll do select and we'll do start end equals date. So you can use the select function to rename a column. So we can do end equals date. And then I'll do window PRCP. Great, so now we have our start, our end, and the total precipitation in that window. So now we recall that a meteorological drought is having less rain than you would expect for that given location in that time of year. So here in Michigan, as we saw in a previous episode, as we get into July and August, it gets drier than it is say in April and May. April showers bring May flowers. We expect it to be wet in the spring and drier later in the summer. And so what we might wanna know is like, looking at this window that I have here was the month of October in 1891 drier or wetter than we might expect to find over all of the other 130 years that we've been looking at, right? So what I need to do is go ahead and I'm going to extract the month and the day from the end date so I can then aggregate by the month and the day. I'll go ahead and then do a mutate to get month. I'll do end month on, so I'll do the month function on end and then we'll also do end day as the day function on end. And again, that gives me the month and the day. I can then group by, I will group by end month and day. Again, nothing changes except now I have this grouping variable. Again, 366 days because February 29th is in there, right? And then I'm gonna do a summarize. And so I'm gonna define drowdy as being below the 10th percentile for that day of the year. So I'll say threshold as my column and then I'll do quantile. And the quantile then is a function that returns a desired probability level or quanti value for a variable. I'll give that then window prcp and then prob equals 0.1. So that's the 10th percentile. Now what we get back is a data frame with the end month, the end day and the threshold. So what we're saying then is on January 1st, if you go back 30 days, then 90% of the time you would have 23.5 millimeters or more of precipitation in that time, right? So if you have below that, then you're below the threshold. So instead of using summarize, I'll use mutate. And again, the output changes and that we still come back to the full data frame that we had before, but I also have the threshold. And so something I might wanna do is go ahead and filter those windows, that window prcp that's below the threshold. So we'll do window prcp below threshold. And so now I have those days, those windows where the precipitation in the window was less than the threshold, right? And again, that threshold being less than 10% of what you would expect over the 130 years worth of data, and so by this definition, there were 4,739 windows that that criteria was met. So because having these groups in here makes me a little bit nervous because we're continuing to do things within groups, I can go ahead and ungroup the data by using ungroup and then pipe that into the filter. Again, that's removed for peace of mind because sometimes weird things happen that I don't expect when the data is still being grouped. So something that I might wanna do is let's go ahead and get the end year out of this. End year equals the year on end. So let's see if there are any droughts here in 2022. We can do filter end year equals, equals 2022. And we see that back in January and February, there was a brief period of drought. Another year that I recall being really droughty was 2012. And so here we can see in 2012, from basically May forward, actually this is only the 35 rows. So let's go ahead and open it up a bit and we can then do print end equals inf to get all of the data outputted, right? So we can see that we got to a point in the summer of 2012, really, between the beginning of May and August where we were in this droughty type condition. You know, we could look at something like right about here, August 4th, where we were down about 15 millimeters of precipitation over the preceding month. One of the nice things about having a pipeline like this is that I can change the window that I wanna look at or I can change the threshold that I'm interested in, right? So I could perhaps look at 100 day window by putting in 99 before. Let's see what that gives us. And so that then shows that, yeah, this was in 2012 a 100 day drought from basically May 23rd, all the way through August 26th, where over that 100 day window, you know, perhaps there's a period right about here that is perhaps the most extreme where we were about 60 millimeters below what we would expect, right? And so 60 millimeters, that'd be six centimeters, maybe about like two and a half inches, not as bad as they're seeing out west for sure, but certainly extreme for us. And we could perhaps kind of focus things in on really extreme drought events by dropping that to say a 5% threshold, right? So basically we wanna look at the conditions, the droughts, those windows where 95% of the years had more precipitation over that period than what we saw. So again, we see that 2012 drought that we had here in Michigan was a long and extreme drought for Michigan, right? Again, totally different conditions than you see in California. But you know, this drought is defined by a place and a time, cool. So I wanna see if there's anything more recent than 2012 that had that level of droughtiness over a hundred days. And we see, no, really that was the last time that we had that level of a lack of precipitation over a hundred day window. So the final thing I'd like to do is go ahead and plot the data. And so I'm gonna define this variable as drought data and I'll go ahead and remove that tail, right? So now we have drought data with all of the data. We have the precipitation and the threshold. And as we've seen in the past, I want to make on the x-axis the day of the year and on the y-axis the amount of precipitation over that window, right? And here we've got a hundred day window. So I'm gonna go ahead and create a new year, right? A fake year if you will. And so we'll take drought data and we'll do fake date and we'll do again, glue on 2020 because 2020 was a leap year and then we'll do hyphen, tilde, curly braces and we'll do end month, close and then we'll do the day. So we'll then do end day. And so I need to wrap this inside of the yMD function that will turn all this into a date, right? And so I've got that fake date column there. It's kind of off the right side. And maybe what I'll do is go ahead and select to get rid of start and end, right? And so now I can see that I've got window precipitation, the threshold, the fake data, okay? The fake date. And I'll go ahead and pipe this into ggplot now. So do ggplot AES. On the x-axis I want fake date. On the y, I'm gonna put window PRCP and then I'm going to group by the end year because I wanna have a separate line for each year worth of data. And then we'll do geomline. And so this then shows the data but I'm noticing that we've got these weird stretches of straight lines and I recall that back in my code here I left in this filter where we were defining droughts, right? So the plot was only showing those droughts. So I'll go ahead and remove that filter and rerun all this. And so now we get a big mess of data. It's interesting that my eye kind of sees a bit of kind of a sinusoidal curve, right? So it's kind of dry in the winter and then it gets wetter and then it gets dry. I don't know if I'm just seeing that or what. But maybe what we'll go ahead and do is add to this lines indicating the drought, right? The bottom line for the drought. And so to do that, I'll go ahead and add another geomline and for why, I will then do threshold. And I'll then do, I need to put that in AES, right? AES because I'm mapping threshold to Y. And then I'll do color equals red, all right? So then we have our red line at the bottom. And I think we're seeing some weirdness again where we've got extra lines in there. And I think that's because we're really plotting the same data many, many times because you'll recall that drought data has that column of threshold, right? So every time January 8th shows up, we're gonna get that threshold, right? So I'm gonna create another data frame based on drought data. So I'll do select and we want end month and day and threshold. And again, this has a lot of repeated data in it. It's got 47,000 rows but it should really only have 366 rows, right? So what we can do is do a distinct on the data frame. And so now we get 366 rows for one row for each day and I'll call this drought line, right? And get that loaded. So I also need to add the date. So I'm gonna go ahead and add that, mutate line that I had down here, back up here to create a fake date. So now if we look at drought line, we see we've got the fake date as well as the threshold. That's good. Now what we can do is down here in our second genome line, I can then do data equals drought line. And that says basically plot the data in drought line rather than the data in drought data for making this specific line, right? So if we go ahead and run that, we actually get an error message saying that end year not found. It's basically not found in drought line because this AES function is changing the mapping to the YA variable, the Y axis from a window PRCP to threshold. But I don't, it's still expecting there to be a group aesthetic, right? But I don't have end year up here in drought line, right? So the way I'm gonna get around this is by saying inherit dot AES equals false, right? And so what that means is don't use any of these AES mappings down here. So I need to add in X for my fake date, right? So we'll do fake date, Y equals threshold. Now we get our red line. It looks a lot like what we had before when we were plotting that threshold column. But again, it's cleaned up now because we're only looking at one version of that, right? So basically when we plot it a bunch of times, then we're getting all sorts of basically funkiness going for the lack of a better scientific term. So what I'd like to do is I wanna turn those black lines to be gray and I wanna make my 2012 line be blue, right? So make that line really stand out. So to do that, I'm gonna come back up here and I'm going to then do is drought year. But I'm gonna put that into mutate, is drought year equals year, I guess it should be end year, right? End year equals equals 2012. And then we can then do color. We can map is drought year to color. And so again, we see that we've got the red for the threshold, the salmon for all of the other years and the teal line for the drought year 2012. And we've got some really hideous colors, right? So let's go ahead and clean that up with scale, color, manual, breaks of true and false. And we'll then do values equals. And then so if it is that year, we'll do Dodger blue and otherwise we'll do gray. And so now we've got all those other lines. We have our blue line for 2012. One of the things that you'll notice is that there seems to be gray lines going over top of our line. And that's because the lines are put down in order of the year. And so what I'd like to do is basically reorder the lines to put 2012 on top. So I'm gonna come back up into my mutate statement here and I'm gonna make end year a factor in the order of is drought year. So basically I want is drought year. When that's true, I want that to be the last value. So again, true and false, true has a value of one, false of zero. So all the zeros will go first because that's numerical. And then the true. So we can take end year and then do FCT reorder. This is a function that comes to us from the four cats package. And it takes a character or a factor as input. So I'm gonna do factor on end year. And then the second variable we give FCT reorder is the variable that we want to reorder on. So we'll do is drought year. And so now we see that we have our blue line on top of all of the other lines. And we can see that period that it dipped down below the threshold, right? So let's clean this up a bit. I'm gonna go ahead and get rid of the legend. And so we'll go ahead and do show.legend equals false. Again, that gives us more lateral real estate. I'm gonna add some labels. So we'll do labs X equals null because it's gonna be obvious that those are dates. Y is going to be total precipitation over previous 30 days. And that needs to be in quotes. And then we also wanna indicate that that's in millimeters, right? And we've also seen in previous episodes that we can change the scaling on the X axis when it's dates by using scale X date. And here we'll do date breaks equals and then two months. And then we'll do date labels. And in quotes, we'll do the percent capital B. And that capital B is the notation to indicate we want the full month written out. And we'll close that line with a plus sign. And so now we see that we've got February, April, June, August, October, December looking good. And that we've got this window of 100 days or basically windows ending on those days of 100 days where it was very droughty. Looking at this makes me wonder, wow, what's this one down here at the bottom? I'll let you figure out how you can adapt the code to highlight that line to indicate what that was. Cause that seems like the driest year on record really for quite a bit of time. If my eyes are looking at this quite correctly, right? So what I'd like to do is let's go ahead and clean out the back. We can then do theme panel dot background, element blank and we'll also do panel dot grid equals element blank. And then let's go ahead and add axes. So we'll do axis dot line equals element line. Maybe a final thing I'll do is add a title. So we'll do plot dot title equals element markdown because I'm going to use some HTML coding to make that title. And also do plot dot title position as plot so that it's left justified on the plot rather than on that Y axis. And I can come back up here to the title here and we'll then do title equals the summer of 2022 had less precipitation than 95% of previous years dating back to 1892. I can highlight 22 with a span and we'll then do style equals and then single quotes I'll do color equals Dodger blue. And I think that actually needs to be a colon rather than equals, right? And then close out this span anchor here, right? And I doesn't have element markdown because I also need to load library ggtext. And so now I see 2022 is in blue to indicate that blue line. And then I can also add less precipitation than 95% of the previous years by doing another span and we'll do style equals single quotes color colon red. And let's see, let's go ahead back here and we'll put this in a backspan to close out that anchor. And we can see the beginning of that 95. So maybe what I want said of element markdown would be element text box simple. And so we now see that it wraps it also doesn't give us much of a margin on the bottom I find. So before I start mucking with the margin I'm gonna go ahead and save this. I'll do gg save and then we'll do figures drought.png and my width I'll make a six height equals four. Which is good. So let's go ahead and add a margin to the bottom of that and we'll call that a day. So then do margin equals margin B let's do 10. And there we go. That looks pretty good. Again, I'd encourage you to play around with this. There's a lot of different concepts we covered here but what I really wanna focus on is make that sliding window using the slide functions from the slider package. Again, the syntax is very similar to what you might find using the map functions from the per package. I'd encourage you to play around with this relatively simple but I think really powerful set of tools that come to us from the slider package. And again, I'd encourage you to play around with this plot to see if you can't change the size of the window, change the probability, change the year that you're interested in. And of course, do this for your own time period, right? I would love it if somebody in a really drought stricken area of California or anywhere else would tweet at me your version of this plot for where you live. I think that would just be really interesting to see. Something that's in the back of my head is I wonder if in some of these areas where we've been having drought after drought after drought, if you know what, that's just the way it is now, right? It's no longer really droughty. That's just the normal weather patterns for that area. Play around with this. Let me know what you come up with and we'll see you next time for another episode of Code Club.