 Hey folks, in today's episode of Code Club, I'm finally going to build out a visual with you to display the level of droughtiness around the world looking at the amount of precipitation for the past 30 days relative to that same window over the life of those weather stations in those different regions. Looking here in Visual Studio Code, VS Code, you can call it, I read in my precipitation data. Again, this is the amount of precipitation for the past 30 days. The station data then tells us what stations belong to each region. I'm thinking of each region as a whole number latitude and longitude. So if your latitude was like 30.45, it's going to be 30. Okay. And we pulled these stations together again, because our eyes can't perceive that small scale difference. So ultimately, what we do is we average the amount of precipitation over that 30 day window for all the weather stations within the same region. And you can see here, that's what's happening, right? We join our precipitation and station data together. We filter to get rid of the first year and the last year, because in some of these weather stations, they've perhaps started mid year ended mid year. But we want to make sure that we do get the data that we have the year 2022 for. Like I said, we group by our latitude and longitude. And then we summarize to get the mean of the precipitation to get the mean amount of precipitation for each whole number latitude and longitude. In the last episode, we refactored this code, of course, to go ahead and group by latitude and longitude and then to calculate the Z score for each combination of latitude and longitude. The Z score is an observation minus the mean and that difference divided by the standard deviation that a Z score is the number of standard deviations away from the mean that your data represents. So I want to know for 2022, you know, the last 30 days of precipitation, how does that compare where I am to the last 130 years worth of data, right? So for that 130 years worth of data, say, I can calculate a mean and a standard deviation. And then I can see where does my observation fall in that distribution. Again, that's the Z score that we're generating here. And then I go ahead and filter the data to only look at those regions where we have at least 50 years worth of data. And where I have the year 2022, because I want to plot the 2022 data. So we're ready to do all that plotting. I'm going to go ahead and fire up a terminal here in VS code. And I'll activate my environment. So I'll do conda activate drought and start our and I'm going to go ahead and load all this and we'll see what we get. So we see that there's about 3300 different latitude and longitude combinations. And so again, we have latitude, longitude and RZ score. And so now what we want to do is convert this data frame into a visual. And so what we can do is I'm going to add this to the overall pipeline. I will pipe this into GG plot. So the data coming through the pipeline will be what's getting plotted. I'll then use the AES function. It feels like it's been a long time since I made a plot here on code club. So the aesthetic, so we're going to use our X, Y, and then fill. And the X again is going to be the position you can think around the equator, so to speak. And so that is going to be the longitude, I think we'll see pretty quickly if we're wrong. I don't do a lot of GIS stuff. So it's very possible that I can get these things flipped. And then latitude is the elevation or the position relative to the equator north south. I think I've got this right, I shouldn't second guess myself so much. And then the fill is going to be our Z score. So Z underscore sort score. And then we could do a geome raster, which is a function that we've used in the past to plot temperature anomalies. So we get this beautiful map. We can see the, you know, the basics of the United States, Alaska, Hawaii out here, Australia comes in pretty well. And Europe and Asia and Africa and South America are there, but they're not as densely seeded with data. So that's just the way it is. Again, the data I'm getting is coming from NOAA, which is a US based agency. I suspect there's perhaps other databases out there that we could get data from that has more data for Europe, Asia, Africa, and South America. I am getting a warning message that raster pixels are placed at horizontal intervals and will be shifted. And so, and I also get for the vertical. So it advises that we should consider using geom tile instead. And again, that's because perhaps we don't have a full grid's worth of data. So I'm going to simply change geom raster to geom tile. And you might be familiar with geom tile, if you've ever made a heat map. So again, a heat map is basically what we're making, right? With a heat map, though, like when I typically think of its use with like genomics, perhaps on the x axis, we'd have different treatments. And on the y axis, we have different genes. And then the cells are rectangles that are then going to be colored relative to like their gene expression or something like that. So maybe you've never thought of a heat map as a plot with an x and y axis, but it certainly is. So here we are with our geom tile version. It looks a lot like what we saw for geom raster. I feel like some of these continents that we don't have as much data for, perhaps looks a little bit more sparse than it did before, but we'll run with geom tile. One thing I want to make sure that we do is to ensure that the spacing on the x axis for the longitude is the same as the spacing on the y axis for the latitude. One of the things I see is that the spacing for the latitude here on the y axis, that one of these spaces between major grid lines is about 50 degrees, whereas on the longitude, it's 100 degrees. So we need these spacings to be the same so that a degree on the x axis is the same size as a degree on the y axis. So I can add to this then a chord fixed. So this gives us perhaps a little bit more of a squished view of our map. But again, it is more faithful so that, you know, this spacing here between zero and 50 on longitude on the y x axis is the same as the size on the y axis between zero and 50. The next thing I want to turn my attention to is the fill color that we're using to depict the level of droughtiness, the Z scores, right. And so if you look at the scale over here on the right side, you'll see that it's a monotonic change, starting at like a dark blue, almost black, going to a light blue, what I'd rather have is perhaps to have blue go to white to red. So we'll use a scale fill gradient two. So I'll go ahead and add scale fill gradient two. And then this pop up is very helpful to tell me the different options. I'm going to flip the colors. So it gives red, a muted red as the low and a muted blue as the high. I'm actually going to flip that. Why don't we try having like a yellow as being the low and a green being the high. So like what I think of green as being like lush growth green, right. And then, you know, maybe like a red or yellow as being dry. I don't want to use red and green because some people can't differentiate between those two colors. So maybe I'll go from yellow to white to green. Let's see what that looks like. So we'll do low equals yellow. And we'll then do mid equals white. And then high equals green. And then we'll do midpoint as zero. So midpoint equals zero is the default and mid equals white is also the default. I like to leave those in here with my arguments to make it crystal clear to myself when I'm reading back through this later. And anybody that looks at it later, what I was using for my different colors. So now we've got our color gradient going from green at the wettest to yellow at the lightest going through zero being white. I feel like these colors just get really washed out. So instead of me trying to pick pairs of colors that will work well together and will be friendly to people who have red green color deficiency, I'm going to use the great tool color brewer to dot org. This is a website that is really designed for people doing GIS cartography type of data visualizations. And what I can do is I can put in the number of data classes, three up to nine, I don't know why they have 1011 or 12, who knows, diverging sequential qualitative. So sequential would be kind of that monotonic change. diverging would be to have say white or a light color in the middle like we have. And then qualitative would be say we've got like three or four or five different categories. And we want to give each color each category a different color. Again, I've got diverging. And I want something that's going to be colorblind safe. And so I think this first option actually works really well where we have this middle kind of grayish color that you can see here. It's not quite white. And the hex code is f five f five f five. So I'm going to use this brownish color for dry conditions and the greener color for more moist conditions. I'll go ahead and click on export. And we'll then go ahead and grab these hex codes. There is a r package for color brewer called our color brewer. I find that it's not really what I want to mess with. It's easier for me to grab these hex codes and put them where I want them in my r script. So that's what I'll do. So we'll come in here and I'll just plop these here for now. And I think this first one was the low that I want to use this f five f five f five is the whitish grayish color that I want to use. And then this five a b four a c think was the greenish color that I'd like to use. And we'll go ahead and regenerate the figure. And because so many of the observations are right around zero, it's totally washed out. So now what I want to do is go ahead and change the color of my background to be black, so that we can then see everything in more stark contrast. So let's do that by going and modifying the arguments within the theme function. So I'll then do plot dot background equals element fill or wrecked, right. And then the fill for that is going to be black. And then we'll also do panel dot background element wrecked fill equals black. So I know I also want to turn off those grid lines. So do panel dot grid equals element blank, which will again get rid of those lines. So with this black background and no grid lines, it does look a bit more attractive, easier to kind of see the individual squares and pixels that we're plotting. The problem that I'm going to come back to though is the scale of the colors. There's not a lot of variation in the colors that we have. And as I look at the z score scale, I see it goes up to 7.5. And so this means that there's a data point in here somewhere that has seven and a half standard deviations outside of the normal. And so that's probably a problem in the data or it's really an outlier, right? And I can't visually look at it, look at this and see where that is. Maybe that's like right there. I don't know. So I think what I'll do instead is figure out what is the bottom range? What is the lowest value? And then I'll, I'll go kind of the positive of that, right? And I'll probably end up hard coding this. So out of curiosity, let's come back in here after the select before the GG plot and do summarize. And I'll do min equals min on z score. And then we'll also do max being the max on z score. And so yeah, I see that the minimum value is negative 2.5, basically, and the max is like 7.8. I think what I will do is go ahead and turn everything that's less than negative two or greater than positive two, I'm going to set that to those values so that then my legend has the maximum green and brown color, whatever these colors are at those two values. And then I'll modify the label on the scale to indicate that. Okay, so we'll go ahead and remove this. And we'll also then go ahead and head it in here where I have my select. And I'll do mutate z score equals if else, z score greater than two, then I want it to be two. Otherwise, I'm going to use z score. And then we'll also have will repeat this. And if it's less than negative two, then we'll make that negative two, probably could have done this with a case one statement, but whatever. And so we'll pipe that in. This looks a lot better than what we just had where the colors were really muted. Because again, we had really broad scale. And so now what we see is that we have two being the upper end and negative two being at the bottom end. And those are the darker colors. Again, I think it's kind of hard to see what's going on in Europe, Africa, Asia, Australia and South America, because there's just not a lot of data there. I'm not totally sold that I want to only focus on the United States because I know I have a lot of people watching this from other places in the world. So I don't want to totally exclude you. But I think the colors here look pretty good. And I'm happy with that. Now what I want to do is turn my attention to looking at the legend. So I can make it clear that this color green is for things greater than two, and this color brown is for things less than negative two. So back in here, my scale fill gradient two, I'm going to go ahead and add breaks. And to this, we'll give it a vector. And so we'll say minus two, minus one, zero, one and two. And then my labels will then do so I'll do less than negative two, negative one, zero, one, and then greater than two. I don't know that I need quotes around negative one, zero and one, I'm going to include them, just to make all these labels consistent. And I misspelled labels, such be ELS. So this gets us our labels to be how we wanted them. I'm now going to use the theme function to go ahead and remove the title. I also want to make the background of my legend black, so that we can see the colors in the same context that we'll see it in the plot. And so then I'll need to flick my colors of the text to be white. And so again, we'll do all that here in the theme function. And so we can then do legend dot background equals element. I'm gonna do blank, because that will make it a transparent background. So be the same color as the background. And then we'll do legend dot text equals element text. And then we'll do color equals and I'm going to use this gray color f fives. And that'll be good. And I need to get rid of the title. So here up in skillful gradient, I'll do name equals null. So this looks this is the legend here is looking better. One thing that I'll do is I'm going to move my legend to be down here in kind of the South Pacific, because there's not much data over here. And if all the data is over here in North America, then I have to keep scanning back and forth to interpret the colors, right? So if I put the legend down here in the South Pacific, I'll have it closer to where most of the data are. So let's go ahead and do that. And we can do that with legend position. And to that, I can give it a vector 140. And on the y, let's try zero. And so it disappeared. And I'm remembering that this is not the actual x and y positions, but it's a relative positioning in the plot, right? So if I put a 00, then it puts it in the lower left corner, right? So maybe I'm going to go in, let's say 10%. So we'll do 0.1. And then we'll do 0.3. So let's maybe move it over to the right and down a bit. And so again, I'll do 0.2. And then 0.2. So I think that's a pretty good position on the legend. Again, it always takes a little bit of tweaking to move things and get them exactly where we want them. I do want to go ahead and get rid of these degrees, latitude and longitude, because I don't think they're super meaningful to most people. So again, we can do that here in the theme function, where I can then add axis dot text equals element blank. And so that then gets rid of that text on the x and y axis indicating the latitude and longitude. So interpreting this figure, and looking at the United States in particular for the last month. And again, I downloaded the data at the kind of beginning part of October. So this is basically September into the beginning of October when I got the data. You'll see that kind of the Midwest, southern Midwest was pretty dry. And I think this fall has been pretty dry for where I am kind of up here in Michigan. Whereas it looks like kind of the southwest for the United States has been pretty wet. And that's after, you know, a pretty severe drought for the past year. Over in Europe, I know they had a lot of drought, it's kind of hard to see what's going on and whether or not they're still in drought conditions. It kind of looks like down and through here, we see some more of the brownish points indicating droughtiness. So I'm going to go ahead and save this figure into a directory that I don't think I have a figures directory yet in here. So maybe I'll go ahead and open up another bash shell. And if we look in in the directory, yep, we don't have a figures directory. So I'll do mkdir figures. And then in gg save, I'll go ahead and save that to figures forward slash world drought.png. So looking at the png, I see that we've got these white borders on the figure. So I think we could remove that border by coming back to plot background, and then adding color equals black, and that sure enough got rid of the border on the top and bottom. One thing I think we could also do is make the aspect ratio of the figure to be two to one. So here we could do width equals eight, height equals four. So this gives us a better depiction of the aspect ratio of the world. Again, with latitude and longitude, this legend position is kind of bugging me. I think what I'm going to do is go ahead and make it horizontal and put it down towards the bottom. So we can do that with legend dot direction equals horizontal. And so now we've got a horizontal act legend. Let's go ahead and move that down a bit. And we'll do that with the y. So I'll go ahead and put that at zero. So I can make that legend a little bit thinner by doing legend dot key height, and give it the unit function, I'll do 0.25 cm. And that looks pretty good. The last thing I want to do is go ahead and build in some titles. So we'll go ahead and use the labs function and we'll do title equals amount of precipitation for and we'll do start to end and then I'll do subtitle and I'll say standardized z scores for at least the past 50 years. And then for my caption, this will be something at the bottom of the plot. I'll go ahead and do precipitation data collected from GHCN daily data at NOAA fix this misspelling here. Of course, all my text is black. So I need to go ahead and change that. And so we'll go ahead and do plot dot title, element text, color equals, let's use that f five, right? We'll use this gray color. And then we also want to do the subtitle plot subtitle plot dot subtitle. And then plot dot caption. Do the same thing. That looks pretty good. I think I want my title title to be a bit larger. And so let's go ahead and add to plot dot title do size 18. I think that size looks pretty good. I now need to add the start and end values. And we'll come back up. And I'm going to say start. And we'll do today. And so up and it doesn't know what today is because it doesn't have lubricate. So we'll do library, lubricate. And while I'm up here also do library glue. And we'll go ahead and load these two packages. So now we have start. And so start is that. And so that's actually the end, right? So this should be end. And then start should be end minus 30. Right? So let's put that after. And so we now have end and start. So to get it in a good format, we can go ahead and use the format function on today. And then we can give it a string of the format we want. So I do percent B capital B that will be the full spelled out name of the month. And then percent D will be the day of the month. And then percent capital Y will be the year. And then for the start, I'm going to grab the same thing I had for end, except I'll do today minus 30. And so now if we run and and start and and start look good, I can now plug those into my title. And we'll then do that in curly braces and in curly braces. And I need to go ahead and wrap this in the glue function so that we can insert those values. So that looks pretty good. I think one thing I'll change is to leave out the year. I think if we're doing it from today going back, it'll be pretty obvious what year it is. So let's go back up to our format. I'm going to leave out that percent Y. And this percent Y. And we'll go ahead and rerun everything and see what it looks like. Yeah, I think that looks pretty good. I'm generally pretty happy with the appearance of this figure. We have this saved as a nice PNG. In the next episode, we'll see how we can get that up onto GitHub and get it to process every day. So before we finish, I'm going to go ahead and save my R script. And then I'm going to update my snake file. And so we'll then do rule a plot drought by region. And we'll then do input will be what I need to call at the end of this line input will be my R script, which I will call it will be in code and it will be plot a drought by region dot R and I misspelled region of course, because that's what I do. And then the data are going to be the two files that we got back here. So I'm going to copy these lines, paste them in here, and then lift out the names of the files. So this is going to be the PRCP data. And then we'll also have the station data, which again, we can clean this up a little bit. And that looks good. And then the output will be in figures forward slash world drought dot PNG that needs to be in quotes drought needs a T good. And then we'll go ahead and do the shell command. So I'll copy and paste this down. I find that copying and pasting is a lot easier to do than re typing things because as you see, as I type, I make a lot of mistakes. Anyway, so this will then go ahead and generate that one thing I need is to rename the name of my R script to this. So I'm going to copy that. And then I'm going to rename it with get so I'll do get MV code merge weather stations are to that run that. And so now we see that we've got plot drought data there, right? I also want to make this executable. So I'll copy that and do ch mod plus x on that. Cool. And then I could do code plot drought by region. This runs, hopefully no problems. And then we can of course add that as a target this figures world drought to our input to our targets, right? I'm going to go ahead and test this with snake make. So I'll do snake make hyphen hyphen dry run. And then I'll plop in the name of the figure to make sure everything works well up. And I'm not actually in my conda environment right now. So I'll do conda, activate drought, rerun that. Hopefully everything works well up saying I'm missing a comma, perhaps online 84. That's certainly possible. Maybe you noticed that. Yeah, I'm missing a couple commas, right? So they're in there. We'll go ahead and save that and rerun it. This is why we test things out. Good. That's all present and accounted for. And it says nothing to be done. So I'm going to go ahead and force it, right? And so if I do hyphen hyphen force, great. So yep, there is that that works. So now I'm going to go ahead and run it. And I'll do it with one processor. So I'll do hyphen C one, wonderful. And then of course, I can make sure that my plot looks the way it's supposed to. And that looks great. I'm pretty happy with that. Great. So now I can do get status. So I see that there's a rplots.pdf that is being untracked. I don't actually want to track that because that's kind of a side effect of running ggplot in this type of environment. So I'm going to go into my git ignore file. And I will add to this rplots.pdf save that. And now if I do get status again, it's no longer saying it's untracked, that rplots.pdf, but it has modified the git ignore. I'm going to go ahead and git add all the things. So now I'll go ahead, let's do git status, git commit. And we'll say generate visual of drought across the world. world, get push. And now that will be up on GitHub, which is exactly where we want it to be. Because in the next episode, what I'm going to do is see how we can use GitHub actions to rerun this entire pipeline every night to get a fresh look at the droughtiness of the world. So that you don't miss that exciting episode, please, please, please make sure that you've subscribed to the channel. You give this episode a thumbs up, tell all your friends about the cool stuff we're doing over here on Code Club. And we'll see you next time.