 Hey folks, we've been doing a pretty deep dive on different approaches to visualizing climate change data. A while back I found this website at the Earth Observatory from NASA. I was really intrigued, as you know I love animations, but I was really intrigued by this plot looking at monthly temperature anomalies between 1880 and 2016. Our data previously in all the other episodes we've been looking at is normalized between 1951 and 1980. This has been normalized between 1980 and 2015, a more recent normalization. What I want to do in this episode is recreate this figure. What I'd like to do is initially get this static version but also see if we can push ourselves and get the gif as well. Something that I have noticed is that if we plot our data that we've been working with over the course of a year, it's basically a flat line with the average again from 1951 to 1980 right at zero. What we need though is additional information to tell us how NASA normalized the data by month. Basically what happened when they normalized the data between 1951 and 1980 is that over those years they look at the average for January, February, March, April, all the way to December and then use that to calculate the average temperature difference for each month for every other year. So it's been done monthly. So we need to know what the adjustment factor is for each of the months. When I look down through this page and looking at the variety of responses and comments, this really got a lot of enthusiasm from the community, from people that I suspect want to be able to recreate the figures. One of the questions that came up from Robert, who every Robert is, is the specific data tailor used available. So they've been working with the GIS temp data that we've also been working with. But Joshua Stevens, one of the authors chimed in to say that the seasonal adjustments come from Mara 2 and can be found here. Mick Watson also chimed in and said, you know, the temperatures seem to be peaking at three degrees Celsius, not at two degrees Celsius. So again, what Joshua responded with was that there's a baseline change of about 0.7 degrees, 0.69 or so degrees over the course of the year that accounts for that difference between three degrees and two degrees. So there's two additional bits of information that we need to bring on to our GIS data that we've been working with in previous episodes. Let's start by bringing in this Mara 2 sees an om.txt file. And so I'm going to go ahead and fire that up. And so we get this text file, which I can save into our climate vis data directory like this. And I'll go ahead and save that without touching anything. Now over here in our studio, let's go ahead and open up that file in data Mara 2 sees an om.txt. Now, I don't like to manually edit the text files, I like to leave them as raw. And so I'm going to leave this as raw. And so we see that there's three header lines here that we don't, we don't want, right? So we'll need to account for that. So let's go ahead and it looks like it's perhaps a tab separated values file. So we'll do read TSV on data. And then again, that Mara 2, we can then do skip equals three. And it looks like it reads it in, but this is reading in as 12 rows in one column. And what this tells me is that these separators aren't tabs, but they're spaces. And so an alternative to read TSV that we could use is read table. We read that in. And sure enough, now we get our month, our season anomaly plus two standard deviations minus standard two deviations, and the seasonal anomaly itself. So we see that we're getting a whole bunch of warning messages that it's expecting for but there's actually five. I think that's because if we look at our text file, there is a space at the end of the data. And so that's basically telling read table there should be another column. But read table only found four column names. I'm not going to worry about that. You could probably get rid of that by specifying the actual column names and giving it five column names. But I don't care. Right. So again, we have this read table. One problem is that our month is being read in as a character. And that's because we have the zero to make it a two character or two digit month. So I'm going to go ahead and modify that. I'll start by doing a select on lowercase month equals uppercase month that'll reach me that'll change the name of the month. And then we'll also get sees an arm. Very good. And now again, again, we want to modify that month by doing mutate month. And we can change that actually by using as dot numeric, as dot numeric will convert month into a number if it can, right? So if I put dog, and then did as numeric on dog, it wouldn't like that. But if I do as numeric on 01, it'll do its best to turn it into a number. And if it can, then it will represent it as a number. So sure enough, we now have month as a number. But I don't really want the number for the month. I want the month abbreviation. So again, I could do month equals month dot ABB, square brace month. And again, as we've seen month ABB are the three character monthly abbreviations, it's a vector, I can then get the month abbreviation for each number by indexing the number of the month that I had previously into that slot. And so now we see that we have our month in our seasonal anomalies, along with all those warning messages, I'm just going to ignore, right? So this I'm going to call my month, a nom. And this is information that will store as we now read in our GIS as data, which again is the normalized data between 1951 and 1980. So to get that, we've seen this before, we'll read TSV data forward slash GLB, blah, blah, blah. For that, we need to do skip equals one and a is the three stars as we've seen, right? And it's not a TSV, it's a CSV, right? And so again, we see that we get 143 rows in one column, and that our column has all the things we want, but separated by commas. That's a sign that we use the wrong delimiter. We saw that, of course, when we were trying to use read TSV instead of read table. So read CSV. And that's all good. And again, we want to select the year and I'll do lowercase year equals year, and then we want all of month dot ABB. And so conveniently, these column headings match the values of month dot ABB. This, of course, then gives us our year and our 12 months. And now we can do a pivot longer. On everything but the year, our names will go to the month. And our values go to sometimes it's hard to talk and typed, you know, values to T diff. Great. And then at the end of this, you can do a slice tail and equals 10. And we see a bunch of NA values for the months that haven't happened yet, right? So I pulled these data down in April. It's now June as I'm recording this, I just haven't updated the file. And so those are a bunch of NA values. So I can go ahead and get rid of those NA values by doing drop underscore NA. And again, if I do slice tail and equals 10, we now see that we've gotten rid of all those NA values, and we're off to the races. Now what I want to do is convert my month into a factor. Because for this plot, we're going to plot the months on the x axis. And I want them in chronological order, not alphabetical order. By default, ggplot will convert that into alphabetical order, unless we tell it otherwise. So I can do mutate month equals factor on month. And I'll set my levels to be month dot ABB. And so now when we look at our data frame, we see that instead of it being a CHR character, it is now a factor. Now what I want to do is join in the monthly anomaly. So again, we'll do an inner join with the data coming through the pipeline and month, a nom. Again, I don't need this period in my inner join statement, our nose to take automatically what's coming through the pipeline and put it on the left side. I like to put it there just to be explicit for myself. I also like to say by equals month to be explicit with myself what is coming through the pipe, right? And so now I have the monthly temperature difference and the seasonal anomaly, the monthly anomaly for that. And I can then go ahead and join that together so I can do a mutate. And I will then do month annum. And I'll then say T diff plus C's annum, giving me that column, which is the month annum column, which I can then use to generate those line plot. I'll go ahead and save this as a variable that I'll call T data. And now we're ready to use T data to generate plot. So we'll do ggplot, AES. On the x axis, I'm going to put the month on the y axis. I'm going to put the month annum. And then I'm going to group by the year. And then I'll do a geom line. And my plot is alphabetical, which is strange, because I thought I changed the order with this factor. And I think the problem is that I'm joining after I do all that, right? So if I look at T data, I bet this column went back to being a character. So to solve that, I can go ahead and bring that mutate back down. And you know what? I can actually join these two mutate statements together. And so the key thing is to put the factor creation after joining in the other month data. So this should work now. And if I look again at T data, I'll see that sure enough month is back to being a factor. I generate the figure. And voila, we basically have what they had over on the NASA website. The next thing I want to do is go ahead and color my lines to match the average temperature of that year. So to get the temperature that we're going to map a color to, I need to create another variable. I could have done this using the column, like the J through D column from the original data frame. But you know what, I think it'll be just as easy to calculate it on my own. And so what I'll do here is a group by a year. And then I'll pipe that actually to a mutate rather than a summarize summarize would take each year and give me one row for each year, mutate will create a new column, and it'll repeat the same value for all months in there, right? So I can call this a V even and I can then do mean month and nom. And then again, if we look at T data, we'll see that we've got this average column, I then need to ungroup it. Great. And so again, we have our T data. And that again has the average column built in. In my ggplot, I can then do color equals A V E. And now I see that I've got a color gradient for the average annual temperature, again, going from about negative a half up to one. I of course don't like that color scheme. And I'm going to scale it from blue to red with white in the middle. We've seen this a number of times already, but we can do scale color gradient two. And then we'll do low equals dark blue mid equals white, which of course is the default. High equals dark red. I'll do midpoint equals zero. And I'm going to go ahead and get rid of that legend. So I'll do guide equals none. Very good. So one small thing that I'm noticing is what was raised by Mick Watson in the comments was that instead of kind of maxing out at two, it's maxing out at about 2.7. If we come back to the discussion, we'll see that again, for those 12 months, we have values ranging from one to 12. We could do the same thing that we did with this Meritus sees an om.txt file. I'm perfectly happy to just kind of guesstimate this to be 0.7 on average. I'm not really concerned about differences in the hundreds place. So let's subtract 0.7 that will then allow us to bring things down. So that we're basically scaling between 1980 and 2015. Again, we can come back here to this month anon and do minus 0.7. Now we see that our plot is more within that range that we saw from the website. One thing I see more clearly here is that the line for 2022 is a dark purple and that's because it's average is based on three months, right? And so it's quite negative making it a darker blue. Let's start with modifying that 2022. So after this ungroup, I'll go ahead and do another mutate. We'll take AVE and I'll do an if else. So if else the year is 2022, so that's the current year, then I want the average value to be the maximum of the absolute value of the AVE column, right? And so because I want the darkest red to again to indicate the year 2022 and I want that to be as dark as the darkest blue or the darkest red. But in this case, the darkest blue is going to be the darkest color. Hopefully that makes sense. And then out here, alternatively, if it's not 2022, then we can use the regular AVE value without correcting for the most current year. And so now we can see that we do have that darker red line in there. And so that looks pretty good. And that will be a pretty good indicator that that's for the year 2022. Now what I want to do is turn our attention to getting our theming to look more like the web version. And of course, we can do that with the theme function. And I want to start by looking at the grid lines. So we'll do panel dot grid dot minor. And we'll do element blank. And again, these grid lines between like zero and two are the minor grid lines. So that will get those to go away. And then panel dot grid dot major dot x will be the grid lines across the x axis. And I want those to go away, because the only grid lines that are here are in the original version, rather, are those horizontal grid lines, we've gotten rid of all those extra grid lines. I'm going to go ahead and copy this and modify it to do element y or major y rather and then element line. I'll go ahead and do color equals gray. And then line type to be dotted. And let's go ahead and do size equals 0.25. The default I'm pretty sure is 0.5. So let's look at to be a bit thinner. So this is going to be a gray grid line. The background is gray, of course. So I want to kind of clear out the background. And so we'll do panel dot background equals element blank. That'll make it a nice white. So it'll be easier to see everything else. And so now we've got those grid lines. Of course, the original version had a grid line for negative three, negative two, negative one, zero, one and two. So we want to go ahead and add that back in, I'm going to put that up here with my scale color gradient two. And so I'll do scale y continuous. And I will then do breaks equals. And we'll do let's do seek from negative three to two by one and add that in. So now we have those grid lines for every unit. The original version didn't have x or y axis labels. So again, we could do labs, x equals null, y equals null. But it did have a title and a subtitle. So we can add that in. So we'll do title. And again, the original was temperature anomaly degree C. So we'll do temperature anomaly. And then it's the unit code is you 00 B zero C. I've been typing that in so many times. I think I haven't memorized closed parentheses. And then we have subtitle equals something and that is difference from 1980 to 2015 annual mean 1980 to 2015 annual mean. And that was all in parentheses. So I need a closing parentheses. Also, we've seen this before in previous episodes, but the title and the subtitle are aligned on the y axis. I'd rather it be aligned on the plot. It's so basically it's by default aligned on the panel, not the plot. So we can fix that very easily. plot dot title, dot position equals plot. Very good. That brings it over to the left. We can also change the formatting. Let's make the title bold. And so we'll do element text font, our face equals bold. And then we'll make our plot dot subtitle element text. And we'll do color equals gray. So we'll shrink the size a little bit and do size equals 10. So I think we've got it pretty close. So I'm going to go ahead and save it now to a PNG in the right dimensions to make sure everything is positioned right and kind of looks the right size. We can do that with GG save. And we'll do figures forward slash monthly anomaly dot PNG. And then my width, I'll make six, my height, I'll make four, my units will be inches. Good, I think those sizes actually work really well. One thing I have noticed is that we have expansion on for the x axis, we need to remove that because in the original version, there was no space between the y axis in January or December, and and the right side. So again, we can come back to scale x continuous and add it in actually. So we're going to scale x continuous, and we'll do expand 00. And it's complaining because I used scale x continuous, rather than discrete, because x is being represented, it's representing a factor, which is discrete, right? So we can do discrete. And now we see that we we've removed that space. But at the same time, we've kind of chopped off part of the December. And so what I'd like to do then is add in a margin for our figure. And so we can do that by coming in here. And we can do plot dot margin. And we'll give it the margin function. And I've the the arguments are trbl trouble, right? And so the top, I'm going to give it 10 points, the right, let's give 10, the bottom 10, and left 10. And if those aren't right, as always, we can adjust. And that actually works pretty well. I'm pretty happy with that spacing. Maybe the December, we could give a little bit more space. And so on the right, we could open that up to like 15. Yeah, and I think that looks pretty good. I'd also like to put an annotation on here for our 2022 line. The original went to 2016 in August. And so I think it's that point there. I'd like to put a circle on this, as well as a label. So again, we've got our T data data frame. And so I'm going to make another data frame that just has the last date, the last month, right? And so for this, what we could then do would be to say, a slice tail, and equals one, again, that gives us that final row, I'm going to call this annotation. We've got that data frame. And then what we could do is to add to our GM line, geom point data equals annotation. So our x will be month, our y will be month, let's go ahead and add that in. So that gives us a red circle. Let's make it bigger. I like to have the big juicy point there. So it's easy to see where it is. Let's do size equals five. Let's now put a text next to it. So it's clear what that actually represents. Again, we can come back up to our geom point. And after that, we could do geom text. And we could do again, do data equals annotation, AES, x equals month, y equals a month, a nom. And then we could then do what we could do label equals, let's do March 2022. Add that on. And so that's centered right on the point. And so that's centered justified vertically or horizontally, we could, I find it's easier to not be centered justified, but to rather be left or right justified. So I'm going to go ahead in this case and make it right justified. So we can do H just equals one. And so we lose the final two and 22. So maybe we'll go ahead and forego algorithmically setting it. But perhaps we could go ahead and set it to the left and right a little bit using actual numbers. And so the x is currently three, maybe we could do like 2.8. And our y is at negative like point eight, maybe we could make it like negative 0.5. And I think that looks like pretty good placement, right? Again, it's analogous to what we saw in the original where they had a label up here for August of 2016. I'll let you see if you can't figure out the filter statement and the repositioning of the points and all the other stuff to make that original plot. I'm here in March of 2022. And again, if you're watching this in the future, by all means, I'd love for you to do this with the data for whenever you're watching this, right? 2023, 2025, who knows, right? So I think this does a really nice job of creating the static screen of that GIF. Hopefully you agree. What I'd like to do now is very quickly show you how we can actually create that GIF. I am going to break up my plot, because there's a lot that's in common between the final screen as well as the rest of the GIF, right? And so I'm going to call this whole plot P, right? And so then, I can then say P, and then go to GGSave, and it'll save it as that final version. But what I need to pull out is this geom point in the geom text, right? That's the unique thing that's different from what we had in the GIF. So then I can add that to P to get that. And I forgot to remove the plus sign at the end of the geom text. So it's trying to add GGSave on. So I get that error message. Let's go ahead and run this again. And of course, everything looks like it did before. Of course, if I just ran P now at the prompt, what I see is the plot without the labels, right? Okay. So now what we want to do is build the animation. And so for the animation, what we can do is P plus transition reveal. And we'll reveal over the year. And I forgot to load gg animate. So come back up here and do library gg animate. And now we can do the transition reveal without any error messages. So we're getting the GIF. But of course, we don't have the label for the year. So to this, we need to go ahead and add geom label. And at the x axis, it's right over July. So that'll be seven. And on the y axis, let's go ahead and put that at zero. And our label will be the year. And we've seen this before, but we want a white background. But we don't want the border, right? And so we'll do label, um, label dot size equals zero. And you know what, I think this part needs to be an AES. So the x, y, and label. So let's see what this looks like without adding the transition reveal. And again, I think that looks pretty good. One thing I might do is to make that bold. And so what we could then do is font face equals bold. Great. And now we can add on the transition reveal. And so what we see is the animated lines. But unfortunately, we see what we saw earlier when we were making the climate spirals is that the year is not increasing smoothly, right? It got from 1880 to 1888. But then it sits here for a while before jumping up to the 1970s. And so what we saw before we could do to deal with that problem is instead of transition reveal to do transition manual, and we'll again do year, but on transition manual, we'll show a new line and then hide the other lines. So if we want to keep all the other lines, then we need to do cumulative equals true. So that seems to have done the trick. Now if we want to output this, I'm going to go ahead and save this as another plot a soul to animate a and then our width, I'll set to be six, our height to be four, a unit equals inches, and then res equals 300. Also then save this as a gift to annum save. And I'll save that then as figures monthly anomaly dot gif not PNG. Yeah. And so this will take a little bit of time to go ahead and render and generate will make you sit around and wait for that. It'll basically be the same thing that we had here. Of course, if you want to see all of the output all of the code that I generated today, you can download it at the link down below in the description, you'll find a blog post that'll give you all the instructions you need to get going. So I really hope that you do follow along in running this. Again, I pursued this visualization because as I was looking around for data visualizations of climate change, I thought this one looked really interesting. If you find others out there in the wild that you would love for me to take a crack at and show you how to do, please let me know, I can't possibly see all the things that are out there. I've got a few more things in the queue, but I'd love to get your thoughts on visualizations that you think would be really fun to recreate. Even though these skills are being applied to think about climate change, I know they can be applied to the work that you are doing to visualize your own data. We'll keep practicing and we'll see you next time for another episode of Code Club.