 Hey folks, in this series of episodes I have been looking at different ways of visualizing global temperature data, looking at the anomalies in temperatures across the Earth normalized between a date range, say between 1951 and 1980. Again, these are global temperatures and we've also looked at different latitudinal bands across the globe. Well, that's cool, but I want to know what things have been like where I live. I live in Dexter, Michigan, which is in the southeast corner of Michigan, just outside of Ann Arbor. And you know, you might be interested in the weather history where you live, right? So how would we figure that out? Well, that's exactly what we're going to do in today's episode. I'm going to show you how we can find the closest NOAA weather station to where we live. This will work best if you're in the United States, but they also have weather stations from outside of the United States as well. So don't despair if you're not within the US. And then in subsequent episodes, what I'd like to do is to use some of the approaches that we used for temperature data at a global scale to look specifically at data from where I live, right, in my neck of the woods. So if you go to ncei.noaa.gov, bring you to this website. Also, if you've ever followed along with my general R tutorial materials, the second module in there looks at a whole bunch of weather data from where I live, right near Ann Arbor, Michigan, looking at temperature and precipitation data. So here we are at the NOAA website. And there's a link here, a button for browse data sets. I'm going to go ahead and click on that. This then brings us to climate data online data set discovery. I'm interested in the daily summaries. And again, these are going to be daily to a particular region, whereas some of these others are global or more seasonal, and I want daily data. And this comes from the global historical climate network, GHCN Daily. We'll go ahead and click on that more button. This brings us to a page that tells us about all the different types of data that are available. There's information on here on using the GUI, the web browser to search for the closest region and data that's available there. I'm using R, and so I want to be able to automate this for any location, right? So if you have a latitude, longitude, I want to be able to generate a plot of that data, right, of the precipitation data or the temperature data over a long period of time. So that's kind of my motivation. So what I'm interested then is looking at the download data. And so there is the NCI direct download. There's an HTTPS server. So that's what I want, I think, is this HTTPS server. This then brings us to a no frills webpage with all of the different files that are available. In here is a readme document. So in the readme file has a bunch of information about all these different files and directories that are in this directory here. So as I look through this, I am trying to look for a file or resource that will help me to figure out what weather station from NOAA is closest to where I live. And so, again, they've got a variety of different things in here. I think this is basically all of the data. If you want to get all of the data, that's what you do. Maybe we don't want all of the data. Again, I perhaps want to be a little bit more focused. I suspect this is a really big file. Maybe I'm wrong, but anyway, if we kind of look down through here, we can see there's an all directory, a GSN directory, a variety of these different directories that have kind of the individual files. I would like for things to be aggregated. Also, at this point, I'm kind of looking for a high-level overview of the data, something like a weather station and the types of data being collected there. Of course, I have a little bit of inside information because I've looked at this before. But anyway, if we scroll down, we see there's compressed versions of those directories that we saw up above here. And now, down here, we see some of that high-level overview of the data. So there's GHC and D countries, so the list of the country codes and their names. I like this one, GHC and D inventory text, file listing the periods of record for each station and element. That looks interesting. List of stations and their metadata, the list of states and province codes. Maybe not so interested in that. The version, mingle list, I don't know. So I think what we're going to be interested in is this GHC and D inventory.txt file. So again, if we come back here, this link here is that GHC and D inventory.txt file. So I'm going to go ahead and click on that. And this then pulls up a file that has the weather station name, the latitude, the longitude, the variable that was being measured, and then the start date and end date that the data were being captured on. Of course, if we come back and we look at the GHC and D stations.txt file, this also gives us a fair amount of information about the station itself, but it doesn't appear to have anything about the data that they're collecting. So it's got the station name, latitude, longitude, the elevation, the name of the station, and some other information. So I think what we'll do is we will use the inventory data. And again, what we can do is go ahead and I'm going to right click on this to get the link. And so that will then copy the link address. And the nice thing about this is that this was last updated on the 14th of July. And so as I'm recording this, it is, yeah, it's the 14th of July, right? So that's from today. And so basically this data will get updated periodically, if not every day. And so that way, then when you generate this, any figures that you generate from this data will be present to the last time it was updated, rather than having to download it, put it in directory and do all those types of manipulations we've seen before. So anyway, now we'll come over to RStudio. I'm going to create a new R script. I'm going to paste that URL in here. I'm going to go ahead and put that in quotes. So it's a string. And so I will call this inventory URL. Great. And as always up here, I'll do library tidyverse. And I'll also do library glue, just to have all those great tools accessible to me. So this inventory URL, as we see is a TXT file, which makes me think that it's not a tab separated values file. We could try it. We've seen this before, but if we do read TSV inventory URL, again, it goes up to the website and then brings down the data. We can see it's downloading it here. Takes a few seconds to download it. It's quite large. And so we see that it is reading it in as a single column. And so it's using a tab as a deler. So what we want instead of read TSV is read table. One of the downsides of using read table or read TSV on a URL is that if the file is big like this one is, again, that's like 725,000 rows, right? Then it's going to be really slow. So we probably don't want to do this download a whole lot of times in our script. So once I get it read in, I'll save it to a variable so that I can then use that variable for downstream processing. So let's go ahead and read that in with read table. I think this should work. So I'm getting a warning message, duplicated column names, deduplicated 1949 to 1949 one. And I think that's because there's no column header here. So let's go ahead and add column names. So we can add that as an argument to read table. I could do call underscore names. And then as a vector, I can give the various column names, right? So I'll say station and then I'll do latitude for latitude, lawn for longitude. And then I'll do a variable for that T max, T min, prcp. Those are the different variables that are being collected at each of the different weather stations. And then I'll do a start and then end. And then let's go ahead and save this as inventory. Great. Let's do inventory to see what it looks like. And so now we see that we've got station, latitude, longitude, the variable and the start and the end times. Excellent. Now what I wanna do is figure out what is my local latitude and longitude so I can find the station that's closest to that latitude and longitude. So here I am. I live outside of Dexter, Michigan. And so one of the nice things that I can do is within Google Maps is that if I right click on a point, any point on this screen, I will get the latitude longitude. So I'll do Dexter. And so the first thing in this pull up is a latitude and longitude, right? And so those are in degrees. And so if I click on that, I see at the bottom it copies it to the clipboard. So now I can come into my R session and I can paste it in. And so then I can call this my lat and then I can do my long with my latitude and longitude. I now wanna calculate the distance from that latitude and longitude to the latitude and longitude of each of the different weather stations. To find the distance between any latitude and longitude, I'm going to go ahead and do Google search and I'll say calculate distance between two latitude longitude points. I see it there and I want the formula. So this exercise might be a bit overkill. I could probably just do a simple Pythagorean theorem between distances between latitude and longitude, assuming that basically we have a flat surface between where I am and the closest station. But we're scientists, let's at least act like it, right? So we'll go ahead and click on this first link to see what this gets us. So we find this page from Geeks for Geeks program for distance between two points on the earth. I'm going to assume that this is correct. It'll be close enough probably. So what we see down here then is a formula for the distance in miles. And so we can, of course, convert that to kilometers. So I'm going to go ahead and copy this information into my R code and I'll go ahead and comment it out so R is not tempted to run that. And I'll take inventory and I'll go ahead and pipe this into a mutate, because what I want to do is to use mutate to create a column of the distances because ultimately what I'm going to do is calculate the distance between my lat, my lawn, and all of the lat lawn in the data frame. And then I'm going to find the station that has the smallest distance, okay? So we'll do lat R to convert our degrees into radians, right? So we'll do lat times two times pi divided by 360. And then we'll also do lawn R equals lawn times two times pi divided by 360. So again, that gets us our latitude, longitude, and radians, which we need, right? And so then what we can do is D equals, we'll use the miles for now, 3963 times, let's do arc cos. And so if you're curious about what trigonometric functions are available to you in R, you could always do question mark cos for the cosine. And then this brings up the various trigonometric functions that are available to us in R. And I see I've got a cos, not arc cos. So I need to fix that. So that's a cos. And so it's going to be a cos. And then we don't have square braces. We need to use always the curved parentheses. So we'll do sine of lat R times the sine. Oh, and I need a closing parentheses there. Sine of my lat. And that reminds me I need to convert my lat and lawn into radians. So we'll do two times two times pi divided by 360. And we'll do the same thing for my lawn. And then again, we'll run those so that it's all entered, right? So that's in radians. Okay, so let's keep track of our parentheses. As always, that's a challenge, right? So we need two parentheses after that final sine plus the cosine on a lat R times the cosine on my lat. Yep, my lat. And then keep moving over times the cosine. So the first cosine, I don't know that it matters, but the first cosine has been the ones in the columns and the second has been mine. So I'll go ahead and do my lawn minus lawn R, okay? And I'm not sure if we have enough parentheses here. I guess one way to find out is to run it and see if we get an error. Yeah, we've got one too many parentheses, like I said. So I'll go ahead and remove that. And I guess if I looked up here, I would see that we have two parentheses there. And so one of these is for the mutate, I believe. Yeah, so that brings us there. That should work, great. And so now we have this distance. And again, that's in miles. And what it says up here is we can multiply that all by 1.609 to get that to work. And so now it's gonna be in kilometers. So it would be good to double check that this formula is doing what we think it does. So let's come back to the webpage. And I suspect down here they've got some code. Yeah, so let's go ahead and use this code to see if when we run it with their numbers, we get basically two kilometers. So I'm gonna go ahead and copy that. Go ahead and put that up here. And I'll go ahead and remove these. Great. This is all like, I think JavaScript or C++ language. And so for my lat r, I'll do a lat one. And then I'll comment out everything else. And then I'll do long one and comment out everything else. And then I'll do my lat equals lat two, my long, long two. Make sure you've got all these loaded. Good. And now let's give this a run up. And I have long one instead of long one. And it occurs to me that these are in degrees rather than in radians. So I need to go ahead and convert it to radians by multiplying it by two pi divided by 360. Again, there's two pi radians per 360 degrees. So I don't need those semicolons. So let's go ahead and try this and very good. We get the value that they had on the website. We have got 2.01. I think here they've got 2.004, maybe a little bit of a rounding error. Okay, so our formula works. We're confident with that. I can now go ahead and remove all this practice code. And we'll go ahead and put our values back in here. Of course, I need to come back and rerun all this. Good. And so now I see that I've got my distances for the different stations. Wonderful. Now what I could do is I can arrange this output data frame by order of the distance. So I could do arrange on D. And this is going to give me the weather stations that are closest at the very top. If I wanted to be in the opposite direction, I could do minus D. And there I see now I've got the furthest weather station from me. That's on the other side of the world, no doubt, right? But I want the one that's closest to me, so I'll do that. What I'm noticing, however, is that while this is probably in Dexter, it's about 0.3 miles away, it only goes back to 2015. I would sure like it to go back quite a bit further. So what I'll do in here is do a filter and I'll say start less than 1950, right? So I want a pretty long series of data. You know what, maybe I'll make that 1960 and then I'll go back a range D. I'm picking 1960 because I remember that episode where we did an analysis of looking at the numbers of stations or those degree latitude coordinates, the number that we had over time when we were doing kind of that global temporal change over the last century or so, and the number really picked up around 1960. So let's try that. And what we find is that there is this station here that started in 1916 and it's about 11.7 miles away. But of course we see that it ended in 1935. So I'll add another filter and I'll say end after, let's do 2020 because we want to keep it current. And so this brings us a weather station that is about 19 kilometers, not miles, 19 kilometers away. And it goes from 1891 to 2022, right? So this gives me at least 130 years worth of data. Your mileage may vary in terms of how much of a history you will have access to. So now what I wanna do is I want to capture this station name. So to do that, what I'm gonna do is I'm instead of a range D, what I'll instead do will be to do top N, N equals minus one on the D, on the distance. And so we see that we get a table with 12 rows and nine columns because there's a tie, right? And so we see all 12 of the variables that are being recorded for that weather station. So I'm interested in the weather station label itself because I wanna use this then to get a file for that specific weather station over time. Hopefully that makes sense. So I can then do distinct on station. And this then gives me my station. I could then do a pull on station to get the station name as a variable. And I can save that as a variable that I'll call my station. Then I can do my station and I get that weather station. Great. So then with that my station, I wanna download the daily data for my station. Now, looking back at the NOAA data, I can go to this directory by station and I can see it's a directory with all the different weather stations stored as a CSVGZ. I'm gonna assume that this first bit of text is the weather station name. So I'll go ahead and right click on this and I'm gonna copy the link address. And then down here, I'll put station daily as that long URL. Of course, I'm not interested in this specific one. I want my station. So right here, I'm gonna remove that. And then in those curly braces, I'm gonna put my station. And then around the whole string, I need to put glue, right? And so the glue function will create the URL for me based on my station, right? So now if I do station daily, I now see that I've got that URL. So this is a CSVGZ file. So it's a compressed CSV file. Read CSV and read CSV can read in these GZIP files even if they're remote like this one. So I could then do read CSV on station daily. This then we see our data. Unfortunately, we don't have any headers. So I know what some of these column names are but I'm not totally sure. So if I come back to this directory, I'm gonna do a search for readme file. And I see readme by station at the very bottom. Clicking on that, I see all the elements in that file, right? So I've got the ID for the station, the eight character date, the element, element type, that's gonna be the variable and then the value of that. And then these are other flags that I'm not super interested in. So again, I can then use call names to get the station, the date, the variable, the value. And then I'm gonna say A, B, C and D. And I'm ultimately gonna get rid of A, B, C and D because I can then do select station date variable value and I'll get those four columns. So before we go, I wanna get this data frame in good shape. We could probably get rid of that station column because we know that it's all from the same station. We've got the date, the variable, the value. I would like to have each of the variables in a separate column and then I only want certain variables, right? So maybe what we'll do is go ahead and do a pivot wider and we'll do names from equals variable values from equals value. And that needs to be an underscore not a hyphen. So that gets us a really wide data frame with 26 different variables. I'm not interested in most of these. I'm gonna be interested in the Tmax, the PRCP and the snow. So what we could then do, so I could do a select on the date, the Tmax, PRCP and the snow. So this gives me back those four columns. Something I'm just going to assume is that if it's an NA value for precipitation and snow that should really be a zero. So I'm gonna go ahead and do a values fill equals zero. And so now we've got zeros plugged in for those NA values. I'm just gonna make that assumption. I might be wrong, but I think it's pretty decent assumption. The final thing I wanna take on is the date. And so here then we will do mutate date and I'm gonna use YMD on date. So YMD is a function that comes from the Luber date function package. And so I'm gonna come back up here and we'll do library, Luber date, get that loaded. And then that YMD is a parser that should take that eight digit number. All right, like so today would be 2022, 07, 24 or 14. And it would parse that out into 2022 hyphen 07 hyphen 14. And there we go. We've got our date column, the Tmax, the PRCP and the snow. So I'm gonna go ahead and call this local weather. Save that. I'll also save this script into code and I'm gonna call this local weather.r. I would strongly encourage you to do this with your own latitude and longitude. This should work pretty well. Let me know how it goes down below in the comments whether or not you're able to pull up your local weather station. I think you should be able to, even if you're not here in the United States. If you plug in a latitude and longitude using something you might pull off of Google Maps like I showed you, even if you're on the other side of the earth, you should still be able to get your closest weather station. This is a pretty comprehensive network of weather stations. They might not all go back as far as mine does, but hey, it would be awesome for you to be able to work with your data for you to get a better sense of how climate change is affecting where you live. Well, like I said, practice with this, tell your friends what we're up to here and we'll see you next time for another episode of Code Club.