 Hey folks over the past few months we have been looking at different ways of visualizing and analyzing climate change data. More recently I've been really drilling down and looking at data that came out of a NOAA weather station outside of Ann Arbor, Michigan just about 10 miles to the east of where I live here in Dexter, Michigan to give us kind of a sense of temperature and precipitation data for southeastern Michigan. More recently I have been looking at drought. The drought is something that's been really gripping the world especially the southwest United States and Europe over the past summer. Just really shocking images of reservoirs and rivers that are historic low levels. Well when I make precipitation plots for this past summer our precipitation here seems pretty normal right? And so a previous viewer chimed in on a comment thread and said yeah I wonder what the drought would look like over the course of the entire world. Sure there'd be hotspots say in the southwest US or in central Europe but what about the rest of the world? And so that got me thinking that that would be a really fun project to work on with you all to try out some of the different tools around reproducible research on this interesting question. So what I want to do is take about 10 episodes or so here to engage in using tools from reproducible research that we perhaps haven't touched in a while as well as introduce some new ones to share with you what I call my stack. My stack of tools or my collection of tools that I use to engage in reproducible research. So the goal is to download all of the data from NOAA to then aggregate it by longitude and latitude and then to calculate going back say a month or two weeks or some number of days to then say well how typical is the amount of precipitation over that period relative to all the data we have for that degree longitude and latitude. Then what we could do is we could calculate some type of statistic maybe like a z score for the past month and we can then use that z score as a color attribute to make a rastered image of the globe. We've done all bits and pieces of these over the past like I said a couple months but we've never really put it together in a single project and so that's what I want to do with you. So again we are going to use a whole bunch of tools that we've talked about like git we're going to use kanda we're going to use something new called snakemake we're going to use r of course and then we're going to use github actions which will allow us to rerun our analysis every day and then post a new image to a website that updates the current state of drought around the world. To get going I'm over here in github I'm at github.com slash rifomonas you need to create your own account on github I've got other episodes way back in the archive I'll try to put a link up here for how you can get things set up with github and get on your own. So I'm going to post this off of the rifomonas account although I also have my own personal account so I'll go ahead and start by clicking this plus sign to create a new repository and I'm going to say that the owner is rifomonas I've got a whole bunch of accounts in here and so this repository name I'm going to call drought index and I will then call this a project to practice reproducible research practices while studying the state of drought around the world and I misspelled practices so we'll get that right. I want to make this public because I want everyone to be able to see my code I'm going to add a readme file I'll also add a .getignore file and so I'll plop in r here which doesn't really narrow it down too much but if we scroll down here we will see that there is an r template for the .getignore file and a license I'm going to put in here an mit license that is a fairly permissive license that really just requires that if you want to use the code you got to give me attribution I'll go ahead and create the repository this then creates a web page of the repository that as we see has three files here .getignore license and the readme .md file the readme .md file is shown right here okay very good so we have a version of our project already up on github that is public and that anyone can see I now want to get this down on my local computer so I'll go ahead and click this code button the green button and to the right of this code this link there's a copy icon so I'll go ahead and click that I now turn to my terminal application again I'm doing this on a mac but you could just as easily do this on windows windows has a linux subsystem that you can use very easily and you can also do this on a linux computer I would strongly encourage you to move your analyses to a linux base system if you're going to be doing any kind of serious data science in the future it's just the tooling is a lot easier to get a hold of in a linux type of environment than say on a windows type of environment of course a mac is a linux type of environment I am in my home directory we know that because I have that little tilde there I'm also using bash the default for mac is zsh I'm using bash because that's what I've always used and I've never really felt a reason to change things that being said when you get a new mac the default is zsh and so it takes a little bit of figuring out to go back to bash so maybe at some point I'll switch to zsh but for now we'll stick with this I'm going to go ahead and put my project directory on my desktop so it's easier for me to find I'll go ahead then and do cd desktop to change directories to the desktop I can then use the git functions to go ahead and get that down so I can show you what I'm using I'll go ahead and do git hyphen hyphen version and this is the version of git that comes with the apple computer 2.3 2.1 I think this is a fairly recent version of git so I'm pretty happy with that I don't think there's going to be any real meaningful differences between what I might be able to install using something like homebrew versus what I already have here on my computer great so to get the project from github down onto my laptop I can start out by doing git clone and then command v to paste in the link that I copied when I was on the website as it says in the dialogue here it clones into drought index creating that directory on my computer and so now I can do cd drought index and I can do ls and I see the license and the readme file here in drought index so I want to orient you to a couple things here that you may have forgotten since the last time I talked a lot about git so on my computer I have things set up so that I have the branch that I'm on put in parentheses next to the directory name right so I'm on the main branch and this is green and so green says that everything is committed we'll see that this will change from green to red red means that there's things that still need to be committed that aren't being tracked or haven't been updated with the repository I don't think I'm going to go off of the main branch I might before the whole project is done so we'll typically see this listed as main the other thing you'll notice is that I have two files here license and readme.md but there were three right there was a .gitignore and so a little bit of trivia to know is that if a file name starts with a period like .gitignore it's going to be hidden here what you could do is ls-a and now you see all of the hidden files those files that start with the period as well as license and readme right so we see .git this is a directory that makes the directory a git repository you don't want to touch that and then we have .gitignore which tells git what file types to ignore and of course we have the license and the readme over the course of this project I'm going to mainly be using a tool called visual studio code um you've seen me use this in previous episodes it's a tool that I'm trying to get to learn and kind of I don't know figure out how to use it's an alternative to our studio I don't know that it's better or worse it's it's different and I'm trying to learn it and people seem interested in kind of learning along with me so what I can do is I have a command line tool installed called code so I can do code space period this then will open up my drought index into virtual studio code and you can see here that I've got the .gitignore with all the different things that it's going to ignore my license right and so copyright 2022 Riffimonis maybe I'll here I'll put my name so Patrick D Schloss I'll save that and then my readme I'll go ahead and kind of reformat this a little bit so I'll do readme as the title and so this is a project I'll say a repository for a project to practice reputable research practices while studying the state of drought around the world okay simple enough I'll go ahead and save that and we might add more information to this as we go along one of the cool things that you'll already see here is that in the explorer here in virtual studio code there's some m's here and so that tells me that those files have been modified I can use various tooling over here like git lens and git hub but I'm not super familiar with how to use those and I don't really want to spend a lot of time messing around with those just yet so one thing that I can do would be control back tick that is the key below the escape key on my keyboard so that will open a terminal then on the right side of my screen I can then right click on the word terminal and say move panel to bottom and so that then will move my terminal to the bottom and so I think this will maybe make for a little bit better viewing experience for you all instead of having it off on the right I can have it down below and then I have everything vertical and I don't have the screen too horribly spread out so again we see here very much what I had here so again if I do ls I see my license and they read me if I do git status I see that I've modified the license and the read me I'll go ahead and do a git add on license and read me I now go to this like dark green color so it's it's added if I do git status I see that I've modified those and they've been those changes have been added to the repository so they're ready to be committed I can then do git commit and I'll say customize default files and now I'm back to green again if I do git status and I'm ready to push so I'll do a git push and that's then pushed up to GitHub if I come back to GitHub and go ahead and refresh the screen I now see that I have customized the default files for both the license and the read me and that my read me file here has been modified very good we've basically gone around the world as it as it's called where we created the repository we pulled it down to my local computer we made some changes and then we pushed it up what that means then is that everything is firing on all cylinders for using version control with our project so that we can make modifications on our local computer and get it up onto GitHub again this has a variety of different purposes one being that this will make it easier for us to then get this content up onto its own web page GitHub has this feature called GitHub pages if you look at my Riffamonus website that's all hosted on GitHub so we'll do the same type of thing for the final product that we're working on also a big part of reproducibility is transparency and so by making all my code publicly available to you and to anybody else you can then get that code yourselves and use it and muck around with it and riff on it right to do your own project so the next thing I want to do is go ahead and start building out our project to add the software that we are going to be using and so a number of episodes of back I talked about a cool tool called Kanda and its friend Mamba and so Kanda and Mamba come to us from the python world but you can really use them with any type of command line tools like R right cool so we are going to create an environment for this project that I'll call drought that we will be able to dictate what tools we want installed for our analysis so go ahead and create a new file that I'll call environment.yml and so this now is my environment file that I can use to create a Kanda environment and so the first thing I'll do is name and I'll then say drought and so this is written in JSON format and so we'll then do channels so I'll go ahead and then create a list of different channels that I want to include to basically tells Kanda and Mamba where to look to find software that I want to install and so I'll do Kanda forge I'll do base and then I'll do R and now I need to give it dependencies so do dependencies and I'll do the same type of thing except then I need to give it the name of the software that I want to install so R that I know I'll be using is R-base and we'll also then do R-tidyverse and we can also then put in the version that we want but I'm not totally sure the version I want so I'm going to go ahead and build this environment with R-base and R-tidyverse it'll tell me then what versions it's installing and so then I'll update this file um to so we can set the version that we want okay so I'll go ahead and save that I can then do Mamba ENV create and then I'll do hyphenf environment.yml so again this is Mamba environment create and then I'm give it the name of the file and it will then install these two dependencies for me great so it's created the environment for me called drought and I can then do Kanda ENV list I see that we've got this drought um environment that I can then activate as it says up above with Kanda activate drought and again if I do Kanda ENV list I now see that drought has the star indicating that that environment is um activated I can also then do echo dollar sign path and it then tells me that the first thing in the path the first place that it's going to be looking for programs is in my drought environment if I go back up further I can see what exactly was installed and so these are alphabetized so I'm looking for R hyphen base and let's see our base is right here and it's actually using 4.1.3 so it's using a slightly older version than what I've actually got installed on my computer already so I currently have 4.2.1 I think it is and so we know that the difference between 4.1.3 and 4.2.1 is very small but one of the differences has to do with the base R pipe character and the ability to basically put the data flowing through the pipeline into a specified position I'm not going to be using the base R pipe so I'm not really super concerned about this difference in the versions but again the value of using something like conda is that I can then specify the versions I want right so our base I can then put up here equals 4.1.3 and so if I scroll back down to R hyphen tidyverse here that's 1.3.2 1.3.2 save that so I'm going to go ahead and for the time being I'm going to remove the environment and then recreate it with those versions specified just to make sure everything works so again I can do mamba env remove hyphen hyphen name drought and I'm in the drought environment so I can't remove an environment I'm in so to get out of the environment again I need to do conda deactivate and again now I can do the mamba env remove to remove that drought environment it's removing all those packages again if I then do conda env list if I do conda env list I now see that that drought environment is gone but again I can go back through my previous commands looking for mamba env create hyphen f environment dot yml and this will then reload a drought environment with those two versions of our base and our tidyverse and again I can then do conda activate drought and again if I do conda env list but I can't spell that for some reason conda env list I now see I'm in drought and if I do r I see that I'm using r version 4.1.3 and so again this is back from March rather than the June version so I know that it's using the right version I wanted I can also do dot lib paths to get the paths that it's looking for libraries in and so it's looking for all of my r libraries in my mini conda environment drought lib r library right and so then I can do library tidyverse and I now have those tools like ggplot 2 per the plier tibble tidy r read r all those great tools at my disposal for this project right cool so I am going to go ahead and quit out of r now because I'm not going to do anything in r today and I can again do conda deactivate again if I do conda env list I see that I've got my drought environment here it's not activated but when I come back to this next time I will be able to activate that very good so again if I look in my directory I see that I've got my license my read me my environment dot yml I'd like to go ahead and create a couple other directories to help me to organize this project so I'll go ahead and make a few directories I'll make a code directory and I'll go ahead and make a data directory and I'll go ahead and make a visuals uh directory and you'll see now up in my explorer tab I've got code data and visuals I need to go ahead and put something in there because if I do get status I see the only thing not tracked is environment dot yml it doesn't care about code data or visuals because there's nothing in there to track so what I'm going to do is use the touch function to go ahead and touch a read me file in those three different directories so that I can then commit those three different read me files even if they're blank which will then help me to preserve the directory structure right so again I can then do touch code forward slash read me uh dot md data read me dot md and visuals read me dot md right and now if I do get status I see I've got uh code data environment and visuals all set to be committed and added to my repository I'm going to do this in two steps I'm going to go ahead and add the environment so I'll do get add environment dot yml and I'll do get commit and I'll do create initial drought environment I misspelled that so if I want I can go ahead and do get commit hyphen hyphen amend and let's see environment okay save that and quit out of there so I just use nano to edit the commit message no doubt there are far easier tools to do that with visual studio code here um maybe we'll get to that in another episode but for now I want to get everything committed and moved along so we can get to the next episode again now if I do get status I see that that environment file has been committed and now I've got data code and visuals um that need to be committed as well so again I can then do get add and I'm going to do get add period so get add period will commit everything in those three directories that's often kind of dangerous um so be careful right so I'll do get add period and if I do get status I now see that I've added those three files I'm actually going to add um a data directory to my dot get ignore file because I don't want to save those big files that are going to be in the data directory to my repository okay so let's go ahead and commit these so we can do get commit and I'll do add project organization right and now I'm going to modify my dot get ignore file right and so to do that I will come up to the top here and I will do data forward slash and save that and so now we're going to ignore anything that happens in that data directory right again if I do get status I see that um the dot get ignore file has been modified and so now I can do get add dot get ignore get commit and I'll say ignore uh data files all right so if I were to say touch uh data forward slash test dot txt I can then do ls data right and I see that I've got test dot txt but you'll notice my main here is green right and if I do get status I see I don't have anything that needs to be committed even though I've got test txt right and that's again because I modified the dot get ignore file and so we're going to be downloading in the next episode a really large file that I don't want to have committed uh to my repository and pushed up to github because github kind of limits the size of your repositories and I don't want anything too uh gnarly and large pushed up to github all right so I will go ahead and remove data test txt and I will then go ahead and do a get push to push everything up to github now if you want to learn more about how we can download files in a reproducible way in an automated way you want to be sure to check out the next episode which I will show a link to over here on my right be sure to check out that next video as we process through uh working on this project to create a global map of a drought index well tell your friends about what we're doing here on this cool new project and I'll see you next time for another episode of code club