 Hey, folks, I'm in the midst of a series of episodes where I'm going to try to build a visual depicting the amount of drought across the globe. And so we're taking this in small steps, about, you know, 10 episodes or so. And what I really want to highlight, in addition to our programming skills, is the ability to engage tools that will enhance the reproducibility of our project. Because ultimately, what I want to be able to do is rerun all my code every night to get a fresh download of the world precipitation data to then compare what's happened over, say, the last month to the last 130 years of the last month. So here in Visual Studio Code, I might accidentally call it Virtual Studio Code. I don't know why I do that, but it's Visual Studio Code. I have this driver dot bash script that we ended off with in the last episode. And so you'll see that there's four commands that are being run here with two different bash scripts, the first on lines five, nine and 12. It's the same script, get ghcnddata.bash, and I'm giving it different files that I want to download. Then we have this line, code get ghcndallfiles.bash, that basically lists out all of the names of the files in ghcndalltargz, right? And so you could imagine that if I say modified this file, where I'm listing out all of the files in this archive, I wouldn't want to have to rerun driver bash because, you know, the only thing that changed was line six. I wouldn't want to have to redownload everything on line five for this gigantic archive that's like three gigs and took like 10, 15 minutes to download. I wouldn't need to get the inventory or the station's files either. So this puts us in a bit of a predicament, right, that how do I make sure that I am running the right code, right, the right script, because a couple of things might happen. So first of all, I could modify get ghcndallfiles.bash, and so I'd want to rerun that. Alternatively, it's also possible that, you know, another day has passed since I downloaded that large targz file, I would download that and that would now be newer than the output of this get ghcndallfiles bash script, right? So then I would want to make sure that I rerun that, right. So what we're running into is a problem of dependencies and keeping track of our dependencies. At this point, we have four lines of code. It's really not that hard to keep track of all of our dependencies. But we can begin to see how this might quickly get out of control. And so what we're going to do in this episode is replace driver bash with a special file called a snake make file. Now, if you're a long time viewer of this channel, you know that in the past, I've used a tool called canoe make, or just make, right? And so I'm a big fan of make. But the people in my lab finally convinced me that, you know, make is from the 70s hasn't changed a whole lot. And it's kind of kludgy to use with data analysis. So they have convinced me to use snake make instead. I've been using it for a recent project that I've been doing for my research. And boy, do I really like it. I just like a whole lot more than make. It's just a lot more intuitive. And so in this project, we're going to use snake make just to kind of get exposed to some of the tools and power of snake make, there's far more in snake make than what I will cover in this episode. So I'd strongly encourage you to go to the snake make website, and check out all their great tutorials. It just takes maybe an hour or two to go through the tutorials and get a sense of what's possible with snake make. So I'm going to come over to the Explorer here in virtual studio code. And I'll click on the new file icon. And here I'll then type snake file. This will then create my new snake file. So the snake file is a collection of different rules. And so we'll start by doing rule, and then the name of the rule. And so what I'm going to do is I'm going to basically going to make four different rules here. And so the first is to get the get all archive colon, right? And so now I have a rule called get all archive. This is going to have inputs. And I'll say input, and it will also have output, right? And then it's also going to have parameters. So we'll have prams. And we will also then have shell. And so this is what's going to be run at the shell. So the only thing that snake make needs to keep track of is the name of my script. So that is the only input. So I'll go ahead and put that here. I'm going to name this. So I'll say script equals that. And I'm going to put it in quotes. My output then will be the name of this file, right? So the GHC and D all tar GZ. So I'll go ahead and copy that. And this also then is going to be in the data directory. So I'll add a data forward slash that. And I'm going to put this in quotes. So I don't need to name output. I don't necessarily need to name the input either, right? You'll see where I'm going with this as I kind of lay out the different inputs, parameters, and the shell command. I could say this is my archive equals like that. But let's leave it there for now and we'll come back. And then my prams are the parameters that I'm passing into my bash script, right? So the only parameter is that GHC and D all tar GZ. I'll say file equals this. I'm again putting everything in quotes because it's being treated as text. Snake make is really a Python script. I know like no Python. So I can do this and you can too. Again, the amount of Python that you need is really minimal. Then for the shell, I'm going to go ahead and what I like to do is use three quotes, three sets of quotes to define a chunk for running the shell command, right? And so what I can then do would be to use curly braces to do input period script, right? And so what is going to go in here is input period script, right? So this is going to get inserted in here when it runs. And then I will also now need to add the params, right? So I could say params dot file. And so here we can kind of see that I don't really need the dot script or the dot file, right? I don't really need to name it file here or script here because I could use straight up input, right? Because that's the only value. That's the only value within input. Also, you'll see here that I don't ever call on output dot archive. So I'm going to go ahead and remove that also go ahead and remove that file equals from the params. So to run our snake file, we of course need snake make. So I'm going to go ahead and go into my terminal, which again, here in virtual studio code, I can do with control back tick. That's the key below the escape brings me to my terminal window where I've got some bells and whistles in here. The one thing you might notice is that my main is in red. So main is the branch I'm in. And if I do get status, we'll see that our snake file has not yet been added to the repository. And so red means danger, green means we're good to go. So by the end of this episode, we will have our snake file added to the repository and we'll be ready to move on with everything else. So the first thing we need to worry about though, is getting into our conda environment that I call drought and installing snake make. So we'll do conda activate drought, and then conda env list. And I see now that drought has a star on that line telling us that that environment has been activated. Now what I want to do is an install snake make. So to install snake make, we'll also need to be able to also get Python. Because as I mentioned, a snake file is really a Python script of sorts, right? So to install snake make, I'll go ahead and do mamba install. So mamba is pretty analogous to conda. It runs a lot faster though. So for installations and creating environments, I like to use mamba. We'll then do C hyphen C conda forge hyphen C bio conda bio conda is where snake make lives. And conda forge is kind of a general use channel that other dependencies for bio conda might live. And then I can go ahead and put snake make very good. It all installed along the way. It does prompt you for whether or not you really want to install everything that it might need as a dependency to load snake make. I said yes, so we're good to go. I do need to update my environment file however. And so up here, I'll go ahead and after base, I'm going to put hyphen bio conda. And I also want to add my snake make dependency. So I'll do hyphen snake make. And I want to make sure I've got the right version. So let's come back up through all this dialogue. And we'll see snake make here is 7.14.0. So I'll go ahead and copy that and say snake make equals 7.14.0. And I'll go ahead and save that again, before I wanted to be done with his project, I'd probably remove the environment and then recreate it from this file to make sure that everything is using the versions that I said it was. But for the time being installing via conda or mamba like I did here should be should be good enough. So I'll go ahead and close out that environment. But we now see that it's been modified over here in our explorer. And again, when we commit the changes to snake file will also commit the changes to our environment file. Now we are ready to use snake make. We have a snake file with a single rule in it. And so what I could do would be to do snake make hyphen hyphen dry run space get all archive. And so what this is going to do is a dry run, basically run through my snake file and make sure everything is formatted properly. It's not going to actually run it. But it's going to make sure that everything looks good. And it's going to do that using the get all archive rule. Of course, I only have one rule at this point. But we'll leave that in there just to be explicit. So we see that there's actually an error when we ran dry run. So it's good that we did the dry run just to make sure everything looks good. And it's saying that params has no object has no attribute file, right? And so down here, I used params period file, but up here, I have params and nothing, right? So I could add file equals that, or I think what I'll do here is or do params with nothing. So now if we run the snake make dry run again, everything goes through swimmingly, no problems. It tells me that there's basically one rule that would need to be run to get everything up to date. Okay, alternatively to removing the dot file here, again, I could then say file equals all that. And if I save that, and then redo my dry run, this also works just fine. So an alternative to dry run that I also like to use is hyphen NP. And so I can do again, snake make, and I'll remove that dry run to do hyphen NP. And so that's the end is basically don't run it. And P is going to print out basically the content of this shell statement. And so we see at the output here is the line of code that's going to get run at the bash terminal. Okay, so again, this takes about 12 minutes to run. So I'm going to hold off on running that for now. And I want to add another rule. So we'll go ahead and minimize this a bit. And I'm going to create a rule that will be get all file names. And that's going to be the next line in my driver bash, right? So that's this. So I'm going to go ahead and bring this down here. And we need to now get this rule to be in the right format. And so the input, I will say I'll do script equals this, right? So that's my script. I also have an input file that is a dependency. So I'll say archive equals. And again, if I come back, that's this GNCD all tar GZ. And it's in the data directory, right? So that is a dependency, even though when we run this, we don't give it that dependency, right? Now we'll do output. And the output of this, again, if we go to our all files here, right, is this data GHD all files dot txt, right? And then that needs to be in quotes, we don't have any parameters that are going into this specific bash script. So I can leave out the params directive. Again, input, output params shell, these are what are called directives. So I don't need that params, but I do need shell. So I'll do shell. And then again, I'll do the three quotes to define the body of the code that I want to run. Having this kind of header and footer of those three quotes, allows me to have multiple lines for a shell statement. If you had a single line, you could wrap it in single quotes. So I will go ahead and do this curly braces and do input dot script. And we will then yeah, that should do it that should run it because as we saw with our driver dot bash, this was a single a bash script. And we don't explicitly feed in this archive to it. So that should be good. Maybe what I'll do is I'll go ahead and show you what it would look like without those three ticks, right? We'll remove all that save that. And so now if I wanted to run this get all file names, what I could again do would be to say snake make hyphen NP, and then get all file names. And I get an error message, right? And so this is the most common error message that I get. So if you've been using make, because I taught you how to use make a while back, you'll know that with make the most common error is using spaces instead of tabs. Well, the most common mistake that I have ever made with a snake file is forgetting commas, right? And so at line 15, it says there's a syntax error. Perhaps you forgot a comma, right? Sure enough, I forgot a comma. So within a directive, so within input here, I've got two different things that I'm going to be setting as input dependencies, right? And so I need to separate those with commas, right? So again, this is a pretty common mistake that I make. And so it's important to put commas between these two lines. The other thing I'll point out is that if you name one of the arguments within a directive, you need to name both of them. Okay, so we'll go ahead and save that and try this again. So let me go ahead down here into the terminal. And you'll see that when we ran this, that it needed to run get all archive and get all file names, right? And so then to run get all archive, it's going to run this line like we saw before. And then it's going to go ahead and run get ghcnd all files dot bash to get the second, right? And so then this is showing us the dependencies that need to be updated and fulfilled before moving on. So one thing you'll notice is that when I ran this snake make hyphen NP with get all file names, that's actually the second rule, right? And so because this archive is a dependency, it knows it needs to get that dependency as well, right? And so that that is generated by get all archives. So we could see in that output there that goes through and it generates both of those dependencies. And we'll add another rule. And so this I'll go ahead and grab for the inventory. So I'll say rule, get inventory, right? And then we'll do input. And then our script will be this, right? Actually, that's not the whole script. This is going to be a params, right? But that's our script. There's no other input just the script. And then our output will be data forward slash this, right? And again, that needs to be in quotes. And then we have params, because that and then our params is the ghc nd inventory dot text. And here I'll again put this as file, all that, right? And then we can then do shell. And again, I'll use those three quotes, because that's my typical convention rather than putting things on a single line. Again, I try to anticipate that these types of files and these types of rules might evolve as I add more information to them, probably not, but I like to kind of leave open that opportunity. So I'll use those three double quotes. Again, we can do input dot script. And it then takes the params, I'll do params dot file, save that. And then we can again do the snake make dash NP get inventory. And again, that tells us that to run this, there is a single rule that needs to be run. Again, inventory doesn't depend on that big archive. And that this then would be the code that it would run. So our fourth rule. And so we'll do back here to driver bash, where we'll get our station metadata, right? And so let's copy and paste that in there. And so we'll do get station data. And again, we'll do input with our script. And here we'll cut this up here. Right. And then our output will be this and we'll do data forward slash this GHD stations, right? And we'll also need a params. And so we're starting to see that a lot of these rules are fairly analogous. And we'll do file equals that GHD stations dot text file. And then we can do shell. And again, it's the same thing that we had up here. So I'll copy and paste this down and clean this up a bit. Great. And so now we can double check that that works by doing snake make hyphen NP get station data. And again, we see that that runs without an error message, we're in good shape. Great. So now we see that we've got these four rules, right? And we know that get all file names depends on get all archive. But if I wanted to get the inventory file and the station data file, then I would have to basically rerun this three times, right? I could do get all file names. And that would also get all archive. But it wouldn't also get me the inventory file or the stations file. What I want to show you is again, if I come back to my terminal. And I have been doing snake make hyphen NP and then the name of the rule. If I did hyphen NP without the name of the rule, let's see what would happen. Well, we find that it runs the very first rule, right? So it's only running one rule, which is the get all archive. And that happens to be the very first rule of the snake file, right? And so what we could do would be to say rule targets. And so this is going to be a listing of the targets that I want snake make to generate, right? And so this is going to be my output file. So I'll go ahead and put those in here. And again, separated by commas. And I don't need to name them because I'm never going to explicitly depend on them. Right? So I'll go ahead and grab this all files as well. And my inventory. And then we'll also need our stations file here, right? All right, so put that comma in. And let's get this all lined up. Good. And so now, if I do snake make hyphen NP, it will run targets because targets is the first rule in the snake file. And I'm getting an error line to expecting rule keyword comment or doc string inside a rule definition. I forgot to put input. In a rule, we need at least one directive. And so I can do input, colon, right? And so these are the inputs to generate the rule targets. There's no output per say. And so basically what it's going to do, it's going to say we need to generate the rule targets. And the dependencies for targets are these four files. Now we need to go look for rules to generate those four files. And that's what's found everywhere else. Okay, so we'll go ahead and save that and rerun snake make NP. We now see at the very top here, there's five total targets, our four get rules, as well as the targets rule. And then as we scroll down, we see the different scripts that get run to satisfy those rules. So to run it for real, I can do snake make. And I can do hyphen C one. So hyphen C for one means use one core one processor. There's ways to parallelize what's going on here. But for now, I'm going to stick with one to keep things nice and simple. I'll go ahead and run that. This again tells me what jobs, what rules need to be executed. And it's going to start going through all of those. Okay. And this will take maybe 1314 minutes to run. And then we'll come back and I'll show you some more cool things we can do with snake make. So we see that it got through all five of the rules. And we can kind of go back through here and see how long it took to get each of those files. But it's all good. If I again run snake make hyphen NP, it should now tell me that nothing needs to be done that all of the requested files are present and up to date. If I do LS hyphen LTH on data, we see that all of the files were created after 1122 on September 8. If I do LS LTH on code, we'll also see that these both of these files were last saved prior to 1122, right? So this get ghcnd all files dot bash was last modified at 1043 on today, September 8 as I'm recording this, which is newer than all those, right? So what I want to do is a small experiment where I can touch. So I could do touch data forward slash ghcnd all dot tar dot gz. So what this will do is that this will take this timestamp of 1136. And it'll make it like 1139 or 1140. So if I do LS LTH on data, I now see that yeah, it's newer than ghcnd all files dot txt. So what would happen now if I do snake make hyphen NP? Well, what it should do is go ahead and run the rule to create ghcnd all files dot txt. So we're going to snake make hyphen NP, we see that it needs to go ahead and run the get all file names, which is what we predicted. So again, I'll do snake make hyphen C1 to go ahead and extract those names from that archive. Again, we gave it the targets target to run. And so it's saying over the four files that we have, this is the only one that needs to be updated. All right, so now if I do LS LTH on data, I now see that all files is newer than that archive, right? If I had gone ahead and modified something in code. So if I come into the get ghcnd all file, and go ahead and add in here a comment, extract file names from archive, I'll save that. And so now if I do LS LTH on code, I see that that was modified at 1143 LS LTH on data, I see that that is older that ghcnd all files is older, right? So now if I do snake make hyphen NP, it tells me that I have modified the get all file names rule or the dependency to the all files rule, and it needs to be run, right? So again, I can do snake make hyphen C1 to use that one processor, it's going to run that to extract those names, and we'll be good to go just to prove it to ourselves. If we do LS LTH on code, and on data, we see that 1143 was when the bash script was modified, and that this then the text file is generated 1144. So it's more recent. And again, if I do snake make hyphen NP, it tells me that there's nothing to be done. All of the rest requested files are present and up to date. So the last thing I'd like to do with you is show you how we can visualize the DAG, the directed acyclic graph, whenever we have run these, the first line says that it's regenerating the DAG, right? So what is the DAG? Again, this is a directed acyclic graph, meaning that you can think of each of our dependencies as being nodes on a graph on a pipeline. And the connections between those nodes between those files are directed, right? We have input going to output. And so we can visualize that as a DAG using a special function called dot, and we can get dot from a tool called graph vis. So again, I'll do mamba install hyphen c conda forge. And then we'll do graph vis. And so that installs graph is again, we can come back up and look at the actual version. So graph is 501. I'm going to add that to my environment, to my dependencies. So we'll do graph is like that. And then I need to put an equal sign, save that. All right. And so now what I can do is I can take snake make, and give it the argument DAG, and I'll give it targets as what I wanted to build the DAG of of kind of the rules and the dependencies required to build out all those targets. And then I'll pipe that to a function called dot. And dot will take the graph notation and render it as a PNG, if I give it capital T, and then PNG for that format. And then I need to redirect this output to tool to a file called DAG dot PNG. I can now see how these different files depend on each other, that we've got get all archive. And that's that rule to get all archive that then is fed into get all file names, get inventory, get station data, and then that feeds into targets. Again, we have a very simple DAG at this point. But that all feeds into targets. And so we can see that if I were to modify get all archive, that that would need to then rebuild that as well as get all file names, and then targets, right. And so it's nice having this PNG file to help visualize what's going on in snake make, of course, as it gets more complicated, this gets a lot more complicated and can be pretty hairy. For the time being, I'm going to go ahead and remove the DAG PNG file because it's not really critical to the progress of this story. But know that it's a useful tool to be able to visualize what's going on. And you can give it any target you want, you wouldn't have to just give it the targets rule. And you can also give it file names, if that's interesting to you as well. So we have some stuff to clean up. I'm noticing that when I created snake make, it created a dot snake make directory. This keeps track of a lot of information about running snake make log files metadata, all sorts of other stuff that I don't want to worry about. And I also don't want in my repository that it can get really big. It's also unique to my computer. And so it serves you no use to have that my copy on your computer. So I'm going to add to dot get ignore dot snake make. And so I'll come down here and create a new section for snake make. And I'll say dot snake make save that. And so now when I saved that, you'll notice that snap dot snake make is now grade, which is kind of the convention to indicate that get is no longer tracking that file. So I'll go ahead and close that. And if we look at our get status, we see we've modified a fair number of things. I'll go ahead and do get add snake file dot get ignore the code get all files right. And then my environment YAML file. So those are all loaded. And I can now commit dash m and I'll say initialize snake make workflow. And I realize I still have my driver dot bash file in here. So I'm going to go ahead and remove that. And then go ahead and amend my commit. So I'll do get our m on driver dot bash. If I do get status, I see that that's been deleted. I can do get commit hyphen hyphen amend. And I'll save that and quit out. And that now get status is now clean. And I no longer have that driver file. And that function of the driver file is now going to be run by our snake file, right. And so that will help us to again, keep track of our dependencies as we work through the rest of the episodes in this project. In the next episode, I am going to show you how we can add an R script to our snake file. We can make an executable script that is actually run in R rather than bash like we saw in today's episode. So that you don't miss that, make sure that you have subscribed to the channel and you've clicked the bell icon for notifications. I've also put a link over here to that video. So you can click on that when you're done with this one.