 Hey, folks! Many of you will remember that before I started talking about how to visualize climate change data, I spent a fair amount of time talking about the ecological problem and statistical problem of rarefaction and whether or not you should be rarefying your data, what are alternatives to rarefaction. If you haven't seen that before, I'll put a link up here in the video so you can go check out the rest of that series, but wait until you finish this episode before you go do that. Anyway, I have taken that analysis that I did with one dataset and kind of did, you know, in a kind of demonstration type format here on YouTube and I've been trying to formalize it and turn it into an actual paper. Well, because there's been a span of time between when I made those videos and today, some of the packages that I use changed, right? So, vegan was a package that changed and that's some of the arguments and some of the interfaces and some of the output from vegan changed from the version that I used back at the beginning of the year versus now. They released a new version of vegan in the meantime. Also, the computer that I was doing this analysis on for the paper I'm writing is on a high-performance computer cluster and it used a version of breakaway that was actually older than the version of breakaway that I had on my local computer, right? So, if you're getting the idea that versions of these different packages can matter, right? It can affect our interface, the arguments that we give to the functions. It can change the functions themselves, have more functions or fewer functions perhaps. It can change the algorithms kind of under the hood of those functions and change how those functions actually work. And it can also, of course, change the output of those functions. And so, if any of that changes, then it's going to cause our code to break. I know that I am not great about documenting the version of the software that I'm using. I might be doing an analysis on my computer here, on my desk here, right? Using one version, you know, perhaps our version 4.1 and up on the cluster, it might be our version 4.2 and I might have different versions of ggplot in the different places. Most of the time, this doesn't really matter. But again, if we kind of scan forward months or years, that can certainly matter. So I'm going to spend today's episode talking to you about a cool package called RNV that allows you to keep track of your R package versions so that you can always have the correct version of the packages for your project in your project. And that what this also allows you to do is then share your project with somebody else. And they then will also have all the same versions of the packages that you're using as well. This is really good for improving reproducibility in terms of keeping track of the right packages that you're using from the R environment. For today's episode, I'm going to be building off the project that we've been developing for visualizing change in climate data. If you want to get a copy of this project, go down below in the description to a link to a blog post that will get you everything you need. I don't know that you necessarily need a copy of my version of this project to follow along. But you know, if you want to, that would certainly be awesome. All right, so we are in my session of RStudio. You'll know that I'm in the climate vis directory that project, because up here in the upper left corner of the console, it has the path to my climate vis directory. Alternatively, I have another RStudio session running at the same time that is in my home directory. Okay, so I'm going to be using these two instances of RStudio. As I go through to illustrate to you how our end works and how we can use it for a project. So first things first, let's go ahead and install our end. And so I'll go ahead up here to install. And then I'll type Rnv to install the Rnv package. Wonderful. So that's installed. One thing I could do would be to type library Rnv. Alternatively, if you don't want to run that library command for some reason, an alternative to library would be to do Rnv colon colon, and then the name of the function that you want to run. So the first command that I'm going to run from Rnv is init. So init is a function that will initialize this project. Again, I'm in my climate change project. So we'll go ahead and run that. So running Rnv init took a while to run. I'm going to come right back to the top and do a little bit of a dissection of what happened when I ran Rnv init. The Rnv init function starts by initializing the project, discovering the package dependencies, it basically looks through all of the R scripts that are in my project directory system, right? So I'm in my project root directory, I'm in this climate vis directory I've got here. And so it's looking in all the files in that directory, as well as the files and all those other directories that I have in here, and looking for any place where I called library to identify that package to install that package, and all of its dependencies, right? So when I call library tidyverse, Rnv init says, ah, we need to install the tidyverse packages. Okay, the next thing it does is copy packages from the cache. And so it identified 27 packages that I need to have dependencies of in this project that I already have installed on my computer. Basically, what we have on our computer is a library directory, typically off of your home directory, where all of our packages are getting installed. What Rnv is doing is creating a library of packages within our project, right? We'll see that here in a moment. But instead of redownloading installing all those packages that we already have installed on our computer, it's basically copying those over into our project library. Next, it goes through and identifies those packages that I didn't have or perhaps had older versions of, and then comes through and installs newer versions of those or just gets them from scratch. And so I have a handful of packages in here that were a bit out of date. And so it goes through and installs all those things or updates them for me. Then it says at the end, the following required packages are not installed lattice and mgcv consider reinstalling these packages before snapshotting the lock lock file. So I'll go ahead and install those. So what we can do is Rnv install. And we can say mgcv that installed. And then we can also do lattice as it suggested. And so now we've got everything installed and we've resolved all those problems. So what we see from Rnv in its output was that it wrote to what's called a lock file. And so this is in my climate viz directory, my project directory, and there's a file now called Rnv lock. And so if you look down here in my files, and go Rnv lock, we then see all of the different packages that are required for this project, right? And so we can see all these different packages, many of these like blob, I've never heard of that package, right? That's getting installed because it's a dependency for something else. And so I don't need to worry about that, right? And again, you can come down through here. And there's, you know, hundreds of lines of dependencies in here, specifying the package name, the version, where it got it from, and some information, like the hash, so that it knows that when it downloads this package again in the future, that it gets the right version, because it'll have the same hash identifier. Anyway, I'm not going to go through all that because I don't know if we look at that that's 1343 lines of stuff that I don't need to bore you with. Okay, so let's go ahead and close that. Some of the other things that were created. We can see in our git history or staging area, actually, that we have a variety of files that were created. So as I already mentioned that are in block, there's also an dot our profile, and the RN directory. So if we look at the dot our profile, so this dot our profile file is run when R is started. And so the first thing that R does is it runs this activate dot R script from the RN directory. And so then if you go into our end, we see that there's a variety of different files in here, including activate dot R, which has a whole bunch of code for loading and making sure you have loaded the right packages and the right versions. You'll also see library and staging. If you look at the dot get ignore file in here, it tells you to ignore the library, the local seller lock, Python and staging, right? So it's ignoring the library and the staging. The library is where we have our R packages, right? And so here, then you can see all the different R packages that go with this project. And again, this is a library directory that's off of my climate vis directory. Okay. And so again, the dot get ignore is nice, because I don't want to be committing all those packages up to GitHub. So I'll go ahead and close this dot our profile file. And I will then go ahead and take a snapshot of my project. So I can do our end colon colon snapshot. So the output tells me that there's three packages that are going to be updated matrix lattice and MGCV lattice and MGCV. You saw me just install with our end install, Matt matrix must have been a dependency of those, right? So do you want to proceed? Yes. And so now it's updated the lock file, right? So one of the things to note is that the snapshot basically gives a save, if you will, of the current packages in your project. And so I could do our end restore. And that will take all my packages back to the previous snapshot, right? And so the snapshot is always a way to take a time capsule, if you will, of the way your project look, I told you that this creates a library path within your project. So if I do dot lib path, so the dot lib paths function gives you a listing of the different directories that are looks for packages. And so what you'll see is that the first place it looks is in my climate vis project, right? The second place it looks is on my computer. So this is like the system library, so packages that are available to everyone using my computer. But that it puts this project library first means that it's going to look there first for say ggplot or dplyr package, any package that I'm using in my project, right? So it's basically putting that before everything else. So if I come back to this other version of our that's again in my home directory and not under the control of our end and do dot lib paths, I then see I've got three paths. So the first is off of my home directory. So users pshloss is my home directory, which is this tilde forward slash right. And then I've got the system wide computer wide library. And then the third is a library that the University of Michigan tax on as well. And so what you can see is that when I'm outside of my project and not under the control or watch of our end, it's going to grab my packages from my home directory library. Whereas again, if I'm in my climate vis project under the watchful high of our end, it's going to grab all those packages from this library that's within climate vis. So that's really cool, right? So for every project I have on my computer, I can have a separate library and keeping track of the different versions of all the different packages that I'm using in those different projects, I might be using three different versions of GG plot to across all those different packages. So you might be thinking like, Well, wouldn't you always want the latest greatest version of GG plot to? Yeah, well, maybe. But sometimes there's only small incremental changes when they release a package. And so I don't really feel the urgency to update that and worry about what other things might change. You know, when I run my scripts, right? So to help illustrate how these paths work and how they're different between the two different instances of R that I have going, I can do find dot packages, find a package and let's do GG plot to. And I see that it's again finding GG plot to when I'm in my climate vis directory within that library. If on the other hand, I come over here and do find dot packages on GG plot to. So what I find when I'm in my home directory actually, is that the version I'm using outside of the climate vis project now is coming from that my workspace library, which was the third path in the output of dot lib paths. Okay, so again, hopefully this illustrates how you can know where it's grabbing the package it's using, right? And so if I were to do like find dot package, dplyr, it's probably the same place. Yeah, it's the same place. Again, if I did it in my climate vis, I'd get dplyr here and I spelled it wrong. I can find it my project library. Let's go ahead and install a couple other packages. You've seen me do this where we did our end install, right? And then I put it in like MGCV. Well, I want to install the Wes Anderson package. And so Wes Anderson is a series of color palettes inspired by movies produced by Wes Anderson, what you can do instead is install dot packages. And we could then put Wes Anderson. And so the RN environment is smart enough to know that install packages is the same as our end install. And so again, we see very similar type of output to what we saw when we did that RN install MGCV, again, downloading installing Wes Anderson and moving it into the cache. And so now I can do find dot package, Wes Anderson, and find it is in my, my project library. If I come to my home and do find dot package, Wes Anderson, it says there's no project called Wes Anderson, right? So Wes Anderson as a package only exists within my project, right? It doesn't exist out here in the rest of our on my computer. So that's again, highlighting that we can install packages and versions of packages only within the project that we're working on. So I've installed that Wes Anderson package into my project. And if I'm in my RN block, and I do Wes Anderson, I don't see anything there. And that's because I haven't taken the snapshot yet of the of the project. So again, I need to do our end snapshot. This then writes to the lock file. I'll go ahead and close the RN block file and reopen it and do another search for Wes Anderson. And nothing is there. And that is most likely because I haven't actually used Wes Anderson anywhere in my project. So I will create a new dummy script that I will put in code and save this as dummy Wes Anderson. And in here, then I'll do library Wes Anderson. And I'll save that. And then I'll go ahead and source code forward slash dummy Wes Anderson runs that everything is good. And now let me do another snapshot. And so now it's saying the following packages will be updated in the lock file, Wes Anderson from nothing to 0.3.6. Do you want to proceed? Yes, I'll do that. It's now been written. And now if I come back up here to my lock file and do a search, I'll probably have to close the RN block file, reopen it and do a search now for Wes Anderson. And there we go. It's been updated into my RN block file. So now I would need to be sure to go ahead and commit all these different changes I've been making, especially after I make those snapshots and the RN block file gets updated. I need to be sure to commit that change so that whenever somebody gets ahold of this in the future, including myself, we have the correct version of those files, right? So I can now go ahead and do commit. And my commit message, I will say, initialize our end control of project. And we'll commit that. I'll close that. And then I will push that up to GitHub. Now on GitHub, I can see that I've got the RN files and directories all pushed up to the repository. What I want to do now is simulate what would happen if you were to get a copy of this project. So I'm going to go ahead and copy this link to the repository. And I'm going to go ahead and create a new project. And we'll do that from version control. And it's get. And so we'll put that there. And then I'm going to call this climate viz test. And I'm going to put that in my home. So let's come back to my home directory there and we'll do open create project. So this is my new climate viz test project. And as you can see, there's some new output here when we started are. And so it says bootstrapping RN 0.15.5. That's the version of our end I'm using. So it downloaded and installed that it loaded that it then said project libraries out of sync with the lock file, use our end restore to install packages to the lock file. So I'll go ahead and do that. And this now says that we're going to basically install all these new packages. They're going to be updated. And so do you want to do that? And I will say yes. And if this went very quickly, because of course I already have these installed on my computer, it's stored in the cache where it can look for libraries. And it just went ahead and copied all that stuff over, right? So again, if this were you using this version of climate viz, you would be good to go using the same version of the packages that I've been using. More immediately, for my own use, I can develop on my laptop, and I can also develop on my high performance computer, right? I can be pushing things back and forth with version control, using the RN block file to keep track of the different versions I'm using. If I want, I can go ahead and update the version of a package I might be using, or I can downgrade it to use the older version. Again, RNV is really nice as a way to keep track of the different versions of packages that you're using for your project, and to maintain that consistency and reproducibility at the software level. So I see RNV as a great gain, because not only can we keep track of our code and keeping our code reproducible, but we can heighten the reproducibility of that code by also linking to that now, the actual versions that R is using. So it's one thing to say that I'm using dplyr version, whatever, right? But it's another thing to have R enforce that. And I write that down effectively into that lock file, and then R is getting that actual version of that package. And so that enforcement is really nice. And so I don't have to worry about what versions I'm using. One place where this does fall down, however, is on things outside of R, right? And so a lot of our packages depend on things outside of R. So one example that's often used is for our markdown documents, it depends on Pandoc. And so Pandoc is not part of R. And so RNV doesn't install Pandoc. And so that's a challenge, right? The other thing it doesn't keep track of really is the version of R. And so while that lock file says what version of R I'm using, it doesn't actually install whatever version of R that I had been using, right? And so if you're depending on other software outside of R, outside of the R package environment, you're going to have to come up with another solution for that or just, you know, manually keep track of those versions. Please make sure that you subscribe to the channel because in the next episode, I'm going to show you yet another approach to keeping track of different versions of your software. And so this is going to be something using a tool called Kanda, also known as Mamba. They work together. But that will allow us to keep track of all software across our computing environment, not just our packages. Again, I think RNV is really powerful. But it does have these weaknesses that I think Kanda really helps to solve. That being said, Kanda isn't perfect and has a little bit more friction to getting going and maintaining. So I hope that you join me for that next episode. Go ahead and try to use RNV on your own projects. And we'll see you next time.