 Hey folks, welcome back for another episode of Code Club. Taking a pause on developing new data visualizations from climate change data to spend a little bit of time talking about principles of reproducibility. We've spent a lot of time talking about writing reproducible code, right? We make these R scripts, I've talked about putting them up to GitHub. As I talked about in the last episode, that's great, but it's only part of the story. Another important part of the story is keeping track of the software we're using. And so in the last episode, we talked about a package called RNV, which keeps track of our packages for us. One of the challenges with RNV as a tool is that it only keeps track of our packages, right? It doesn't actually keep track of R itself. It won't keep track of say what mother version you're using, or what other tools you might be using to run your project and to write your paper, right? And so the example that I used was that when you create an R markdown document that gets rendered into a word document or HTML or PDF using a tool called Pandoc. Well, RNV can't keep track of Pandoc. So what are we to do to keep track of things like our R version, Pandoc version, and perhaps also our R packages? Well, in today's episode, I am going to share with you another strategy, which is to use a set of tools from Kanda. So Kanda is a package manager. Initially, I think developed for Python, but like all good things, it's no longer just for Python, but also incorporates things for R and all sorts of other packages. The mother software package that my lab develops is written in C++. And you can get a latest version of that from Kanda to run on your computer making the installation just super easy, because it keeps track of all the dependencies that you will need. Before we jump into installing and using Kanda, I'm going to create a branch that was based on the state of the repository before I brought in RNV. And so as we've seen in previous episodes, I can go to my history, I can then come to the last commit before I initialize the RNV project. And I will click on these angle brackets. And so I'm now going to be browsing the repository at this state. And so you can see here, there's nothing in here about RNV, right? So now I can create a branch from this position, right? And so I will now call this Kanda. And so I'm going to create branch Kanda from this commit. Great. And so now I have the Kanda branch, as well as the main branch. So coming over to our studio, I can come to my git tab and do a poll on that. It will bring down a new branch the Kanda branch for me. And so I can then come and instead of main, I want Kanda. And so it switched to new branch Kanda. So I restarted our studio because when I initially started it for today's episode, it had our end. And so we had a little bit of dialogue down here in the console. So I restarted it because I no longer have our end in my directory, I no longer have that dot our profile file that we had before. And so now when we've restarted our that goes away. Of course, I still have this RNV directory in here. Because in our end, there's these two directories library and staging that the git ignore file was ignoring, right? So I want to ignore our invent on this branch. So I'll go ahead to my git ignore. And I'll then add our end there. I'll save that and close git ignore. Go ahead and refresh. And now I see that I've got a modified version of git ignore. I'll stage this and commit it to ignore directory from the RN branch. Maybe I'll just say ignore the RN directory from that branch. Okay, so we'll commit that and close. Now everything is cleaned up and we're ready to move on and use Kanda. Because in Kanda, I'm going to install our as well as a variety of different packages, that's going to make things a little bit funky here in our studio, right? So I could try to install Kanda and all the different stuff here in our studio using the terminal window. But I think that's just going to get a little too messy. So I'm going to go ahead and quit out of our studio and do everything in a terminal. So I'm going to start at the Kanda website. So docs dot Kanda dot IO will bring you here. I want to go to mini Kanda. So there's anaconda and mini Kanda anacondas. As I understand it, you basically download everything, even stuff you might not need mini Kanda, you get the minimum and then you install stuff as you need it. I've always used mini Kanda. And that's what we're going to use here. So looking down through this page, we see a variety of links for installing the version that you need for your computer. I'm doing this on a Mac. Basically, everything I'm saying here for a Mac will also work well for Linux. If you're working on Windows, things are going to be a bit different. If only because not all the software that's developed is developed for Windows, right? And so all the bioinformatics software we use at a minimum is going to be supported on Linux, and often also for a Mac. And so the safest is to do all this in Linux. I've got a Mac, so I'm going to do it on a Mac. But you know, if you're on Windows and would like to see how this might work with Windows computer, let me know and maybe we can figure something out. But I'm going to proceed with the Mac. Again, things are very similar for a Linux computer. So I'm going to start with the mini Kanda three Mac OS Intel version PKG. So this is a GUI a graphic interface. So I'll go ahead and download this. So double click on this. And this gives me the dialogue that kind of steps me through everything. You could do the same thing, but at the command line, I just will do this with the GUI to show you all what it looks like. But basically at the command line version, you would download a version that ends in .sh. And then a command line, you would type bash, whatever, sh, that's often going to be useful on a Linux computer, because you're not going to have ready access to a graphical interface like I do here on my Mac. All right, so this went through without a hitch, go ahead and click close there. I can then go ahead and delete the installation package. So now if I do Kanda hyphen hyphen version, very good. So it says Kanda 4.12. I know there's actually a 4.13 out there. So I'm going to go ahead and update that now. So to do that, we'll do Kanda update. And then we'll do hyphen n and base. And I'll describe with all this means here in a second, minus C defaults, and then Kanda. And so what this means is I want to update Kanda this last word here. And the dash n base is the name of the environment that I want to update it in. So one of the things that happens is that when we start a new terminal, it automatically puts us into a base mode a base Kanda environment, right? And so we're going to modify the base environment. And this dash C defaults means get Kanda from the defaults channel. Okay, so there's stuff that can be updated. And I think it's going to go up to 13. So we'll go ahead and say yes. And again, we can do Kanda hyphen hyphen version. And we see we're in 4.13. Excellent. Now we want to go ahead and install Mamba. So Mamba is a tool that's basically the same thing as Kanda. It's a rewritten version of Kanda. And Mamba runs a lot faster, which is great for working and building environments for installing software. There is one special place where we'll still need to use Kanda. But Mamba is great. And so we need to install it. So we'll go ahead and do Kanda install. And then we can do n base. And then I can do hyphen, see Kanda hyphen forage. So that's a different channel. So we use defaults to get Kanda, we're going to use Kanda forage to get Mamba and a few other tools. And then we'll go ahead and do Mamba. Again, it wants to tell me what's going on. So I'll say yes. And so now I should be able to do Mamba version, Mamba version. And I've got Mamba 0.24 with Kanda 413. Great. And so now we've got our environment or base environment all set up. Now what I want to do is create an environment for a project that is this climate visualization project, right? So to create that environment, we're going to use Mamba, right? So we'll do Mamba, create hyphen and viz. So we're going to create an environment called viz. And then there's going to be three different channels that we're going to frequently use. And so first, that we'll use is Kanda forage. We'll do hyphen C for bio Kanda. So bio Kanda is a channel that's got a lot of bioinformatics software. And then we'll do R. And so R has a lot of R packages. So we're going to install the R software itself, but we'll also get those packages like say tidyverse or plotly, things like that. And we'll get those packages from the R channel, whereas R itself, I think comes from Kanda forage. And so now what I want to give it are the names of the software I want. And so the R software that we want is going to be R hyphen base. And I'm going to say equals four. And so four means give me that version of R. So four is kind of the base level. So we'll work with that. Maybe we'll do 4.1. And then we'll also do a package. So let's do the tidyverse. And so all of the R packages you can get by doing R hyphen and then the name of the package, right? So R hyphen tidyverse, R hyphen plotly, and so forth. So if we're not sure, we can always Google it, right? And so I will do Kanda tidyverse. And so this then shows me our tidyverse. So I'll go ahead and click on this first link. This then shows me our packages are tidyverse, right? So that R hyphen tidyverse, right? And if I were to do say a dplyr, I would see that there's our hyphen dplyr. And if I were to do something like mother, so mother is the software my lab develops, and you can see it's available under bio Kanda. Great. So I'm going to do our hyphen tidyverse. And I'm going to go ahead and put in the version. So the 1.31, which is the current version of the tidyverse, it asked me to confirm the changes. There's a whole bunch of stuff that it's got to install to get the tidyverse. Again, the tidyverse is a meta package. It's a package of a bunch of packages. All those packages have dependencies. So we're getting all that stuff. So we'll go ahead and say yes. So that took a few minutes to install everything we see at the bottom here, it says to activate this environment use mamba activate vis to activate it to mamba deactivate. I've never done what they say. Typically, this is the only place where I use Kanda, I'll do Kanda activate vis. So the way I was taught was that this is the only place you use Kanda. If you've done it with mamba, let me know what you find. I'm going to stick with what I've done before, because I know it works. So I'll do Kanda, activate this. And so now I can look at the environments that I have, I can do mamba env list. So mamba environment list. And so now I see I have two environments, right? So I've got the vis that I just made, as well as the base. And I can see that I'm in vis because there's this star here, right? And so what I want to show you is how these environments differ, right? So if I do echo dollar sign path, that is going to show me all of the directories that my computing operating system looks for different pieces of software that I might run from the command line. And so it goes in order, right? And so what you'll find is that the very first thing in here is my mini Kanda environment vis, right? If I were to do which R, it shows me the version of R that I am using is in mini Kanda, right? And if I did R hyphen version, that this is using 4.1.3. If I were to do Kanda deactivate, I could then do mamba env list, I now see I'm in base, right? And if I again do echo dollar sign path, I see that the first element in the path is mini Kanda, but it's the base bin, right? And that's basically where Kanda and mamba are being stored, right? Whereas up here I had env's in environments vis bin, right? So there's a little bit of a difference, right? And so then if I were then to do which R, it's pulling that from use local bin R, right? And R version, remove that space, I see I've got 4.2, right? So again, we're creating different environments that are using different pieces of software, okay? It's a lot like what we saw for our end in the last episode. The main difference, however, is that our end only works on our packages, whereas Kanda mamba works with all software and all packages. Let's go back to our viz. So we'll do Kanda, activate viz. Again, we do mamba env list. And we can then see that we're in our viz environment. And then if I do R to go into R, I can see that. And then I can do find dot package on tidyverse. So what I find is that it's actually using the version of tidyverse that I have installed on my computer, the general computer. So if I do lib paths, I see that my mini Kanda version is actually getting seen second after the general version, right? So that's not good. So let's go ahead and quit out of that. So actually, I want to create an R profile in my climate viz. So I'll go ahead and move there. We'll do CD desktop, climate viz. And again, you can see the invisible files by doing LS hyphen a, and I see here that I don't have a dot R profile file. So I'll go ahead and create one, we'll do Adam dot R profile. I'm doing this in Adam, but you could do it in any text editor, Adam's just easier to work with than nano. The key is that it needs to be a text editor. So we'll do dot lib paths. And again, that's a function. And I'm going to give it the arguments, a vector of arguments, an argument of a vector of paths, right? So I'll give it the C function. And then I'll do dot lib paths, again, function. And I want the second seat, that was my mini Kanda. Again, you do what works for your system, right? And so you'll may not have to do this. If you didn't get the things flipped, I've been on Linux systems where this doesn't happen. Mac seems to be doing it. I don't know. But anyway, you want to get your paths in the right order. So we'll go ahead and save this. And then come back to the terminal and start R. And then we can do dot lib paths, open close parentheses, and we get the right order. Now we can do find dot package on tidyverse. And very good. It's grabbing it now from our vis environment. Again, this takes a little bit of work to get things set up. So what if we wanted to install another package like gg animate? Well, let's go ahead and quit out of here. So we'll go ahead and install gg animate. So I'll do, I'll do mamba install hyphen and vis again, we're installing it into the vis environment. And then the channel, I'm going to put all of them in. I'll do kanda forge hyphen C bio kanda hyphen CR. It looks in the order of these different directories and then our gg animate again, it goes through the same process and asks if you want to change things. Yes. And again, I can come into our and do find dot package gg animate. And it's in my proper location. Right. And just to make sure everything works, we could always do library tidyverse. That's good. And then library gg animate. And that works as well. Right. And so it's suggesting adding gift ski or AV, and then starting over. So this raises an important point. I have found that if you want to install a package in R, when you're using mamba or kanda, you don't want to install it from within R. Okay, that's a key difference from our end. Also, I have found that it does not work well to mix our end with kanda or mamba. I've just run into big headaches with that. Maybe you'll have a different experience, but I've really struggled, right? So if I want gift ski, I need to install gift ski into my vis environment. So I'll go ahead and quit out of here, right? And we could keep installing things. I think we also had a plotly package and maybe some other packages that we used along the way. But this process of installing things, you know, at the command line like this, it works, but it doesn't lend itself very well to restarting or recreating a environment to run an analysis. What I'd rather do is perhaps create my environment from a file that I can store in my project. And so that's what we're going to do next. I'll go ahead and do Adam to open up a text editor. And so I'm going to save this as a configuration file. So I'll say config dot yml. It'll be a YAML file. So the first line, I'll say name vis, then I'll do channels. And then tab over and I'll do conda forge bio conda are I think also in here we might want defaults. Great. And then I want dependencies. Right. And so now I'll do our height equals 4.1. That specifies the version. tidyverse equals 1.3.1. We could also then do gg animate. I forget what version we had. If you're not so concerned about the version, then you can leave that off. Right. We could also then put in plotly. And I realized that I'm putting in the R packages, not the actual conda names, right. So this should be our base. This should be our tidyverse, our gg animate in our plotly. I'll go ahead and save that because what we're going to do is use this to create a new environment. So first I need to get rid of my vis environment. So I'll do mamba env list. And again, I see I've got that vis there. So I'll do conda deactivate. And again, mamba env list shows me that I've got my base environment and my vis but vis is no longer active. I can then do mamba env remove hyphen and vis. And so what that's going to do is remove the vis environment from my entire system. So that dash and vis that dash and is the name of the environment. This then says it's going to remove all packages in the environment that's listed out there. Very good. And again, if I do mamba environment list, all I have is my base, right? So now I can do mamba env create hyphen f config.yml. And I forgot that I don't want to just tab over, but I need hyphens to start each line to be proper YAML. So we'll go ahead and add that. Save that return, try again. And there we go. So I was missing those hyphens there. Again, I can do conda activate vis. I'm now in my environment. So I can do mamba env list with the activated vis environment. And as we've seen, if I do which are, it's there, our version, again, 4.1.3. And we're in good shape. And if I fire up R, because I changed that our profile, and that doesn't go anywhere, that stays. If I do dot lib paths, I now see I've got yeah, my mini conda path first, and my system wide library path second, so that if I do find dot package, tidyverse, it pulls up my mini conda version. Okay, so and then we would continue to work. And if you wanted to add another package, you could come back to your config YAML file, add your dependencies like we did here, tear down the environment, recreate it, and then proceed. Alternatively, what you could do is you could you could also do that mamba install syntax at the command line, to add individual packages as you go, just make sure that you're also updating this config YAML file. Because what we can now do is we can go back to our terminal. Of course, I see that my conda branch there is read telling me that there's things that have been changed, right? So the beauty of this is that how I can commit these changes for my R profile and my config YAML file, so that you can get the exact software that I have, not just the R packages, but also the R version. If we're doing other analyses, say you're using mother, or you're using Pandoc or any other of these pieces of software, you will be telling your end user what they need and helping them to get that so they can run the analysis just as you have it, right? So it's a bit more work I find than our end. But I feel better about it, right, that I'm creating an environment that somebody else can use directly, and that we can really control all of the software, not just the R packages. And to me, that's a win. So if you're doing an analysis that only uses R, then cool, use our end. But I find that if you're doing more things than just R packages, then I think it really is worth pursuing and using Kanda. Something that we'll talk about in a future episode is a really powerful tool called Snakemake. And so Snakemake works really well with these types of environments. Again, you know, it used to be that Kanda was known for only working with Python packages, but really you can maintain so much more software using Kanda and Mamba than just Python. And I think it's a really great asset for a reproducible workflow and would encourage you to work with this. You know, it's not easy necessarily, there's a little bit of friction to get in going. But there's a lot of people out there doing things that can help you out if you run into a problem. Certainly, leave me a note down below in the comments if you have any questions or things you'd like me to follow up on in a future episode. All right, well, like I said, give this a shot, and we'll see you next time for another episode of Code Club.