 All right. Hey. Hi, welcome. This is level up your scientific coding. This is the first of a series of three webinars that we're going to produce. And this one's on version control using get and GitHub. So this webinar is produced by CSDMS, the Community Surface Dynamics Modeling System. I'm Mark. I'm a research software engineer at CSDMS. Hi, I'm Benjamin. I'm a postdoc at CSDMS. All right, so get so version control version control is a good thing. And hopefully what we can show you in this webinar, you know, we can give you a notion of why it would be useful for you as a busy scientist to use version control and also give you a good idea of where to get started to get more information on using version control. Yeah, and let's be fair, Mark. You will need some time to familiarize yourself with using version control. But once you get a hang of it, you will not regret your investment. Right on. It's something that I use every day and I rely on it as well. All right, so here's how we're going to do our webinar. So we want to try to answer three questions. So why, where and how. So for the why part, we want to try to motivate why you as a grad student or as a postdoc or as a researcher or as a professor would benefit from using version control. We'll show then where you can get more information. So other people have done a really nice job. And so we can point to the things that they have shown. And then last, because we kind of need to show this anyway is how so we'll do a brief demonstration, a live demonstration of using get and get up, and we'll cross our fingers that it works out okay. Oh, and just so you know so the each section so we're going to try to focus more on the why part so maybe 15 minutes for the why, maybe five minutes for the where and maybe 10 minutes for the how. All right, so let's do a little throwback to me as a grad student and like one of the things I definitely remember is you making a piece of code for an assignment or for paper, and doing good and that line is approaching. And then all of a sudden you run into this smart as saying that his code is five times faster than yours. Because he like vectorized his for loops or something, you go back to your code, start changing everything. And then you realize, then my code isn't working anymore and the deadline is like in one hour. And well, you don't want to end up in that kind of situation. So probably now you're going to say no worries, I make versions and that's what I've been doing myself for a very long time so you end up with final code version 2.9 final code version 2.8. Okay, you submit your paper and then you get it back and you want to rerun your code and you're like, wait a second, is it now version 2.8, which was my final one, or did I experiment a bit and is there maybe a version 3.2, which I need to use. And all of that is causing a lot of troubles and generating mass overall. So that's one of the things you can really use it in a very efficient way. I totally did that too. Yeah, the versions. Another thing is like probably you want to move from making this Alt, Excel or Roots figures to like more well designed graphs and an easy way to do that is using R, MATLAB, Python, whatever, any computer language to come up with a standard default design and maybe even come up with your own style of making figures. You can keep all those designs in a central place on GitHub and use them when you produce another paper. And then a third thing, and that's not like a small thing, most of you will probably at one point move to private sector and being able to manage your code with a version control system is a highly desired skill, isn't it Mark? Totally, I got a job doing it. Oh, and just like for the figures too, it's also nice like, you know, if you submit a paper where you have figures in it and you get reviews back and the reviewers like, I don't know about that figure. It's really easy because you can go back and just, you know, modify your script and regenerate the figure that. So and then like another perspective is a perspective of a postdoc and like I've got a user story somewhere hovering in between graduate and postdoc, which is like I started my PhD back in the days we've developing a numerical codes to simulate the river decision. And at one point I was presenting this at the symposium, and I was asked by Wolfgang Schwanhardt, which is a German guy who has a lovely tool set to make all kind of GIS analysis in MATLAB. It's called Topical Toolbox. It's a wonderful tool. And he asked me Benjamin, would you like to implement your code into this GIS software toolbox? I said yes, why not? And from that we developed an entire landscape evolution model, which is now actually like default in this Topical Toolbox thing. And we don't even knowing lots of people are using this Topical Toolbox and we don't even knowing they actually have this landscape evolution model sitting there on their very own computer. So like for me that was a very easy way to actually distribute and share my scientific ideas. As a postdoc like of course it's like we said it's good to organize and backup stuff you do in terms of publications. And then another point on this slide is productivity and you can be like productivity, how can I be productive? Because you have to do that sometimes. A reason why is actually once you have a stable version, you can go exploring in separate branches as Mark will explain in a minute. And actually save time because you always can go back to this one working version of your software. And then it's to showcase your work for the jobs like we discussed and then one thing I find very important actually is that it's a collaborative tool. So as a postdoc you probably move from one place to another, you bump into new people, you start new collaborations. And it is a very easy way to share and develop code amongst collaborators in the mark. Alright, so next, why would you want to learn version control if you're a researcher? So that's kind of my position right now in my career, you know, or maybe someone who's like a research scientist at a government agency. I've got some cool collaborations with some people at USGS right now, for example. So again, kind of take a step back, you know, we tried to make these slides and put bullets for each of them. And obviously, a lot of reasons are true for all these different kind of users. So collaboration again, so you know, just as postdocs, you know, as researchers, we also collaborate. As I mentioned, I'm collaborating with some people at USGS. And it's great, we're using GitHub, we each write code, we send pull requests so that we can review the code, for example. So it's a great way to work together on a project. All right, and also like for rain papers as well. Secondly, you know, helping others access your work. So, you know, I like to go to AGU, for example, the American Geophysical Union Fall Meeting. And you run into people, maybe I'm presenting a poster. I'm talking to people, you know, someone that has a problem, it's like, oh, you know, actually, I have some code. I have an idea that could possibly help with your work. It's really easy for me just to, you know, give them my card and scribble my, you know, my GitHub page on the back. And so, you know, go here and take a look, I can help you. The third bullet here, which I think is kind of fun is that GitHub and version control are useful for reporting. You know, maybe to funding agencies, you know, NSF gives me money to do things. I can show them what I did. It's up there. It's public. It's on the web. You know, it's also useful for showing my supervisor as well. It's like, here, this is what I've done. It's all there. You know, I don't really have anything to hide. All right, so this is for a research scientist. Our last why slide is from the perspective of a professor. Now I tried to get some of the professors that are here at CSDMS to come visit us for this slide. But let's put on your virtual professor hat. Virtual professor hat, yes. Looks good. Yeah, there you go. Sweet. I collected some stories from them, but they're all busy teaching. They're all busy teaching or at conferences. So they're doing very good professor things right now. All right, so a couple of stories I collected from them. So first of all, tracking team projects. So, you know, Benjamin and I work at CSDMS. It's headed by Greg Tucker. He's a professor here. You know, so there's a whole bunch of software that we develop and it's all tracked through GitHub. Greg is actually also a lead PI on the land lab project and they have a really big and very active GitHub organization where they track all their work. So Git and GitHub are very useful for organizing those projects. The next bullet here, arena overrame, you know, the assistant director at CSDMS also a professor suggested this one. It's useful as a collaboration tool in class, you know, so students can, you know, submit homework, for example, and do things through GitHub. The next one, this is a little note to me the time machine. So this is a story that came to us. I can't remember if it was through arena or through Eric, but Laura Moore, who's a professor at North Carolina. I'm paraphrasing because I can't remember exactly the quote, but she was like, you know, it's like a time machine because you know she has students that she advises, and they do their work and they make their papers and they get their degrees, and then they move on. Oftentimes, or at least in the past, that work may have been lost, it may have went with the student. But now, using GitHub, for example, you know, all that information is still there so that the next group of students that you know come into her group can use that information. They're not starting at ground zero. They already have something to work with that they can add to, like an archive. Yeah, they can build on. So they're building this, you know, this knowledge base. Okay, travel forward in time, if this time machine. That would be totally cool. I would love that actually. All right, so this is our last why slide, but hopefully we've shown, you know, through these kind of four groups of people who are scientists, you know, why it would be useful to use GitHub. All right. Yeah, and you know, we're trying our best to give a little grasp of what it does and what GitHub is. But obviously, there's a lot of good resource resources out there where you can learn more and, for example, software corpantry is very nice because they bundle lessons on several topics. They have lessons on Unix shell. They for sure have lessons on version control and GitHub. So definitely go there if you want to step by step procedure on how to do things. And obviously there's the GitHub help and the Git documentation where most of the commands are clearly explained. Probably if you Google for git sheet sheet, you might even find a nice document with some short comments with little explanations. I downloaded that. I'm going to help. Oh, you don't know. I totally got that. Yeah. Fantastic. And then there's the final one stack overflow, which is a platform where you probably end up if you Google anything you're worried about. It's very nice. And I even recommend you to make an account and to contribute or to ask questions because it's surprisingly a first people respond to questions if you really stuck with something. Benjamin has a much higher rating than me on stack overflow. I think I have a rating of one. Oh yeah, you can you can earn medals. Yeah. Yeah, but it's kind of funny. I mean, I, you know, again, I use Git and GitHub daily. And, you know, if I have a question, you know, I'm getting better, I kind of know now, but if I have a question, I'll often Google things and I will look for the stack overflow. If there's if there's some stack overflow post on what I'm what I'm looking for, you know, maybe there are a couple, I will almost always find a good answer through that. It's a great resource. Okay, so that's the where so if again if you want to learn more. These are some good places to start. Alright, okay. Alright, so next is the how part. So I'm going to show a short demo of how to use Git and GitHub. This is where the webinar can get tricky so I lost my fingers. Alright, so I'm going to give this a shot. Alright, so let me start with the GitHub page for a repository that I've set up that has an example inside of it. Alright, so I should mention to so this is my web browser. I like to reverse the colors as it makes a little bit easier for me to see. So just so you know, you're if you go to this page and I'll have a white background Alright, so you can see what this GitHub page looks like. Alright, so I have some instructions for how to use it. I have some data output slides. The one thing that's interesting here, you can see there's a Python file. Alright, I'm going to click on that. And it gives a preview of the code. Alright, so this is this is some code that we'll be looking at and using just a little bit in our example. So that's why I wanted to preview it. And all it does is it reads from a net CDF file that's in that data directory. It then prepares the data and then it creates a figure. Alright, and just that figure is created as a ping file. What's fun about this is that this is actually old code that I had. I use this code to actually to interview for this position at CSDMS to show that I knew a little bit of Python. Because now I look at like, oh, I would have done things differently, but it's cool because again, you know, I had this basically stored in version control. And I'm really surprised that it's still ran to Python 27 which is access to CSDMS. That's right. Exactly. Yeah, right. Okay, so that's just a little bit about what's in the repository. So next I want to show you how you can access this information. You can see that this repository is underneath the CSDMS organization. So it's basically read only for people outside of CSDMS like, you know, I because I'm CSDMS, I can write to it and so can Benjamin, but other people can't. So I want to show with this example, kind of a standard workflow for working with get if you want to look at some look at and use someone else's code. So if you're just looking at this code, that's one thing but if you actually want to modify it, we have to go a little bit do a little bit more. Alright, so let me show how you go about modifying some code in this example. So to do so, I'm going to click on the fork button. This is the top right of the page. And what this will do is it will ask me where I want to fork it. And I want to fork it to my organization. Okay, so that'll take a second. And so now you can see it's in the, you know, my GitHub login is MD Piper. So now it's in my organization. So this means that now I can edit this code if I want to. So go ahead and make your own fork if you want to follow. Yeah, you can totally follow along with this if you want. Alright, so the next step then is that, you know, this repository lives up on GitHub servers wherever they may be somewhere in the cloud. I want to get a local version of this repository. So to do that, I'll click on the clone or download step. Alright, so there's a button right here. And you can clone either with HTTPS or using SSH keys. For this example, I think it's easier maybe just use HTTPS. Alright, so click on the clone, click on the copy button that just copies that that link. Alright, now I'm going to go to my terminal. Alright, so here's a terminal here. I'm on my machine, my local machine. It's a Mac. Alright, and I want to what you can see right now, there's nothing in this in this directory. I want to clone this repository. So I just paste it in that URL that I copied from the web page. Oh, and I should mention to that, you know, get is already installed on my machine. I think it comes by default on Max and it's easy to install on Linux. Yeah, and there's also, you can get get for Windows, there's a there's an application, and there's also a command line tool like what I'm using on Windows as well. Yeah, and also like to make an account on GitHub where we just forked a code. Probably what you want to do if you're totally new to it, go to one of these websites we advertised in the previous slide. And look how to make a profile, how to install it. And I would really recommend that there's a graphical user interface as well. If you're not so familiar with the terminal, that's equally fine. But you can do a lot of nice thing in this using this terminal command. So yeah, it's worth investing learning how to do it. Right. Yeah, I actually started when I started using get I started with the desktop version, the GUI, basically, and then, you know, it did it did everything that the command line does but then I think I eventually kind of migrated to the command line. Because I can do things I think a little faster than using the desktop application. But then of course you have to litter all these shell commands. All right, so it's now clone. So now it's down to my machine. If I do a directory listen you can see there's this directory called level up. If I change into it. You can see now on my machine, I basically have all the same files that I have in the repository. All right, so maybe before I do any more get commands. Let's run the code to see how it works. All right, so this is a so again, the file here is this Python file. It's actually Python two. So I give some instructions in the read me on basically how to set up an environment where you can run Python two and then run the code. All right, so I like to use Anaconda actually we're big fans but here at CSDMS it's an easy way to set up a Python environment. You don't have to do this. If you're familiar with Python, you can set up a virtual environment yourself using whatever other technique you want. All right, so I've included a conda environment file. And you can see that's my first statement here. I'm pretty sure I did this already yesterday so I can skip that step of making the environment. And I'm just going to activate it. All right, so now you can see I'm in my level up environment. All right, so at this point I should be able to now just run this program. So this is just a script if I call Python. All right, so there's a warning. It's because this is old code. All right, but now you can see it produce some output. So there's a ping file. Let's take a look at that. All right, cool. Yeah, so it actually works. So I was really surprised that it still worked. It makes me happy. All right, so great. So it's showing some 500 millibar geopotential heights. There's some problems though. So one problem I spotted is that look at the date on this. So the time machine would be in the future. Yeah, right. So, you know, this is ends up reanalysis data. This isn't like a future forecast. So that year is very much wrong. So let's use version control to try to fix this. So the idea is let's let's repair this error. Let's fix this error and then send a pull request back up to the CSDMS level up repository. All right, so this is kind of a workflow that you could do if you're working with someone and you find a problem, for example, in their code, you can submit a fix for it. Oops. Come back here. All right, so I need this and that. All right, so let me show a few get commands to show how we go about doing this. Let me just clear my screen to start. All right, so the first command that I use all the time is get status. This gives me a little bit about what's going on inside the repository. So you can see that, you know, none of the tracked files have changed but there is this additional new file that hasn't been tracked. We can ignore that that's not going to go into the repository. So what I can do next is let's change the file. Let's let's repair that issue in the file. So I'm going to open up the Python file with a with an editor, the best editor you max. So there's a couple of things. So here's the issue that I found. All right, so when I looked at this, I looked at the net CDF file with like with NC dump, basically, it's looked at the header. So the dates don't start there. Well, at least they didn't at the time. All right, so if I fix this, that should give me the correct starting point for the data inside the file. I could probably change my documentation. Yeah, I probably do that. Yeah. And this, oh, this actually this is a good read. So this is something that I feel strongly about now, but don't put comments like this, this I should not have these comments there because like, you know, if you hadn't pointed that out to me, then the comment would have been out of date with the code as well. So yeah, I tend to shy away from writing comments like this nowadays. This is something I've learned over the years now. I want to do one change. I want to do one other little change. So this is in the, this is in the function that creates the visualization. You can see it has a day of year parameter. And this was set to be a day sometime in the summer. But I think it would be more fun to make it. February 18, which is today. So this will be February 18 2010. So 10 years ago, which is kind of neat. Okay, so I've made two little changes. And I've saved them. Let's get out of the editor. All right, and now let's check the status of the repository again. All right, so now you can see that there has been a change in one of the files tracked by the repository. So get is telling me that the Python file hasn't modified. Let's see what exactly changed. So if I use the get diff command. Oops, I can use it on the Python file. And it shows me exactly what changed inside the file. So you can see it changed in these three locations, you know, the year, the comment about the year and then the date. So that's kind of a neat thing. You can see exactly what changed inside of a file. All right, so I would like to save these changes into the repository. So this is a point where I take a step back just for a second. Again, if I show the status, you can see that I'm on the master branch of the repository. Another way of seeing this is with the get branch command. All right, so it tells me that right now I'm on the mass, the master branch. It's not a good idea to typically not a good idea to modify or make changes directly to the master branch. It's a better idea, because you know the master branch is kind of where it all started. It's a better idea to make a feature branch. You know, and this is where you can explore and make changes, but you don't have these changes in the master branch. I don't write exactly. So even though the file has changed, I haven't committed that yet. Yeah, so I'm going to make a branch, and then I'll commit the change to the branch, and that's what I can use to push back to the CSD master repository. All right, so I'm going to make a branch. One kind of common syntax is to name the branch with your GitHub handle, and then kind of the purpose of the branch. So this is to fix your error, maybe. So that could be a nice branch name. All right, if I use get branch again, I can see now I have a new branch. It's not active yet though, because note that there's a little asterisk has helped me out so the master branch is still active. So the next step is another command, we get checkout command. And with that, I can check out my feature branch. All right, so you can see inside this feature branch now it's giving me a little hint here that there's a modified file. If I do get branch, you can see now that the feature branch is active. If I do get status. You can also see that the feature branch is active. All right, so now I want to save these changes I made into this feature branch. And the way we save the changes with the commit command. So I use get commit. I'm going to make save the changes from this. The Python file that I changed. Okay, I can just hit return. This pulls up my editor. All right, and I need to say something like, I'm going to look at my notes just for a second. I had a better, I had a better way of saying this. Okay, so I want to have a short descriptive message that explains you know what I changed. And I'm going to say also changed the default day. So you know the idea when you make these commits is you want to be brief, but you want to explain what you've done so that someone could later on, you know, browse to the history of the repository and kind of understand what you did. Okay, so I'm going to say this. All right, so get tells me that it made changes. If I call get status again. There are no changes left to be made, you know, there's again this file that we're not going to include, but I've made I have no untracked changes in my repository. Okay, so at this point now I've made the changes I want, but it's in my, it's on my local machine so my local machine. I want to push the changes back up to the MD Piper level up repository, which is again somewhere on the cloud. All right, so command for that is get push. So I want to push up to origin. That's where I downloaded from that's my MD Piper repository. All right now I push the branch, MD Piper fix your error. Okay, cool. So get up pushes it up there. You can see over on the web page where you have a little message saying hey there's a new, a new branch. There's something, let's do something kind of neat here. I'm going to go back to the upstream repository. So again, MD Piper level up that's the origin, the upstream repository is CSDMS level up. All right, and you can see, so this is cool. So it noticed that my downstream repository has a change. Okay, and so from this, I can make a pull request I can try to put the changes that I made back into this repository. And the CSDMS organization. All right, so I'm going to click the button here compare and pull request. All right, so this is going to be the start of a pull request. All right, so one thing that I haven't really talked about well here yet is that you know to differentiate get and get hub. You know so get is the tool the software tool that we're using get hub is a company it's owned by Microsoft that wraps all this get into a service that we can use. And so a pull request is actually something that get hub provides. All right, but it's a really cool thing. What I like about it is, you know, basically, I'm going to try to make an argument for why my code should go back into this upstream repository you have to try to argue here why it's a good idea for someone else to include your code to lend you a job in this case exactly to give me a job in this case right. Alright, so. So here so I'm just since we made this one little change. I think it's a useful title. I can add a little more here so I can say, like, this pull request PR to simple changes, adjust the output figure. I usually write a little more than that but just to save time. You know, I try to argue why I want to make these changes. Okay, so I can create the pull request. All right, and then this would go up to the owner of the repository and then they could decide if they wanted to, you know, have this pull, you know, bring the code into their repository. Oh, you know what I never showed that I was going to do. Do you want to show what you actually I forgot about that. So this is kind of the end of the get part. Let's just go back really quickly to make sure that that works. So if I. Sorry about that. If I run Python. If I open up the resulting figure. All right, there we go. So we can see. This is the right date now 2010 to 18. What do we actually see more. This is the, this is the 500 millibar so this is like, you know, halfway up the atmosphere basically it's during temperatures. And so you can see there's kind of a, there's what we call a ridge because the warm air is higher up there yeah yeah so probably in Southern California it's really nice and probably even in Colorado is probably pretty nice. Cool. All right, so that was the demonstration. All right, let's go back to our slides. So that's about it. Let's wrap up. So we told you why in from different perspectives, you might be interesting using version control. We also pointed you to the resources where you can find more details and definitely go there and check it out. And then Mark gave this nice example of how to do things and the code for this is also on the link you type that comes as you must level up. And then I would like to invite you guys for our next webinar, which is on unit testing testing and if you don't know what it is for sure join in and if you do know what it is, but don't know exactly how to do it also feel welcome to join. All right. Yeah. We have chat questions. Okay, all right. We're going to try to find that now. So, I guess that's it then so thanks for watching. Okay, so we'll try to do next we'll try to answer some questions. So I in order to get to the chat, there we go. Find it sweet thank you chat. Okay, so we've got one question the chat. All right. So the question is, is it a better practice to create a branch off of the master before making a change the files, or does making changes then creating a branch off the master. Is it equivalent. That's a really good question actually that's something that actually like Benjamin and I talked about yesterday when we were doing a practice run. So, I think my usual workflow is I try to create a branch ahead of time. You know, so I was a little bit out of sorts there yeah so I usually try to do a bench ahead of time because I know that I'm going to make changes and I'm going to organize my changes. Yeah, but you can do it either way, but I think it's better to create a branch just because in the I know that I have, for example, when I'm making changes, I start making changes I forget that I'm still in the master branch and then I commit to the master branch. So I've totally done that before, but that you can still make a branch afterward and then that change goes over. If you start from the master branch it goes over. There's a whole lot of very interesting graph theory behind get as well. But yeah so anyway so it's okay to start do it afterward, but I think as a best practice, it would be better to do it beforehand. Yeah, I would recommend the same yeah source make a branch and then start committing stuff to the branch direction. Yeah, and stuff like this is covered in some of the resources that we have as well, you know, definitely like the get the get documentation.