 Good. Hello. I'm Adam. And today I'm going to be talking about Python application building and version control to play different topics. But we're going to start off with a little bit of application building using command line parsing using this module called argpars. Then get it into a little bit about how we can break up our different codes into different modules and packages to make things a lot cleaner and neater. Very briefly talk about debugging and testing. I don't have a whole lot about that. And then the second half of the class I'll be talking about version control specifically with the get version control system. So starting off with argpars, you should all hopefully have access to this module already. What this allows you to do is simply be able to call programs from the command line that you make with Python so you can just send off this standalone code to anybody and just give them a set of instructions of how to run a particular code with it. And you can specify certain options in a nice format here and it will nicely parse all these different functions for you. So this goal is to just build this standalone code based in Python with command line options and keywords. So this module argpars has been built into Python 2.7 and above, also in EPD. So you guys all should probably have it. If you don't, you can just do the install argpars. And I also had a sample code that you guys can download from the B space that will go through a little bit. And also the breakout session is on there as well. So if you guys want to go ahead and download that at some point. So this allows for user-friendly command line interfaces, leaves it up to the code to determine what the user wanted. You can be kind of flexible in how you specify these different options and the code will parse them in a nice way. And it can also automatically generate help messages if you screw up, if you type something that it can't recognize. It'll say, it'll barf and say, I don't understand what you're trying to say here. Here are the options that I'm allowing you to give. Please try again. There's also a quick note on this older module that some of you may have seen before called up parse, which is basically a similar thing, but it's not 100% compatible. And I think it's being edged out in terms of argpars. So the first step for setting up argpars is to just create a parser object and tell this object what arguments to expect, which can then be used to process the command line arguments during the run time. This parser class is called argument parser and what this does is it just takes several arguments to set up the description and using the help text of the Python program. So just starting off simply here, all you do is import argpars, set up your parser object using this argument parser class and you can give it a little description if you want. And then what we do with our parser object is we go through and we define different arguments that we expect the parser to expect and be able to handle. So we use this function add argument in the parser class and there are six different supported actions which I'll get into the next slide. And then once you define all these arguments, the command line entry is parsed by passing a sequence of these arguments that you give on the command line to the parser object. And these will just be taken from the command line by default. So going through the different actions that can be defined. There are six built in actions that can be triggered. The simplest one is just store. What this will do is if you give it a value, it will just store that value into some variable. And if you want, you can convert it to a different type. So by default things on the command line are just parser strings but you can say I expect an integer here so please try to convert it to an integer. And if it can't convert it to an integer, then it will give you a nice friendly helpful error message saying that I can't understand what you're trying to give me. Another action is store constant. This is slightly different in that this is just a single flag. So if this flag is set, then a constant will be stored that's predefined, hard coded into the parser program. Store true and store false or just save the appropriate boolean value whether you want this variable to be assigned true or false. Append is a neat one. You can save values to a list. So you can specify several different variables and they will all be appended to this one list variable. And similarly to the store constant, you can specify this append constant which will append a bunch of predefined variables within the hard code to the list. So there's this, this is the file that I uploaded to B-space that we're going to go through one by one to see these different argument actions in practice. So here we went through the different steps of importing the arg-parse module, setting up our parser object, parser equals arg-parse, argument parser, gave it a description. And then we added all these different arguments giving them the flag that we expect from the command line. The action that this argument is supposed to perform. The variable that it's going to be appended to which is this destination and a little help string. Josh, do you have a laser pointer about chance? Do you want to say the difference between store and store constant? Yeah, so store takes the variable that you define in the command line. So for instance here the command line would be python whatever this file is called minus s to denote that this is the flag I want to parse and then a value. So I could do python code dot pi minus s variable name and then the string variable name will be stored to the destination simple value. What store constant does is you have a predefined constant here which is hard coded into this code base. So all you would set here is this minus c flag, you wouldn't give it a variable afterwards. All you'd say is python code dot pi minus c and then this value to store would be assigned to the destination constant value. And we'll go through what these different things do. So gone through here and defined a bunch of these different actions. We got the store constant, we got the store true and store false which are the Boolean ones. Append which appends things to a list, the destination collection which is the variable. Append constants and minus minus version. And then all of these results are in this, we can assign all the results from the parsed command into this parser dot parse args object. So these go into the results object and then they are all values of this object, all attributes of this object by defined by their destination. So if something was stored with this minus s command then it will be put in results dot simple value and then we can print out simple value. Similarly results dot constant value, results dot Boolean switch, results dot collection and results dot constant collection. And there's also this nice argument help whereas if we just specify, so this code that I uploaded was called arg parse action dot pi which is what we just saw. So if I just do in on the command line python arg parse action dot pi minus h or minus minus help then it will print out all of the different arguments that it can accept and it will give you a little help string that you defined in the code of what that flag does. So here we see minus s simple value will store the simple value you specify here into this variable. So going through these different slides or these different options here starting off with store. If we give the command line argument python arg parse action dot pi minus s value what this will do and then as you remember at the end of the code we printed out all the different variables that we're storing here. So simple value gets the value, the string value value gets stored to simple value. We didn't do anything with the variable constant values that's still none. Boolean switch was defined to be false by default collection is still empty and constant collection is still empty. So going on to store constant to delineate the difference between this and store. Here we just give it the flag minus c and what will this do? So this simple value is none because we haven't defined that. So what will the destination constant value be stored as? If I just gave it the flag minus c the constant value to store is hard coded as the constant that we want to store to this destination. So as we go to the next line constant value you see the value has been assigned value to store. That make sense? No that can be anything. Absolutely. Yeah that's hard coded. So this is something like I don't know if you want if there's for some reason you want to have a flag that this flag will only do one thing and will only say like I don't know like if there's a string that is later used in your code and you want to be able to have a flag saying please assign this string to this value or this number to this value then this will just allow you to do it with a simple flag rather than having to input the value you want. So this is something like if you the coder knows that there's only one value that makes sense for this particular variable then you can set it up this way. Yeah going back to Josh's point that we can specify what we want it would be there's a flag here like I think it's called type where you can specify whether you want it to be an int or a float or a string or something by default it's just a string. Yeah I can imagine maybe. For what? Oh for the constant store for this guy. Yeah you can, well this one it expects that you give it a string input. So if you look back in the help file it would say to run this you do python program dot pi minus s and then type a string and then it will try to parse that string and assign it to this variable simple value. Correct. And if it can't convert it then I think it will give you a nice error message saying I don't understand that input. I expected an integer. Okay so store to store false uses just Boolean switches here we see we can give it a default value if we want so here the default is false. And so if we set this flag minus t then it will change this Boolean value to true. So you see the Boolean switch is now being print out is true. Append this can take a series of entries on the command line and then it will append all of them on to a particular list. So these will all go to the list collection. So here our command line input is python hard parse action dot pi minus a one minus a two minus a three. So then these strings one two and three will all be appended to this list collection. So when we print out the list at the end here you see that they have all been assigned to this collection list. And similarly append constant here we define hard coded what the different values that we can append are. So again these are just single flags we can't specify a variable afterwards. And these will append these different predefined constant values to a list. So here we can do python hard parse action dot pi minus b minus a and then this will append the predefined constants to this constant collection list. Good to know. Yeah I think it will do that. So it will throw actually if we just look at we can print out it will actually show you what error handle it will do. We can overwrite that. We can say in the case that who has been sent to a constant by two different take the second one. And you can already write it off. You can set all of that stuff. And finally you can specify this version flag which will just print out the current version of your code. So you can update if you update your code and send out someone else or send a new version to someone you can say that this is version 2.0 or 2.1 or whatever you want. So that's hard parsed stuff. Does anyone have any questions on this? We're going to be going into a breakout session where you'll be able to deal with all these different things and try it out for yourself. You can set nr2 to a number like 2 which is at least let's say you want to fit a latitude longitude and you want to say minus minus position. And you want to force that to happen. You can also do nr2 to this question mark. So now we're going on to modules and packages. This is a nice way of being able to better organize your code once your code starts becoming large and unwieldy because you just can't keep everything in just single stand alone. Code files all the time you're eventually going to want to be able to have all your code communicate with each other nicely and put things in different folders to make sense based on say I want to group all my code into a bunch of plotting stuff and then a group another block of code into some other thing that we're doing with databases or something like that. And then we can communicate back and forth between all our different code bases in a nice easy to use format. So you have functions from other codes made for different reasons, might be usable elsewhere. So say we make a plotting code for one thing and then we eventually start doing another project and we realize that the plotting code that we wrote for our old project could be used for this new project. Then we can call that module and be able to reuse it over and over again. So it's useful to break up our code into modules and packages. Whereas a module is just a file containing predefined functions and variables. And it must just have a .py extension. So this is similar to what we've been doing all along with the Python standard library. So we can import these modules just by saying import plotting or well we've done everything with matplotlib and what like. So you do import matplotlib or from matplotlib import pyplot. These are just instances of functions within modules that we can import to our code base. So in order for these modules to be able to be called correctly we need to set up our paths correctly so that if we're trying to call our different modules that we've created from anywhere else on our computer if we're not in the right folder, Python needs to know where to look for your code base. So you have to set up your Python path in your environment variables so that Python knows where to look for your code base. So I'm not going to go through all this but you can read this as you're at your leisure. You basically just need to set this Python path environment variable to the correct path to where your code lives. So you have a bunch of Python code in this directory path to your code then you just set this as your Python path and then you'll be able to when you initiate a instance of Python be able to import all of the different code that you've written in a nice easy way. And you can also check all this within Python and append to your path if you want to do this on the fly within Python. Probably don't need to do this very often but you can at least check to make sure if you're getting path errors or something like that you can check to make sure that your path is set correctly within Python to see where it is looking for your particular path. So if you import this sys module and then print sys.path it will list all of the places that Python is currently looking for code to find all the modules and functions that you've predefined. So here there's two different paths set here which I've called predefined Python path and path defined by environment variable. So these are it will list all the ones that are prebuilt into Python where it's looking within its own structure and then also the ones that you've defined by yourself. You can also append this list if you want to add a new path within this current instance of Python. You can just basically just add to this list and then this will be added to your path. But this goes away once you close down Python for this instance. So if you want something to be permanently within your Python path you'll want to set it up as your environment variable. Yeah of course. The thing to also recognize is that you may have a code that you're writing called you know and that's got its own set of directories. In the code that you run you may want to not sort of modify the Python path directly but you might want to look at an environment variable called my big code base or something and then append inside of Python you know something that uses that as a step so that you now have you actually tell Python while you're running oh and by the way I have another code base which I want you to look at. You may not want to have to directly cart code your Python path. They add all of the different codes that you're going to be running. You may want to do that with a separate code here. Oh great. So now briefly on the packages. If you have this path all set correctly then your code can be broken up even further into reasonable folders and import it as necessary. You can either import all of the modules within a package or functions or classes within the modules. And really all you need to do for this is just break up your code into separate folders. Say you have an analysis folder with two different modules data cleaner and data generator and then you have also a plotting module with different modules histogram and scatter plot. All you need to do is just place a blank like nothing in it just underscore underscore in it underscore underscore dot pie within that folder and then you'll be able to import different functions in from these different modules in that folder. So if we want to import all of the different functions from this file data generator all we need to do is do import analysis which is the name of our module our package for analysis dot data generator which is the name of this file and then we can shorten it if we want as dg. And there's also say there's a function nice nice hist within this histogram dot pie program. Then we can just import that individual function if we want from plotting dot histogram import nice hist. And then we can call all these different functions that we've imported with the example my data equals dg dot generate data where generate data is a function within the data generator program and then we can do nice hist my data. So this is just a nice way to break up your code into a bunch of different packages for cleaner use. And very briefly there's a okay too much into this but there's a a way called distu.tils to package up your all of your code base into a nice format for the bundle it up and for easy use by others. You can just create a setup dot pie file using these distu.tils things which allows others to install your code in the standard fashion where they just do python setup dot pie and then it will install all of your code base together. This is a bit more complicated than anything I want to get into but I just wanted to throw that name out there so you guys could look into it if you want to. You have anything more to say about distu.tils Josh? Yeah. Okay. So briefly about debugging and testing this standard technique or maybe not standard but the technique some people use to debug code by inserting a bunch of print statements inside your code to see what's going wrong can be inconvenient if your code takes a long time to run. So there's this module PDB which is an interactive source debugger which allows you to step through all the different parts of your code and see what's happening at every given step. And variables are preserved at each break point and you can step through lines of code and see what variable is assigned at what place to see what really is going on within your code. And this is especially easy with NIPython because if your code crashes and it gives you a nice, it spits out a nice error message and you can just type out, type debug and it will set you into this IPDB environment where you can step through and see what the code is doing wrong. So I think I might want to give a little demo on this just to briefly go and mirror this so I can see what I'm doing. Yeah, I was just going to put an error in the code so it will crash and then try to step through and see where it crashed. But I realize this might not be the best one to do for this. Oh, really? I just raised exceptions. Right. So save this. So all this code is doing is, let's do print here. Taking these two variables and then trying to convert them into integers and then printing them. So I got a value error here. Invalid literal for int with base 10 hello. So all this is telling me that I don't understand this variable. This int string is crashing. I can't do this action. So if I do debug, then it will bring me into this IPDB environment and I can see what the different values that are stored in this environment are, sorry, in my code are. So I can type just print string or print number string. Why is that not there? Oh, is that? Oh, yeah, maybe. So I can go through these different variables and see what they are. So I can see that number string is assigned to variable 5, which Python will be able to handle and turn into an integer. But it can't turn this integer string into, sorry, can't turn this string hello into an integer. So we can see that's why it's crashing. I'll just go back to my code and get rid of this. Exit out of IPDB with just Ctrl D and then I can do this and it will print and run correctly. Yeah. Oh, yeah, how's that? Okay, that's probably, yes. Def, right. So going back and making the code crappy again. Assigning our variables here. So I do run untitled.py and, sorry, import, import, untitled. Yeah, this will work. Untitled. Oh, right, I've got to call it in here. Right. There we go. Untitled. I've got to reload here and now untitled.assigninprint. There we go. Now debug. So yes, we are here within the printstuff function. We can also go up and down to go through the different functions with U and D to see where in the different lines of code these things are being assigned. Does that make sense? So it allows you, especially if you have a really complex code that's calling tons of different functions, you can step up and down through all the different parts of the code and see where exactly a particular variable is being assigned. So if I, since I stepped up here, I see that the string hello is being assigned within this assigned in print function. Sorry, within this printstuff function. No, it's being, it's being assigned in the assigned in print function and then being printed in the printstuff function. So I can either go back and change the string to something else, reload it, and then print. It'll work fine. So that just allows a nice easy way to go back and debug your code on the fly. Okay. So now I was going to have a breakout session. So did you guys able to access the breakout.zip? Okay. So there's this file breakout ten dot zip. And within this, once you unzip it, there's a couple directories and also a function. There's also a file called breakout ten dot pi, which is currently empty. And what I want you to do here is not to modify the other files in the other folders, but you will need to use them. So you want to use the modules and packages stuff that I talked about before to be able to import stuff from these different folders in order to run the breakout ten dot pi. And then I want to build up a command line parser which allows the user to specify how many data points to generate. So there's a function called, one of the folders has a data generator file. And then it also, one of the other folders is plotting and you'll be able to define what to, how you want to plot it. So basically I want this command line parser to be able to specify how many data points you want to generate. Down here you see the flag minus n 200. So that's saying I want to generate 200 data points. And then there's this flag minus t, which we'll say what I want it to be plotting a filled in histogram or an outlined one. And also be able to specify the title of the plot with this minus capital T flag. And then have the plot to be generated. Make sense? Okay. Have fun. So I'm just going to go really quickly over what the answer should look like here. So to start off, because we had this breakout 10 folder, we had our file breakout 10.pi and these two subfolders data gen and plotting. Get rid of these. So you see in order to be able to import data gen and plotting into my breakout 10.pi I had to put this underscore underscore nit underscore underscore dot pi into each of these folders. And then I built up the breakout 10.pi which imported these things. And I'll go through this line by line in a sec, but I just want to demonstrate that this works hopefully. So if I do python breakout 10.pi minus minus help, I'll get a nice friendly help message. Maybe. No. There we go. Saying how to use the code and what all the different arguments are. So we see that I have this flag minus t, which if I flag this, it'll be true to use the unfilled histogram. If it's minus f, then do not use the minus or do not use the unfilled histogram. Minus n and variable data n will store the number of data points to retrieve from our data generator function. And then this minus capital T title will store the title for the plot. So if I do this command python breakout 10.pi minus t. So this will be an unfilled histogram. Minus n 200. So I'll get 200 data points. And then minus t my awesome title. And you see I had to put this in strings because or sorry in quotes because otherwise it would take it as three different arguments. Then press enter and I should hopefully get a nice unfilled histogram of rammed in data with the title my awesome title. Let's see if I add a minus f at the end. See if Chris is right. That'll take the last one. Yep. So even though I had the minus tn minus f, it'll take the last flag that you give it. Let's see if I give it an argument that I didn't specify it could take. It crashes and says unrecognized argument minus g and then gives me a little nice little instruction again of how to actually use the code. Sure. Should do that for your homework. Every time your code crashes. Yeah. Okay. So yes. Here's going through the code that I did to achieve this. I place this blank in it.pi files in the data gen and plotting folders. And then I want to import the function ran data from the data gen dot generate data module. And then import the outlined histogram from the plotting dot hist outline module. So now I have those tools at my disposal from those different folders, everything that's communicating nicely. And I also import the regular histogram, import arc parse, import pi lab for plotting purposes, set up my parser, add the argument minus t, which will store true. This is a binary switch, unfilled histogram to true. With the minus f flag, it'll set unfilled histogram to false. The minus n is just a simple store action. You can set a default. Here I have as a default of 10 data points. Send it to the destination data n. And you see that I specify the type as integer. So it'll try to convert it into an integer. And then here's another store action minus t, with the default no title, the destination title. Take all the results from the parser into the results object. Then I get my data from the function ran data. Feeding to it, the results dot data n from the parser, which is just a number of data points I want. Then if results dot unfilled histogram equals true, then we do the unfilled histogram plot. Otherwise, we do the regular histogram plot. Make sense? Any questions? Yes? Yeah. So let's do that. Oops, I still have that bad one. Oh, sorry, thanks. Yep. So there's just 10 data points there. And also since it's trying to convert it into a number, if I feed it something that it can't convert into an int, then it should just exit with a nice error message. Invalid int file blah. Are you within the, yeah, I didn't change my path. I just did this all within the directory that breakout 10 dot pi was in. And you're still getting, okay, a lot of happens. Let's do another one of these. Okay. Okay. Now let's forget. So the goals of this talk, you guys have, or many of you at least may have seen, get introduced at the boot camp. And I'm sure some of you already use some kind of version control. But I think it'll be worthwhile to go through this again, especially for those that aren't actively using version control in their everyday lives. So if you're not using version control, I should hopefully convince you that all of you should start using version control now and give a little cookbook method of how to use a very specific set of get-centric tools to get started and then point in the right direction for more use. So going back to this video game analogy, let's say we want to play Google 8-bit version. In the very old school state video games, there were no save slots. And if you died, that was just game over. So that sucked. You had to start all over again. Then we eventually got to one save slot. But there were pitfalls of this. Like what if we wanted to go back to a previous save state? What if we wanted to revisit a special part of our game that we want to revisit over and over again? Or go down a different path that we could take? Or worse, what if the game is unwinnable? What if we go down some dungeon that we can't escape from and then we save our game and then we can't go back and fix it? But then comes new technology. We get video games with multiple frequent saves. Security which gives us security to avoid lost progress. And the ability to return to earlier states and choose to go down new paths. So this is kind of the mentality that we want. Does anyone recognize this game? Is that, is it? Could be. Okay. Is that Half-Life 2? Okay. That looks like my miracle. This is the game of life. So yes, it saves it very often. And this is a mentality that we want. We want to save early and save often and have all of our previous save states available for us to go back. So this is the kind of idea for version control. It allows us to have a management of all the changes to the programs that we make to documents and other computer files. Gives us all these different checkpoints of the evolution of the source code. So if we go, if we have a particular place where we want to branch off and explore new routes with our code, we can do so keeping all the old code the same. It's also a nice way to back up, store, restore and synchronize all of our code. See what the different changes in the history of the code was and also allow for collaboration on a single code base with multiple users. And finally to maintain sanity. So here's a nice little plot to illustrating this. We have this code quality per development per hour which I got from Peter Williams here, grad student in our department. As time goes on, we want to be able to feel free to experiment without fear of breaking our existing code. Here's a very embarrassing slide of what I used to do for primitive version control. I basically just saved a bunch of different instances of the same code over and over again keeping just calling it a different letter each time. So this way I maintained the ability to go back to an old version of my code if I wanted to, but it quickly was spiraling out of control as I made lots of different changes. This was my undergrad, don't worry. This was before I saw the light. I also had some Fortran in there, it was even worse. So we want to avoid that. I mean that's still better than nothing, but we want to be able to do this in a better way. So right now I'm going to go through a bunch of terms that are commonly used in version control specifically with Git. So we have this repository or repo which is this sort of central database that's storing all the versions of the files and also keeping track of all their changes. So this repo contains all of the past versions of your files and notes all the different changes that were made between them. There's a server which is the computer storing this repository. A client is the computer connecting to the repository. A working copy is a local directory of this repository where we make all the different changes to our files. And this word trunk is the primary location for all the code in the repository. Now some different actions we can do. Add is to begin tracking the file by just putting it into the repository. We do that in a special way. Revision is the current version of a file, whether it's version one, two, et cetera. And the latest of these revisions is called the head. And to do this action called check out is to download a file or files from this repository. So if we want to check out a particular file or a particular bunch of files from a repository, we do something called the check out. To check in or to commit is to, after we've made changes to the file, we want to upload the change file to the repository. And give a nice little commit message saying what we've changed. We can do an update or sync which will basically swap the different files and make sure that we have the current repository updated and meshing nicely with our current working directory. And there's also revert, which we say screw it. I don't like anything that I've done. I'm going to revert back to the repository state and get rid of all my changes that I've made locally. Some more advanced terms are things that we can do are to branch. So this is kind of going down a different path, say going down a different cave in your video game. We basically create a separate copy of a file or folder for private use and then we can go experiment with it and play around with it and see if we want to develop or commit the changes after the fact. We can find the differences between two files using the diff command to see what has changed between two revisions. Merge your patch to apply the changes from one file to another, bringing it up to dates. Conflict, obviously, is when changes contradict each other. This can sometimes happen when you're working with another person on the code and you both try to edit the same part of the code. And then if you try to merge those things together, it will create a conflict saying I can't merge these, you've tried to edit the same part. And then if this happens, you'll have to resolve the conflict. So there are two main breakdowns of different types of version control systems. There's centralized and distributed. Centralized code or version control systems are like SVN, what else, CVS, I think was centralized. And then distributed are sort of newer ones like Git and Miracurial. So with the centralized ones, we have all of the revisions existing on a single server. So we have this nice computer somewhere that contains all the different revisions that have ever been made to this code. And then any user that wants to access this code will have to check out a particular version, make changes to it, and then commit the changes back to the central server. And so all that exists on the user's computer is just the current version of things. So what this requires is that you have communication with the central server at all times if you want to be able to commit new changes or pull back other changes that other people have made, you just basically need to have communication with the central server. With a distributed framework, you clone the entire repository. So you take everything and put it onto your local machine. So on your local machine, you have all the different history of everything that has been done to that repository. And you basically just have everything at your disposal locally. So you don't really need communication with the central server since everything is local. So you can, you know, make changes and commit changes to your local repository while you're flying on a plane and you have no internet access. And then later on, if you want to be able to push to a central repository so other people can access your code, you can still do that with these push and pull actions to pull to and from a central repository somewhere. But it's not necessary. You can do everything locally if you want, which is really nice. These are just illustrations of this. I'm going to really go through these. So there are, I think in my opinion at least, and I think in a lot of other people's opinions, I used to use SVN, which was a central repository, a centralized VCS system. But I quickly got turned over to Git because it's got these advantages of a distributed version control system. With centralized history modification is difficult. It's hard to go back and change history should you want to. And as I said before, you require network communication at all times if you want to do any actions. And because of this network communication, everything is a lot slower. But with the distributed version control system like Git, everything is, everything can be local and everything is a lot faster because of it. This is a list of a few different brands of version control systems, CVS back in the 90s, Subversion, which is what I used to use SVN, and then a bunch of these new ones came out around 2005. And the one I'm going to be talking about today is Git. Where did that name come from? Does anyone know? Pretty awesome. So, I'm going to, what's that? Getting started. Getting started? We'll go with that. Yeah, it is. Yeah, I think so. Do you know why he named it that, though? Okay. Why are we even having this conversation now? Oh, what? Okay. So, yeah, so now we can just go through different steps of Git. You should hopefully all have it on your computers now, so you can kind of follow along with this as we go. Version control of your files is very easy to start up locally. All you need to do is initialize the repository, so you go into a particular folder that you want to start version control in, add some files to it, and then commit changes. So, we can start off by just going into a folder we called MySoftware, which we now want to start putting under version control. We initialize the repository using the command init, and that should print out a nice little message saying that the initialization was successful. But if we do ls-a in that repository, you should see that this file.git has been created along with all your other files that you may already have in that directory. So, that is what keeps track of all the different history. This is, I think, a folder of all the different history and modifications and such, and all of the past versions of the code that have been created under this repository. So, if you don't have any files in this folder yet, go ahead and add a few files in there so we can add them. And then if you want to just add all of your current files into all the current files in the MySoftware directory, then we can just do git add dot, which will just add everything that's in that directory to... No, you need to specify that you want this particular file to start being version controlled. So, if you just have a file in there that you didn't do git add to, then it will just stay as a normal file. It won't be version controlled. Any changes you make to it won't be tracked. So, you need to add it as you go along. If you create a new file, you need to do git add to that file. And then you do git commit minus m. The minus m is to give the message of what you've changed. So, you do git commit minus m. And this is just the message initial import of all my files. And then it prints out a nice little message of all that. Did people get that to work? Are there any problems with that? Yeah? Yeah, no, no, sure. So, add is... It is a type of change that is to be made. So, I say git add file dot pi. That means I want file dot pi to start being handled under version control. And so, that is a change that I've made. And then if I want to commit that change, then I need to do git commit minus m. And then the string added file dot pi. And then, so the commit is what you do after every change that you make. So, then if I were to go into file dot pi and add a few lines of code, then I would do git commit minus m and the message added some lines of code. And hopefully a more descriptive message about what those lines of code do. But that's the idea. So, there's the actions... The action commit is to commit any changes that you've made to any files in that directory or the addition of new files to that directory. Does that make sense? Yeah, so if you're sort of adding a file, you make a good file and you realize you want it to be a written message. Right. So, we usually present them as this line and this line. So, the basic workflow is, I mean, it's just very nice and easy. You modify your code and document. You commit changes and then you repeat. It's really all there is to it for a very basic workflow of git. You can. If you... Yeah. So, you can if you only want... Let's say you've made changes to file a and file b and you only want to commit the changes to file b, then you do git commit b, say it was b dot pi and then minus m text message of what you've changed to file b. So, that will leave all the changes that you've made to file a uncommitted for you to commit at some later date. Oh, yeah. I'm just going to go into graphical user interfaces at some point. I've not used git GUI, but... Yeah, I'll give it an illustration. All these... Unless you really like using command lines, all this can be done a lot easier with the GUI. But I think it's useful to go through the basics of the command lines at the beginning. So, yeah. Going through the... We can go through some of the different actions that we can do here. Starting off with the add action. So, I'm creating this file readme on the command line. Echo, this is the readme. Pipe it into the readme file. And now I want to add this readme file to start being version controlled. So, I do git add readme. So, this is saying instead of doing everything that has changed, I specify that I want just to add this particular file. And then git commit minus m added a readme file. And now that it has been committed. I can also remove a file from... Do you have a question? Yeah. Yeah. Do you include or add it again to readme-a? No. No. Then you just do git commit readme minus m. Okay. So, I said git commit-m like change readme again. But didn't say git commit readme but it's not going to commit. I think you'd... Yeah, I think you'd need to specify what file you want to commit. Or use dash a. Or use dash a. Yeah. Okay. Yeah. Because if you use dash a, then it will do all the changes. Yeah. So, the opposite of this, if I wanted to delete a file from being version controlled, I would remove, git RM, useless file, that text, or a recursive remove of a directory, git remove minus r. Kind of annoying, but if you want to rename a file, then you need to sort of do a... It sort of does a delete of the file and then adds it again. But you can shorten this with just the git move command. I think the move is exactly the same. But I'm not 100% sure. But I think it is actually the same as just adding and deleting in terms of how it logs it or whatever. So, all of these commits that you have made are identifiable by these things called SH1 hashes. So, there's these long string of numbers and letters that identify each of these particular commits. So, if you do the command git log, you will get all of the recent commits that have been... all the recent changes that have been committed to the repository, along with a message about them. So, you get the commit hashtag, the author who did the committing, the date, and the comment that they associated with it. So, we can go through and see what the current status of our repository is by doing git status. This will let us know if any changes to particular files have been made or if files have been added but not yet committed. So, this git status command will say, you know, you have made changes to file X and file Y, you have added file Z, but they have not been committed yet. It can also... so that will let you know what files have yet to be committed so you can do that. There's also git diff, which is a way on the command line to see the differences between your current working copy of a file and the file that you got from the repository. So, you can see what changes that you had made. So, here we see after I get git diff of this file, mystuff.py, we see that I have removed this line and added these lines. However, as I said, unless you really like working from the command line, there are GUI tools which make this process a lot easier. And the one I have used and had a lot of success with is SmartGit, but I'm sure there's many others that some people might be able to recommend to you. I gave a link to a bunch of different GUI interfaces that you could download, but I found SmartGit to be pretty good. I think it works on Mac, Linux, and Windows. So, it's pretty flexible in terms of that. So, I was going to go through a little demo now of how to use SmartGit for doing the things that we were talking about. Yes, SmartGit runs on Mac. And in everything, yeah. Did that work? Okay, anyway. So, here you see my base directory is this Q repo directory, and I have it within it a bunch of different folders. You can also put papers and stuff under version control. But here, let's go through some of my software directories. As you can see, I treat all these different directories as modules. I got the init.py in there so I can easily communicate all my different code back and forth between them. Within SmartGit, I can pull up a log to see what different changes have been made to a particular file and when. So, here I have the commit messages for each of the different commits that I made to this particular file. And you can also scroll through and they have a little differencing thing here that shows what changes have been made between these two different versions. So, it looks here like I just modified a couple keywords and added a few lines of code there. So, let's say I want to open up this file and make some changes to it and then save that, close it and then SmartGit will notice that this file has changed. It will show up as a little orange picture here saying that this file has changed and needs to be committed. So, if I want to see what I changed, if I am not quick with my commit, I can go ahead and changes and see what I actually changed here. This pulls up on a little GUI showing the differences between the file that I got from the repository and my local copy. So, you can see it just shows here that I added a nice comment here. So, that's fine. Now I want to commit this file. So, I just click commit and then hit commit. And there we go, everything is committed and we see that my... Sorry, where's that? Oh, that just... What is the index state? I don't know. Did that show up as changed when I changed gridplot? That might be... Is that if the repository version has changed? No? I don't know. I'm not sure what the index state is. Right, there's a file that you can specify within your Git repository. Called ignore. Right. So, there's this gitignore file that if you put this in your main Git repository, this will say, I don't want to ever worry about any of these types of files. So, I never want to be able to commit a .pyc file because that's just, you know, compiled whatever, bibliography files, building files. I also want to ignore some of these image files that constantly be created. This just allows me to say that these files are unimportant to me and I don't want them to be version controlled. So, if you put this .gitignore file in your main directory for your where the version control is happening, then you can ignore all those different types of files. Is there a good way to carry around the data files that you want in Git? I want to do my one version that I definitely don't want in a file to be version controlled. How do you got it to be version controlled? Do you really like to add it to Git in a way that doesn't version control? Yeah. I want to be able to have it have carry them all, but definitely you never a version control. Yeah, so you just don't have all the different copies of the file? Well, just don't commit any changes. The way you might do it right now, just a very by the nature of what you're asking for, essentially you want to hybrid centralized repository that's serving the big data and you want to distribute the system that's all the code and all the stuff around it. Which is not, it's not ridiculous to think about that we might actually go in that direction. But, likely what you want to do is you want to have a big file called execute this if you want the code and then the data, and then it actually goes to some central repository to hold it in the data for the first time. I don't think it's all different. There's something where I have my data on my computer. I just give somebody else a bundle of my repository and I give them my data directly. I was just having fun doing it. Well, right now I have a kind of I like all for one. No, no, it's not a solve problem. But, so any sort of something great doing is probably the best way to do it is okay. You have to particularly get into problems with that when doing shared papers and stuff because those images can get out of control too. Like when you make a lot of changes to you know any image files or EPS files or something like that. You don't want to I mean you're constantly making changes to them but you don't want every version of that file to be saved because then you just get huge repositories. I don't know. Get a light? Is that? Yeah, I use get a light as well but I don't know if you can specify what files or just what folders. Yeah, right. Alright, let's just have one more of these. Okay. Just like on YouTube.com you can select your playback resume. And yes, full screen mode is available if designed. YouTube collection doesn't require the internet. You don't lose interactivity. Just fill out the comment form and place it in one of the self-adgressed standard topics. Or you can throw a red thumbs up or a red thumbs down. Your feedback will be sent through REC. So you can maintain the dialogue that you use. YouTube collection isn't just a standard but it's all of you. Every YouTube video uploading ever. As soon as you sign up, we'll dispatch a fleet of 175 YouTube trucks as well. The necessary video models will arise through the REC. Steve Gray, who are in an area with low overpasses will be delivered about half an hour. Since approximately an hour of video is uploaded to YouTube every second, you'll receive a new truckload every week. And if you sign up now, we'll do the entire duty of the library to a friend or relative of your choice, totally free. Visit YouTube.com for a new YouTube collection. Order now and select from the number of free adults you think are trending videos on laziness. Or, weatherproof exterior needy shopping for when your collection gets too large for being inside of your home. Order the YouTube collection on a simple, economical, convenient way. A whole new way you enjoy your years of love. Amazing. It reminds me of Gmail paper a couple years ago. Was that last year? I don't know. Yeah, they had a lot this year. I think there were several others like Google and Google. Yeah, should be. Self-driving cars are real. Okay. Now on to collaboration. So it makes it or any version control system makes it really easy to work on a single code base or document with multiple people contributing. Or similarly, as is usually the case for me, one person developing the same code on multiple computers. So I have a bunch of code that I want to run on different computers or be able to edit from my laptop or from a work desktop computer or from my home desktop or whatever. Or be able to back up my code as a side of this every time I commit and have it pushed to a central repository somewhere, all that code is saved in case my laptop breaks. But we also just want to avoid this emailing each other back and forth different versions of a paper or of a code base or something like that. So this way allows us to easily see what has changed and when and be able to do all the merging of all the different changes behind the scenes and if any conflict should arise, which hopefully they won't unless you're both working on the same line of code or the same paragraph or something like that of a paper. Then we can deal with those merge conflicts as they arrive. So in the distributed version control system like Git, cloning is a standard operation to get files. So as I mentioned before you just you clone the entire repository onto your local computer into a particular folder on your local disk. So as I said, cloning is useful for backups even when you're not collaborating. In order to clone a repository from somewhere else you do this git clone action. So you do git clone and then the path to the other computer and then the path to the files containing the repository that you want to clone. And then from now on you can pull the state of the files from this other computer to the one you're working on now. So if someone makes a change to the other computer and then they commit their changes you can pull those changes to your local computer to get the latest updated version of that code. So from the computer that you're working on now do you commit your changes then do git pool from the other computer to the path to your current working version. So these projects have to be hosted somewhere in order for you to be able to communicate with them from anywhere. So because git is a distributed system with everything being local in order to be able to push your projects back any changes that you have made back to a place where other people can access them you need to be able to use this command push so that you can push all of your changes to a repository somewhere that people have access to and then can pool your changes. And in order to avoid confusion you only want to push to something called a bare repository and the bare repository is just something that has no working copy it just contains all of the changes that have been made to git. So it's not a place on someone's computer where they're actively making changes to its sole purpose is to act as a folder containing the latest updated version of the code. So you can set up one of these bare repositories on a central server that maybe everyone has SSH access to so that they can push and pool to the central server and or you can there are also many free hosting sites if your code is a public or open source and we'll one of the most popular ones of these is called git hub if you guys have heard of that and as part of the homework we'll have you create a git hub repository so you can play around with all this it is a git repository but it has no working file so it doesn't have a working copy of the latest changes no there's a special way to create a bare repository you can just I can't remember but sounds right but yeah and that so basically you just don't want to make any changes in the files in the bare repository you just want to leave it to be and then the way you communicate with this bare repository is with pushing and pooling so once you get this bare repository hosted and created on a separate server you can clone the repository to your local disk the first action you always want to do is pool just in case someone has made changes to the repository and then push any new changes that you have committed on your local repository to the server so here I'm creating the folder my clone which is where I want my repository to be I do git clone and then the path to the repository it can be git at github.com go to set no longer exist sadly and then the path to the git repository and then it will clone all of that information onto your local disk then you can make changes to the files in this repository you know change a file then do git commit minus m long descriptive commit message once again you pool to update the latest version of the code just in case someone has updated the code while you were working on your code this is just a safety to prevent merge conflicts do you git pool if there aren't any merge conflicts you can resolve them as necessary otherwise then you can just do git push to push the changes that you have made locally to the central repository so that others can access them so conflicts sometimes arrives generally git will happily deal with multiple workers on the same file git should be able to merge the changes automatically if for instance Alice was working on line 10 of the code and Bob was working on line 50 of the code and they both made changes to each of those lines and they both committed those changes and then both pushed to the repository then git should be able to merge them correctly as long as they do this are good with how they do the scheduling operations but if the same line is edited by multiple people human intervention is required to resolve the conflicts so say Alice and Bob both tried to edit the same line one said person one is the best person one said person two is the best person it will create this this file which has both of the options here and then you just need to to fix the merge conflict and then commit the corrected version so once again this sort of overview of this workflow is just to pull the latest version from the central repository modify the code and document on your local repository commit your changes as you make them pull again to resolve any merge conflicts that might have occurred while you were working on your code and then you can push your changes to the repository and then you can repeat so once again I'll give a quick demo of how to do this this sort of workflow with SmartGit so I'm going to SSH into my home computer ignore all the errors that it tells me it'll be fine okay so I cd it into the place where my repository is called Q repo and now I want to do git pull to see if any changes have been made so we can pretend that I am another person on my home computer here excuse me I didn't actually push any new changes so remember before I made changes to my repository on the grid plot code so now I can push these to the central repository so that others can access them and now if I do git pull on my other computer it notes that there has been a change to gridplot.py one file change, one insertion, zero deletions I can do git log to see what happened here we see I added a nice comment very nice so let me make a change on this computer added another comment I never use that whatever so now I can do git gridplot.py-m there we go that is committed then I can do git pull just in case someone has made changes while I was working on that one file which nobody should have then I do git push and now back on my laptop I can go through here and do a pull and let it crunch through okay now I can take a look at the log and see that I actually did add another comment on another computer so if this was someone else oh actually you can see that I have two different author names here Adam and Morgan is my laptop Adam Morgan is my desktop at home so this is like an example of how you can have two different people or one person on two different computers collaborating on the same code right any questions about that so there's a few more complex things that we can get into I did not have this formatted correctly so different things that you can do with git that I'm not going to go into too much details but I just wanted to let you know that these options are out there if you want to go back and undo some changes that you've made to a particular file say you've committed something and you screwed up or something you just want to ignore any changes that you've made after a particular commit you can do that with something called a git reset minus minus hard specify which commit you're going back to with the last few digits or sorry the first few digits of that sh1 hash that we mentioned before that each of these commits is identified by so this reset minus minus hard will load an old commit and delete all commits newer than the one just loaded so this is a kind of a dangerous thing to do unless you really know that that's what you want if you want to check out an old version of the code and be able to play with it and see how it ran you can do git checkout and then specify the first few digits of the hash and then this will load an old commit but any new edits will be applied to this new commit and will go down a different branch so your other edits that you made after the fact after this old commit will still be accessible and there's this git revert which say you make a lot of changes to your local repository but then realize you don't want to do anything with them you can just revert and then I'll go back to the latest version from the repository so say you made a bunch of changes and realized or maybe you made a bunch of changes when you were drunk and you don't want to commit them and let people know that you were drunk when you made those changes just revert back to the the old repository you can get them from the log so if you do git log and then you scroll down to where the you had nice commit messages then you can see at what point you look at the commit message or you can look at the if you do a diff or if you go back to smart git you can pull up if you just double click on that if you go through the log it will say all the different commit messages if you double click on one of those then it will pull up the state of the file as it was at that time so you can see what the state of the file was and you can go back to that with smart git again it's much easier because you can just say I want to check out this file click and then we'll check out that file but this is what you do if you want to do it from the command line there's also branching if you want to take things down a new path but leave a current version of the code as it is you can modify it but not commit that to the same code base over and over again you can do what's called a branch and just go down a different path with your code and say you want to test out a new algorithm you can switch to a new branch and do development on this new algorithm without affecting the other branch of the code the main one that people are working on and then after the fact if you decide that you want to merge that branch back to the main part of the code you can do so with this merge I don't really do any of this very often but it's out there you give a branch a particular name so the master branch is the one that is the main branch for that code base but if you want to make a new one for testing purposes or something and start playing around with that so there's lots more that you can do with Git as well there's also things like you can get help from also commands you can browse through all your history change history if you want if you want to change the commit message that you've made before you can do that with something called rebase you can also work with other version control system tools I used to use subversion so all of my code base was version controlled under subversion but Git allows you to take that code base and import it all within the Git framework so that was really nice because I wanted to start using Git but I had already made all these nice commits and had all the history and stuff and it allowed me to take all that from the old code base there is advanced collaboration features of which I know nothing about and other things like you can tag a version of your code with a particular version number and there's also some shell script tools that you can use so in conclusion with Git version control is awesome and you all should do it Git is a free actively developed distributed version control system which I highly recommend graphical user face tools like smart Git make it even easier to use there are online repositories like github which allow for free backup of your code base and easy collaboration although it's only free if it's open source so if there's any code or say papers or something like that that you don't want to be publicly accessible then you probably don't want to use the free version of github but there are paid options available so you can store things on the internet or you can just set up a server that everyone that's collaborating has access to and then you can do everything through SSH and the mantra of pool modify commit pool push repeat and here are some of links to obtain all this stuff that I mentioned before there's Git magic book I think if you're new to Git is a great online resource introducing everything cool so any questions about Git we've got okay so I'll just go into the homework now three homework problems here that I've set up the first one is to go into one of these free hosting online hosting services such as github and begin Git version control of your homework for this class so you can create one folder for each of your homeworks so homework 1, homework 2, homework 3, etc and you can add all the previous homeworks as well and as you may commit on your local clone on your laptop or on your desktop you can push all the changes to the online repository so from now on or at least for this week you can turn in your homework for this week by sending an email to us with instructions on how to clone this repository so create this repository on github and putting all your homework files into it for this week and for previous weeks and allow us to clone the repository from github to our local computers homework number two is to begin version control on one of your own coding projects separate from this class just so you can start getting in the habit of doing this for your own work it can be research code or journal article, project, thesis or everything that you do which is the ideal solution you don't need to send this code to us just send us a copy of the Git log so we know that you're actually doing it and all we'll be looking for here is that you're making a number of commits and actually are using this for development just for the goal of continuing on using Git in your everyday life or some other version control you can tell them about the Git X when you see one line you don't want to say if I change that if you want to be as descriptive as possible if you're being really good about commenting you'll have a space after that and then you'll write a longer description of what it is but when you're saying descriptive commit messages we'll be happy if you just make it a bit more than if I just change this how it is as a functionality in the next one and then finally back to the art parts and module stuff we're going to be reusing some of the old code that you guys had developed was this the last homework or two homeworks ago three homeworks ago I think it was homework five using the database homework so Chris has uploaded the latest prediction data from intrade onto B space so you can download this prediction data into the homework folder for this week so this is this homework seven or eight okay so this is homework eight so download the latest prediction data into this homework and without moving the base code from your homework five folder so leaving all of the the modules and stuff that you put in that folder the same load the necessary functions from that code into a new module for this week's homework folder and then set up a parser with art parts allowing the user to retrieve information from the database in a user friendly way so keep all your homework five code in a separate folder set it up so that you can import functions and modules from that folder to do the various database stuff that we want and then set up in art parts set up so that we can call say function called election predictions.py and we can call this from the command line minus C Obama minus date March 28th would print out Obama's closing value from the prediction date on March 28th and also include an option minus P or minus minus plot which will show a plot of all the predicted values from the candidate over time with the value at the specified date highlighted on that plot and just include all the necessary checks to make sure that the user's path is set correctly to import the code so that when we try to run this in our computer that everything will work correctly. Is that clear? Okay, great.