 Welcome back to the Riffamonus Reproducible Research Tutorial series. Today's tutorial is going to focus on how we can organize our projects to maximize the ability to find files, as well as to improve the reproducibility of our analyses. As we do this, we'll practice using Amazon's EC2 service, we'll develop our command line skills, and we'll use Markdown to improve the documentation. As a foretaste for the next tutorial, we'll begin to use version control. I think that many of us have a habit of organizing our projects by having multiple directories, and sometimes these directories might be strewn across your hard drive or even on different computers. Alternatively, there's also a temptation to have a single directory where you dump all your raw data, your process data, your code, your figures, your images, and your text. In short, you really have no organization. As you can appreciate, this makes it really hard to find anything and to know where you are in your data analysis workflow. But hopefully by now, you've been able to read William Noble's A Quick Guide to Organizing Computational Biology Projects. If you haven't, stop the video now and go back and read that. It's really important. You can find a link for it on the Reproducible Research homepage within the Riffamonus.org website. We'll be adapting the structure that Nobel describes in that paper for an analysis that we'll be working on for the rest of the tutorial series. And also a lot of the ideas that he describes in that paper will be the motivation for future tutorials within the series. This tutorial will have a lot going on in it. Feel free to take it in chunks. You'll start to notice that some of the tools we've already discussed, things like Markdown and using AWS, will be prominent and so we'll get extra practice here. You'll also notice that other tools, things like Git and scripting are introduced, but perhaps not dealt with in great depth. You'll become a Git expert in the next tutorial and a scripting expert in the tutorial after that. By seeing the material multiple times and layering new information on each time and seeing the tools in different contexts, you'll learn the material that much better. So if you feel a bit lost, that's fine. Stick with it. It's part of the program. It's there to help you learn better. Join me now in opening the slides for today's tutorial, which you can find within the Reproducible Research Tutorial series at theRephomonas.org website. Before we get going on today's tutorial on project organization, I'd like you all to do a little exercise with me or for me that will be useful and pretty critical for subsequent steps within this tutorial. So can you restart and log in to the EC2 instance that we created in the fourth tutorial where we discussed using high-performance computers? So go ahead and pause this video. And when we come back, I'll show you how I go about logging into the instance. Okay, so did you get it? Well, let me show you what I would do. So I am going to open up a new tab here in my browser and the web address is aws.amazon.com. I'm going to sign in to my console. I'm going to give it my email address and my password, and then I'll sign in. It's got in here my recently visited services of EC2. Again, you could type EC2 in here, but I'll go ahead and press EC2. And I then will click on the link for running instances. And so it says I currently have zero running instances because we stopped the instance before. And so you'll see my instance here and it says the instance state is stopped. So I want to go ahead and go actions, instance state, start. Are you sure? Yes. And so this might take a couple of seconds to fire up. And what we're looking for is a public DNS address down here, a public IP address. And so if we're a little antsy, we can hit the refresh button and voila, there it is. So now I want to open up my terminal. And again, if you're using Windows, then you're probably going to be using that git bash. You're going to use the git bash terminal window. And so here, remember, I can type SSH space dash I space dot tilde forward slash dot SSH forward slash my key pair dot PEM, Ubuntu at and then I'm going to copy and paste over here. So hopefully this works. Are you sure? Yes. And haha, we're connected. So it says there's six packages that can be updated. I'm not going to worry about those for now. If I type LS, I see that 300 dot JPEG file that's there. And if you were doing some other exercises to practice moving things around, you might see other things there as well. I'm going to go ahead and delete that file. So I'll do RM 300 dot JPEG. Okay. So great. This is how we go about logging in to our EC2 instance. Now, do you remember what that other command was that we could use if we're afraid that our network connectivity is going to break while we're running something? Do you remember what that was called? Right. That is called TMUX. So TMUX. And this brings up the TMUX session. And we know that it's a TMUX session because at the bottom of the screen, there's this green bar. Great. So we're going to leave this here. I'm going to leave the instance running again on a per minute or per hour basis. It's not that expensive. And it's probably more expensive to keep just in terms of our time to log out, log in over and over again. Just leave it open. Okay. And so we'll go back to our slides now. And so hopefully this exercise was something that you remember. Perhaps you wrote down a note to yourself on how to do that. And again, by doing it numerous times over the course of this tutorial series, hopefully you'll get better adjusted to doing it. And it'll just feel like second nature to you. So the goals of today's tutorial are to evaluate different organizational strategies for the ability to foster reproducibility. We're going to use bash commands to take a fairly chaotic project and give it a more logical structure. We're going to use a project template that will help us to organize our project and hopefully improve the reproducibility. And we're going to use what we talked about in the documentation and tutorial as well as this tutorial to start a new project. Right? So there's one thing to retrofit an old project, but it's much easier to start fresh from a new project incorporating these ideas. And then finally, we're going to demonstrate how we can use Git and GitHub to make our new project public. So as I said in the introduction, before we go any further, it's really important that you read this article. William Noble did a great job of laying out a lot of really solid principles and recommendations on how we can organize our projects. And not only organization, but it also talks about things like documentation and automation, things that we'll come back to in future tutorials. So if you've gotten this far and you haven't been listening to me, stop, go back and listen to this or read this paper. So what did William Noble have to say? So as I said, it discusses far more than organization. He talks about automation. He talks about how project organization is a form of documentation that helps make written and scripted documentation easier to maintain. If you know where things are, then it's easier to maintain your project and to know where things are. If you've got a directory that has your reference files, it's very easy to see which reference files you're using. Whereas if all those reference files are scattered into a big garbage can of data in a single directory, then it's very easy to get lost and to not know what files you're using. And as I said, this is going to help provide an outline for the rest of these tutorials and talking about things like organization, literate programming, scripting, automation, version control. And so we'll come back to this paper frequently. So for this tutorial, I want to look at this figure that was in the Noble article. And we're not going to use this exact structure, but it really helps us to think about how we might want to organize our own projects. And as he says, the core guiding principle is simple. Someone unfamiliar with your project should be able to look at your computer files and understand in detail what you did and why. Most commonly, however, that someone is you. And so if you wanted to find the worm data for this project, it's pretty clear looking at this structure where it is. And so you might imagine they're collecting data on different days. And so there might be other directories within that data directory. And then for each day, there might be data on yeast and worm or other model organisms. And so again, we're not going to use this exact structure, but it's helpful in thinking about the principles we talked about in the previous tutorial of having directory for our documentation, for our manuscripts, for our data, our raw and our process data, to have a directory for our source code, and perhaps a directory for our results. And so some of the big bullet points in the Noble article were to record every operation that you perform, comment generously. He says avoid editing intermediate files by hand. I would say do not edit intermediate files by hand, that we should be using scripts to do that. And we should be keeping our raw data raw. We should develop a driver script to centralize all processing and analysis. And then we should use relative paths relative to the project route. And then we should make the script restartable. So again, we've already talked about commenting generously, and we've talked about keeping raw data raw. And so here we're going to talk about one or two of these other bullet points, and in subsequent tutorials we'll take on the others. So let's compare two projects that we've already looked at. And think about the file organization. So we're going to look at the example from Meadow et al and Microbiome, and the article from Sarah Westcott and myself that was published in M-Sphere. And so as we look at these, we want to ask ourselves, what do we notice about the structure of the two projects? How long does it take us to find the code for figure one? Where is the main document, and where would the data go? So I'm going to open these two repositories in separate windows. And so here's the GitHub repository for Meadow et al, and here's the repository for Westcott and Schloss. So what do you notice about the structure? Well, Meadow has a directory for figure. He also has something for list surfaces cache slash HTML. I'm not really sure what that's about. Figure, I assume, is going to have the figures. And so here are some of the figures that were in the Meadow et al paper. So that's nice, that's good. But everything else is kind of dumped in here into a single directory. He's got code, they've got text files, they've got data, all in one directory. Now, I don't want to bust their chops too bad. This is a pretty small project, right? There's maybe 10 files here. You don't need to have a bunch of file structure, directory structure, for a small number of files. If you look at the Westcott and Schloss directories, you'll see that there's a directory for code, for data, results, submission. There's a license make file, readme file. And you could then also look at this overview diagram in the readme file as we talked about last time to figure out where different things are. So the Westcott and Schloss is more complicated. It's perhaps a bigger set, bigger, more data sets. And so it needs that structure, whereas the Meadow et al perhaps doesn't so much. So if we want to find the code for figure one in Meadow et al, I think we did this yesterday. We might say, well, let's look in functions.r. And if we scroll through here, nothing really jumps out at me as saying it's figure one. Maybe it would help for us to know what figure one is. So if we look at figure and we assume this is figure one, this is an ordination where they've got different colors for different types of samples. And they also have different sized points. And I forget from the paper whether or not the size of the points matters. Let's look at another file here. Let's look at this lilasurfaces.rmd file. So here is the rmarkdown file from Meadow et al and their surfaces. And if we scroll through here, it looks promising that this will have the information we need to generate figure one. And so let's see. Keep scrolling. We're looking for something that looks like it might be making an ordination or plotting an ordination. Create an ordination of these sources combined with surface samples. And so there's a PCA command to build the PCA and to plot the results. So we're not going to go into what does each line of code do, but needless to say, here's the code I suspect for generating figure one. And so it took a little bit of digging, but we found it. It's here. One of the nice things about the rmarkdown is it allows you to embed code in with text. If we look now at the West Cotton Schloss repository, let's see if we can find figure one here. And so if we go code, there's a bunch of files here, a lot of scripts. Do we find anything here for figure one? Well, there's build various figures. I suspect it's one of these five figures, but I'm not sure where. So let's go back out here. I remember from this documentation earlier, they said if we wanted to build the paper, we'd write make write dot paper. And so I know that make means that it uses the make file. And so if I scroll through this, I will talk about make files in a future tutorial, but these are all the instructions for building all of the data sets. And so if we scroll through here, we might look for something that says figure one and see what code is used to build figure one. So I'm kind of scrolling down through here. Ah, build figures. Okay, well, this tells me how to build the performance figure. And I'm not sure if that's figure one or not. I suppose what I could do is to open up results figures, performance. Let's do that. So if we do results figures, it was performance dot PNG was the first one. And this is the first figure from that paper. Okay, so performance dot PNG was the file that we want. And so again, code build performance figure dot R is here. And so here now is the code for building the performance figure. Okay. And so one of the things you might notice here is that there are no comments. Bad pat. And so, but we have found the code for figure one for both studies. So where is the main document? If we go back to Meadow at all, I believe that this is the main document, this dot RMD file. The list surfaces dot RMD is the main document. And here what they're kind of providing is a notebook for how they made the figures and how they did their analysis. For the Westcott and Schloss, if we come back to the home directory for the repository, we might go to submission. And we see then there's Westcott opti-clost M sphere, 2016 dot RMD. And this is more than likely the main document. This is the manuscript that was submitted and that if you, as we learn about literate programming in a future tutorial, we'll see how this code is used to fill in values and plots for generating the final document. So again, it didn't take very long to find the main document for both repositories. And then the next question is where would the data go? Where would the data go for Meadow at all surfaces? Here it appears that it comes in here. There's this source habitats blast class. I'm not sure what that is. This is not a CSV as a pretty generic file format, so I'm not exactly sure what's going on. But this is perhaps saying the source and then the abundance and taxonomic ID. They also have a file called Rdata, which is a way to save data from an R session that's commonly used. But I don't immediately see how I could add new data to this project if I had other data from other surfaces that I might want to add to compare to say, if I had data from a university classroom at University of Michigan, how would it compare to theirs? It's not immediately clear how I would do that, albeit their purposes here are mainly to describe how they did their analysis, not to describe how you would interact with their analysis. And then if I look here in Westcott OptiClust, if I look on data, the raw data I believe are in here, and like the human process references in soil. And so here is the soil metadata. And so what we notice is missing from this and also the meta at all dataset is that if we look in these, this project analyzed sequence data, but there's no sequence data in here. And so something we might ask is, well, why don't we see those files in here? Why aren't the reference files in here? Talk about that issue of how we deal with large files. So GitHub cannot easily handle large datasets. Each file has a limit, and each file is capped to 100 megabytes, and each repository is capped at 1 gigabyte. So many of our datasets we cannot host on GitHub. And so we have to do some tricks to get around that, and mainly we need to provide instructions on how to get those files and how to generate intermediate files. We need to provide those instructions with our code, and that's part of what we're going to be doing today as we set up a new project. And so, again, we generally only post the data files that are critical to the analysis, or metadata. And our scripts should document where data go and how it gets there. And we'll talk more about using Git and GitHub later for handling large datasets. So we talked about the noble organization, and as I've been doing a lot of projects with microbiome data in my lab, we've come up with a system of organizing our projects that works pretty well for us. And so this is the overall structure that you saw on the Westcott and Schloss repository, where we have the home directory for our project. We have that readme file, maybe a file about contributing, a license, citation, how we want to cite the project. We have a directory for all the files for submission, a directory for all of our data. So maybe we have a references directory, a raw directory, a directory that contains output from mother. If you're using Chime or some other program, you could have a Chime directory. You might also have a directory for process data, so clean data that won't be altered. You might have a directory for code, also a directory with different results, whether they're tables, figures or pictures. And then you might also have an exploratory directory where you're trying out some different types of analysis and you want to keep track of those and perhaps some type of research notebook, computational research notebook. And then finally, a executable make file that helps to run the overall project. And so hopefully you can look at this structure and agree with me that it makes sense that you could come into this with out knowing much about the project and figure out where things are. And so I'd like you to participate in a project with me that I'm calling the bad project. And if you haven't seen this Lady Gaga parody from the Zhang Lab at Baylor, it's pretty awesome. We can all relate to inheriting a bad project or perhaps starting her own bad project. But you can imagine that perhaps there was a former student or a former postdoc who left and you've now inherited their project and it's up to you to make heads or tails that you've done and to turn that into a manuscript. And you look at it and it's just a disaster. They've got kind of the garbage can organization where they've dumped everything into a single directory and you've just watched this tutorial and now you want to organize it into something logical. And so what we'll do is we'll download a data set into a new directory on our EC2 instance that we'll call bad project and we're going to use the recommended directory structure to move files to the appropriate directories. And so we can, some hints, we can use the command mkdir to create a new directory and we can use mv to move a file from the root directory into our new directory where the word directory here is the name of the directory in current file is the name of the file that we want to move. So also don't forget to include the slash after directory because after the directory name because that slash tells you that it's a directory. So let's go ahead and do this and I'll help you get set up. So we're going to return to our tmux instance and we're going to type wgethttps and also copy and paste github.com. So when we run this we want to make sure that we're on our EC2 instance. It's best to get into the practice of using that rather than using your own computer. So if we run that and now we type ls we see that this archive is in our home directory and we know that it's the home directory because there's a tilde here. Similarly if we type pwd we see that we're in home ubuntu. Your username is ubuntu on this instance. So we can then decompress this archive by typing unzip v0.1.zip and this explodes it out. If we type ls we now see that we've got badproject-0.1 and then we still have that archive in the home directory. Let's go ahead and get rid of that archive and we can do rmv0.1.zip and let's change the name of our badproject directory to get rid of the hyphen 0.1. So move badproject to badproject. So now if we cd into badproject type ls we see a whole bunch of files. One of the things with Tmux that you might notice is that if you scroll up you lose a lot of the output. So what we'd like to do now and what I'm going to shut up and let you do this now is to create the directory structure that we talked about for a microbiome analysis. So if you come back to this general structure for a microbiome study you'll see that we probably need a directory for submission, data, data references, data raw, data mother, data process, code, results, exploratory, make file and that we want to create that directory structure and move our files into those directories. So again I'll just briefly show you something that we could do mkdir code. So I have a code directory and I'm going to move plot nmds.r into code and if we look at ls code we see now that code, the directory code has plot nmds.r in it. So I will tell you to now pause the video and go off and on your own and give this project some organization. Great, so hopefully you have something like this. I did add a little Easter egg in here that if you type cat solution you then see my organization. And you can see where I put say these fastq.gz files into data raw or the hmp mock faster into data references. This is not meant to be the perfect way to do it. This structure is a structure that my research group has found works pretty well for us. Every project is a little bit different and so it's a starting point to have this structure that helps us to organize our projects. It's not meant to be a straight jacket to force you to do things a specific way. It's a tool to help you to organize things. And the key again is that somebody that comes into your home directory that comes into bad project should know where they can find different things or they should know what's in data references. And so if we go from that big garbage can of data to type ls now and you can see there's a directory called code, data, results, submission, there's a couple files, license, make file and readme that it's much easier to get a sense of where things are. So like I said, this is a structure that works really well for my lab and it's a structure that's become fairly standard for my research group. And so to help with this structure and because it gets kind of redundant to do the same thing over and over again, I've created a template that I call New Project. And so we can download New Project which is at that link. We can return to our home directory and we can then download that zip file, decompress it and use that for the latest release of New Project to start our own analysis. So what we're going to work on in this tutorial as well as in subsequent tutorials is trying to regenerate a paragraph and an NMDS plot from the Kozic study that was published in AEM. And so if you're not familiar with that paper, you might go back and look at it. It's a paper where we describe our method of sequencing 16S genes using Illuminos MySeq. And so this was published in AEM in 2013. So something I like to do is to name my directories or name my repositories to include the last name of the first author, a very brief like one word, two word description of the paper, the journal it was published in and then the year. All right. So I'm going to go ahead and click on this instructions link to go to this tab and here are instructions on how to use the template. So we're going to download the latest release to the directory and decompress. Okay. And so if I click on this in a new tab, I see that the latest release is here and there's a link here for source code that if I right click on that to copy link, I can say, well, I need to go back to my home directory Tilda to get back home and I'm going to clear my screen by doing control L that I can type WGET and then it downloads it and if I type LS now I see the 0.11.zip directory which is what I archive which I just downloaded as well as my bad project directory. I now want to unzip this directory and do unzip 0.11.zip a whole bunch of stuff spits out and if I say LS I now see I have that archive bad project and new project I'll remove the archive and I will rename new project to be Kozic re-analysis re-analysis re-analysis can't spell AEM2013 and I will cd into the Kozic directory type LS and we see the structure that we've been talking about there's a few extra files in here there's the instructions which we're looking at on the github site and then there is a license for a new project and there's a license for there's a license for this template for new project template so I'm going to go back to the instructions here on the website just because it's a little bit easier to read we've already done this where we've renamed the directory to be Kozic re-analysis AEM2013 we then want to it tells us then to open the readme document here change the first line to reflect the title of your research study and the content from that section to the end you can but are not obligated to keep the acknowledgement system section you should keep the directory tree so I'll do nano, readme .md I'm going to get rid of this line in nano you can type ctrl-k to delete a line and I'm going to replace this with something like Kozic re-analysis project and I'll save the abstract for later this is the overview that we've talked about I'm going to leave this now but I might want to make sure that I update that as I update my directory structure and then here are the dependencies how to regenerate the repository and I'm going to be doing the analysis there's other stuff in here that you can update and customize for your own purposes do you remember how to save this right we do ctrl-o ctrl-enter ctrl-x to get out and if we look back at our instructions we want to replace the license from the templates repository with the license for your project so license this is a public domain license for the repository for the template that we made and that template was part of I've been put into the public domain so but your project isn't in the public domain so we're going to open up nano new project license md and see that this is under an MIT license we'll leave this for now in a subsequent tutorial we'll come back and we'll talk about the license again but we want to make this new project license the real license for the project so we'll do move mv new project license license.md and that's great we do ls we now see that we only have one license directory in here one license file I'm sorry what we'd like to do is to have this directory for our project be under version control and so to do that we're going to use git without really knowing a whole lot about git but that'll be okay and as I said in the introduction we're going to be talking about git in the next tutorial as well so to get started we're going to type git init this is a nice period I'm going to go ahead and do control L to get a clear screen git init period there's spaces between git init and init in the period we hit enter we can then do git add period and then git commit dash m quote initial commit these three commands did we'll talk about those more in our next tutorial so we'd now like to connect this to git hub so I'm going to go ahead and go to my profile in git hub and I am going to create a new repository so I'll click on this plus sign up of the screen and do new repository and I'm going to name my repository as kozic reanalysis ah aem2013 so this repository name does not need to be the same name as your directory over here it's just kind of helpful to make things align so we'll go ahead and keep this as a public repository we don't want to initialize with the readme or a license because we already have those click create repository we already have an existing repository so we want to push an existing repository from the command line they have a nice little copy to clipboard button here so I copy that and then over here I can paste hit enter asks for my username I can say pschloss my password and so I'm entering my github password and username so I have now pushed we'll talk more about that in the next tutorial but I've now pushed the content from this repository up to github and so if I come back to github and hit refresh I now see that I've made a new repository so this is a bit more of a sophisticated version of a repository than the one we made for our paper airplanes there you go, high five, good job but you're not done yet I have an exercise for you to engage in to go ahead and to get the raw data and the reference files that we're going to need to run this analysis so ideally you'd work with a partner but I realize that you might be working individually so that's fine hopefully you are working with somebody that has some mother experience going to be a mother tutorial series we will not use much mother at all in here talk much about mother so it's not super critical it would be good for you to be familiar with the original manuscript with the original paper but again it is important so to help you find that, if we go to click on this in a new tab this will open up at the AEM website so this is the manuscript and so you're going to need to look at this to figure out what kind of references did I use where are the raw data I was not a good kid I did not put them in the SRA at the time and so you're going to want to look through here and to make some judgments about where the data are the references you need and what software you need so what I'd like you to do then is that you need to get the data and as my hint at the website you'll go to there will be a link for with genomes slash meta genomes you want those data you'll want to get the reference files you'll want to get mother and you'll also want to take the text for the figure legend and the paragraph for the readme so the figure legend for figure 4 and so you're going to want to obtain the files and put them in the correct locations in your ec2 instance and I'd like you to do your best to document what you are doing so if I repeat everything we've done with this new project template and I want to get going to create this Kozitch reanalysis following your approach I need instructions from you as to what you did as some helpers we've already used a tool called wget wget is useful for pulling down large data files to the current directory there's another tool called curl I learned wget so I like wget tarxvf will unpack a tar file tarxvzf unpacks a targz or a tgz file you can see an rm will remove files or delete them and if you're making a markdown file like a readme file you can have tx rendered as code on github by putting the code inside of two sets of three backticks so this is a backtick which is the lower character on the key with a tilde so it's the key right above the tab on the left side of your keyboard so when you're done wait don't get fancy and make a github commit or push we'll do that in the next tutorial and please wait for me if you'd like add me or your pi as a collaborator to this project and you can do that by going to settings collaborators and then enter the handle so mine is pschloss which is me but that's kind of silly because I'm already collaborating with myself so if you want to add me or if you want to add your pi or somebody else that's great so look at this list of things that you need to do and go ahead and do them stop the video and when you're done come back and I'll show you what I did hopefully that exercise wasn't too challenging hopefully you're able to get the sequence data the reference files mother and the text for the paragraph and to dump that into the readme file in addition I hope you're able to document what you did and the various steps you took to obtain those files I'm going to share with you what I wrote for my code and again it's not critical that yours matches exactly what mine looks like but what I'm trying to illustrate with my readme file this one here is in the code directory is how I obtained the linux version of mother and hopefully you can see from this code here this bash code that this would allow me to copy and paste these lines into a terminal to get these to get mother installed into my code directory so then if I ran code mother mother with the dash of e I would get the version number similarly here's the instructions to get the fastq.gz files from the website where we posted the data and you can see there's a wget command here there's a tar call to decompress the tar file we got rid of that tar file because I put all the data into data raw so there's many ways that you might get to the desired output of having all your fastq.gz files in data raw this is how I did it the key that I want to emphasize is that we can use code and we can use text explanations to describe to somebody how they can achieve what we did and so if you looked at my three lines here you could say oh pat you have a bug on line 2 you need to use this parameter instead of that parameter so it's transparent that I can see what was going on or somebody else can see what was going on the advantage of writing it as code with these three lines of wget tar rm rm is that I could copy and paste these into my terminal generate those files and to get the files and put them automatically into data raw without me having to do anything else their instructions for the computer and their instructions for somebody else that might come in the future here are the instructions to get the references that I'll use I used a silver reference alignment file that I got from the mother website as well as the rdp reference taxonomy that I also got from the mother website so if you struggled a little bit to find these on the mother website again it's not a big deal that comes a bit with some of the familiarity with mother and again as I'm saying this I'm realizing I'm assuming that somebody knows how to use mother and where they can get these reference files and so by again if people already had this code and could see it in my repository they would know exactly where I got the train set version 14 from this code in my readme file but that wasn't immediately obvious from looking at the manuscript and then finally I took that paragraph scaling up from the cosage paper and put that into my readme.md file in the home directory of my project so again don't feel like you have to use the exact same approach I used to documenting this and using the bash code to get the files and to put the files where they belong the goal is to get the files put them where they belong and by whatever means necessary but hopefully those means are reproducible and that somebody else could come along and do it as well and so something you might do is try deleting all of those files that you just downloaded and rerun your instructions and see if you can do it without getting any error messages some exercises to work on I don't want you to enter these into your AWS instance but think about and write out perhaps by hand or in a document the commands to create a new directory how to move from your home directory to some other directory how to create a new git repository how do you get data from an internet based resource how do you move a file from one directory to another how do you rename a file and then how do you decompress a file ending in zip targz and tar and if you can do all of these things then you'll be in really good shape for the future tutorials before we close we want to remember that we need to close our instance and so we need to come to our shell but that will get us out of our tmux window and if we type exit again we're now at our home directory if I type exit again that window will close and if I come back to my EC2 management console I can do actions instance state stop and I'll say yes stop well that was a lot of work I really hope that you took the time to pause the video and engage in the material to fix the bad project template to organize the data and reference files that we'll need for the Kozitch re-analysis effort I know there's a temptation to watch these videos straight through I know when I watch other people's videos I do that myself but you really get so much more out of the experience by doing the work in parallel with me once you feel like you've understood the material and are pretty confident in your skills take a look back at a project you're working on for your own research and think about how the files are organized I don't mean to imply that you've got a bad project but perhaps you can replicate that exercise with your own project to improve the organization you can also go back and generate some readme files to describe how and where that you got the data and reference files for your own project so this is the second time that we've worked with Git the first time was with the paper airplane example we used Git on GitHub even though you didn't really feel like you were using Git perhaps in the next tutorial we'll do a deep dive into Git that tutorial will give us the skills we need to keep track of the progress of our analysis until then keep practicing and we'll talk to you soon