 If you were one of the brave souls that followed me last year as I went from an empty directory all the way through a completed project that I ended up publishing as a manuscript, you know that I am a stickler for organization. Well, in this episode of Code Club, I am going to show you my philosophy for organizing projects in three different ways that we can get our project into good organizational shape. We'll do that using the graphical user interface of my Mac computer or you might do with Windows Explorer and Finder and Windows. We'll also do that using exclusively all our commands and then we'll also do it from the command line and I'll tell you why you might pursue one approach over another. So why am I such a stickler for organization? Well, invariably in every project there's going to be a period of time where the project will pause, right? So as I'm recording this, I am coming back from the holiday break where I took maybe a month off and I'm not quite sure what I was doing before that break, right? Well, if I can look at the organization of my project, if I can look at the directory at the root of my project and very easily see where everything is, then it makes it a lot easier to come back into the project. I can see where the code is. I can see where the raw data is. I can see where the results are. It makes it much easier for me to navigate the project. And so if it's easier for me to navigate my own project a month later, then imagine how much easier it is for someone else to navigate your project when they're coming to it fresh. A number of years ago, I had a graduate student or postdoc. I forget exactly who it was. I just remember this event. I went into their project directory and there were thousands of files. I'm serious, thousands of files. And it was impossible for me to navigate what was going on in that directory. There was no organization. Imagine if instead I had come into that directory and I could see things like code, data, raw data, process data, the manuscript directory, results, tables, figures, right? If I had separate directories for each of those and perhaps had a read me file and the root of the project, it would be so much easier for me to then approach the work that somebody else had done. So that's what we are going for. I realize that the project we're currently working on only has maybe seven or so different files. So it's not that big of an organizational nightmare yet, but it's not too hard to believe that if we've kept maturing this project a bit, it might get a little bit more hairy to kind of look at all these different files if we don't first impose some type of organization. So you can see here my distances directory. This is the directory where I'm keeping track of all the files that relate to this project that we're currently kind of working on developing here. And I've got a couple of different types of files in here, right? So I have a few different distance matrices. These are the the files. We have some code. It's not hard to imagine that some point we might have a results directory, we might put our figures and the submission director put all the stuff that we might want to submit when we're ready to submit the manuscript for publication and for peer review. So what I would do in a very easy case like this is you could use the finder window to organize it, right? So we could do a new folder on all call this code. And I'll go ahead and put read matrix into that code directory, go ahead and create another directory that I'll call data. And I can go ahead and move these files into data. And I'll go ahead and create a directory called results. So one of the things to keep in mind is that while I don't every episode tell you that I am committing the changes to a Git repository, I am. And so one of the challenges of using Git with a project like this where we have directories that are empty, is that Git really only keeps track of files and kind of the path to those files gets not going to know how to keep track of the history of the results directory, if there's no file in that right so get will keep track of the history of a file, not the history of a directory. So I'm going to go ahead and I'm going to copy this read me file into results. And I'll go ahead and open this. So this is the read me file opened in the atom text editor. And again, it doesn't matter where you do it as long as it's a text editor, I'm going to go ahead and delete everything in it. So it's an empty read me file now. And so now I have a file in my results directory that it's at least kind of a placeholder for this directory. And so if I expand all these directories, you can see the organizational structure, right? So I have code in my code directory data in the data directory, and that read me file in the results directory holding the place also at the top level of the project what I call the project root directory, you'll see these three directories, as well as an our project file license and the read me file. So the next approach that I want to take is showing you how we can use commands from base R to go ahead and organize our project. If you look down at the files tab here of my R studio window, you'll see that I've reverted everything back to the way they were. So you might be asking yourself, why would you ever want to do these file and directory manipulations in R versus the graphical interface? Well, first of all, you don't always have the benefit of a graphical interface. For example, my lab does a lot of work up on in the cloud on a high performance computer at the University of Michigan that does not have a graphical interface. And so all the manipulations we have to do would either be at the bash shell prompt. So basically in Linux or from within R. Alternatively, sometimes you might also want your scripts to generate directories that are then used to deposit new data, or to perhaps get a listing of all of the files in a directory that you then want to feed into some script that might synthesize all that data together. And so it's not totally ridiculous to think that you would like to use R to do file and directory manipulations. So let's go ahead here in R and I will show you how we can recreate that directory structure that I showed you using the finder window. So I think the first useful function to get under our belt is list dot files. These are the files, these are the seven files that we saw previously. And so now we want to go ahead and create some directories that we can then move these files into to create directories, we can do dir dot create. And then I can do data in quotes. And I can do dir dot create code. But again, that needs to be in quotes, dir dot create. And then I can do results. And again, if I do list dot files now, I see that I have these three directories in here, I have code, data and results. And I can look at the contents of any of those directories by doing list dot files. And then I can say data, right, so that will show me the contents of the data directory. And the actual argument for this is the path, right, so I can say path equals data. And there's nothing in there. So we have a couple different approaches for moving files into the individual directories. So the first approach that I'll use is file dot rename. And so that is going to rename a file. And so you might be thinking, how are we going to rename it to get it to move into that other directory? Well, I could take in quotes again, read matrix dot r. So that's what we're going to the original name. And we're going to name it to code forward slash read matrix dot r. And if I do list dot files, and the path again being the code directory, I now see that that file is in there. And if I do list files, without any argument, I get the current directory listing, and I no longer see that read matrix dot r script in the project root directory. So that's one way to do it. Another approach would be to copy the file from the current directory into a new directory, and then delete the copy from the current directory, file dot copy, read me dot md. And we're going to copy it into results. So again, if we do list dot files on results, we see read me dot md. Again, if we do list dot files, we see that we still have read me md in our project root directory. Now what I could do if I didn't want to keep that read me md would be to do file dot remove on read me dot md. But I don't want to do that because I really want to have that read me file. And actually what I might do is I'm going to go ahead and remove the read me dot md file that's in my results directory that I just created. Because I don't want that read me I want to blank read me if you recall what we did just a minute ago, we created a blank read me file. So I'm going to go ahead and remove that. And now if I do list dot files on results, I see I don't have anything there. But what I could do would be to do file dot create. And then I could say results forward slash read me md list dot files on results. And I see that that file is there. If I came over to results and opened up read me, again, it is a blank file, which is what I wanted because I want a file that's there as a placeholder, forget to keep track of again, if we look where we're at with list files, we currently still have those three distance files. We want them to be in our data directory. Now imagine instead of three, we had 303, right or 33. I wouldn't want to do this file rename over and over again, or copy and delete over and over again, I want to kind of minimize the number of steps that I was doing this on. So what I might do instead would be to use list files. And I'm going to get the listing of all the files that satisfy the criteria I'm interested in. So I could say pattern equals dist. So this will list all of the files in the project root directory where I'm at, or whatever path I might give it, that have dist in it, right. And so I see dist in distances, as well as these three other distance matrix files. So I might want to make my pattern a bit more specific. And so what I might do would be to put a period. But this pattern is part of what's called a regular expression. And a period in a regular expression will match any character, not just the period. So if I wanted to match the period, I could do backslash, backslash, period, dist, I now get these three distance matrix files, and I can call these dist files. And now I can take those dist files, and I can copy them, and then delete them from their current situation. If you have more advanced skills with regular expressions, you could probably do the same thing with file rename, I will leave that to you as homework. But I'm going to do it here with file copy and file remove. So how would we do it with file copy? Well, I'll do file copy. And then the from will be dist files. And to will be to data. Right. So I get three truths saying it worked list files on data. I get those three distance files, which is good. Of course, if I do list files, because I copied it, I still have them in my project root directory, but I can do file remove on dist files, I get all truths. And again, if I do list files, I now see that those three distance matrix files are gone. But I still keep this distances dot our project very good. So now I have the same setup of my project that I had previously when I use the graphical interface. Now, there is a downside to this approach. If we look in the upper right corner, we'll see what happens. Get thinks that I deleted all of these files that I moved. And so I've broken the history, I've broken the timeline of commits for all these files by moving them into new directories. And so, well, that's not the end of the world for some of these files. It's not so desirable. I'd prefer to do it perhaps in a manner that allows me to keep track of the version history of all these files. To do that, though, we need to be able to run our from the command line interface. And to do that, we are going to use the terminal window in our studio. So to get a shell window open a terminal, we can come to tools, and then new terminal. And this then opens up a new terminal. It opens it up for me in my distances directory. I'm on the main branch. Again, I have some extra things added to my bash environment to tell me the weather. It is cold out, as well as to tell me what branch I am in my project. You probably don't have any of these unless you watch that previous episode I made about a year ago on how to customize your bash environment. But I'm here in distances. And if I run LS, I see the contents of my directory, I did go ahead and get rid of all those directories and put all the files back into the main project route directory so we can show you how to do this with the command line. Of course, I'm using the terminal window here within our studio, but know that this is the same as if I were using bash or a bash window to log in to say a remote computer like I have for our high performance computer. And so this would be the same type of thing using a shell like environment. So to create the directories I want, I could do mkdir code, right? And if I do LS, I see those things listed. If I do LS-f, I see now what files or what names in here are directories. Code now has a forward slash with that dash f argument to see what's a directory. If I want to make multiple directories at the same time, I could of course do mkdir results data. I now see that I have code data and results in my directory as directories themselves. Now, I want to be able to move my distance files to data, my code to code, and to put a readme file inside of my results directory. But again, I want to be able to do this in a way that respects the history of the files. So to do that, I am going to use the MV command, so short for move, but I'm going to use it within Git. So watch how I do that. I can do git mv readmatrix.r code. So this is going to do a Git move of readmatrix r into code. If I do git status, no longer do I see that it thinks it deleted the file, but it sees that it thinks it renamed the file. And if I look up at my Git window here in our studio, you'll see that it no longer has that delete, but it recognizes that I've renamed the file. And so this way again, we can keep track of the history of the file. Of course, I can do the same type of thing where I do git mv star.dist. And so this will match those files that end in .dist, and I can move them into data. Again, if I do git status, I now see those four files that I have renamed. And of course, if I do ls data, I see those three files. And if I do ls, I now see that I only have three files and three directories here in my project root directory. The next thing that I want to do is go ahead and create that empty readme file in the results directory so that I have a placeholder for results. You'll notice that when I ran git status, it's not trying to keep track of results because there's nothing in results for it to keep track of. I mentioned this earlier, right? So what I can do is I can use a function called touch. So touch will touch and bring into existence a new file. So I can do touch results forward slash readme.md. And this again, will touch it, create it. And again, if I do git status now, it now says, oh, there's an untracked file in results, right? So this is a new file for git to keep track of. And if I do ls on results, I see the readme.md. And if I did cat on results, readme, I see that it's an empty file. Okay, so we'll go ahead and add that to our git repository. I'll do git add results, readme, git status. We see that we have those four renamed files, as well as the new file. And if I then do git commit, and I'll say organize project, I now see that everything is been committed and changed. And if I do git status, I see everything is good. And it's ready for me to push those up to GitHub. If I come over to our studios git tab here, and click the refresh, it says everything is good to go. And again, we're ready to push those commits up to GitHub, go ahead and see if you can use these three approaches to organize a project that you're working with. Again, it's not so important how you do it, but that you do it. Keep practicing with these concepts. And we'll see you next time for another episode of Code Club.