 Today is an exciting day. You know why? Because we get to start a new project. New project day is always my favorite day. And we're gonna work on this project together as we've done in the past here on YouTube with our Code Club episodes. Now don't worry, I don't think this is going to lead to a paper. I don't know, maybe you thought that was fun, but that seemed kind of long and drawn out. I'm gonna try to keep this much more focused, maybe keep it to a month or maybe two months. I don't think it'll go two months, maybe a month, month and a half. So I have an ulterior motive here. I don't have enough time to do all the things I need to do, and there's things I want to do, right? So making these videos, that's something I want to do. I really enjoy making these videos and interacting with people around the videos. I have to make a report, right? A report to give to a collaborator so we can present it to this board that we're gonna try to get data from. If you know my story, I've been trying to get this data set now for about six years, seven years, and I have like no hope that it's gonna come, and so I kind of feel like this is a bit of a waste of time. But it's not a waste of time if I make a video with you all. So along the way, we're gonna work with the data set that we've actually published many times in now. I think probably like six or seven, maybe eight papers, we've used this data set. The data set was originally generated by a former graduate student in my lab named Neil Baxter, where he was trying to use the gut microbiome to predict whether or not somebody had a lesion in their colon to detect whether or not they had adenomas or carcinomas in their colon. Now, my collaborator and the people we're trying to get this data from, they have additional questions and want to kind of think about the data in a variety of different approaches. So I figured let's redo it to kind of suit the questions more specifically the way they want. Also, over the years, our thinking in how to build these types of models or classification schemes has changed as well. And so recently, a number of graduate students and postdocs in my lab and other labs have put together a package called MECROPE ML. And so MECROPE ML is a package for R to analyze a variety of different types of data using a variety of different types of machine learning approaches for making classification and regression based models. So if that all just went way over your head, then be sure you're subscribed to this channel. Because again, over the next few weeks, I'm going to go through this project, go through MECROPE ML and show you all the great things it's capable of doing. And I guarantee you'll you'll learn a lot. So definitely be sure you're subscribed, click on that bell icon so you know when the next episode is released. And you can keep learning. So I also want to do this project entirely in our studio, or at least as much in our studio as I can. You'll see today I also use Safari or my browser. Anyway, I want to do as much as I can in our studio because I know that one of the daunting things when someone joins my lab, or perhaps when you started watching that other video series, was like, oh my gosh, Pat has so many tools going on here, right? He's got bash, he's got the he's got get, he's got our studio, he has Adam, he has make, he has all these things, right? So I want to try to keep it simple, and show you how much we can do in our studio. That being said, this kind of violates one of my rules. So this isn't the way I generally do things, right? I don't generally do everything in our studio. But I see a pedagogical reason for doing that is to do as much as we can here in our studio. And along the way, we'll identify some friction points. And when we identify the friction points, I might say, you know, this would be much easier using this other tool. But we'll try to do as much as we can here in our studio. And one of the things that we'll do today is launching a project, creating file organization directory organization, and linking that to a account on GitHub, a repository on GitHub, so we can keep the whole project under version control. And like I said, we'll do all of that in our studio. So I'm excited to get going. And so let's dig into this new project. I'm here at the Riffa Monus account on GitHub. GitHub, if you're not familiar with it, is a website dedicated to social computing, isn't everything social these days? Anyway, it's a website that makes it easy to post your code and to get access to other people's code. All of the content for these code clubs is available through this Riffa Monus account on GitHub. If you're an academic, you can get an account for your lab for free. Actually, anybody can get an account for free. But you have to make all of your repositories public. If you're an academic and get an account, then with that educational account, you can have private repositories, if that's, if that's how you want to roll, right, I'm going to make everything public, so you all can follow along and, you know, get the code along with me. I'm going to go ahead and create a new repository. I'll go ahead and link up above to an episode that I did almost a year ago now, showing how to set up get and set up a GitHub account. I'm going to create a new repository and our repository is basically a directory that has all of my code, all of my text, as well as the history of any modifications I make to that, right? So right now I don't have a repository. So I'm going to go ahead and make it here in GitHub. And I'm going to call my repository name as part of Riffa Monus, M-I-K-R-O-P-M-L demo. So it's going to be my M-I-K-R-O-P-M-L demo. I'm going to make it public. And I'd like to add a readme file, a getignore, and the getignore I'll use is a R-type template. So they'll throw things in there that they, that I would then want, get the version control system to ignore. So they won't keep track of that as part of that history chain, right? And then I'm going to choose a license. And I'm going to put this under the MIT license, which is a fairly permissive license that allows anybody to get my code and modify it as they see fit. I only ask that they then give attribution back to me. Okay. And so then I'll click create repository. And now I have my M-I-K-R-O-P-M-L demo repository here off of the Riffa Monus GitHub account. And so we're in good shape. This is living up in the cloud somewhere, right? And I need to get this down into my computer. So what I'll do is go ahead and click on code, and then click on this clipboard icon to copy the link or the address of my repository. Coming over to our studio, I can then do file new project. And I will then use this third option of version control. Previously, we did either new directory or existing directory, but we're going to do version control. It's going to be a git repository. And I will then go ahead and put in that link that I copied. And I'm going to create this project as a subdirectory of my desktop. I'll have it living on my desktop. You can go ahead and put this wherever you want. Again, you could click browse and then, you know, move it wherever you'd want. But I'm going to leave it on my desktop, somewhere that's convenient for me to get access to. And I'll say create project. This then relaunches our studio. I see in the upper left corner here that my working directory now is desktop micropml demo. In the lower right corner, I now have all of the files that were up on GitHub. And so we're in good shape. So the next thing I want to go ahead and grab are the data that we are going to use for this demo back in Riffamonus. I actually have a directory called raw data right here. And so this is the data that I use with my minimal R instructional materials that we also used for the visualization episodes that I did pre Juneteenth. So anyway, we can come to the releases. So we're at 0.03. That's the latest release. If you're watching this in the future, go ahead and grab the latest release. But it should work with the code from 0.03. So whatever version you get should be good. And I will then go ahead and grab source code zip. This will then download it. So I have my micropml directory and my raw data directory. I'll go ahead and rename it to be raw data. Also know that if you're working in Windows, everything should be the same should be the same. Let's hope for that, right? And so of course, how you rename directories or create directories, things like that will vary by the platform that you're on. I'm going to then drag this directory raw data into micropml demo. And then if I open that up, I now see that I have raw data, my license, all that. I'm going to go ahead and create another directory in here that I will call code. That will be the directory that my my code lives in. And I will also create another directory that I'll call a processed data. And so I try to keep my raw data separate from my process data and keep my code separate from that as well. I'll also create another directory called documentation. Right. And so any documentation that I write, those reports that I write or summaries, will go into that documentation directory. So I think I'm in pretty good shape for what my project directory looks like. I now look over again in that lower right corner of our studio and I see all of those directories there. For some reason, documentation is there twice. Let me just go up a level back. And I see that there's only one version of that. Just something funny must have happened in the interface here in our studio. One thing that I need so that I can keep these directories up on GitHub is I need to have some file within each of the directories. I'm going to go ahead and create a couple new R scripts. And so I'll call I'll save this as readme.md. And I'll save this into process data. I'll use md. Yep. So another that I will save as readme.md. And I'm going to store this in my code directory. So use md. And then I'm going to create another one that will live in my documentation directory. Then I want documentation. I'll clean up these windows. So now all of these directories have a readme file in them, which makes sure that there's something physical, so to speak, are digital inside each of those directories. So then I can keep that directory in this structure under version control so that when I push or move this up back to GitHub to that microfml demo directory in Riffamonus that that directory structure will be retained. Now, up in the upper right panel, there's a tab here for git. And you'll see now that there are a variety of files and directories that are present. But if I put my mouse over the top of the status, it says untracked. And so that means that git is not keeping track of these. Now, I've got raw data, process data, documentation code, as well as my RStudio R project file. I actually don't want to keep track of raw data. Raw data tends to be really big. And it just makes it the repositories get kind of bloated. And so as a general rule, when I'm pursuing a project, I don't commit my raw data. In my read me for the overall project, I'll probably put a link to that raw data directory. So let's go ahead and see how we can ignore that raw data. And I can do that by going into this file called git ignore. And you'll see in here, there's a variety of files that GitHub put in here to ignore for a kind of generic R file. And so down here at the bottom, I can go ahead and put raw data forward slash as the directory. And I'll go ahead and save that and close git ignore, hitting refresh raw data then goes away. But now I see that git ignore has an M, which means that it's been modified. So git ignore is being tracked. It saw that modification. And it's ready for me to perhaps do something with that. One of the nice things is if I click on git ignore, and I do a diff, make this a bit bigger, that I can see what's been changed in my git ignore file, right. And so this green means it's been added. So these this graphical interface for working with version control is nice. I've never used git within our studio. Actually, I have. And I found the whole process a little bit wanting and just it was just painful, right? It wasn't as easy as doing things from the command line. I found that in its effort to try to make things simple, it actually made things a lot harder. So I've always done things from the command line. But like I said, for this project, I'm going to do everything from within our studio. And we'll see if things have gotten better. And and I want to learn how to do it, right? Because I want to see if perhaps it's easier. And because it'd be, you know, great then to help other people learn how to use version control through a nice interface like our studio. So you have everything, you know, your programming, your coding, your text, your directories, your version control, all in this one piece of software, and that that obviously has certain advantages. Okay, so I can then go ahead and click these buttons to stage the files. If you're doing this in the command line, this would be the same as like git add. So those are now all staged. And I can then do commit. And so now we are ready to write a commit message indicating what is happening at this stage. So I'll go ahead and add add project organization to directory. Okay, and I can then do commit. And so again, this is doing a git commit if you're running things from the command line. And now everything is in good shape. And then I can then click this push button to push things up to get up and close that. And now if I come back to my browser and hit refresh. So now everything that is in our studio on my local computer is mirrored up in my repository, my micro ML demo, repository off of the Rafa bonus account. And we are in good shape. And so I strongly encourage people to follow along. And if you would like to follow along, you can do just that. And so down below in the description for this episode, you will find a link to a blog post that goes with this video and tell it giving you instructions for how you can get exactly what I have. And so because this is the first episode, what you can do to get caught up instantly is that you can go ahead. And just like in our studio, where I did new project version control, you can then clone from that repository, and you can then plug the information in here, and you can get exactly what I have as well. And we'll all be at the same place. Again, I'll put a link across the top here to a previous episode that I made on installing git. You might need to go into the command line to get that installed. But other than getting things installed, you should be in good shape. Also, if getting that installed is a little bit beyond where you are in your skills, don't worry about it. Running git is not a critical component of setting up and running this project. It is a very nice tool for making your code more open and available so that if we're writing a paper that anybody could go and see your code, we've done that on that previous project right. The other nice thing about having everything under git is that if you happen to accidentally delete everything, you can then very easily get it back by pulling things back down from the repository, and you'll basically be good to go. So don't feel like you have to go ahead and install git and do everything through GitHub, but I do think it'll make long-term your life a lot better. I certainly get a lot out of having all my code up on GitHub. And again, everything that you see me do, all my websites, the papers that we publish, the stuff that I do in these episodes, it ultimately all lives up on GitHub as a way for me to make it accessible to you. Anyway, like I said, please be sure that you're subscribed to the channel down below so that you know when the next video is released as we kind of keep trucking through learning more about how to apply machine learning techniques to microbiome data. If you have any questions, by all means down below in the comments, ask me questions about machine learning or your experience with them and tools you've used. We're going to be using this package micro-PML that my lab has written and created. And so we want more people to hear about it. So I'm going to give you a heavy dose of micro-PML over the coming episodes. But I really do hope that you come back and that you feel free to ask any questions that you have and I'll do my best to answer them as we kind of march through the next few episodes. Anyway, please be sure to tell your friends about what we're doing here in Code Club and this project that we've just started. Ah, it's always such a great day to start a new project, right? Ah, it just has that like shiny new car feel to it. We'll see how long that sticks around. Anyway, I'm sure it'll stay for a while. So go ahead and get everything set up and if you have questions, let me know and we'll see you next time for another episode of Code Club.