 Hey folks welcome back for another episode of Code Club. I'm so glad that you're with me as we start this new series of episodes where I will be building a new R package. I have never made an R package before. Someone asked me recently how long have you been writing an R and I think it's been about maybe 15 or 16 years and somehow I've never managed to make an R package. I think it was because I was too scared to make a package because I'd seen on social media all sorts of kind of complaining about the process of submitting a package and the feedback and just ah it just seemed like way too much but thankfully Jenny Bryan and Hadley Wickham I guess the author list is Hadley Wickham and Jenny Bryan whatever put out this great book R packages and they've been building a lot of great tools around package development and R using the dev tools use this test that and a variety of other tools that all come together to make it a lot easier to write and submit your own R package to CRAN. So we're going to do that over the next series of episodes I am going to be working with you all to build a set of tools in a package that will read in DNA sequences and then will output classification of those sequences so we know who those sequences came from. Now you might be saying like who cares well I care but anyway I had to pick something to do right I had to pick something to make a package on and so this is what I'm going to do. One of the motivations for this besides wanting to learn how to make an R package is that a very popular tool in the microbiome microbial ecology field the ribosomal database project went offline last summer and so I would like to create a way to make its functionality more broadly accessible. Yes the classifier is available in my tool mother other tools like chime2 and probably some others but it would be nice to have an R package that can do it because then maybe it would be easier to post it up to a new website that perhaps a new version of the ribosomal database project the RDP so that's what we're going to do please don't be scared off by the topic I had to pick something it was either this or sheep so you know pick your poison and maybe we'll get to those sheep before we're all set and done all right so we're going to basically do what we did in the last episode but applied to a new project I you may have noticed in the previous episode that I had a few problems around the dev tools check function I got that squared away it turns out I had something installed on my computer in a place that it wasn't supposed to be odds are good you would never have that problem so we'll move right along and the first question is what are we going to name this thing so I have come up with a list of possible names again what this tool is going to do is it's going to take sequence data and output a classification for that and so we need a name right because the name is what everyone's going to be calling it forever right so I've previously had tools like daughter and mother and sons kind of a family theme there I've had tree climber slip shop I think that's about it and so picking a name is really hard so sometimes people will pick an acronym I hate acronyms because the acronym is always just garbage right like it just it doesn't mean anything mother is not an acronym daughter d o t u r is distance-based otus and richness see what I mean it's just it's really contrived and so mother doesn't mean anything except that it was a nod to my daughter and to my wife and yeah so picking a good name we can't have any like symbols in the name for an r package I prefer to have everything be all lowercase because you get to something like lefsa which has all sorts of crazy capitalization it's kind of like you know spongebob square pants capitalization and so I'd prefer everything to be lowercase there's a general trend in our package names to have an r in them that's not always adhered to but it's pretty common kind of like read r tidy r d-ply r right and so we want it to describe what it's doing the other aspect is that we don't want it to already be used and we don't want it to be too common right so like if I'd have named mother mother with an e instead of a u well google would probably never find it until it got super popular and probably wouldn't get super popular because no one could find it you know what I mean and so I've got nine or so names here most of which I don't really like at all and so I want to kind of go through why I do or don't like them so I like phylo typer because phylo type is what we do when we classify sequences so if I take a sequence and I classify it and say oh that's an ashrich or oh that's a basilis we have phylo typed it right and so phylo typer uses that r motif and it also does hopefully what it says it is right mother too eh I'm not such a fan of that mother is far more than just classification and I'm not ready here to build out a whole new version of mother as an r package but maybe we'll get to that someday classify r that's that's a possibly good name so I'll put that up the list classifier is probably too too generic right it's it's it's not going to be very accessible via google right 16 s uh actually this is not allowed because you can't start a package name with a number as I understand so you can have numbers in a package name but it can't be the first character 16 r is another idea I had r actually is the successor to s so there was an s programming language and then it was followed up by r it's kind of the opposite of c and c plus plus and rdp um I could do that but I feel like I'm stepping on their toes and I don't I don't want to do that rdp tools kind of the same idea riffa monis being the name of this channel eh um I would like riffa monis to be far more than classifying sequences so again we have this tension between being super specific and super generic right like something like 16 s or 16 r um or even putting that in the name I feel like would be too constraining because I know people use uh the classifier within mother at least to do things other than 16 s so I don't want to be too restrictive so I'm going to go ahead and take these names over to google and see what we find with classifier and so this shows up already as being a package in bioconductor so that's that's not going to work um I guess we could look and see well what what is it doing yeah so I'm not totally sure what this is doing but it's it's not classifying sequences so that's fine but but again the package already exists and that's that's going to cause confusion we don't want to do that what happens if we do classifier um this gets us a whole bunch of stuff on classification uh what if we did classify 16 s um we get papers so there doesn't seem to be anything that pops up here um if we do phylotyper uh I see that there is a paper that was published on a tool called phylotyper uh in silico predictor of gene subtypes um it turns out that this is actually a python package um if we go to their github repository we see that it's it's really a python package that I think uses are if I understand it right one of the things that stands out to me is that this has not been touched in seven years um and so this is on the border of I like I don't like um and so that's um a little bit um has me a little bit cautious about pursuing using phylotyper as my package name so we can go to the crayon website at crayon.r-project.org crayon is the um archive of our packages so it's the comprehensive our archive network right um and so we can go to packages and then we can look at um table of available packages sorted by name and so let's go to classify let's see if we have anything here by classify so there's a bunch of things that have this uh so allergy classify let's see classify r um so the classify r right was a bioconductor page if we do classifier um I'm just kind of let's but that's going to be up in the c's right uh so we'll come down let's see I don't really like this name again because it's so easily googled it's not very unique right and so class classify so classifier uh doesn't show up but really if I'm going to do classifier I want classify with an r so so let's look at phylo type and so again scrolling down here I guess I could search phylo t right so there's no phylo type right so there's phylo tape phylo tools phylo top um and these all have to do with phylogenies so that's uh different enough and a kind of these other phylo's up here also have a lot to do with kind of evolutionary things right so it gets us kind of in the right ballpark uh you'll see phylo seek graph test here phylo seek itself which is another popular tool within the microbial ecology world is actually over in bioconductor bioconductor is another repository where you can get r packages I'm not totally sold that I want this to be in bioconductor I don't want my first package to be a bioconductor package because my sense is that bioconductor is even more of a pain to get through than crayon so I kind of like phylo typer and I think I will run with that so if you don't like that name could you let me know if you have a better idea for name let me know until we publish this to crayon I think we have a lot of latitude in what we call this but for now my working name for this project will be phylo typer so in the last episode I used to create underscore package function from dev tools this time around I'm going to do something slightly different mainly to kind of show you a different way to do it and so we'll do file new project and then from here I'll go ahead and do new directory and then I want a new r package and so it's going to be a package and it's going to be called phylo typer I guess we could always also drop that e it could phylo tip r that's just too confusing I don't have any source files at this point and I'm going to put it off of my desktop and I'll go ahead and have it open in a new session so we'll create that project this then relaunches r in our studio already I see a small difference between using the create underscore package function and using the dialogue through our studio and that is that it creates a hello.r script which is being stored in my r directory where we keep all of our code right so again what we saw previously we have this git ignore the r build ignore file which ignores the things that we don't want to end up in the final kind of package that goes up to cram we have a preliminary description that will modify here in a bit there's a directory with documentation that they've already created with hello.rd there's a namespace that's again we don't want to touch ourselves we want to let dev tools and use this modify namespace for us there's this r project file that keeps track of a lot of the settings that we're using and we already saw that the r scripts r code goes into the r directory very good so I want to get a variety of things set up in the rest of today's episode so that when we come back we're ready to get going with coding the first thing I'll do is use underscore git because I love using version control and think everyone should be using it too it's complaining it could not find function use git and so that is because I haven't loaded dev tools so one of the tricks that I showed in the last episode is that we can create a dot r profile file that automatically loads dev tools whenever we start our project so we can instead use use this use underscore dev tools and so this use this colon colon is a is a construction that we'll be seeing as we go about programming in our package this allows me to use the use dev tools function from the use this package without doing library use this so this is opening up my global dot r profile file this is the r profile file that I have in my home directory that tells our which crann repository to use to install packages from this isn't where I want to put it I want to put it within my project directory and so to do that I'm going to go ahead and create a new text file I'll call it dot our profile so that's there and I now see it here as well and I'm going to go ahead and copy the code I guess it had already copied it for me it said and I could paste it so what this means is that it will require dev tools require is a lot like library except if it's not installed it'll go ahead and install it for you suppress messages means that it'll make it do it quietly okay so I'll go ahead and save this and now whenever I restart r those changes will take effect so for this session I'll go ahead and do library dev tools and next time we go through this it'll automatically load it okay where were we oh yeah use get so do use underscore get and again if I open this up a little bit bigger there's eight committed files is it okay to commit them I agree so I'll do number one again remember it usually gives us these options in different orders with different titles so you have to you have to be paying attention to what you're doing so again it created a commit with a message initial commit if I come over to my git window and press refresh I see that everything has been committed and again if I look at the history window I see that it's got one commit already logged I now want to get this up to github and so I'm going to again use use underscore github and I could leave this empty I'm going to go ahead for now and put this into my riffamonis account so if you're doing it on your own parallel to me go ahead and don't give any arguments to use github I'm going to do organization riffamonis it will then go through a dialogue to set this all up is it okay to commit them it made a change to the description file that is okay and now here we go we now have our github repositories all set up for our filotyper package awesome we're out there right that's really cool now we have to do some stuff right so there's two things that this page makes me think about number one is that I need to put in a license number two I need to put in a readme file so let's start with the license and we'll go ahead and do use underscore mit license and that's a function again with no arguments that then creates a license and a license md and it puts the full license markdown file this one that name into our dot rbuild ignore file because when we submit this crayon doesn't want the full text of the license they have a much simpler version of the license which is this here we could go ahead and commit these changes of course so I'll go ahead and do that now and then I'll say use mit license commit that close it so the other thing I want to do is the readme file so we'll go ahead and do use readme underscore rmd this also then is creating the rmd file and it's putting that rmd file into my rbuild ignore it creates this default readme file and it's got all sorts of good stuff in there that we'll come back and play with later but for now what I want to do is I want to build the markdown file right so I'll do build readme again no arguments and so now we see that we have both readme files here this readme will then show up on our repository when we go back into github so that's looking great I'm going to go back to my description file now and we'll go ahead and edit this to make it a little bit more informative about what's going on and I'll say implementation of tools for classifying DNA sequences so I'm noticing that the author and maintainer information here is a little bit different than what we had with the regex site from the previous episode let me go ahead and pull that up so I can show you what I mean I think I'm going to go ahead and grab this instead so I'll go ahead and grab these and hopefully it won't cause any problems and it's also grabbed an older version right so we'll talk about version numbers later but this is a very much more preliminary version number than what we had previously I'll go ahead and change my email address to my professional email pshlausatumich.edu I've got my orchid id that I'll pop in here and then a description use four spaces when okay so I'll go ahead and do package for classification based analysis of DNA sequences primarily implements naive vasian classifier from the ribosomal database project cool all right so that's good enough for now everything else there looks good I'll go ahead and close these files let me go ahead and run document to update all the documentation that there might be so these changes are all related to some preliminary documentation so I'm going to go ahead and stage these and commit them as um preliminary effort to write documentation okay commit that so the other thing that I want to do is go ahead and set up the the infrastructure the skeleton for tests and so to do that we'll use use test that and so now we see that we have a tests directory we have this test that test that our script that we should not be modifying so I'll go ahead and close that and then we have our test that our test that directory right within tests that is currently empty because we haven't created anything um that we want to test right so let's see what changed in the description file um yeah it's added the test that stuff okay so that's all good again I don't know that I need to run document again because we changed the description file but so it updated the file type or documentation okay I guess it doesn't hurt to try right the other thing that we can do is run uh load all so load all will load everything all the r scripts in the r directory into the current session that works without any errors basically we're looking for our errors and then we can do check this is where I ran into problems last time this goes through without an issue which I'm happy to see so I'm getting one error and I think that error is with test that because I don't have any tests so it's failing on that so I think what I'll do is go ahead and do use test and the file in here is hello so I'll go ahead and do use test hello this then creates tests test that test hello right and then I can do test and it tests it it pasts right and so now if I retry the check so that got rid of the error I do have one note saying malform description field should contain one or more complete sentences so let me go back up to my description file if I need two senses this package primarily oh we will likely include other methods of classification and possibly some methods of visualizing the data okay so we're going to save that let's try this check again so the documentation in the r packages book suggests running check numerous times frequently running check because you don't want these problems to fester and so although it's a note notes may not be the kiss of death for submitting to crayon but we want to get those notes resolved as best we can again I think this documentation is really going to evolve as we go through so I'm not totally worried about at this point sure enough that took care of the note and now I don't have to worry about it okay so now we've got that all saved I'm going to go ahead and add these changes to my repository these were built around tests so I'll go ahead and do create testing infrastructure using test that there's another r-based testing infrastructure besides test that which I think is our unit but test that I think is becoming a lot more popular these days so I'll go ahead and commit that close it and then we'll go ahead and push this to the repository refreshing I now see that we've got a lot more stuff in here and we have sure enough our readme file that has been converted and you'll see this this plot is automatically generated and so this is looking good as a skeleton for our our project filo typer again if you like that name let me know down below in the comments if you don't like that name let me know down below in the comments and definitely give me some suggestions it's not too soon to change the name but it's kind of growing on me so as I go through this I hope you're also thinking about your own work and packages that you might go ahead and want to create and so if you can develop your package in parallel to my package I think that would be really awesome and that would really help you to reinforce the concepts and the techniques that I'm going to be talking about in this series of episodes all right take care and we'll see you next time for another episode of code club