 If you're like most biologists, when you sit down to finally write up your research findings, you're gonna fire up Microsoft Word and start typing. Well, today I'm gonna show you a different way using our markdown that will really enhance the reproducibility of your research. Hey folks, happy new year. Here's hoping that 2021 is better than 2020. I'm Pat Schloss and this is Code Club. In each episode of Code Club, I try to apply principles of reproducible research to an interesting biological question. Right before a break, I summarized all of the work that we had done in the previous six months or so. We had talked about the resolution of an Amplicon sequence variant or an operational taxonomic unit and what was the perhaps the best resolution to use as a biological unit for inference, right? There's probably no perfect unit, but how can we balance the trade-off between splitting a genome into multiple bins or perhaps lumping together multiple species into a common bin? So we outlined it, we talked about journals that we might wanna go to and we talked about the format of the manuscript that we might wanna do. So we're thinking about a short format paper, maybe around 1200, 1500 words with two figures to summarize everything that we had done. Well, now we're ready to write that up and convert that outline to actual text. How will we do it? Well, we're not gonna write it directly into Microsoft Word. We're gonna use a really cool package from R called R Markdown that will allow us to output our rendered document either to HTML, Microsoft Word format or my favorite, a PDF. So let's start talking about how we're gonna write our paper. R Markdown has been a complete game changer for my research group. For the past four or so years, all of the manuscripts that we have written have been originally written in R Markdown documents and the output of those documents is what we then submit to the journal. Why do we do that? Well, you can imagine having a table of values, right? Perhaps there's 20 or 50 or 100 values in that table and say you change something upstream in your data analysis pipeline. Well, you now have a new set of say 100 values or 20 values even that you need to update in your table. What a pain in the ass, right? We've already talked about in previous episodes how there's an R package called Cable that allows us to generate tables. Well, wouldn't it be cool if you could put that code into a script or some type of file so that you don't have to update that table manually? Well, that's exactly what R Markdown does. It allows us to render tables based on data coming into an R Markdown document. Even more important, however, is that I can embed R code within a sentence. And so if you see something that says like the average number of RRN copies per genus of Escherichia is seven, you can know that that seven wasn't written out as a seven. It was written out as R code. And so again, if I change something upstream, when I render my R Markdown document, that seven, that number gets filled in. Perhaps I add some other genomes that are released in the most recent version of the database when that's ever released. Then that number will get automatically updated. I don't have to worry about manually updating all the numbers. I've done that before in all my papers before using R Markdown and it's a royal pain in the butt. And invariably, I miss something or I make typos when I'm copying and pasting in the numbers. So R Markdown allows me to do that. This is a very different way of writing a manuscript for most of you. It's gonna require a different way to think about your references. Going back even further, like it's gonna require a different way of thinking about writing. Writing in a text editor versus a word processor really will put the emphasis on the texture writing rather than the formatting. So the texture writing, it'll change how you think about inserting your references. It'll change the thinking about your editing process. Do you use Microsoft Word? Do you use Track Changes? Well, how do you take Track Changes and put them back into an R Markdown document? I'm just gonna tell you, it's not easy. And so that's what I'm really gonna be excited to share with you over the next month or so as we go through developing this manuscript and eventually submitting it to a journal. It's not bad. And I find that for all the pain points and difference from using Microsoft Word to write a paper, it's really worth it to know that I've got a reproducible workflow that takes data as it comes in and as it's updated and then updates the outputted manuscript. And I've also found that people respond really well to knowing that we use this reproducible process. So it's not easy, it's not familiar, but the more you practice with it, the more familiar it will become and the easier it will become to. And you'll really see the power of this type of approach. So let's get to it. We'll go over to our terminal. I will go to our project root directory, which I'm already in. And let me go ahead and open up my submission manuscript .rmd file that we developed in the last episode before the break. So this was the outline that we created in the last episode. And this is really written in Markdown, not R Markdown. So the difference is that R Markdown has R embedded into the Markdown. And something that will set it apart as an R Markdown document is what we call YAML material, Y-A-M-L. And it starts with three hyphens and encloses with three hyphens, okay? And so what we can then say would be output and I will then say HTML underscore document. I'm gonna save that. And I'm gonna come back to my terminal and you could do this in R Studio, but I find with my workflow it works easier for me to do it in a text editor that I like and I'm comfortable with and doing things from the command line. So I showed you a little bit about how to run R from the command line several episodes ago, but we can go ahead and do R to fire it up. I'm gonna need to do library R Markdown. You should have R Markdown installed when you install the tidyverse. And so then you can do render and then in your parentheses, we can then say in quotes submission, manuscript RMD, close parentheses and let this run and it outputs manuscript.html. I can then come to my documents and open up my submission directory where I see now I have manuscript.html and you see we have a webpage of my outline which looks pretty nice, right? It's not perfect, but who cares? Now I don't typically work in HTML when I'm working on my manuscripts. I prefer to work either in a Word file or actually as a PDF. I've got my handy dandy iPad that I like to open up my PDFs in and then manually write on things. I even like working with actual paper. So I'm one of those dinosaurs that likes to print things out and edit even better than track changes in Word. But the nice thing about R Markdown is that we have this flexibility. So you could output things as a HTML document like this. Alternatively, we could put in Word document, save that and then re-render. And again, if we come to our Finder, we now see that we have manuscript.docx. Open that up and we see that we have a Word file, right? There are ways to get the Word file formatted the way you like using something called a reference file. I haven't found using those reference files to be very satisfying, frankly. It's just a pain in the butt. So we'll output the Word file because I know that a lot of people like working in Word, but I also want to output things as a PDF. So I'll minimize that. And so to get a PDF document, I can then say PDF document, save that. And to get this to work, you need to have a package in R called Tiny Latex, Tiny Text, Tiny Text. So if you've got R Markdown, you don't need to run library on Tiny Text. It'll run for you. This is a small version of Latex that works really well with R Markdown. And so I'd strongly encourage you install Tiny Text, Tiny Text, I'm not sure how you pronounce these things. And so that works really well for rendering PDFs. So know that you have to have Tiny Text or some other installation of Latex installed to convert things to a PDF. Okay. So if I do my PDF document and then I do render, this will generate manuscript.pdf right here. And you can see I've got a PDF, right? I'm already up to two pages. Wow, look at how much I've typed. All right. And so this is a very generic format of a Latex document. You look at this and you say, oh yeah, they're showing off, they wrote that in Latex. So maybe we are showing off a little bit. So like I said, I would like to output both of the Word document as well as the PDF document. And so what I can do is I can put a new line under output and do Word document and then colon default and then PDF, colon, PDF document, colon default. And these are tabs to start the lines. I think that's important. And I can then render submission manuscript RMD. And I need to put comma output format equals all and that will then output both formats of the manuscript for me. And so you'll see here that it output output manuscript dot docs, docs, docs and manuscript PDF. And if we look at the output here, if I sort by date modified, I see that my doc X and my PDF were generated at the same time. The other thing that I'll see here is I've got manuscript dot tech as a output file that is created. Normally that doesn't quite happen that way. And that instead, what I'll need to do is add a argument to PDF document on the next line. So I'll get rid of that default and I'll add keep underscore tech true. And so that will keep it rather than deleting the file for me. So again, if I save that and I render output format all and outputs it for me. And if I look in my finder, I see that I've got those three files and instead of being 1114, it's now 1115. So everything got updated. I'm personally going to get rid of this manuscript HTML, but know that if you wanted to output HTML document, you would do HTML document default like that. And you would do that output format all and all three of these would get generated. Uez, who's the developer of our Markdown actually encourages people to work in Markdown or in HTML, something that's kind of a low format output. Because at this point, you really want to put your emphasis on writing text and not in formatting things like formatting should be the last thing you do. That being said, how things look are really important to me as I'm working on things. If something's not super attractive to me, then that's not making me feel good, right? It's not helping my writing process. So again, I like to see the PDF so I can see that I'm making progress that I'm cranking through and putting out pages. You'd be you, but know that there are different schools of thought on that. Okay. So we've got our PDF document as a type of output here. If we look at the PDF, there's a couple of things I noticed that I may or may not like. One is the font. I prefer a sans serif font. So without like the serifs, I also like double spacing and I also like one inch margin. So this actually looks like it's pretty close to a one inch margin, but we can set those things. The other thing I like to have for manuscripts are line numbers. We're really getting ahead of ourselves putting in line numbers, but it's easiest to do some of these formatting things now rather than later. The other thing I like to do is put in page breaks between my sections. Psychologically, this gives me a boost because I can go from a one or two page paper like we have here to a six page paper. I've already written six pages. Wow. Look at that. I also like to have a title page. And so what I'm going to do today is kind of set up those things with you and show you what they look like and how we go about doing it. It may seem scary to think about using LaTeX to do all this, but trust me, I know very little LaTeX. What I am showing you is the extent of my knowledge of LaTeX. And frankly, what I do most of the time is copy and paste the code from one manuscript to another to get the formatting I really like. Also know that you can easily Google a lot of these things. You could Google R Markdown double spacing PDF and you would get the solution that I'm going to share with you all today. Okay. So to get our one inch margins, I'm going to insert another line here into the YAML and do geometry. And we will say margin equals 1.0 inches. And to make things look different, maybe I'll do, let's do a half inch margin. See what that looks like. We'll render that. And then if we look at our manuscript, we see that we do have smaller margins, half inch margins here. Okay. But I want one inch margins because I think those are easiest to look at. And again, we can render that and look at the output and we see we've got our one inch margins. Excellent. The next thing I'm thinking about is my font size. I like to have a 11 point aerial or sans serif font. I know people hate aerial or Helvetica font. I like it. That's kind of what I write in. It makes me feel good to write in that format. Know that if you want to change your font, there are many ways to do that. Again, Google is your friend. So I can say font size and then I can say 11 PT for 11 point. And then to tell it what font I want to use, kind of the easiest way is to have a line called header includes. And this then is where we can insert kind of special LaTeX packages we want or special LaTeX commands we want to run. And so the first one that we will do, and here we're going to set these off again with a tab and now a hyphen to indicate the different packages. So we're going to say use package curly brace Helvet. And then we will then say renew. I can need a backslash. So renew commands command star forward slash family default. And then again, curly braces SF default. Again, I don't know what this does. I know this works to get me the sans serif font that I want. Okay. And again, I can save that if you've never run LaTeX before or use these packages before. Sometimes the tiny tech package will get these LaTeX packages for you. So know that your output might look a little bit different from mine because it's going to have to kind of install some of those things. So it didn't update it. And I see here that I've got head includes rather than header includes. So let me see if that solves the problem. It would have been nice for it to give me an error message, but it didn't. So now I get my sans serif font and things are looking good. Okay. The other thing that I like to have in my manuscripts is for them to be double spaced. And so we're going to again use package set space. And we will then do backspace double spacing. And so let's run that. And now we see we've got double spaced text. And I wonder if I do triple spacing. What will happen? Yeah, it's unhappy. It doesn't like triple spacing. Again, if you want to triple spacing or one and a half spacing, you can kind of look again via Google at how to set that, but we'll stick with double spacing and that will look good. Great. So we've we're using aerial 11 point font or Helvetica 11 point font. We've got our double spacing. The other thing I want to include our line numbers. And so we will use package left and then line no. So this is putting my line numbers on the left side. And then we then say line numbers for it to for it to insert those line numbers in the left hand margin. Again, you don't have to know much LaTeX to get what you want. You know, you can probably do a lot more sophisticated things if you knew LaTeX, but I'm not looking for big sophistication. Right. So you might say, well, how do I change the font size of the line number? I don't care. This looks good enough. And I'm ready to move on. Right. So we've got line numbers. We've got double spacing. We've got the font. And this this is looking good. Also, you'll notice that it automatically puts in page numbers for us. So I'm happy there. So the next thing I want to think about is a title. So you could go ahead and put the title in your YAML. I prefer to put it here. You'll notice from the exploratory data analysis we did that we did put the title and the author into the YAML material. I'm going to put that instead here in my manuscript. And I'm going to use a single pound. And to be a little bit cheeky, and because I don't have a better title at this point, title is probably the last thing that you should worry about writing. I'm going to do a play on the title of the paper that said Amplicon sequence variance should replace operational taxonomic units in marker gene analysis. And instead I'm going to say should not. So Amplicon sequence variance should not replace operational taxonomic units in marker gene data analysis. So that's going to be my title. One thing to make this a cover page is that I'll say New Page and that will then insert a page break between my title page and my abstract page. So again, if I render this, I'll see a big title across the top along with a page break. And so you can see I've got my title and then a page break here. And something else I would like to have in here is a running title. So ASM journals at least require a running title. And these, if you think about like a PDF outputted paper or kind of the brief title that goes at the top of every page. And so again, just because I'm not trying to come up with the perfect title or running title at this point, I want kind of a holder value. I'll put ASVs versus OTS. So then I also want to put in my name. So I'll say Patrick D Schloss. And if I had other co-authors, I'd list them there as well. And I could think about having some type of dagger or some type of symbol that I could then use to indicate to whom correspondence should be addressed. And then I will say my address, which will be Department of Microbiology and Immunology, University of Michigan, and I remember Michigan 48109. And that would be good. So a couple of questions that we might have in here is how to format these things. And so you can get a dagger symbol by using a backslash dagger. So we can then let's render that and see what this all looks like. Again, none of this matters for getting us writing. It makes us happy about how things look. So the dagger actually I think needs to be in braces with a dollar sign brace and then end in a dollar sign because that's like a math notation. Again, these are things that I've kind of picked up over time in using this type of stuff. So outputted it. Let's see what it looks like. And so we see we get that dagger. So that looks good to get it to be super script inside here. I think I want to put a carrot, which will get it to be a super scripted dagger next to my name. And if we look here, we see that we've got the super script dagger next to patch loss. And the only thing I notice is that my address is on many lines. I can change that by putting two spaces at the end of each of the lines in my address. We run that. Yeah. Why is it getting rid of that? I wonder. I guess if I put those on separate lines, these are kind of the quirks of working with Markdown. And so there then it puts my address on separate lines. This is all compressed. Again, we're kind of into the weeds of things that don't matter a whole lot, but we can change the spacing between our different lines with V space. And so I'll say like 10 millimeters between the title and the running title. And then maybe here as well. And then maybe I can play around with some of this. Let me put this up here. And again, none of this stuff is particularly important. And let's see what this looks like. So we kind of see that we've got spacing that looks pretty decent. Maybe I'll put 20 millimeters. There are two centimeters. The other thing I'll put towards the bottom then is I'll say in bold, I'll put observation format. And that then will kind of cap off my title page knowing that the title and running title are not by any means finalized. So it looks good. Maybe I'll just put a colon at the end of this. Oh, it's so easy to fucks with stuff when it doesn't really matter. All right. So let's render that and then leave it alone. And again, that looks pretty decent. And you could play with all sorts of formatting on here. But let me just tell you that no journal is going to care how your cover page is formatted. It's because I care, right? And it's how I want it to look because I'm going to be working in this environment a lot. Something else that maybe I should add to my contact would be my email address. So pshloss at umich.edu. And in a special way to do this so that it's clickable within a PDF is to use some special text. So backslash href with the curly brace. And then to say mail to colon that and then and then put my email address in curly braces. And the benefit of that then is that if we look at the output that we can we see that it's now clickable and that this would then go to if I clicked on this, it would then email to to me. All right. So that's all good. The other thing that I like to do to give my manuscript some structure is to put these new pages on separate lines for between different sections to break up the text in the separate sections. So again, I'm going to then do that between each of my sections to give the manuscript a little bit of structure and spacing. And again, if I render this and then look at the output, walla, I have a eight page paper. Look, we're just cranking out text. If only we look at the page number and not the actual contents of the pages. Okay. So again, this is the seed, the nucleus of our manuscript and it's in good shape. And now we're really ready to kind of pay attention to writing the text and writing out our story. Okay. So I'm going to close that. I'm going to go ahead and quit out of R. But one thing I need to get here is my render statement. And I'm going to make a rule in my make file. And so I will come to the bottom then. And I'm going to create a rule for submission manuscript dot PDF. And then also one for dot dot X. And it's going to be based on submission manuscript dot RMD. So that's the prerequisite that we have so far. The only dependency I'm then going to put in the instructions, which will be our hyphen E and then in quotes. So the hyphen E says execute what follows in quotes in R. And so this is kind of like our script, except we're not we're not putting this in a script where we're rendering it directly from the command line in R. And I'll show you what this looks like. But we're going to do R hyphen E. We can then paste in that render command. And you'll notice that we've got conflicting quotes. So I'm going to put this in single quotes. And then I will do let's go back to our terminal and do make submission manuscript dot PDF. It says it's up to date. So maybe I need to just kick this to get it to trigger again. And so it runs and it says could not find function render. And if you recall, the first thing we did when we opened up R was to do library R markdown. And so we need to add that here to do library R markdown close quotes, close parentheses and then a semicolon. We save that we run that. Then this is going to regenerate all of our files for us. And if we look at submission, we now see that we've got the PDF file, the tech file and the doc X file, all generated for us. Something I'm wondering about. We close this old version of the manuscript. If I open up this word version, if it's got the line breaks, because that's kind of a La Tech thing. Yeah, it has that. So it doesn't represent it doesn't respect our spacing or line numbers or things like that, because that's all from La Tech rather than from word. So again, if you used one of those reference files, it basically put the formatting you want into the reference file, and then you use word, you generate the word file using that reference file, you would then get the spacing and the font and the line numbers and all that stuff. So we don't have that in the word version here. Invariably what I do is I because I like working with the PDFs, I don't worry about the word file at all until I'm ready to submit. And in fact, I'm not. So if I submit to like an ASM journal, they don't need a word file from the beginning. They'll take a PDF version. And so I'll give them a PDF version. And then if they like it and they accept it and they then need a revised version as we always need revisions, I'll then generate a doc X file at the very end. And then I'll go through and manually do the updated formatting so it looks halfway decent. But again, we're a ways away from needing to worry about that. So we really just worry about generating the PDF. Okay. So again, hopefully today gets you a little bit comfortable familiar with using our markdown. We'd seen it previously when generating those exploratory analysis scripts. It's going to be largely the same as we go forward now. I like looking at the PDF because it looks like a manuscript to me, right? And I can see it grow and I can see it develop at the same time. Don't get so overwhelmed by formatting that you lose track that the real job is writing text and pounding out a paper that is good and that you can then go back and revise and then worry about formatting. Papers are not rejected because of formatting. Typically they're rejected because they don't tell a story or the science isn't convincing. So that's what we're going to start talking about in the next episode. So play around with our markdown. See if you can perhaps use it with your own projects. Please also be sure to tell your friends about Code Club and what we're doing here. Beyond even this project, I have some really exciting things in plan for February and beyond. So be sure that your friends know about this. It's great to have something like this come out regularly to reinforce your new year's resolutions, to learn more data analysis skills, and to improve your own reproducible research practices. All right, so until next time, see you again for another episode of Code Club.