 Welcome back to the Riffamonis Reproducible Research Tutorial series. I hope you're able to make it to the last tutorial where we added an R script to our driver file to generate a figure with an ordination plot. I'm constantly amazed by the growing number of resources that are available in both R and Python to make our data analyses more reproducible. One of these new resources is R Markdown. This is a blend of Markdown, which we've already discussed when talking about documentation, and R code, which we discussed in the previous tutorial. R Markdown has revolutionized how my research group approaches scientific publisher. Emerging from the Python ecosystem, there's a similar tool called Jupyter Notebooks. Both R Markdown and Jupyter are examples of literate programming in which native text is blended with computer code. The results are pretty marvelous. Have you ever had to update numbers in a manuscript after changing a parameter, using a different statistical analysis, or adding more data? It's painful. More than one time, I've had errors slip into my manuscripts because I forgot to update all of the numbers. And tables with dozens of values? Ah, what a pain. Literate programming removes that pain. If you look at my papers from my research group that are posted on GitHub, you can find R Markdown documents that contain R code behind any summary statistic or p-value. The tables are generated by R code. Furthermore, combined with the idea of a driver script, which we've been developing in this tutorial series, not only can you find the code for the summary statistic, but you can track it all the way back to a raw data file. If you remember back to the introduction for this series, I mentioned an April Fool's joke we played introducing a write.paper function into mother. Well, R Markdown is basically that function. I can't wait to show you how to write research reports and manuscripts using R Markdown. Join me now in opening the slides for today's tutorial, which you can find within the reproducible research tutorial series on the riffamotus.org website. Before we get going and discussing literate programming with R Markdown, I have a brief pop quiz for you that will hopefully jog your memory in how we can work with R from our pipeline. So the question is, how would you run a function? So for example, we saw before plotNMDS from our R script, which is also called plotNMDS.R at the bash shell prompt. Take a couple moments and see if you can look back through your notes or if you remember off the top of your head how you would go ahead and run this function from this R script without first going into R. So hopefully that jogged your memory and you had some recollection of doing this in the previous tutorial. But recall that we can use r-e to execute a string of commands from the bash command line. And so that needs to be, those commands need to be embedded within single or double quotes they need to match. And the individual commands can be separated with a semicolon. So here, bound within double quotes, we have two commands. We have a source function and a plotNMDS function. The source function loads the code from code plotNMDS.R. That file name is wrapped in single quotes. And then we run plotNMDS and that takes a .axis file that is also wrapped in single quotes. And that's a long file name. So again, if we run this from the bash command prompt, we don't have to go into R, but the code will still get run and then it will quit out of R. And so this is really nice if we want to be able to automate our pipeline because we can run R code without actually having to manually go into R. So for today's tutorial, I hope to help you learn how to express a manuscript as an extension of a programmatic analysis to really see that typing and writing our text, our narrative can be fused with programmatic analyses. And this is what we call literate programming. Within R Markdown, they use code chunks and inline code to insert information into the text. So we'll talk about how to do that. And we'll also implement advanced R Markdown features, including citations, figures and tables to make it more of a polished manuscript. And then we'll talk about YAML and YAML material that can be used to impact the format of your output. So looking at this paragraph that you've already seen and copied into your main readme file, this is the paragraph called Scaling Up from the Kozitch analysis. You'll notice that there's a variety of numbers in here that we had to calculate somewhere, right? So down here, there's an error rate of 0.07% for the two mock communities, another 0.01% for the curation, and we had 14,094 sequences. All of these numbers would need to be updated if we changed the data set, if we changed how we calculated error, if we changed any steps in the pipeline. And if you look at this one paragraph, there's maybe a dozen or so different numbers that are being generated elsewhere, right? And so here in these red rectangles are those numbers that we had to calculate somewhere, right? So whether it's the number of samples, a citation number, numbers of sequences, numbers of reads per sample, and so forth, p-values, that we had to generate somewhere and then insert into the text. Similarly, here's a table from that same paper where there's maybe 100 or so different numbers that if we, again, changed the pipeline or if we added another run of data, we would have to update this table. And that could just be a royal pain because you're sure to introduce all sorts of errors in doing that. And so to think about what goes into a paper beyond the narrative, we have things like counts, the numbers of things, we have calculated values, we have p-values, we have references, figure numbers, the actual figures and tables themselves. Each of these can be added and addressed into a paper, into a manuscript using literate programming. Literate programming is the idea that we can merge code with text generation and formatting. Literate programming is developed and championed by a world-renowned computer scientist named Donald Knuth. And currently there are several modern options. So as I mentioned in my introductory remarks, there is Jupyter and R Markdown from the Python and R environments, respectively. From these types of literate programming, we can get many types of output, whether it's a Markdown plain text file, PDF, a Word.x file, or HTML code for rendering on a website. Thinking about R Markdown, there's a lot of applications that have come out of the ability to use R Markdown in the packages R Markdown and Nitter, which are two packages we'll use that work together. People have used R Markdown to write books, create slide decks, blogs, and interactive websites. It's been really powerful. I liken it to a cookbook, that if you go into Amazon and look at the reviews for a random cookbook, invariably you'll find reviews that say, this cookbook described how to make these types of cookies, but the cookies taste awful. Well, think about a slide deck where you're teaching R or you're showing somebody your code and your figure. It's like giving someone the cookie and the recipe used to make the cookie, right? Well, we can have the plot or the p-value and the code right there with it to know how those values were calculated or how those images were generated. I have taught entire semester-long courses using slide decks built using R Markdown. And why we're here discussing this today is that I write now manuscripts entirely in R Markdown. And again, if we think about this paragraph and the various values that we might want to calculate using R, we can now look at this example of R Markdown. Not all of the numbers have been converted to R Markdown, but you might look over here and you'll see these back ticks with R and then R code embedded. This R code, when it's rendered by Knitter and R Markdown together, will spit out a number telling the reader how many pairs of reads there were per sample. And so this is going to be pretty foreign to you right now, but by the end of this tutorial hopefully you'll understand what's going on here. So hopefully you recall that we previously used Markdown in the paper airplane example and in our readme files for providing documentation. R Markdown is the idea that we can take that Markdown and embed R code into it to generate text, tables, and figures. The R Markdown package from R uses the Knitter package and other goodies to convert R Markdown files into a variety of formats. And so you can see the schematic from R Studio where you write in R Markdown use Knitter to convert that to Markdown and then a program called Pandoc will convert that Markdown into a variety of formats. This whole pipeline is fairly opaque to you. All you need to worry about is writing the R Markdown, setting what the output you want it to be, and you'll get it. And that is all done using the programs Knitter and R Markdown. So the outputs we can get include Markdown for basic text. And again, if we were to look at this on a site like GitHub where it automatically converts Markdown to HTML, that would serve our purposes. Alternatively, we can generate HTML-based websites. This is really nice for lab reports and affords you limitless formatting options using HTML, CSS, JavaScript if you want. You can also write, have things outputted as a DocX file. This is great for manuscripts. We find that it's easier for our collaborators to work with the DocX file than say the R Markdown file. There is a bit of a limited formatting issue. You can format using a template.x file where in that template file you provide the formatting that you want. Things like what font do you use, what size, double spacing, line numbering, things like that. We can also output PDFs. I find that this is my preference for manuscripts, but I find that it's harder for my collaborators to work with because they tend to want to get into the text and mess with it and do things like track changes and all the things they're used to doing. But there is then limitless formatting via LaTeX. And so you don't need to know LaTeX to generate a PDF, but it does then allow you to have more options in formatting. There are various helpers within the new project directory template that we'll talk about later. And so again, there's a range of outputs for papers. You need to find what works best for you and your collaborators in terms of the output. So we can do all of this through RStudio. They also have a nice notebook-like interface and they have various helpers to make things easier. But we're not going to use RStudio. I'm sorry. Because we're trying to emphasize the importance of having a workflow that's automated and can run without our intervention. Having to then go into a graphical interface kind of limits that. I'm also assuming that you'll likely want to run everything on a cluster and may not be able to run a graphical interface on your server. I know that there's a huge barrier to using a graphical user interface on my local Flux cluster or even going on to Amazon. So what we're going to do is to return to our Kozic analysis. We're first going to log into our instance and start FileZilla so it'll be easier to transfer files back and forth and see what's being generated. Previously in the tutorial on organization, you copied the scaling up paragraph into your readme file. We're going to move that now to our submission practiceRMD file. So this practiceRMD file is going to be a new file that will generate. And we'll use for practicing various aspects of using our markdown. So I've been able to log into my instance and I'm going to go ahead and move to my Kozic reanalysis AM 2013 directory. And we're in the right place. So again, I'm going to open my readme file and I see my scaling up paragraph here. I'm going to copy that, scroll X out, and I said nano submission practice.RMD. And that gets me my scaling up. And I forget if my readme also had the figure legend, it doesn't look like it. So I'm going to open back up submission practice.RMD. And I'm going to copy the figure legend for figure four. So again, this is a bit artificial because rarely would we want to go back and add the R code. You certainly can, but it's much easier to write the R code as you're actually writing the paper. Okay, we'll save this and exit out. Next, we'd like to go ahead and look at our .rprofile file. So if we do ls, you'll notice we don't see a rprofile file, but if you do ls-a, in here, the bottom right, we have a .rprofile file. This is the file that is loaded whenever we run r. Now we talked about in the last tutorial that we don't want to put a lot of stuff into this .rprofile file. So let's see what's in there. So we'll do nano.rprofile, and it says library nitter, library rmarkdown. And then it sets paths for where to put things from nitter and from rmarkdown. And it helps to normalize our path and to get things in the right place. So everything looks good here. One other thing I'll add is that the AWS instance you have has nitter and rmarkdown already installed. So you don't need to install packages for rmarkdown or nitter. We'll go ahead and reopen our submissionpractice.rmd file, and we're going to start the process of converting this into an rmarkdown document. At a minimum to make it an rmarkdown document, we need to add what's called a yaml header. So a yaml header is denoted by having three hyphens on the first line, some information, and then three hyphens to close the yaml header. In between the three hyphens, we need to add information that tells rmarkdown and nitter and pandock eventually what exactly needs to be done. So in here we'll put output colon space HTML document. So our yaml headers will get a little bit more complicated as we go along, but it's important to note that we never want to put a tab in our yaml header. Any kind of justification along the left side, left margin, we want to do with spaces. So we'll go ahead and save and quit this. And so then from our command line, we can render this doing r-e, quote, render, single quote, submission-practice.rmd, single quote, parentheses, close quote. Remember we don't need to put our markdown or library nitter in here because that's automatically loaded as part of our profile file. So we'll run this and it runs pretty quickly. And we see output created practice.html. So I need to connect to my AWS instance in FileZilla and I need to update my IP address and I'm going to get that from my AWS instance. Copy it to the clipboard, come back over here, paste that in, connect. Yes, I'll trust them, okay. Cosetree analysis, submission-practice.html, opening that up. I see I now have a HTML formatted version of the scaling up file. That's great. So at a basic level, we see that it works. So let's add a title, date, and author to our yaml. So I'm going to reopen my submission-practice file and I'm going to add title. And I'll say, reproducing Kozic et al. Author, I'm going to put my name in, you put your name in, patch loss. Date, and I'm going to say April 24th, 2018. And I'll go ahead and save this and quit. And then back out in my, in Bash, I'm going to render this and I can then go to FileZilla. I can refresh and refresh here. Maybe I need to double click again on it. Reopen local file. Discard local file, then download and edit file anew. Okay, so now we see we have a title. Lost the all. My name and my date. So it nicely puts in this information that we added into the yaml header. So I'll return to my RMD file. And if we look down in here, we see that there's a couple numbers here. The 4.3 million pairs of sequence reads from the 16S RNA gene with an average coverage of 9,913 pairs of reads per sample. With 95% of the samples had more than 2,454 pairs of sequences. What I'd like to do is to automate those calculations in my RMD file. So to do this, we'll need to create what's called a code chunk. And we'll also need to make use of what's called inline code. A code chunk is denoted in R markdown by three back ticks, a curly brace, an R and a curly brace. And then the code chunk ends with another set of three back ticks. So if you have a bunch of paragraphs in your paper, you can have code chunks scattered throughout your document. Because our document is pretty small here, we're only going to have one code chunk. And so because this isn't an R course, I'm going to copy the notes, the code, I'm sorry, from the slide deck into the code chunk here. And so we see, I'm going to add some spacing here so we can kind of differentiate what's going on. So we've got our shared file name. We're reading in our shared file with the R code. We're counting the number of sequences in each row. And then we're getting the sequence counts in millions by counting up the number of sequences divided by a million, the average sequence counts using the mean function, and then the percentile sequence counts here in this final line of the code chunk. So this is the code that's going to be used later in the paragraph to embed into our sentence. And so this 4.3 million is going to come from this million sequence counts. And so these three variables hold information that's going to go into this paragraph using what I'm calling inline code. All right, so 4.3 million pairs. I'm going to remove that 4.3 and put in a back tick, R, and I have another back tick. And so this means between the back ticks we're going to put R code that's going to be run and inserted directly into the sentence. So I'm going to say million sequence underscore counts. Similarly, down here we have average coverage. I'm going to replace this with R, back tick, R. And inside the back ticks, I'm going to put average sequence counts. And then in here, the 95% of samples had more than 2,454 pairs of sequences. I'm going to do, again, back tick, R, back tick. And inside the back ticks, I'm going to put percentile sequence counts. All right, so see how we did that? We have a code chunk up here that reads in the file. It does the calculations. And then down here in the narrative part of the text, we can call on these variables that were defined up here in the code chunk to be inserted in directly. Now, we could have put everything in this code chunk directly into the inline code, but that would get really painful to read and difficult to edit later. And so, again, we can use this code chunk to define variables that we then insert into the text. So I'm going to go ahead and save this and re-render. But average sequence counts not found. So I'm going to check that out again. And I misspelled sequence up here in my code chunk. Save that, re-render. Everything worked. I'll come over to FileZilla, refresh, reopen. I'm going to discard and then download the file anew. Aha! And so now what we see is that we have our code chunk. And then in this sentence, where we put our R code, we see we generated 3.86 million pairs of sequence reads, with an average covered of 1.07, blah, blah, blah, times 10 to the 4 pairs of reads per sample. Okay? So the information is here, but the formatting leaves a little bit to be desired. So how well did we reproduce the previous results? Well, the numbers are pretty close. The Kozich paper originally was not generated using an R Markdown document. They're pretty close. There were two data sets available, so it's possible we grabbed the wrong data set or that I used a different date. There was one data set that was the with metagenomics and one was without the metagenomics. We may have grabbed the wrong one that I used originally. We could go back and check that out. There have also been software changes to mother over the years. That might cause some things. There's also some randomness at different steps in the pipeline. And it's also possible that the total number of sequences was really the number of raw sequences. And what we have here is the number of curated sequences that made it through our pipeline. And so these are all things that would have been nice if we would have documented it the first time around. We have since gone back and regenerated this paper using reproducible practices. And the numbers are pretty close. And the gist of the story doesn't change. So this is really cool, right? There's still a bunch of things we'd like to do with this before we'd submit it to a journal or even share it with a collaborator. And so let's just think about what we've done first. We've created a code chunk. That reads in the file and it does some processing of the data. When we submit it, we probably don't want to see the code. We'd like to hide that. But for our PI or select collaborators, we might want to show them the code so that we have proof of our methods and they can see what we've done. The inline code, the number of significant digits is a bit over the top. So what we're going to first do is look at how we can get rid of seeing this code chunk. We still want it to run, but we don't want to see it. So if we go back into our markdown document, inside our curly braces after the R, we can write echo equals false. What echo equals false will do it will not echo the code. So echo equals true is what we currently have where it echoes back the code to the screen. So we quit out. We re-render. And now the code chunk is gone, but the results of our R are still in the sentence. So there's some other code chunk options that are useful. So include tells whether the code and results should be in the output. It will still run the code. Echo tells whether the code but not the results should be in the output. It will still run the code. Message true or false tells whether messages that are generated by the code should be in the output. And then warning true or false tells whether warning messages that are generated by the code should be in the output. In the slide deck, there's a link to many other options that are available for how to format figures, do things like caching, make dependencies between code chunks, but again for most of what I do in my papers I'm mainly using echo message and warning in my code chunks. One of the things that we tend not to think about when we use r a lot is how to format text or to format numbers. Normally we're happy to pull a p-value, pull a summary statistic out of r but with our markdown we really want to pull it into another document so we need to think about how we format things. So returning to submission practice, if I scroll down to where I embedded my r code I probably want to round this number that we had as output and we can do this with a function called round and format and so I can say format, round, million sequences, million sequence counts and so for rounding I'm going to say a round to one significant digits and then for format I'm going to say n small equals 1l close parentheses and what this will do is this will return a single digit or a number with a single one significant digit nothing to the right of the decimal point so if we save render open that up so we see that we've got one significant digit to the right of the decimal point 3.9 which is what we were hoping for so that's a lot tidier. We can do the same type of thing we can do the same type of thing with the other numbers that we generated so if we go back to nano and we look at our average sequence counts we can do the same thing where we round it to one significant digit format round comma 1 comma n small equals 1l close parentheses and we'll do the same thing here with our percentile where we'll do format round 1 1l quit render and so we see much nicer formatting so we have 10,735.1 pairs as the average instead of you know something in scientific notation and then 2,788.9 pairs of sequences so in this sentence we had three numbers 3.9 10 735.1 2788.9 that were all generated from r if we return to our r code and we we look down at this we might notice that this isn't very dry that we have the same format round and then the number and the same parameters over and over again and so what we'd like to do let's perhaps remove this so we don't have to repeat it but that our text can be formatted or numbers can be formatted the same way every time we're outputting text what we'll do is we'll go up and we'll create another code chunk and i'm gonna we can name our code chunk so i'm going to name this one nitter settings and i'm going to say eval equals true so we're going to evaluate what's in here our echo is going to be false and we're going to say cache equals false let me close that with three back ticks and in here i'm going to set several chunk options so we can globally set our chunk options so i'm going to say ops chunk dollar sign set parentheses tidy equals true and i'm going to copy this several times so i don't have to keep typing so we'll do tidy equals true tidy refers to how the code is formatted echo equals false we've already seen fault echo where it outputs the code chunk eval equals true warning equals false and cache equals false you might want to run this a couple times with warning equals true you might want to run once with warning equals true just to make sure you're not getting any problematic warning or error messages when you run your analysis so if you look at the slide deck there's a function in the slide deck called inline hook that i'm going to copy and paste into here and this is a function that is run every time it does an inline code chunk okay wherever you see that back tick r and then r code it runs this function on the contents within those back ticks and so you'll you'll see that there's information in here on what to do some of this we don't need to worry about is it a list unlist it and spit it out as a vector but if it's numeric then it does different things so it'll format it with commas and no no special digits for formatting if it's not scientific or if it's not an integer it will again use commas and it'll use one significant digit and it won't use scientific notation so this first one is saying is it an integer if it's an integer don't use a decimal if it's not an integer then use one significant digit and so what this allows us to do then is to go back into our code our inline code and to reformat the text the other thing is that this code chunk would be a great place to load any libraries that we had to run so we could source any utilities or we could library any packages that we're using in here and so again we can come down to our text and we can remove these format round function calls from here and here so we can save this and exit out render it and we see that we have the same formatting of our text but the text is much more dry some other things we'll want to deal with include figures and tables references and then other output formats and so that's what we're going to spend the rest of this tutorial discussing my preference in working with figures in a manuscript is to not generate the figure within my rmd file well if you're using an rmd file as a notebook like we saw in the meadow at all paper then that makes sense to leave it in there but normally when i'm submitting a manuscript i have to submit the figures separate from the text and so and also sometimes the figures require a fair amount of code to generate and i like to encapsulate that away into a separate file but we can still see the figures show up in our rmd in our output from our rmd file and and that is using our markdown code so let's go back into our practiced rmd file and i'm going to come to the bottom where i have my figure legend and under that we can use markdown to insert an image so i'm going to and the syntax is type and then i'll talk so the syntax is as you see here and that we use an exclamation point square bracket and inside the square bracket is the description if you're familiar with html this is like the alt attribute for an image tag and then we give the path to get to the figure all right and so i'm going to change my description to be figure one and then the path to figure is going to be a little bit weird because we need to give it a relative path to our submission directory so we're going to do dot dot slash results forward figures forward nmds underscore figure dot png and if we save this and render hopefully that figure will now be inserted into the output of our rmd file and sure enough there is our figure now i could get rid of the text inside the description and that figure one will go away i don't really want to see that and again it's gone now okay so that's how i tend to insert figures if i'm going to put figures into a manuscript again when you submit a manuscript frequently you're not going to put the figures in with the figure legends but if you're doing it for a a report to give to your pi or a draft then it's nice to be able to put the figures in with the manuscript so we're going to go ahead and add a silly table to our our markdown document and i'm going to put this at the end because normally i have tables at the end of my manuscripts and because this isn't a tutorial on r necessarily i'm going to go ahead and create a code chunk and copy the code over from the slide deck so there's my my code chunk frame and i'll add this as a table one is the name of my code chunk the names aren't critical sometimes it's helpful for debugging where things go wrong because it'll give you an error message of of where where problems happened so i'm going to go ahead and copy and paste this code from the slide decks over in and you can see what's perhaps what's happening is that we make a column for days number of samples per days mean samples per day per sample the total sequences per days and then we have a loop that goes through and counts these things but then we're we get to the table is that it it makes a data frame that has four columns unique days so what are the days of the study the mean sequences per day per sample so how many sequences do we have per day per sample the total sequences by day and then the number of samples we have by day this then gets us into cable which is that package that function i'm sorry in our markdown that will build a nice table for us we give it our name of our data frame next comes digits which is the number of numbers to the right of the decimal point that we're going to use in formatting our table we're not going to use any row names our column names are going to be day seeks per sample mean br is a bit of html to put in a line break sequences per day total and number of samples finally we tell cable to align our columns to be the center and and so just to show you a little we can we can give this as a string where each letter in the string tells you whether tells knitter where where whether it should be center right left center whatever um where you want the number positioned horizontally within the table so if we save this we render it we then load it from file zilla and we see that we get a data frame so if you recall we made the first column left formatted center right center okay and also we said zero digits to the right for the day one one to the right for column two zero and zero for the other two columns as well so we can put in some nice formatting and again this is all html and if you know some css you could format this to look um a bit more like a bit more polish or a bit more like you would like it to look but again this is a bit of a silly example telling us for each day of the mouse study how many sequences we have per sample the total number of sequences across all of our samples for that day and the number of samples that we have so you see that you know on day 302 we had one sample but on day zero we had 12 samples from our 12 different mice so again this is how we can put a figure and now a table into our our markdown document the next thing that we want to be able to do is to put references inside of our R code and one of the things that we need to tell it is what type of formatting we want to use and so there's a variety that will work with our markdown word and latech and so we can find these in what's called the csl github repository which you can find in the slide deck a link to we commonly use style guides for the asm journals and there's a copy of that called mbio.csl in the directory csl so if we um submission you'll see here there's mbio.csl and so we can see the csl github repository we can see the csl repository by going to the github repository citation style languages slash styles and you'll see in here any number of journals represented and you can search for the one you want because there's so many of them up here so if we did I spelled American wrong American nope that's right and so oh we see one for applied environmental microbiology or a variety of the asm journals in here so if we wanted applied environmental microbiology we would copy this file and then pop this into our submission directory and we would name it and so this is what we've done but with the mbio.csl and returning to our markdown document we'll need to make a couple of changes to our we'll add to our yaml header we'll say csl mbio.csl and then we also need to tell it where our bibliography is so we'll do bibliography dot bibliography equals references dot bib and we'll save that but now we need to create references.bib and we'll also have to insert our references into our text here so we'll quit out of that and we will create a reference file we'll do submission references sorry nano submission references dot bib and by default we put in here the mother paper so Schloss 2009 and this is the type of formatting that we need to represent our our papers and so I want to get two out of here and so one is the original paper that these data were taken from and so I will go to PubMed and the authors on that were Schloss Schubert, Zacular and that's this and I'm going to copy this doi number and there's a nice tool called doitobib.org and we can give it a doi a PMCID or an archive ID and so if I hit enter on that I get nice bibtech formatted material so I can copy that and I can go back in here and I can paste that in and so now I've got these two and the other file that I want is the other paper sorry that I want is if I go back to PubMed is Clayton and MKUJC and this is the theta yc and so we need to get the doi number so I'll click on the paper now there's the doi and so I can click this doi and go back to doitobib so I can copy that now into my references.bib file and now I have three files or three papers represented in my bib file so to cite these in our manuscript you now want to go back into submission practice rmd and we want to replace this 18 with our reference so we'll do square bracket at Schloss 2012 and this is coming from our references file so if you scroll down till we had Schloss 2012 this is the name that we're going to cite Schloss 2012 similarly if we wanted to use um cite the theta yc this u 2001 would be what we'd cite so let's go ahead and put that in and scrolling to the bottom we have this 28 we're going to see that that was at u 2001 I'll quit that and we can then render it this runs through we then open up this and we see u clayton has a two and the original mouse study has a one and if we scroll to the bottom we see our references so maybe we'd want to say we'd want to add said nano submission at the bottom we could put in and double pound references so we have references and then list of our references and that's pretty sweet right so we can do references can do tables we can put figures into our our markdown documents so as I mentioned before there's a lot of different output formats that we can get from our our markdown documents and this is all going to be set in that yaml header material we've already seen how we can output html as I mentioned we can output markdown we can output word document files pdf's and many others each of these formats have many many many options that are constantly being updated and added so I'd encourage you if you're interested in that to check on those click on those links and to go to the our markdown site where they have really great documentation on how you can customize the output further i'm going to give just brief description of how we can do different things we've already seen with html using output html document with markdown we can say output equals md document we can also do a word document so word underscore document it's a little bit beyond what i want to do right now but we have the ability to generate a template to customize the formatting it's can be a little bit limiting there might be ways to figure out line numbers but it's it's not trivial it's problems with some problems with table formatting bulleted lists and no forced line breaks so but at the same time people you're working with are going to want a word file so what we generally do is we generate the word file knowing that we're going to send it to collaborators we then manually format it to get the fonts and the spacing and everything the way we want it and we send them the word file that maybe takes five minutes to to kind of take what we get as a word file out of knitter to convert into what we we want to send to our collaborators PDF is also a powerful tool you but you first have to have a full installation of tech on your system to render it as a PDF i think it render offers the best possible output look but there's a bit of a learning curve and there's a variety of things that you're going to have to add using letek we have some help things built into the new project template that we're going to be using here in a minute we can also generate output that's a combination of various types PDF word markdown whatever you want perhaps one collaborator wants a PDF one wants a word document you know go crazy i'll often render a PDF and a word version for me and my collaborators respectively because i like working with a PDF i like the way it looks it feels good to me of course others want a word document so they can do track changes i still like to get a pen and kind of mark things up manually so we also have this new project rmd template that was in there you'll notice in the submission directory there is a file called manuscript rmd and we will render that as we did for practice rmd and so hopefully you'll see what's similar and what we're going to do is to copy what we had from that scaling up paragraph into the results section of our manuscript and we're going to go ahead and re-render what we had there so again if we do ls submission we see down here that we have manuscript rmd and i should add that tech comes pre-installed with this amazon image and like i said earlier if you want to render to pdf you have to install tech on your own and there's instructions on how to do that on the rm the rmarkdown website that's linked through those slide notes so let's open up manuscript rmd and we see that there's some built-in yaml material here for you there's also this set of um knitter settings for you that's already here and we've got a running title space name is for your authors this is the cover page you could customize this the way you want there's an abstract um various sections of the paper so if you were writing a paper you could take this template and you could populate all these sections and then we've got figure legends references and so forth so what we're going to do is i'm going to copy the material from practice into manuscript rmd so the submission practice i'm going to use cat to output submission practice rmd and this will make it easier for me to copy and paste so i'm going to grab this code chunk all the way down to here copy nano submission manuscript rmd and i'm going to throw this in my results section and then i'm going to take my figure four and my figure copy that and put this down at the bottom and replace this here great save that back out so again what we've done is we've taken our results are scaling up paragraph put it into the results section of of this manuscript rmd file i'm going to save it and now to render it again we can do r-e quote render submission manuscript rmd single quote parentheses double quote we run that it will generate manuscript pdf for us and go to go to file zilla we see that we now have manuscript pdf we'll download this and we see we get out name of study we could update this stuff if we wanted of course and so here in our results and discussion scaling up paragraph that has again the code we added in our inline code and again here's our figure this is a png file so it's the quality is pretty poor but you know we could update that and fix that as well right so again we've seen how to output a html file as well as a pdf file and looks pretty nice we've got line numbers i've got that cover page that we could update and and fix a bit to get it ready for submission so as a closing note we can also put code into our header material so here's a way to put whatever today's date is in as the date we're using our assist time to get the time the date that it will then go into the date for our report we can also feed in parameters like a function treating the our markdown document like a function where we might say have grade reports or something like that that we're doing for a class or some monthly report that we're doing for kind of updating a project we can run that by feeding it specific data files to generate kind of reproducible report reports where the underlying data is changing and it's then reflected in the output into the report we've really done a lot today with our markdown documents and thinking about how we can use our markdown documents to make research reports as well as manuscripts that embed code into our science narrative of course what we're going to want to do before we quit is to commit as well as logout of aws but before you do i would really encourage you to edit your scaling up code in paragraph further to automate the printing of another number or two from that paragraph you might also see about generating a figure using the data in the shared file that we imported in that code chunk another question for you to think about if you have code in a say a utility script called code utilities.r or something else like that where would you run that source command to load the functions from that utility.r file i feel strongly that our markdown and other literate programming tools are a huge step forward in improving the transparency and reproducibility of any analysis not only can someone rerun the code to generate the paper but they can see the code used to generate the numbers in the manuscript within the context of the actual text describing the relevance of the results we saw our markdown documents before when we looked at the meadow paper published in the journal microbiome if you look back at that file you notice they used our markdown to generate what you might consider a research notebook type of document that's a great way of walking people through an analysis in a less rigid manner than we typically see in a manuscript my preference is to also make those type of documents but to use them more in a data exploration phase of a project and to use the tools we talked about today to prepare a polished manuscript that's ready for submission next time we'll go all in and actually write our own write.paper function that allows us to start from an empty directory and end with a pdf version of our manuscript we'll be using a tool called make that will allow us to make our driver script more sophisticated and will allow us to restart the pipeline at any step in the workflow talk to you next time