 I'm super excited to tell you about significant digits in R, said nobody ever, until me, because that's what we're going to talk about in today's episode of Code Club. Hey friends, welcome back for another episode of Code Club. Maybe I overstated my enthusiasm about significant digits in R, but as we saw in the last episode of Code Club, when we do a calculation, we might get out to, say, eight significant digits, which really overstates the precision and accuracy of a lot of our calculations. That many significant digits is, again, really overstated for when we're inserting those numbers into a sentence, like we might when we have inline code going into an R markdown document. So in today's episode, what we're going to do is we're going to use the format function to shorten up those significant digits so that we are using significant digits or numbers of significant digits that are a bit more reasonable. We'll also dig in to that third paragraph of the results section to insert some more values and see if we can't move this manuscript a little bit further along. Here we are in our project root directory. I'm going to go ahead and fire up RStudio by opening up my Rproge file. We've seen this so many times over the past, I don't know, 50 or so episodes. I'm going to go to my manuscript submission and then manuscript.rmd file. Here we are back in the cozy confines of our R markdown document. So in the last episode, we took on this first paragraph of the results section to insert values. Moving along, I realized in the second paragraph, what we do is we say, what threshold do you need to see one ASV or one OTU per genome? Because again, the problem with ASVs is that we then tend to split a single genome into multiple different bins. So what threshold, if we're going to cluster ASVs by a distance threshold into OTUs, what threshold would give us a single OTU back? I did that in exploratory data analysis several episodes ago, but what I realized was we only did one iteration of that. And so in the next episode, I'm going to come back and we're going to do 100 or so iterations of that so we can get some greater confidence in the numbers that we have here. So this third paragraph, though, was looking at the output of our receiver operator characteristic curve. And you may recall that we did a whole bunch of episodes on this. And the final episode we did on that topic was looking at the thresholds where we could get a balance between sensitivity and specificity, as well as the point on that receiver operator characteristic curve that is closest to perfect classification. So I've got the values plugged in here. And what we want to do is we want to see, remind ourselves, how we did this from the last episode, to plug in the values that we had from our previous analysis. So to create the code chunk, remember we do the three back ticks with the curly braces and then an R inside that, and we close out the code chunk with three back ticks. I'm going to go back to our exploratory directory and the very last thing we looked at was that rock curve analysis. So I'm going to fire that up and you'll see that we basically plotted the sensitivity specificity values on the rock curve for looking at a 3% distance and again looking at the balance as well as the closest point to the perfect classification. So what we will do is we need to read in our sensitivity specificity. So I'm going to copy this into my manuscript.RMD into this code chunk and we will then also have, let's see, there's one called balance. So I've already done this work, no need to retype it and paste it in and distance, copy and paste that in and voila, we're good. And if I come to the top of this, actually, you know what I can do? I think I can run all of the code chunks to this code chunk, right? And so that's one of the nice things about our studio is that if I click this downward arrow with the line under it, if I click that, it'll run all the chunks above that point. Maybe we can see what's going on. And so let's take some moment or two to run, we're good. And then we can go ahead and run this chunk by hitting the play button there. That runs that. And so then if I look at balance, I get the balance data frame. If I look at distance, I get the distance data frame. Awesome. So again, the value of keeping track of all of our code in those exploratory data analyses is that we can reuse them elsewhere. All right, so here we have, first identify the thresholds or the sensitivity and specificity are most similar to each other. That's the balance situation. The best thresholds were blah, blah, blah, blah, blah. Okay, so we're going to do balance. So if we look at balance again, we get this data frame. I'm not so concerned about the min diff or the sensitivity, specificity. I want to know the threshold. And so what we will do is I will do balance v19. And that is going to be balance, filter, region v19. V19. And I will then do poll threshold. And so then if we look at balance v19, we get 0.06, which is what we had here, right? Okay, so copy and paste this down for our four regions, v4, v3, 4, v4, 5, v4, 3, 4, 4, 5. v4, 3, 4, 4, 5. Get those all generated. And then I can come down into my text and using that inline R code, I can do R balance v19. And again, for amount of copying and pasting later, we can then update these v4, 3, 4, and 4, 5. And then also, I think this should be, I'm missing a back tick here. Yep, that's good. And I think this should be v3-v4 and v4-v5. So that looks good. And we can do the same thing with distance, where I will again copy these down and we should be able to get the same thing, right? So if I replace this with distance, bam, bam, bam, distance, distance, distance. And again, if we run all these values, double check here, 0.055, distance, v4, 0.035, great. And if we do that, then we come down and again, we can do back tick R distance, v19, back tick, and I'm gonna copy these again, right? And then we need to update the label, v4, 3, 4, and 4, 5. And again, I'll put those hyphens in between the v4, v5, and v3, v4. We'll do a whole bunch of editing later before we're all set and done and ready to submit. So I'll save this and I'm gonna come back out here to my terminal where I can then do make submission-manuscript.rmpdf, it reminds me, I need to add a dependency. So if I come back to RStudio here, this file that I read in, if this file gets updated, data processed, rndbroc.tsv, if that file gets updated, I wanna regenerate the output of this. So let me fire up Adam here, okay? And I come to my make file and come back down to my rmd documents, rmd, yeah, here we go. So I can paste that in here. And again, whenever rndbroc.tsv gets updated, that will trigger make to regenerate either the pdf or the docx version of the file. And nothing's changed, so everything's up to date. Let's go ahead and see what this looks like, submission-manuscript.pdf, let me make this bigger so you can see it nicely. And let's see, where are we down in this paragraph? I forgot to turn off my echo equals false, somewhat equal to false. But we see that in here, we've got our values, 0.045, 0.04, and we're all in good shape. So we need to turn off that echo. Let me come back to RStudio and I can do echo equals false. And actually, you know what, I'm gonna take this off because what I'd like to show you is how we can set it so that every code chunk is echo equals false. And so I will remove the echo equals false from this code chunk as well. And what we can do is up here in this first block, I can do opts chunk $set echo equals false. And what this does is that for all of the code chunks, except for this first one, because you've got to get into this to run it, right? So for all the other code chunks, it will automatically set echo to false, okay? And so again, you saw down here in this code chunk, there's no echo and in this other code chunk, there's also no echo. So I'll go ahead and save that. We will make it and then we will see. I hope that our code chunk isn't echoed. And let's see what happened up here. And we can see that, yeah, our code chunks have been hidden, right? Good. So again, when we add another code chunk, we don't have to worry about echo equals false. When we go ahead and put in that other code chunk for this middle paragraph of the results section of our paper. Excellent. Okay. One thing that I don't really like about our output is that you'll notice that this is 0.04, which is one significant digit. And then 0.045, which is two significant digits. And then back up here in this section, we have one, two, three, four, five, six, seven, eight significant digits for some of these numbers. And the other thing I notice is that when I output an integer integer that's greater than 1,000, I'd like to have a comma, right? So 15 comma, six, one, four genomes. Just makes things look a little bit nicer. Yeah, you know, ASM journals have great copy editors. So if I didn't put in a comma, they would. But, you know, I want the reviewers to think that I did a professional job writing this and that I cared enough about this to put in a comma and then I cared enough to trim my numbers to a proper number of significant digits. So how do we do that? Well, that's the bulk of what I wanna be talking about with you today. What we will do is we will come back up to the top of our markdown document. And I'm gonna create a function called inline hook. And this is gonna take a value x, it's gonna be a function, and it's gonna take a value x, right? And then we'll have those curly braces. So I could do knitter underscore hooks, dollar sign set, inline equals inline hook. And what this means is that if knitter, which is the engine behind our markdown, encounters one of those, you know, sets of single backticks with the R in it, it fires up my inline hook function right here, right? And so that will then automatically run this function on x and x is the value that is generated by that inline code, right? So what I could do would say paste, and I could, let's do paste zero. So paste zero, glues together, sticks together, bits of text, and so we could do star star, comma x, star star, and so that will put pairs of stars, which is bold in markdown around a number. So if I save this, let's make sure this works. We'll make this and ah, it's complaining, knitter hooks not found. So I had knitter hooks and what I meant was knit hooks. So we'll rerun this and it got past that first code chunk. So I think we're in good shape. We run this and we fire this up. And then what we see is that all of our information that came from inline cold code is now bolded, okay? So it worked, right? But I don't necessarily want my inline numbers or values to be bolded. That's not really what I wanna do here. I wanna be able to format them. So I'm gonna, we might save this because it might be nice to know what was from inline code and what wasn't. So what we can do is there's a function that we've seen something like. So we've seen if else before, at least I think I've talked about that in a previous episode. We can also use an if block. So if, and then we say is.numeric x. So if x is a numeric value, then we're gonna wanna output it different ways, right? Otherwise we will then just have x, right? And so I'll say formatted equals x. And maybe what I'll do, so we can keep that bolding is output formatted in bolded, right? So what are we gonna do here for is numeric? Okay, so now we need to step back a little bit and talk about format function. So let me clear my screen here to get it to look a little bit cleaner. And let's say we have a value 3.196, right? And let's do format x. So format will take x and basically put quotes around it. It makes it a character string, but it's far more powerful than that. So what I can, I can give it several different arguments, one being digits, right? So if I say digits equals two, that then returns 3.2, right? But what digits is supposed to do is return two significant digits to the right of the decimal point, right? And so that's not two digits to the right of the decimal point. It's truncating the zero, right? So it should be 3.20. What we can do is then we can say n small equals two. So this is how many decimal places at least should be represented to the right of the decimal point. And so this means that we will have two values at least to the right of the decimal point and we will have two significant digits. So now we get 3.20, right? Let's think about if we did format 0.03 digits equals two n small equals two. What do you think we'll get? Make a guess, make a prediction. We get 0.03, it's a little bit surprising. If we did, yeah, so if we do 0.35 we get 0.035 and so I think if we did 0.03 with digits equals three n small equals two. There we go. If we do n small, three digits, two we get, yeah. Right, so that's rounding to two significant digits to the right of the decimal, that looks good. So we can supply a format function call like this inside of our inline hook function call. And so if is numeric, we can do format x digits equals two n small equals two. And again, we'll say formatted equals that and let's save that, let's run it and let's see what we get back. So while this is running, remember to subscribe to the channel and like this video. Click on that notifications icon so you know when the next episode drops and so you can see how this manuscript continues to develop over the course of this month. Okay, so let's see. We now see that we have 19.00 which is not ideal because those are integers, they're not fractions but we see that this was like 0.597 and now it's 0.60, right? So we've shortened those significantly. And again, we've got this weird thing going on with our integer values. If we come down and we look down here, we see that we've got 0.06, 0.045, that's less than ideal. I'm not so worried about it. Maybe what we could do is we could convert those to percentages. You know, that would make a lot of sense because here I talked about 3%, right? So let's convert those to percentages and then we maybe don't have to worry so much about it. Although I guess then we're gonna have as extra zero at the end. So let's come back down here to where we did balance and distance, where'd you go? And we can multiply these by 100 and multiply these by 100, save that, make that. I'm really more concerned about having eight significant digits than having, whether or not I have that extra significant digit at the end, looking here. We see that we have 6.00, 4.500. That's not the end of the world, it's not ideal. I'd really only rather have one significant digit to the right of the decimal for these. But this reminds me that I should go ahead and put in a percent sign. Let's see, and we can do that here. Here, and here, and there, there, there, and there. And maybe I should say best distance thresholds. Blah, blah, blah, distance threshold. Best distance thresholds. So that looks pretty good. Again, I'm not totally wild about having that extra significant digit there, but for now, I'm not totally sure how I might go ahead and change that to only be one significant digit here, but two significant digits back up here. Anyway, so what I'd like to move on to now is thinking about these integers. So some of these integers have two zeros after them. And then also I'd like to put in the comma there for my thousands. Coming back to our studio, I'm back up to the top here where I had my inline hook. I can set up another if statement. So I can say if. And so what I wanna do is find a way for something to be seen as an integer. And so what I can do is I can say if, say the absolute value, let's do x minus round x, right? And so round x. So if we did round of pi, we'd get three, right? If you did round pi, say two, you'd get 3.14, okay? So if we do round x and say that is less than a value that's called dot machine epsilon, and that's gonna be double x. So this is the smallest number that if you add it to one, the value is one, right? So is that equal to one? So it's the smallest value that if you add it, it's slightly greater than one, right? So if I, let's test this. If I put that like that, that's false. But if I divide this by two, then it's true, right? So it's right at the cusp of if I add machine, a value slightly smaller than machine epsilon to any number, I'll get back that number, okay? And so this difference, if this difference is less than machine epsilon, then I'm gonna treat it as an integer, right? So treat as integer else, right? Then we will treat it here like we did here. And what we'll do is we'll borrow this. And what I'm gonna say is digits equals zero, no and small. And I can then do big.mark equals comma. And so that means at the 1,000th place, put in a comma. And I don't have any values in the 1,000s that are floating point numbers. But I'll go ahead and put big mark there as well. And that looks good. So let's go ahead and run this and see what this does to those numbers, the number of species and the number of genomes in our database that we reported in our manuscript, as well as the number of copies, right? So for some of those, they were getting those extra zeroes at the end of them. So we come down to our paragraph and we see we've got 15 comma, 614, 4 comma, 774. It looks like I typed it, right? And that I inserted that comma intentionally. I also don't have crazy significant digits for 19 or one or any of these other integer values. That looks good. One last thing that kind of annoys me is that these bacterial names are not italicized. So let's see if we can italicize those by coming down here. And I think I got those by doing dollar sign species. So let's look here. Where did we put in the species name? So right here, for example, whatever this was, I'm gonna go ahead and put a single star on either side. And so interestingly, if you have three stars on either side, it's bold italics. So we already are bolding it, but if we add an extra star, then it'll be bold italics, which will be okay. And then before we submit the manuscript, we'll go ahead and turn off that extra bolding. But where is the other one? Looking for those dollar sign species. So right here, here. So that looks good. Between that and this one also, right? And looking down through here, for example, that and similarly an E. coli genome, I had that. So this one up here was the mycobacterium. This is the E. coli for comparison. And there's another E. coli here. And so those two won't be bolded because they weren't generated by that inline code. Also up here in my intro, I had an example with where I talked about staphylococcus. So I'll go ahead and start those. And one thing that I'm not quite sure how to do, I'm sure there's a way to do it. But you know, we use mycobacterium twice in that paragraph. The second time, we probably should abbreviate to M period something, but then it wouldn't be reproducible. I'm sure there's a way to count the number of times you've seen mycobacterium to that point, but that's way too advanced for really what I'm trying to do here and really for what I do for my manuscript. So let's go ahead and make that. And I think we'll be in good shape having inserted the code into that third paragraph, formatted our numbers to look nicely and we'll be ready to take on that final, the middle paragraph of the introduction of the results section. And so we can see mycobacterium tuberculosis, metabacillus leralis, those are italicized. They look good, E. coli is italicized here as well. Everything looks really nice. Before I go, I just can't let this pass. Or I've got 6%, 4.50%, 6%, 4%. It's just too messy, too wonky. And it's doing this 6% without the zeros because it's treating it as an integer rather than as a decimal. I'm probably gonna have to format these things myself, right? And you know what, maybe what I'll do instead of writing format over and over again is I'm gonna make a little function that I'll put up here that I'll call format percent. We'll do, yeah, format, PCT, function X, and then digits equals one, right? And then I can then say format X, digits equals digits, or I'll say, I'll say mydigits. Mydigits equals one, and that equals one is the default. So that'll be mydigits and then n small equals mydigits, right? And let me see if I save that. Load that into this session and I take format percent and I fly back down here to where I had those percentages. Where was that? So right here, I can do format percent balance v19, and let's see what this looks like. So that's 0.06, that was not right. And that's probably because I never re-ran all this with the times 100. And let me see if I re-run this. That gives me 6.0. So I think that will work. Good. So we will now replicate that. So we'll do format percent format PCT around balance v4 and then format percent balance v34. And same here and same down here. So this is not as elegant a solution as inline code hook, but sometimes functional is more important than elegant. And I couldn't let you go thinking that I thought that was the way to do things. So we'll save that. Let's go ahead and make that and see if we get the formatting rendered properly here. If one of my graduates since her postdocs gave me that, I wouldn't let it fly. So I shouldn't let it fly myself. And if we zoom in, yeah, now we see that our distance thresholds have one significant digit to the right, which is right. And that our numbers up here still have two significant digits to the right of the decimal. And one thing to note again, is that these numbers still are bolded because the output of my format percent still comes up into the inline code hook, right? And what that's doing is that if it's numeric, well, no, it's not numeric because it's coming in as a string. And therefore it's coming out as formatted and then it gets placed inside the double stars. Very good. So now we've got it in a place that I really like it and we'll take it from there in the next episode. So in the next episode, we'll finish off our results section by plugging numbers in to where I have Xs here. We also will have to create an R script. And in that episode, I'll talk about where I like to do the heavy lifting in my computation. Do I do my heavy lifting here in my R Markdown document? Or do I try to do it outside of the R Markdown document? So that's why you need to subscribe to the channel, click on that bell icon and know when the next episode comes because you'll find that exciting conclusion to that question. All right, so I will go ahead and commit this but before I do, I'll say goodbye to you. Thanks again. I know you have lots on your plate. You all are very busy and that you spend time with me working through learning R Markdown and reproducible research really means a lot to me that you all spend this much time watching me go on about significant digits and how exciting they are but more importantly, how exciting reproducible research is. So please tell your friends about Code Club would love to expand the number of people that are watching this because I think there's a lot that people can get out of them. I know you've already found a lot out of them. Why don't you share that with your friends? Anyway, till next time, keep practicing and we'll see you for another episode of Code Club.