 All right. Well, thank you very much, everyone. Today I'm here to talk about our Markdown driven development. So essentially, I think many of us in the open science and open data communities already know a lot of the value of working in a code notebook, like our Markdown or Jupiter, and a lot of the values that that can bring in terms of literate programming, where we're both code and narrative elements of an analysis and creating a wide variety of different interesting analytical up. However, the thing I'm particularly passionate about about our Markdown is its ability to be a prototyping engine for more sustainable analytical tools. I didn't intend that any time that you've created an analysis in our Markdown, you've actually gone while on your way to creating a really mature analytical product that's custom fit for your domain specific workflow. And what I want to talk about today is steps of how to turn this latent and implicit tool into an explicit one. And this works is by the time you've completed a single analysis. If you think about it, you've already done many of the really, really hard parts of building a package. For example, on the design side, you've understood user requirements, crafted a same workflow, and by virtue of your own analysis, have a complete and really compelling example. On the development front, you've created the right tools to help you along the way, and figured out how to make them play nicely together, and you have working and hopefully well tested code. So what I want to talk about is five main steps to what I call our Markdown driven development for how to take the your open data analysis and turn it into a product that supports more reusability and accessibility. As I go through this process, that this can have many different outcomes, including a single file, a project or package, and there's no superiority or better or worse on this spectrum. It simply matters how you expect your users to take advantage of your work in the future. But first, let's start at the beginning. The first step to refactoring a code notebook is simply to remove the elements that probably should have never been there in the first place. You can do things like hard coded variables, plain text passwords, local file paths, and unused pieces of code. For hard coded variables, one trick I like to use in our Markdown is converting variables into parameters. So for example, if I'm generating a report month over month, I might at some point have to filter my data set for a specific date range. It might be very fragile because I might forget to change it or change it inconsistently throughout my document. Instead, I can use parameters to turn my whole analysis and my whole our Markdown script into one mega function, where I am highlighting all of the variables that I might want to change at the beginning to promote consistency. This is also a great way to protect my credentials because with parameters I can also insert dummy values and and only supply the real values at runtime. So I never have to store passwords or other API keys or secret material in plain text. We want to remove local file paths, because this ensures that our project will work on no one's computer but our own, and probably not even our own, should we ever rearrange or reorganize our file. One gold standard is using relative file paths, reference, describing the location of your files, only within the working directory. Using R in particular, you can get one step better by using the here package, which makes file pads that are especially resilient across different operating systems and platforms. The final thing we want to remove as hard as it can be is sometimes we have to kill our darlings. No matter how fun some of the different rabbit holes we may have gone down in our analysis were. We have code in our scripts that we didn't actually use and doesn't actually add to that final analysis you're delivering. It's probably time to remove those and clean things up. So, once we only have what we want in our markdown, our next step is thinking about where we want it to be by rearranging chunks. Creative and literate programming is great at capturing our thought processes, but thought processes are rarely linear, and as a result, neither is the code that we wrote. And some characteristics I use to make this transition are letting infrastructure and heavy lifting computing chunks come up to the top of my document. And letting database data wrangling check code that makes plots and tables and helps communicate. Think down to the bottom. This has multiple benefits. And it helps us clearly define our dependencies, because things like reading in data and external files and loading in libraries are naturally at the top of our scripts. This code is also since it's doing the most of the work, the most likely to break, and it can be nice to front load our errors. So if we fail, we fail early. By letting some of the more communication pieces seem to the bottom, we're more likely to notice repetitive code that we're using for processing different types of plots and tables and analytical views, which can help us reduce duplication in the next step. And finally, this also helps us to consolidate more of the narrative pieces of our code. So for working with less technical collaborators, it's still easy for them to get in and help figure out exactly where they need to edit to edit your manuscript. Now, one exciting thing about doing this rearranging isn't also a great way to start thinking about adding more than interface aspect to your single file. For example, two tricks you can use within the RStudio IDE are naming your code chunks and commenting within a code chunk with four little dashes after each comment. This creates the nice table of contents that you can see at the bottom of the screen to help users easily jump between pieces of your report and find the right places in the script that they need to jump to automatically. This serves as kind of like a great additional incentive for the code commenting that we all know that we should probably be doing anyway. So as I mentioned before, one benefit of that rearranging is we can start to better get to know our own code and get to find patterns within it. And the next thing we can do with those patterns is reduce a lot of the duplicate code that we may have in our code notebook by defining functions. So on the left hand side of this example, you might see in some exploratory data analysis, I might have just started making scatter plots to understand the relationship between different variables. However, this has a couple of problems. If I want to say change the point size, I'd have to go in and change multiple pieces of code. Also, it's not very literate, and that my intent in what I'm doing isn't totally clear. On the right hand side instead, I define a single scatter plot function and call it three times. Now I could only change my code one place if I needed to for my own use. It'd be easier for a collaborator to borrow that function to use for a different purpose if they wanted to. And even better, in the exploratory data analysis section, you can really read in and for my intent very clearly that I'm trying to make those three scatter plots. Again, returning to the concept of how we build more of a user interface for our tool. We can then think about commenting and documenting our functions, even within the single or markdown script, in the same way you would foreign our package. So as you can see in this example, we can use the our oxygen package to create a code documenting skeleton, similar to what you'd use inside of a package to define each expected input and output of our functions. Since this is the type of documentation that our users are very, very used to looking at this can go a long way into helping them learn and engage with pieces of your code. So at this point, we've gotten to relatively clean code, and we can start also thinking about code style and navigability, and they're great tools for this in our, including the linter and styler packages, which can help you check your code against a configurable style guide, and either call out violations of that style guide, as you see on the right hand side, or in the case of the styler package, actually even go ahead and fix those code style issues on your path. And besides code style, there's also writing style to think about. And for that, we can use this spelling package to help catch typos in the narrative pieces of our of our report, just to keep things as absolutely clean and polished as possible. So at this stage of our markdown driven development, where we've still only worked in one single file, and we have a very sustainable, well, well engineered file. And it's so for some use cases, this may actually be the perfect place to stop. If you're in a low resource environment, without a portable without a like formal version control system or repository single files can be the most easy to share between operators to understand the differences between versions by simply running a decline in the terminal, and easy to automate with or refresh with a single click of a button. However, they can sometimes be lengthy and somewhat intimidating for new users to inherit and enable some anti patterns because they don't give us all the flexibility we need. So if we're looking for something a little more extensible, we may want to consider breaking apart our file into a more robust analytical project. So the key here is to define a standardized file structure and think about how we can break out different components of our markdown document into different folders in that project. For example, the functions I talked about defining for could go in a scripts or function folder. The data that data that we're using for analysis could go in a raw data folder, and any subsequent data artifacts that are being created from some of those computation chunks can be saved in an output folder. The analysis as a whole can live in an analysis folder, and it's cetera with other types of documentation or external files you might collect along the way. Just as a slightly more technical view of this, we can return to the exploratory data analysis example that I talked about before. So instead of using different chunks to define my function and then call it, I can actually save that function definition in a separate R script, and simply reference that and ask R to go execute that code for me by using the source function. As shown at the top that I can sign. One place I particularly like to use this trick myself is when working with data, instead of directly having databases being queried or API is being called from my or markdown document. We can have a script that reads data from a database and saves it to a raw data folder. We have another script that reads in that data, and doesn't some of our like wrangling model fitting analysis pieces, and saves those intermediate data artifacts, and we only read those into the our markdown. This is very beneficial, because I, it eliminates the direct link and the direct dependency between some of those data stores that we don't necessarily have control over, and our final or markdown report. For example, if I find a typo in my or markdown, I can refresh and rerun that document without having to worry about if my database or my API is currently online and accessible. Again, at the project stage, there are a lot more optimizations that we can put into play. We talked about this concept of having an interface and defining that standard file structure in an organizational setting can have really huge benefits, because different collaborators as they move between different projects can easily have an intuitive sense of where to navigate in a project to find certain types of material, again like data sets or function definitions. And there are many, and there are many great R packages to help you do this, such as use this project template and starters. And similarly for dependency management, it's a great time to think about adding in the RM package. The benefits that we've achieved by taking this extra step are making our project a lot more flexible and more extensible to different use cases, and easier to steal and borrow small components of like a single function for other use. However, at the same time, in this approach, we've still captured a lot of the initial context of our analysis that we're really still focusing on the one specific instance of a problem that we were thinking about. However, this can sometimes muddy the line between analysis and general purpose code, and make it hard to fully extend or generalize our work. And we haven't fully taken advantage of our all of ours developer tools. With all of these concerns, we can think about taking the final step of turning our project into a package. And the magic of our packages is it's really simple is saving files in the right place. So, for example, our scripts folder, those functions could move into an R function folder. And when the R markdown itself can serve multiple purposes, both as a template embedded in the package that helps interactively walk users through the code, as well as the vignette that as at a high level excites users about learning your package by giving them a really interesting and enticing example. Similarly, even the data that we used along the way, if it's a properly protected anonymize could go into the package as example data. And all that's missing is documentation, which if you recall we actually already wrote a few steps go, and unit tests, which we probably should have actually talked about far earlier in the process. If we want to actually trust any of our analysis, even when it's in a single file. So many great R packages to help you do these last few steps. Use this can help make sure you get all those project assets in the right folders. DevTools can help automatically document the documentation in your man folder from the R oxygen test that provides a very friendly user interface for writing tests. And with package down, you can even go one step further and create a very user friendly and navigable website to help other people learn how to use your analytical product. Now we've really completed the spectrum of different types of outputs packages of course are the sort of like formal mechanism for distributing code and tools at scale. And it's something other coding users are really comfortable learning how to learn. So those are some great advantages. However, we've now looks abstracted very far away from your specific analysis to a very general class of problems, which sometimes can be great, but sometimes may actually be a bridge too far. If you're you think future users will be more focused on reproducing your work than extending it. So today, you may have different goals, but regardless of what that is, following the steps of our markdown driven development can help you get a lot closer to making a sustainable and empathetic data product, both for yourself, and for anyone who wants to use it in the future and benefit from all of your labor. So, thank you very much for your time today. I'm happy to take questions on slack. And I'd also mentioned I have to blog post up on this topic. If you're interested in learning more. And finally, thank you so much to the organizers of CSP comp. This has been a phenomenal experience and a great conference despite all of the challenges going virtual. Thank you very much. Thank you.