 Okay, welcome everybody. Our next talk is by Karthik Ram, Reproducible Notebooks with Whole Punch. This is also recorded, Karthik will be available to answer some questions during the chat as well as at the end of this talk. Thank you. Hi there. I'm Karthik Ram from the University of California at Berkeley. And today I want to talk to you about reproducible notebooks with a package that I have developed called Whole Punch. So let's imagine you've completed a project and you've shared all of the code on GitHub. Let's also imagine for a second that the code runs and not just on your machine that a reader is able to clone your repository and then render all of the outputs from your notebooks. Great. Well, this holds true over time. Quite likely not. And there are many reasons why this can happen. But a few common ones are that others don't have access to your data. Perhaps you've made some API calls and those APIs are now unreachable. But the more likely reason is that the dependencies have changed over time. So the R package ecosystem is continuously evolving. Functions can change their behavior or simply become deprecated. And someone else with the exact same set of packages but different versions might not be able to run your code and get the exact same output. So there are two things you can do to alleviate this. One is to carefully go through your R code, your R Markdown files, and then document all of your dependencies. And then the next thing you want to do is maybe create a Docker container that contains the exact versions of all of the packages that you used in your analysis. Both of these are very time consuming, frankly. Not something everyone has time to do for every project. But there are ways to make this a little bit easier. So I'd like to introduce you to the concept of a research compendium. And the original idea proposed by Robert Gentleman and Duncan Tefal Lang was that you can ship a collection of data code and text together as a compendium, which can then be easily shared, managed, and updated. If you're interested in learning more about research compendium, I gave a longer talk on this topic at the 2019 RStudio conference. And I've linked to that talk in the slide. It turns out that R package structure is ideally suited for a compendium. And this is possible because R packages contain a file called the description file. It's composed of simple key value pairs in a file format called the Debian control format. And if you'd like to turn a collection of code into a compendium, the simplest thing you can do is add a description file similar to the one that you've seen in many R packages. It does not have to be comprehensive. It just needs to have a few required fields. And so in this example, I have a package of type compendium. I've named it. I've given it a version number. I've listed a few dependencies, including one that is available only on GitHub. The nice thing is that with this file, someone can easily install all of these dependencies using the DevTools package. But so far, we have only listed the dependencies, but not the exact versions of the dependencies. And this is where Binder comes in. Binder is an open source project that makes it really easy to share analyses that you have in any type of notebook. It's worth noting that Binder is an open source project. And my Binder is an instance of that project that can be many instances of Binder. And if you enable Binder on a collection of code and add a badge to your GitHub repository, anyone can click that badge and then be dropped into an RStudio server on your browser. All of the dependencies have already been installed and the code is ready to run. There's actually no local installation required. It's really that simple. So to recap, a badge on your readme launches a new instance of Binder. Binder then looks for a recent Docker image. And if it's not able to find one, it takes several minutes to build a new one. And then once fully launched, it drops you into an RStudio server with everything ready to go. So how do you set up Binder for your R project? There's a few different ways to do this, but I believe the simplest way to do this is with the whole punch package. And the workflow for this is very simple. You start by loading the whole punch library. And then you would write two files, a description file and a Docker file. You'll then add a badge to your readme. At this point, you will commit and push your code to GitHub. And then finally ask Binder to build a Docker image from the Docker file. So the workflow is write a description, write a Docker file, generate a badge, and build on Binder. And these map very nicely to four functions for the whole punch package. And I'll walk through them right now. So you'll start by writing a description. You can just stick to the defaults, but you might want to name your package, describe it, and add a version number. You then want to create a Docker file. And some of the things you might want to change are the maintainer. Everything else you can leave to defaults. But it's interesting to note that whole punch will pick an appropriate Docker image to start from. And it chooses one that already comes with RStudio server, Jupyter Hub elements that are necessary for Binder, and the tidyverse to really speed things along. Whole punch will also look at the date where you last modified an R script or an R markdown file in your project and pop that date in here. And this date will map onto the version of R that is used in your Docker file. You can, of course, overwrite this by popping in any date that you'd like. And of course, the last step is to then generate a badge. The hub option here defaults to my Binder, which, as I said, is an instance of Binder. You can leave it at this default or swap it out for any one of many other Binder hubs that are publicly available. And the very last step after this is to build the Binder. This is optional because clicking the badge also runs this step. So those are the steps to going from a collection of R code on GitHub to having a live executable notebook that anybody can run. It's great for showcasing small examples from your paper, code examples for teaching, and also to show off use cases and tutorials. There are some limitations of Binder, though. It only has one gigabyte of RAM, so you're not going to run computationally intensive examples. It will time out after 10 minutes of being inactive. And you still have to find a way to have all of your data read in at the analysis time. So you can either commit small data to GitHub or read data in from elsewhere. So Binder is a really nice way for you to get more visibility for your work. And if you're interested in learning more, I've put in links to the documentation and links to the slides. Thank you very much. Thank you. So we have a few questions. The first question is, will writing a Docker file make launching Binder instances faster? Packages like Tidyverse are installed from source on Linux and take forever to install. Thank you. Yes, it will actually make it go faster. And that's part of the reason that I wrote the whole bunch package. You're actually welcome to skip whole bunch entirely and pop your GitHub URL into Binder, but it's going to try and build, as you say, all of the Tidyverse from source. So the particular image that I launched from already includes all of the Tidyverse, all of the Tidyverse dependencies, RStudio server, Binder and Binder dependencies. So at most it's going to install a few extra packages and really speed things up. And build Binder does this sort of asynchronously, so it won't hold up your R terminal. You can just continue doing other work. And when it's ready, it'll just launch the server for you. And then the other question that we had is does whole bunch apply to any R projects or only R packages? So you would normally not do this for R packages because you're not trying to create a virtual instance of your R package for someone to run through. So you would actually do this for code examples. So I don't know of anybody that's turned on Binder for a package directly, but if you've got examples and a GitHub repo, that's where you would turn this on. And then my repository already has a Docker file for Ipython notebooks. Should I create a separate Docker file for running the RMDSs? So this is an interesting question. I'm not sure if your Docker file can be modified to install some Binder elements and make sure you install the RStudio elements. Then it's probably fine to use the same Docker file. I would say that you can have multiple Docker files in the same repo and let Binder only look at one and leave a second Docker file for, say, local work or other work that you might do. So the package installs all of the Binder-specific configuration in a hidden folder called dot Binder, which will not interfere with your standard Docker file. That's the only issue. It is possible that with just a standard Docker file without all the other elements, you might end up on Binder, but it's not able to run RStudio. Wonderful. So thank you so much, Karthik. This is a really cool package. And we're going to move to the next session. Thanks.