 Hello, everybody. Our next speaker is Vincent Major, a researcher at the NYU Grossman School of Medicine. He works in predictive analytics, applying machine learning and deep learning techniques to electronic medical records. Vincent will have a pre-recorded lightning talk, but he will be around for questions afterwards. Take it away, Vincent. I'm Dr. Package Snapshots and Pack Grant. So I'm not going to spend too much time on it, but there's an increasing need for research and almost everything we do at Medicine, but also increasing expectations from journals for data and code, and also increasing use of pad computing, especially with PHI. And R has some particular challenges, particularly when we have different hard projects running on our laptops that we get pretty easy to accidentally break something by updating our particular package. So the goal of this work is to bring in a solution to maximize reproducibility without sacrificing the interactive RStudio experience that we've come to love. So there's quite a few hurdles to reproducibility, but we're going to control as many factors as we can, particularly the operating system and libraries in that system. The R version and the package versions for a particular project. If all of these three things are controlled and specified, your analysis should always be reproducible. So my recommendation is definitely is being proactive, is better than being reactive, investing some time up front to make sure you have everything organized would definitely pay off in the long run. For two years I've been using a solution using Docker package snapshots and our package called Pack Grant that manages my packages for me. So the reason we use Docker is Docker really has complete control over the operating system. It doesn't matter if we're running on my laptop or a server or AWS, I can specify a base operating system. If we sell the next like a DBN, for example, then I can install the dependencies that I need, OpenSSL, LibxML2, etc. Then I can install R, particular version, and install all of the R packages that I need. Now what this means is everything is isolated and run together as a single entity, which means I can pick up this Docker container and put it somewhere else to a different server, a different cloud platform on my own laptop and it will run identically the same. So a lot of this hard work has been done for us by a great team called Rocker and it's particularly some version stable images. What this means is for many of the last few years, R versions, for example, 3.4.1, there was a set of images exactly for that R version, which is really great. Particularly on top of the base image is one that adds the interactive R studio, as well as multiple options. It's very easy to pull this image from the Docker hub, then Docker pull Rocker R Studio 3.4.1. And then when it comes time to test it out, we simply run that container and when we go into the browser to see a fresh and install of R and R studio, we can see here in the session info that the R version is 3.4.1. So the great thing that the Rocker developers did is they changed the default CRAN repository to a dated Microsoft snapshot at EMRAN. So you can see here if I install packages de-plyer, for example, the URL that it goes to is actually not CRAN, but a Microsoft snapshot dated at 2017, 09, 28, which is the last day that there are version 3.4.1 was the latest version. So in addition to this, I recommend doing a few things, which is mounting a folder of source code and a really obvious place inside the Docker container, more on that later, adding your user ID, user ID, so that any actions taken by the Docker container look like you were the user that took them, that helps with any permissions issues downstream, and also initializing a packed web project, which will control and organize all of your packages used for this project within the mounted source code directory. What this looks like is a runtime call here with the user ID password in the mount, and also once you're inside the RStudio, you install pack and initialize a project. So I've got some code online that you can follow, but I'm also going to work through a demo. So what we'll do is I am in a server here. If I make a directory for an example project and go into that directory, you see I have nothing in here. But then if I run the runtime argument parameters I just showed you, we see the user ID, the example password in the current directory, then I can go to my browser, and I see a login page. The login is RStudio. In the password I just updated. And we'll see we log into a fresh version of RStudio, and there's nothing in here. Now, of note, we log in as a user called RStudio, which means when we get the working directory, we're actually at that user's home, home RStudio. But I mounted a different location of where my source code is. So I will change the working directory to that location slash work. I see there's nothing in here. So if I go back to my terminal and I touch a test R, I'll see it pop up straight away here on the right. Similarly, if I create a new R script, say that it's script R, I will see here that it's also outside the container in my source code directory. So the great thing about this moving everything into a slash work location is cognitively we know everything in there is going to be copied outside. So the next thing to do is install Perker app, and then we use Perker app to initialize a project at the same location. The project location, we see it creates a folder and some other details here. And if we check the library paths right now, there'll be standard Linux ones, but if we do Perker app on, to turn on the project, we see those library paths are updated to this location. So if we go into library Linux 3.4.1, we see the only package right now installed in Perker app itself. So if we install a test package, color rule, we will see in the printer here it actually is installed into this location. And I got it from the MRAIN 2017 September 28 snapshot, and we see it here on the right. So to really give a further example of this, I have another image running. Here it is. So if I log in again. So this is sort of what I do each day when I go to work. So here I am in a blank studio because I've refreshed the session. And I have actually stopped this container and brought it back up again. So what I will do is I have an example script and a pack route here at slash work. So if I check the library paths and go pack route home, turn it on again. The library paths have updated. Now if I go to my example script straight away, I can load dgplot and load dplio and run my analysis just like it ran yesterday, even though I stopped and brought this container back to life again. So hopefully that was helpful for everyone and I'll take any questions and thanks for listening. Hi, we're already over time, but there's any quick questions I can take now. Yes, do you have any opinion about the new RENV package instead of pack route? Yeah, I was actually not aware of that at all. I would definitely be looking into that a little further. I think it's very interesting that it was one of the same developers and learning a lot of mistakes, I think, is a good example of what the R community is like anyway. I would definitely look into it. I'm from the University of California. And another question, can people run R notebooks and RMD files? Yeah, once you get into the R studio, everything works in the normal way. So I do use a lot of R notebooks in this exact system. The only issue is sometimes when the container goes down for whatever reason, it tries to bring back up the R notebook, but it's frozen between not knowing which set of libraries it's using. So you have to just be patient with it and then turn the pack route back on and then everything sort of comes back to normal. Excellent. And maybe we have time for one more question. User says a naive question, but does rocker support the R version 4 and higher? Yes, I believe there's some links in there on the Docker Hub for Rocker that they've split their repositories into a different place starting at hard version 4.0. I think they decided that instead of stop building on things for years now, that they're going to start fresh in another repository. So just find that link that it will be on there somewhere. All right. Well, thank you very much. Okay. Thanks, everyone.