 I think we can start our sessions, our session right now. Welcome everyone to our production. I'm happy to chair the sessions with a lot of interesting talks coming up. First it will be Marcin on shiny POC to production applications in eight steps. Then we got Alex K. Gold reliable reproducible project packages. And then we got Iñaki Ukar. I hope I pronounced it somewhere correctly. Binary R packages for Linux past, present and future. And with that I'm off the floor. The stage is yours, Marcin. So hello everybody. So my talk is about shiny proof of concept to production in eight steps. My name is Marcin Dubel and I'm working with shiny for like six years. And those will be some areas that I want to highlight in that talk so that you can check if everything is covered in your application and adjust them to be production and like great tools for the users. And I will wrap up this steps in some sort of story so that you might find familiar in your companies, in your projects. So let's imagine that we have Data Science, our team in our organization and we are about to create an excellent shiny application. And one of the members of our team is I called him Peter and he is a shiny developer intern and his task is to deliver proof of concept of the application while the rest of the team is showing the idea to the stakeholders and trying to find the budget for the application for this project. And our proof of concept is ready, but and we get the budgeting. Peter has ended his internship. And now our task is to move this proof of concept to a regular working great production shiny application. Okay, so let's see what are the obstacles that we may find and how we should overcome them to deliver a great value for our users. Okay, so first of all, we need to get the code base and so Peter send the team and its application. And this is an in a zip file, which is kind of bothering us because this is not actually the version that we are looking for. It's not the last one, but we can't find any anything else. So Peter should have used the control version tools, some repository, it might depend on the tool that is used in the organization can be and get happy bucket get lab, whatever, but it should follow the version controls show that we can easily share this code and easily see all the versions and all the history of the changes. But what should be kept outside of the of the repository. It's definitely the data, the, the, first of all, you should use the database, but if you're using some flat files. You should be stored somewhere else to not overwhelm them and the repository and definitely what might be the mistake of the of the Peter is that he can store their the credentials so use the aren't viral files, or the nice feature of the API that you're applying of putting your credentials there. So, cold sharing is our first lesson learned to be able to maintain application as a team production ready application requires a lot of collaboration. So we need to share the code easily. So we've got the code from Peter. It is maybe not the best version but it is as it is. And we cannot run this. There are errors. So what we found out is that there are some paths that are referring to the files which are directly on his local computer, which is super bad. We cannot access those five so we need to translate those paths to be relative according to the our project, which should be always used for the our applications. Okay, this is this is solved, but still we have some errors in our application and which are cost because our environment is different to what Peter used on his computer. So tools like aren't for setting up the shareable state of the packages and versions that are used in the project is a key to be able to work together for going even further for the deployments I would recommend creating Docker images and containerized the application and controlling all this all the system requirements, all the our version and and everything that is needed for the application. Okay, development environment. It's our lesson learned number two. So, we've got the code. We are finally able to run this and we see that the app that Peter created is super easy just UI and server files and so great it will be easy for us to maintain this. But when we dig deeper, we see that it's not quite the story and implementing any changes right now adding new features for our production application will be super painful. What I would recommend when there is a lot of business logic in your application, consider using a package to reuse that the functions extract your UI parts to separate modules which will be much easier to to And use our six classes to encapsulate the ideas and the concepts of your application into one one object. If you're familiar with my shark attack application. The trash for example is one instance of the of the object and with its own methods of how to collect trash or computer with or different parameters of of the types of trashes. Okay, code structure clean code structure is our lesson learned number three. Speaking about the project structure the folder structure and the shiny application is much more than just an art code. So you will need to use a lot of different languages like CSS and JavaScript, and do not use them like in line in your In our but this is not a good practice because then it's hard to to find those places and to maintain such code. Keep everything all the dependencies in the WWW folder so that they are loaded with your application and use minified files the as you can see on the right there is an example of folder structure of one of my applications. There's just one minified file for all the styles in CSS. And this allows up to load much faster because this loading just one file and not dozens of them. So my also recommendation is to extract blocks of text external file, often in applications you have some long business descriptions, and they are making code hard to hard to maintain hard to follow hard to understand. And this is also great because if you have some business business users that are adjusting those texts, they can just go to text YML here for example, and fix the descriptions without going into your code and messing with something. Okay, how to use external resources it's not our lesson learned number four. Next step, we all agree that tests are super important. Of course Peter didn't implement them. But maybe there is already continuous integration so the quotation above it might sound silly but actually this is a great indicator that you are on the right path to production. Always start project with continuous integration and at least test structure, and this this will allow you to work on this step by step. And now it is easier than ever because you have a lot of ready tools like, for example, GitHub actions marketplace to just plug it and use it. And once you do this for one project, it is super easy to use some templates and reuse the boilerplate code for continuous integration and tests in other projects. So this is like just a thing that you should start with. And then we can add your test incrementally, keep your continuous integration green, and make sure that your app works as it should. So there's a continuous integration, our lesson learned number five. Okay, but what if our code our functions works super fine, but the stakeholders are not satisfied with what application is producing because it might be that we implemented correct functions but they are doing strong job. Or that we input something into application that is not fine. So here I would like to present you two tools to help you to make sure that business logic is fine. And I recommend to hear the Drake package and Drake package its main benefit that it produces the graphs of your workflow in the application. And here you can see the real example of the real graph produced for the project I worked with client where there was a lot of logic in the Excel file that I needed to transfer into shiny application. And using Drake I was able to present this workflow in this nice graph where blue triangles are the functions and green circles are the data objects. And together I can sit with sit down with client and go through this logic and check if this is everything okay as it should be implemented. But we can always put the wrong data into application and it will just produce the wrong output so make sure that you validate your data your inputs to the application. For example with the data validator package that helps you to check if the logic is okay and you can automate this also with your workflow or checking the database updates. Okay, so business logic check check is our lesson learned number six. Okay, now the big problem that we've noticed in a lot of applications is that it works fine when Peter presents this with some data sample that he he mocked on his machine but once we deploy it. It's not so great there is it is so slow. And basically there can be three reasons for that there can be problems with data code for deployment. So for data, I recommend that you load only what is needed. So, by that I mean that I see a lot of applications that loaded all, and then the first step of the of the user is to filter it down to something that they will work with. So, just use those filters and load after data after filter use efficient data libraries. Data frames are not always the best and use API to filter and select and group your data on the on the database part to push that work there. There can be wrong in code. You should keep calculations outside of shine off shiny. I mean that if something can be pre calculated some heavy operations, put it also in the database put this in some API, and just just use shiny as a display or as a front end, and also keep all actions in browser so heavily using of JavaScript is always good for our performance and make code efficient check if every for loop is needed and if there are any duplications or something can be optimized. And the deployment so the server environment is different to your local environment so make sure that you deploy early, your application to avoid any surprises just before the release, and to check if everything works fine, and prepare architecture for the number of concurrent users that might be surprising when suddenly a lot of people start to accessing that you can use tools like shiny loads test to check it earlier, or just do some stress test with your colleagues. So performance it's super important and this is our lesson number seven. Okay, so we've delivered and something that we think that this is a great production up, but our users do not. So remember that user adoption is key for the app success. And to help you achieve this, invest in designer work, start with mockups even without proof of concept to go through the key features with your users and adjust and find out what's needed. And while you are developing collect feedback often and of course you should organize the sessions with the users, but from our experience it might be difficult to organize. But what is even what is very, very fruitful is like doing the hallway test so by that women just gather your colleague and that do not work with the app, ask them to click it through and find out if everything is intuitive. So app styling and user experience is key for the success and this is our lesson learned number eight. Okay, so those are the steps that I briefly covered. I know that I a little bit to rush up but you can go and find a lot of materials by yourself but what I wanted that is a key takeaway for you is that those parts are important. And also that you don't need to wait when the proof of concept is done, but actually you can start with this and this is the best approach to start with having everything, every of those areas in mind. And that will allow you to move to production to maintain the application to expand this with the new features really, really smoothly. Thank you very much. And I'm looking forward for the discussion either here during user or you can reach me on any platform that you would like. Absilon is kindly asking you to fill their feedback form that you can deliver and we can share our knowledge better. Thank you very much. I hope that it will help you to deliver great shiny application. We all love this tool. So I'm glad that I can be maybe helpful. Thank you. Yeah, thanks a lot to you, Marcin. That was really an interesting and insightful talk is we already saw also on on the slack where a couple of questions already came up I tried to get a few of them to you and hopefully you have some time and also to check the questions out on on on slack and answer there. The first question from Ashley is, what do you use to minify CSS and JS files. Okay, so for for CSS, I definitely recommend the and SAS styling. And so this is this is a tool that not only allow you to minify the files but also to program it better. So it allows you to use variables in CSS and stuff like that. So and SAS so as a SS tool. This is great for for for CSS for JavaScript it's it's harder I use some solution that is provided by my colleagues in project and I'm just using a template so this is connected data to other topic that I covered so that I don't quite think about this I'm just using what they've produced, but I will check this and I will come back on slack with the answer. All right. Yeah, we also heard about sauce in that on markdown talk which was really useful so that that seems to be a good option. The second question is from Steffi. What does the data validation do. Can you elaborate on a data validation maybe a bit. Okay, so you might be familiar with assert arm package that allows you to just create a simple rules for your for data set like whether the values in the column are in the specific range or they're only those values or they are not missing the values or maybe there are some dependencies between columns or rows and data validator wraps up those well known assert are rules and produce a really nice reports that you saw on on the give that is just an HTML document so you can easily send it can be sent automatically by some tool like our studio connect scheduler and gives even like business users a quick overview of what's happening. And there is also some summary part that you can use your application to check some logic for example, if there are no errors update data, if there are some do not do this because there is something wrong with your date. Thank you very much. There are more questions, but one of them has already answered been answered quite well by by Derek who's on the next team and I would like to start the next talk, the next talk on time. We can we stop these questions here there are a couple of more up I will transfer the questions from the Q&A to to the slack. And we will continue to discussion on on the slack of this. So, the stage is up for the next talk. This is in yaki and and took. And we heard from our keynote speaker Jerome just a few minutes ago that the Linux binaries are a hard thing so let's learn about the hard work of in yaki and the stage is yours. So I'm honored to be here today presenting this work with dear KW tell, who I believe doesn't need any introduction. So, let me get straight to the point. I don't think we need to lose here the terrific work done by the crunching during the last 20 years or so in which ground has grown spectacularly as you can see in the graph here. What I'd like to highlight is, however, is that many of these almost 18,000 packages use compile code. And we used to say right that ours like glue language and developers use these capabilities to offer better performance integration with third party libraries and so on. So, how many packages use compile code. Well really a lot, as you can see in the graph below. So let's start off with a clean library and then pick a package at random, you have an 80% chance of requiring compilation, either directly because the packet itself requires compilation or indirectly, because one of the dependencies require requires compilation. But compiling is let's say troublesome. It's an appropriate tool chain and libraries requires a lot of time. And when something goes off, then debugging the issue requires a skills right. So for instance, what if I saw you something like this. So Windows and Mac OS users would be asking themselves so what's the deal right no problem, because crime with great pain and must say already provides up to date binaries for them. But there are probably some Linux users among the audience, like these right now. Yes, we're going to the last time they needed to go for a walk when this was installed, or maybe fry some eggs on their computer. Right. But nowadays, many windows and Mac OS users are also Linux users in the cloud, or because they use CI city systems powered by Linux. So, and we can do better and we need to do better. See, for instance, this example of a binary solution of the whole day diverse that is around 80 packages in just 36 seconds on the natural Linux system. But there are of course, a number of challenges here. There are many Linux distributions out there, and with different philosophies conventions, update cycles, different sober stacks, we need to manage system dependencies of course. We need to scale at the same base as grand das and to accommodate new platforms. And of course, we lack the integration in Linux systems. And basically, we can install packages from source within the R session. But we also need to use a system package manager to install those system dependencies that we require, which in turn requires admin privileges. So there's a lot of previous work done in which we base our own and Linux distributions themselves have been shipping our and a number of packages seems always basically here you see there are three hand almost 400 packages on Fedora. Of course, with full dependency management, multi tenancy with meaning that you can provide a coherent stack of packages to multiple users in the same machine. But due to the, let's say stringent policies and guidelines that official repositories implement for good, I must say, this doesn't really scale. And of course, there's no integration whatsoever. You can install, of course, a handful of packages with the system package manager, but then the rest of them need to be installed from cram with install package within your R session. So there are a number of projects, striving to bring binary repositories of ground packages to various Linux distributions. The oldest and most successful one is granted them. And most of you know it, which Michael rather uses to maintain the C2D4U repositories with about 5000 packages there for several R versions. Also, the left Stoia maintains AutoCram with the impressive number of 16,000 packages for OpenSusa. So, these projects are so and as a path, okay, with improved scalability, you can see that a lot of packages are provided with these projects. But still, we have no integration as in the same as with artificial repositories. Also last year, our studio released extremely successful public package manager, which all of us have been enjoying during the last year. Our studio brings us binary packages for many distributions, but most importantly, they have integrated binary installations into our Linux workflow. So that style packages behaves the same as in Windows or Mac OS in our Linux system. However, this only brings us packages but not system dependencies. So we still need to start ourselves using the system package manager and we need to install these system dependencies. Of course, version mismatches may happen. All this is fantastic work. At the same time, of course, the open source community cannot just sit and hope the best for our studio. As we said, we can and we should do better. So, meet here, Cran2copper and BSPM. So first of all, let me show you a primer. In Federal Linux, you can enable this ucar slash cram copper repository and then you install this mysterious copper manager packets and bam, you go to your R session, you can install packages, whatever, and you get your binary packages from the system libraries and you get their dependencies for free, transparently, completely transparently. So how does this work? Well, Cran2copper uses the federal copper build system to build packages and publish the binary repositories. So Cran2copper is currently available with automated daily updates via GitHub actions. And we bring the missing bit from previously surveyed projects. The integration we always hope for. And what is the magic? What's the magic behind this? The magic here is called BSPM, which stands for Brits, the system package manager. This is a multidisciplinary integration that currently supports DNF and APT that's covering a large portion of the Linux ecosystem. Cran2copper's copper manager is just like a branded version of this BSPM package. BSPM is available on cram, as a devian package, and in the rocker project. So what it does is to provide all the other projects we saw before with this desired integration. And this is how the magic work. Basically, when BSPM is installed as a system package, it provides a service that is able to talk to the system package manager for you. So if enabled, when you call install packages, BSPM tries to install your packages from the binary repositories, along with their dependencies. If you're a normal desktop or server user, then this don't be a database interface, which is debatable. And there's the real magic, whereas if you already have routers to the permission, then it just takes the straight route. And if any package is not present in the repositories and binary repositories, then install packages transparently continues installation for from source from cram as usual. Quick example, RSPM on the left BSPM on the right in both systems we try to install the units package in our session. If I play this, as you can see in both systems we are calling install packages. Of course, both systems are equally fast, as you can see. But what happens at the end is that a library is missing in the left side. In the right side with BSPM, since we are pulling the packages from the system repositories, then everything is set and you can start using your packets. Another example, what if we want to store the whole use partial stack, the use spatial image in the rocker project installs at 25 packages SF SP raster and, and all that packages, which require 139 additional packages, the package dependencies and 69 system dependencies, the size of the download is 300 megabytes on this case, 800 megabytes. This just takes 68 seconds with this system. So, finally, these are the links for the projects. I would like to thank all the people using and enjoying them providing feedback reporting issues. And so thank you very much. That's all I got. Yeah. Thank you very much for for this inspiring talk and obviously also for for the work. Yeah, as there are not many questions for around this time. I have a question. I have a question myself. I wonder, isn't, isn't this, I mean, first of all, I would like to say, even though it's maybe difficult for people who are not deep into binary packages and building these, these things to understand how relevant this is. There's all the Docker coming up and everybody in a grandmother starting to do the first steps and with GitHub actions, they don't even have to install a Docker runtime environment or something like that. This isn't the architecture but through the great documentation of GitHub, they started to use GitHub actions and I'm under the hood, they're using, they're using Docker, of course, and then they might just run install packages in some YAML file and then see, see the effect of, of this important work. So this is more to, to emphasize the importance of this work and to say thanks for it. And no, there is a question coming in. I somehow inspired people and L you was asking, can this be extended to take binaries from RSPM. No, they're completely different systems. I mean, I believe, I don't know the setup RStudio has to build all those packages. I imagine that's how I would do it to match their setup. And they do it by running Docker instances in various distros and then pulling the dependencies and building everything there and then just putting the final binary packages into the repositories, but they are not compatible. We're ultimately building a system repository. So an RPM repository and an APT repository, a dev repository. So basically, we are not shipping just the R packets but the RPM or dev file that contains all the logic to install that into the user system and to pull all the dependencies that are needed. So they are not quite interoperable in that sense. So, if I understood it correctly, does that mean that you provide a thing that Windows, for example, doesn't have? I mean, if I look at the spatial community, they have a lot of dependencies on a system level, often, very often. And on Windows, this can be quite a pain to install these dependencies to look them up and install them from all sorts of sources. And yeah, so this is something, this pain is something that you take away from people and it could become, especially for a spatial community, a nice thing to follow and how this developed, is that correct? I don't think so. I mean, for Windows, what CRAN does is to, well, and Jerome can elaborate on that because he is providing all that Windows stack with great pain. So basically, they provide the R tools with all the dependencies that all the CRAN packages need. What happens is that, of course, if there's another package coming on CRAN that requires a new additional dependency that must be incorporated to the R tools bundle and all the stuff. Here, we can rely, of course, we can potentially rely on all the system libraries that we have in all these distributions. Yeah. It's more extensive than the bundle that they produce for Windows. Yeah, exactly because you're relying on these package management systems that you have there. Yeah. This is not it. Okay, Dirk was weighing in on Elio's first question. Here's another question from Elio. Yeah, there's, I think there is time for this. I would just say that what's the probability that trying to install a random package will need compilation. Yeah, there was in my first slide and you pick up a package at random from the whole CRAN, you have 80% chance of requiring compilation from a clean library. I mean, if you have already some packages installed, maybe you pick a package that doesn't require compilation directly and then you pull another one that you have already in your system. So with my setup, it depends on whether the package is available. Most, the 1% most used packages are there in CRAN2 there in CRAN2 OBS and CRAN2 copper. They are there. And then if you maybe you hit a very rare package, a very recent one that maybe you hit source installation. But on Fedora, I mean, I don't, I think in the last one year and a half, I didn't hit any source installation yet. Oh, I'll jump the bend wagon, because this is actually interesting for a use case that I have. We used Docker a bit when we use exotic setups, especially with reporting and tack and so on. And what we don't what we don't do is like the typical stuff that people often do on on on GitHub actions where they just take like a from from rocker or from somewhere else from something and then install stuff on the fly because it's just that fast. So we built a darker image having in our own Docker registry that we host inside our institution. And then we have sometimes every once in a while, something in our internal packages change, we have to rebuild that darker image and this takes quite a while. So this would be a great tool chain to reduce on a build time of that darker image, right? Yeah, to make it as small as possible. And then you just install the binary versions as you need it. Yes, when you run the the CI or the CD deployment. Yeah. All right. I'm checking from someone is just asking, because I also think you didn't mention it is it possible to make the slides available for this talk. Yeah, I'll get them in the chat room and on Slack. Yeah, that would be perfect because I think that might it would also be great if you if you could check I mean the slack will be up for for a little bit after the conference if you at the time because I think this is a topic which is highly relevant but takes for some people moments to take a step back and then I let it digest a little bit but nevertheless I think that there will be good questions coming up and this is also an advantage of this format and that people can just go back and maybe ask something they didn't didn't come to their mind right now. So, I, oh, well, it's like it's just like a long go whenever I want to. I got one question. No, that's also Derek here. Also answering stuff. Well, I think we could go on with the with the next talk and people interested in this one, please continue to discussion on our Slack community with reliable reproducible project packages. Alex the stage is yours. Awesome. Thanks so much. Let me get my screen shared here. Okay, great. Thanks so much for being here folks really really appreciate being able to speak with all of you. I'm going to be talking today about reliably reproducible project package I think this is just like ties right into the last two talks we've heard so I'm super psyched to get to get to go here. In case you want to tweet at me for any reason, my Twitter is at Alex K gold. If you like this if you don't like it, whatever tweet at me and these slides well slides from from other talks I've given are available at Alex K gold dot space slash speaking. Great, so just just a little bit about me. I am a manager on the solutions engineering team at our studio. Before I came to our studio I was a data scientist and a data science team lead. And so what do I do as a solution engineer in our studio I help our customers deploy install configure and use our professional products, along with sort of the open source tooling in both are and and Python so it means I get spend a lot of time talking with folks about how to, you know, design our and Python projects in a way that makes them reproducible and reusable. So, let's talk a little bit about like, why, why we care about reproducibility that of course is about sharing right that that we want to be able to share our projects with with other people contemporaneously, and also with future us right like we need to be able to share the future and future us does not answer the phone so you have to be a little careful when when you're sharing with them. But I think reproducibility is a little bit of like a slippery topic it's it's a little hard to know what what we mean when we say reproducibility so you know when we say reproducible to me. I think also of like some synonyms being like self contained or portable and what I mean by that is that, you know, for analysis to be reproducible that means that it sort of stands alone you don't need a lot of other things around it to be able to to make it run again, and portable of course that it can move across people across space across machines across time. So, this is sort of what it means for analysis to be reproducible and so I thought the first talk in this block did you did a great job talking about some of the ways right you make your code reproducible by doing version control, which is which is great you make your data reproducible there are lots of different ways to do this but use a database do something else. As you can guess from the title of this talk we're not going to focus much on that. We're really going to talk about the project level that the package level how do you make your projects, your project packages. And so, let's let's just talk a little bit about sort of what happens when we're using packages in a project. I want to say just just real quick that conceptually in case you're interested, everything I'm talking about here is true for Python as well obviously the details are different. Some of the details are different and the code you would actually type is quite different, but conceptually and sort of on the on the level of the sort of schematic of what's going on it's all exactly the same. So, let's just think a little bit so let's say I have a product inside that project, I have a library library a calling library a to use some some functions in it. So what happens that I have a library on my machine of our packages and my our session will go into that library and find library a and let's say just for sake of argument that a has a dependency, we'll call it C, and those are both version 1.0. I've got library a its dependency C 1.0 in my library. Love it. It works great. Now when I do library be it goes and it gets be, let's say be is also version 1.0. So, this, this works great like this, this works right. But sometimes it doesn't so let's let's talk about how this falls apart and one of the things to keep in mind here is that by default, you have one library on your machine for all of your projects. It is it will change per version of our but in general is just one library. So if I start, you know, mucking around something over here, I could affect another project. So one of the things we'll be talking about is how to sort of get rid of that that complexity. So the first way that sharing libraries gets mucked up is is when you try and share your project with another person. It's a great job. You got your code version control you share it with them you show the data. It's sort of like their library just looks totally different like what what kind of crazy library they have that they have, you know, version 0.8 of a they have 0.7 of B they don't have. Sorry, see, they're not be at all it's it's a mess. And so then you know they try and just run your code without any sort of preparation. You know what happens they go library a and it loads in that version 0.8 of a and like, maybe that works maybe, maybe, they change between 0.8 and 1.0. Then they go and load library be and that doesn't load at all so that just won't work without without more steps so we've got to solve that problem somehow to be able to sort of reliably reproduce our projects. So, even even just me over time let's just say like some time has passed. I'm working on other projects I'm not paying a lot of attention to this one. I'm working on other things and things can change as I mentioned we only have one library. And so what ends up happening is like maybe in that time my package versions have changed a has gone to 1.2 C has gone to 1.2. And all of a sudden I can't, I can't get back to my library state and so like then when I do library a, I load up a 1.2 C 1.2 and again, maybe that works. I don't know. Maybe it works maybe it doesn't and I that's not a state I like to be at least when I'm when I'm trying to run a project. The last way this tends to be really fragile is like let's say I'm coming back to a project after some time and even if my libraries haven't changed but maybe like, I'm doing some more work and I before I like, I didn't, I didn't do any graphing my product or even doing modeling and I'm coming back and I'm adding that to my product or any kind of extra pack. So let's say I'm going to install package D. So what that means that will go to repository crayon bio conductor or studio package manager and it will get package D it'll bring it into my library but because time is past it'll get the version of D that's like from right now not from like when I was working on my project. And so like maybe that's D 1.2 and maybe D 1.2 has a tendency on C 1.2. And so then when I go back and I try and library a like, maybe a is cool with C 1.2 or maybe it breaks and again this is just like, not a state I want to be and so we want to do it in a way sort of cannot end up in this state. So what are the solutions here. Number one and this has been mentioned a couple times already is our end our end is a is an R package that allows you to create an isolated project library. And number two is to use a snapshot repository from from our studio package manager. So that's that's a way to get around this here so let's just talk real quick about how this how this solves this problem. So basically it makes a project specific library so instead of having just my system library that changed with every product I don't know and you know I can work over here it messes this up. Using our end creates a project specific library that will only be changed when I make changes inside this product so when I do library a library be it will always go to my project specific library and get the libraries just for my project, not for not you know, edited by steps. Step two is I knew a way to share this across people and across time and so we're going to use something called a lock file that will sort of store the packages in the versions. And so what we're going to do is we're going to create the lock file for using the version and that will create the lock file will save that in our project that can go in version control. It's a very small file so it's not like a huge thing to add to your version control. And so then, if, like, let's say we go back to my friends crazy library over here. If they run the R and restore command, it will just the same setup and isolated project library go to the repository and get the right versions of the packages specific to this project. That's great. The last thing we're going to do to avoid that that package D installation problem is switch from using just a generic, you know, whatever repository to a dated repository and so this will be the repository as of a particular date so let's say I lost the repository to be today that no matter when in the future I go back and I say install that packages D. I'm going to get the packages as of today. And so what that means is that D will be version 1.0 it'll play nice with C1.0 and I don't have to worry about things breaking in the future. So let's let's do a little we'll do a little live demo here of what this actually looks like so. Let me pull up our studio workbench. If you don't know our studio workbench is a rebranding of our studio server pro, but everything I show here is exactly the same in our studio open source desktop or server. So the first thing I want to show up is just like one little bit of the magic here under the hood, which is that. So if I'm in if I'm in our and I type dot live pass what this will give me is the path to the library that my system is using you'll notice here. I'm not in an our project, and I'm not using our end so when I type dot live paths you'll see there are two paths here. The first one is a path that is specific to my user it's my user path. So let's get right to the version of our although it's actually just the two dot version of our not the three dot version of our, and then there's also a system library. That's fine so like now if I type live library d player. That works right just loaded d player create no problem. Easy. Awesome. The problem is we're still subject to all of those issues that I was just talking about so let's let's talk a little about how we might use our end so here I am inside a product I've created called user 2021 very creative name I know. And so what I'm going to do is I'm going to run the RN init command, and what this does is this brings me into an RN project and so did a couple things. You can see now I'm inside this project right it's my home directory, this project, and it created a couple of directors created this RN folder, which is the actual library of my packages, and created this RN dot lock file. And now if I do library d player, it fails, it fails. This is actually a good thing right because now I'm in an isolated safe project environment, where the only packages that are here ones I explicitly import. And things going else, going on elsewhere on the server, don't affect me here, which, which is is really nice and of course the magic under the hood here is that if I look at live paths, right. Is this RN file inside the product say projects to the library. And this is a temp directory that you don't really need to worry about. But so now let's say I do want to play I'm going to do install that packages deep layer because I use deep layer with pretty much every project. And so what you just saw is that installed and install ridiculously quickly like I mean you know binary installs are fast but that was, that was something else and the reason of course which you can read 18 times here is that aren't maintains a cash across all of my projects. So if you have the same package in multiple projects are and will link it in and I don't need to worry about, you know, having long install times, even at binary speeds across different projects which, which is really, really nice. So that's our step one now I've got my isolated project library. It's safe from anything else going on on, you know, on different projects or on our server that other users are doing or anything like that. So this is a little bit at the at the lock file so this is an important component here so here's my lock file. And you can see what what's here is it maintains it keeps a record of my our version which is nice so I know what what our version I use and in fact if I try to load this up with a different version of art it will give me a warning. So I'm aware that I'm loading up with a different version of art. It records the repositories that I, you know, was was using here, and then it records the packages that I'm using in my project and one thing you'll notice is, it didn't add deep layer deep layers and that's because by default, our end weights for me that explicitly snapshot the code in my repository so right now there's no code in my repository so if I had a library deep layer statement, save that and we'll save we'll just call it code. So these are really creative names here. And so now if I run our snapshot. It's going to scan the code in my library for sort of standard ways of specifying packages right either a library statement or a double colon or whatever. Record where they're from say these okay to put the lock out I'm going to say yes, and it will write those to the lock file so now I've got not just aren't but all of these other packages recorded. It's going to have a package name, the version of the package, and where it came from. And this will do both for for, you know, CRAN, GitHub, bio conductor, wherever your RCO package manager wherever you're installing from it, it will, it will record. And so, now if I want to share this with somebody else, I can share this the aversion control. And the really nice thing here that are added does which is which is pretty smart is by default, it doesn't. It's not going to share the actual libraries right it's not going to share like a ton of our package code. Instead, it's going to share a couple little settings files, but mainly what it's going to share is this rm dot lock file. And so I can see in the commit here. Mostly what's going up is this this rm dot lock file. It's that same file I was just just putting. And that's a very lightweight file right that's not like sharing all my R packages. And if somebody else wants to get back to the same place that that I was doing the same thing I was doing, they just do an RN restore, and that will get them back to the same state as is documented in the lock file. So one last reason that that's part two is the lock file, let's us share with other people which which is great. Okay, so now part three, making sure that those those package install say sort of consistent over time. So I'm going to go here to package manager rc dot com this is our public version of our sphere package manager. There is also an on-prem version. If you, you know, for folks who need to host a binder firewall, but this is the public one free to use. And this is both crayon, by the way, just in case you're curious all and crayon are both just crayon there. We have two for historical reasons, but they're both just all of crayon. There's also a bio conductor repo if you're a bio conductor person. And so what you'll see is that by default it just goes to the latest here and so what this means is always get the latest set of packages which often can work fine. So one of the things that I can do is I can go up here to the the calendar version. And so if I click on the calendar here the last time that we pulled in cram packages was on Wednesday. And so as of 8pm. These are the packages that are available on cram you can see instead of being at the latest here this end bit right sort of latest. It's a frozen snapshot as of, you know, July 7. And now if I copy this window. If I copy this, you are in modify, modify, paste this into my, paste this into my lock file. And if I just restart our here real quick. You'll see that now my repos will be that that data repository. That data repository that was specified in my lock file. One other thing you'll notice if you have eagle eyes and you know a little bit about this is usually the RCA package manager you have to specify your Linux distribution if you want those Linux byte areas that we were talking about in the last talk. But our end is is package manager aware and so you don't need to specify you know I put in this without a Linux bionic and it added it because it identified the the build of Linux that I'm on. Okay, great. So that's, that's what I wanted to show off so. That's that's these are the big takeaways use our and it's a great tool. It will make your life easier make your packages more reproducible. If you're in Python, you can accomplish basically exactly the same thing with virtual and or with pion for the two recommendations I tend to make. Use a snapshot and repo in terms of when to do the snapshot I tend to do it sort of when I'm reaching a like pause point in my project. And if you want these slides they're Alex K goal dot space slash speaking. So thanks so much everybody. I don't know if there are questions but Yeah, thanks Alex that was really inspiring and really a lot of information to digest. And indeed there are a lot of questions. I'm not sure whether we can all cover them but let me just get to the questions quick, but it would also be great if you if you could look them up on on the slide because there's really a couple of questions. Jeremy wants to know, is it wise to put the RF lock file on to GitHub version control. I tried to put the RF folder on GitHub but failed because there's too much to commit. I think you kind of touched already but Can you elaborate what might Yeah. Yeah, go wrong here. The thing that really needs to go on to get is the lock file. You can put the folder if you want. I definitely would not recommend putting more than the sort of default right by default there's a get ignore that's created in there that ignores all the actual code files for the for the packages and you should stick with that. You know if you share it with somebody else right. I did this on a bionic Linux server. If you share it with somebody who's running on Mac, if they have the lock file, they can restore the packages no problem. If you try and share the actual package files things get real weird. So, so wouldn't recommend that. Maybe we got, we got another one. I use here's a set of question I've, well I for myself why I think these are quick to answer because times running so Where is the our environment cash stored and related to that. Does it also catch custom repositories for example hosting on a mini crime. Where does that stuff go. Yeah, great, great question. So the cash will be stored in your user home directory so it's a it's a user level cash stored by default in the home directory you can redirect it to other places if you you want to do that for some reason. And our studio package manager will pick up anything that is listed in your options repo so if you have a mini crime repository that you have in your in your repos option, it will pick that up by default. R n sets like the repositories in itself to be the repositories that are the repo options at the time that the R and libraries initialized but you can obviously change it after the fact with with the modify command. Yeah. So I think it's time to, to move on actually, I think the most important part of this talk and also I see that from the reactions on slag is pointing people to the R n because there's still a lot of people who don't know that. And that would be great if you just could connect and help others to to maybe you'll find a way to to R and then to get started because that would help them a lot. Thanks for the talk.