 Okay, so yeah, today I will be talking a lot about Shiny applications. I personally am a big fan. I guess that you also find them useful throughout your career, from your projects. And I worked for several pharmaceutical companies and introduced built Shiny applications for them. So how especially it is really nice to play around with your scripts, analysis studies there in a way of interactive Shiny applications. But one way is to play with the Shiny application and actually see your analysis interactive. And another much more complicated stuff is making them production-ready, production-alized Shiny applications. For that, you need to introduce a lot of different components and take care of a lot of different areas of how you are building Shiny applications. Today I would like to present you setting a reproducible Air Shiny project environment. And we will be focusing on how to make this production-ready application, how to start this, what things it's required to make sure that this will work everywhere. Every time and that you can actually trust those results. My name is Marcin Dubel and I work in Apsion company and building the reproducible production-ready Shiny applications is what we do every day. So coming back to our map of the areas of the Shiny applications, we will take out few of those that we want to take care of today. We are also super important but we will focus on making sure that our application is reproducible production application. We will start simple with the code sharing. I bet everyone is already there and you are not sharing your code in the form of some zip folders or versioning by naming and you are all familiar with different tools for the repositories based on the Git, like GitHub, Bitbucket or GitLab. And so this is kind of like a one-on-one class for making sure that your application is reproducible, that you have your code on some repository. However, you can ask yourself a question whether you are making use of those and I saw this a lot that, okay, we have Git but we are like putting everything into the main branch without checking what is there, without actually splitting our work to branches, without any meaningful commits and without doing reviews. So the base of making sure that your application will be reproducible and production is to make sure that your process of building this is actually reliable. But okay, this is the very basic. Let's go to what we should keep outside of the repository. And definitely storing all the data files that I bet for the pharmaceutical companies for your project in bioinformatics, it's also quite typical for maybe for that reason only that the data is so huge that it's impossible to store this in the repository. So usually there are some databases or some file sharing things. But also what is not so obvious all the time, we cannot store the credentials there in the repository. So for a local development, we usually use the add and write on file. So this is the file when you set your environmental variables, for example, the credentials to the database, or you can use the really nice feature of RStudio Connect if you are using that. And probably if you are working in some huge organizations, they have the license for RStudio Connect or Post-it Connect. And this is a really nice way to store your credentials. So that's the obvious thing for the security reasons why we don't want to keep this in the repository. We want that to be secure. Okay, speaking about the reproducibility, we need to talk about the different environments. And here I will mean that we would like to make sure that whatever I'm building on my machine, it will work on my machine as well in the future. It will work on the other machines. It will work on the server when I deploy this. Okay, so the first, again, the baby step is that we don't want to have any local paths to some not-relative paths to something on my computer so that when I move around my project, it still works. So always start with using R projects and use the relative paths to the project itself. Okay, so that will assure that when I move around my project, it will still work. However, I want to make sure that it's not only working when I move it around my computer, it also works in time when there are different versions of packages and also when I move it to my colleague, my teammate computer. For that, I'm also pretty sure that you've already heard this because this is a game changer for Shiny application for a few years right now. So you should be familiar with RNV package that allows you to separate your project environment. And here I mean the version of the packages that you are using from the rest of your R project and allows you to share this state and send this to repository so that your teammates can take, clone this and restore the environment with the same version of packages. Setup is super simple and restoring environment is super simple and that will allow you to move in time on your machine so that you can come back in months or years to the same project, restore the versions of the packages that were used and rerun and expect the same results. But also your teammate can take your project, restore the environment and expect the same results. However, this is only dealing with the version of our packages and there are, there can be different dependencies that we are using. So there you can be some system dependencies, there can be an R version itself which RENV is not solving that problem. There is a wording that different R version was used but it's not solving that problem. For this, the Docker images, the containers are a really nice solution so that you can actually share the exact same environment with all the system dependencies, R version and the packages there. Usually you are using them combined so you build your Docker image and one step in building them and the container is to restore the environment based on what is in the RENV log file. This is the file, the instruction of what versions of packages to restore is called log file. Usually nowadays again with the Connect and all those projects from our Studio Posit there is much simpler because there is a server already predefined with the Workbench and the Connect server that are the same for all users and you don't need to care about the containers. You do need to care about keeping your RENV log for your project but all the system dependencies should be solved. Okay, so this is the environment and when we are talking about environment you probably heard a lot about that we need to have a production environment, development environment and we would like them to be the same so that whenever we are building on a development environment and we deploy this to production it works the same. And yes, this is crucial and this should be solved with all those tools that I presented on the previous slide. But I would also like to talk a little bit about the configuration. What I see as a mature reproducible production application is actually how it behaves on a non-production environment, how it is configured. Usually I see useful to have five different setups for the application and to store in the config file how this should behave because of course we want them to be the same when we are talking about the environment. So what are the version of packages, system dependencies, our version, that stuff but we usually want them to behave differently for example to use the different data. So there is a production database with a real data that maybe we are not only reading what we are also writing to this database and definitely it shouldn't be the same data that we are using for tests or the development and that can be specified in config. There are sometimes some features that should be differentiated between those environments. Usually for the projects in Shiny for pharmaceutical companies we were adding in non-production environment some big warning that this is only the test, don't rely on this data that you have here because we really don't want to make the decisions based on the fake data that we are using for development and if someone is accidentally there on the wrong cling we want a big banner to be there displayed for that environment. Okay so those five environments that we see are useful of course production. So here users, stakeholders, developers, all of those parties involved they think that this is a working application and we all believe that this is stable, this is what we can reproduce and rely on. There is usually also a test deployment when stakeholders and developers think that it's work but users are about to test this. So usually for a very mature production applications we have the release candidates on test. While the release candidates pass all the tests, automated tests, user tests, we can merge this with the production. There is also a development environment when developers think that it works and this is what was specified in the tasks but now someone who is managing the application, some business units, some subject matter experts, they are testing before the school will be shared with the users. And there is also a sandbox environment that we found super useful whether you can deploy just to check particular features, just to check how this behaves on a deployed server. It should be the same as on the development but it's always good to test. And usually what I suggest to include is also the config for the offline mode. So when there is no database connection so that you can actually work without being blocked because database is down but this is also useful for other purposes. Let me share a little bit why those configs are super useful. As I said, usually what you want is for the application to get to the different data so it allows you to separate your application from your data layer and test them independently. And it usually speeds up the development. My experience from the projects in pharmaceutical companies, bioinformatics projects is that we are not, the performance is not the main goal because we want the results to be good, to be fine, to be reliable and we can wait for them sometime. It doesn't need to be blazingly fast but for the development you can use a smaller data set, just some fake results to make this much faster to build those applications and you can modify how this should behave based on the config. You can of course test your new features carefully on the deaf or sound box environments and your automated tests can be faster. The same, you can just plug into the fake data and make sure that everything works and the data will be tested separately. Here we can focus on the application themselves. Yes, we can have the reproducible upstate on the mock data. If we are talking about the very dynamic data sets that may be changing a lot, it will be difficult to set up the unique tests based on this or the front-end tests that will display the comparable results every time and also if something is wrong, we won't know whether this is like the new data is something wrong with the new data or the application stops working. If we can use on some config like deaf or test or the sound box or offline the mock data that will be used the same mock data, we will be able to test just the features of the application and we'll be able to reproduce it every time the same. And usually that was the case for such projects that the data sorter is kind of not something to expose. So having the offline config, for example, ready to just share your application, present it somewhere on a conference for some new stakeholders to new developers or maybe the developers themselves shouldn't be allowed to see the data, the actual production data and they can only work on the mock data, that is really helpful there. And what I mean by having the different configs usually it's as easy as having just a YAML file that will specify, okay, for that config you can use those views from that database using those credentials and for others you are switching between different names of the views credentials, whatever you will specify, you can also inherit between those different configs that will allow you to make this super efficient. Okay, sorry, okay, I started to see the agreement. Okay, two more things. The CI and tests, my recommendation start each and every project with the CI and test structure. I won't go into details of how you can actually like build your test, you can do it later, but use something, some automated CI tools like GitHub Actions, use the templates to just repeat those every time the same and make sure that this is green and later you can add your tests incrementally. Also two more things because we discussed about testing applications separated to testing your data and testing your logic. Here I can recommend you the targets package that was previously called Drake for building and how your logic looks like and also the data validator package that here you can automate testing your data. So for example, if there is this dynamic database, you are getting your data, you can before uploading this to production server you can run the tests on the data side and test those separately and only if the tests are passed, this will be sent to your production application to be used. If something is wrong, you can delay this process, send some email notification to check the data, maybe there is something wrong there and you can also apply this to different config files. Okay, so we have our shiny application and we have all those parts covered. Some of them were that we discussed today. Some of those you can learn on different materials, stocks from epsilon, maybe on some other conference. And with this, you should be able to start with the production reproducible application. I think that our last takeout note will be that you should have this in mind when you start. You should have tools ready to make it from the very beginning because later it will be difficult to introduce all of those concepts. Thank you for this and you can reach out to me if you have any questions. Thank you for today. Thank you. Can everybody hear me? I can hear you. Perfect. Perfect. Thank you. I was having so much technical issues, so I appreciate it everyone. Thank you so much for your talk. I didn't see any questions in the chat, but I was having technical issues, so I'm not sure if there were any. I didn't see any in the Q&A either. Beth, I'm not sure if you noticed any questions for our speaker. We have about one minute before the next talk is up. Nope, I didn't see anything. All right. Thank you so much. And then we're going to bring up our next talk which is a recorded talk that is presented by Jacqueline Gonniff of RStudio, and I don't believe that she was going to be available for questions, but thank you.