 Okay, good evening. Can I get a thumbs up that you can hear me first? Is that possible? Okay, great. Thank you. Hi, my name's Russ Hyde. I'm a data scientist at Jumping Rivers. My colleague Keith Newman should be answering questions in the online chat. And yes, the topic of the talk today, it relates to shiny applications and how you make them more efficient, how you make them more enjoyable to work with as a developer and how you can make them, you know, a little bit nicer for your users to work with. So we're going to talk about streamlining, which is sort of making your app more efficient, and we're going to talk about automation, which is about kind of removing those tasks that are a bit boring or a bit repetitive or whatever that you need to do as a developer. Jumping Rivers that we work for is a data science consultancy in the UK. We work on the whole kind of stack of data science projects, machine learning, shiny applications and things, and also the kind of data engineering and infrastructure side as well. If there's a phrase that encapsulates the ideas behind this talk, it's that data doesn't stand still, or even sit still. Now, over the past two years, we've been working on a working with an application that was originally developed by Dean Attali, and we've been working on this with Reperta past stories group at the World Health Organization. This is a shiny dashboard for presenting COVID-19 vaccination counts. So within the application, you can see things like, you know, the vaccine uptake in specific countries or in specific age groups, or even in like subpopulations like the health worker subpopulation of a given country. As you can probably imagine, the data upon which this app is based is a moving target. So from week to week, there's new vaccines being administered across the world, and that data is gradually feeding into the app as time goes on. So the data in this particular application, so the counts of vaccine doses and stuff originally came from three main databases, one of which Tessie was kind of the main of the bunch that the WHO were involved. We're going to sort of illustrate how a data source might evolve over time in this image here. So the first thing that the simplest changes that on a, you know, maybe on a week by week or a day by day basis, you might get new releases of data. And if you're developing an application that's aiming to present that data, it wants to be in sync with the most recent version of the data. If you're integrating multiple data sets together, you may have a kind of mismatch between when one data set gets updated and when another does and your app has to be able to account for that. Similarly, you can get changes in the actual structure of the data. There may be new columns added as time goes on. There may be a change in the encoding of specific columns or something in your data set as time goes on. And then finally, something we're dealing with at the moment is that as time goes on, you might find that you may prefer to work with one data set over another. So we're changing preferences as time goes on. This is a typical shiny app, and this is the kind of structure of a shiny app that would always be in sync with that data. You have some raw data for a project like this, that would be raw data that's stored online somewhere that you would import into the R process where your application is running and do a bit of processing on to generate some process data and then present that within your dashboard. That would always be in sync because whatever the raw data is, the period of time when a user started their session, that would be the data that gets processed and presented to the world. The only problem here is that transferring data is quite expensive, it can take a long time and processing data can certainly take a long time depending on the source and depending on what you need to do with it. So if you start ending up with multiple users all working with the Apple at the same time, you end up with multiple sessions, processing and downloading the same data and doing kind of, you know, doing the same tasks in a way that's kind of quite inefficient. And so, there are ways to solve that problem. Now, data is slow in a lot of different ways. The transfer of data from one place to another can be quite slow. Processing of data can be quite slow and the transfer in, you know, anyway. So there are lots of places in your source code that you can improve the efficiency of your application by rewriting a function here or there. But sometimes it's better to do a little bit of measurement, a bit of profiling of your code to work out, you know, if you need to speed up an application, spend some time getting information about, you know, how fast it starts up. So this is a tool called Google Lighthouse, which will give you a report on how fast your application starts, how quickly the images arise on the screen and things. And then you can run that within your browser. There's a tool in R called ProfViz, which will do a kind of visual profiling of your source code. So while you're running a shiny app, if you run a shiny app inside this call to this ProfViz function, when you then close that shiny app, you'll get a graph that looks like this, which kind of in a way illustrates how much time the app spent running a particular function. And you can get a further detail of like how long a given function ran for and how much memory it used. So it's important to measure because there's no point optimizing code unless it's going to bring a appreciable improvement to your actual project. That's changes you could make to the source code. There are changes you can make that don't necessarily involve, you know, getting right deep in the weeds of rewriting functions and things like that. An incremental improvement you can make over that initial app that I showed is to, so in that initial app, processing of the data was going on in every user session. But you don't necessarily need to do that. You could actually do the processing in one user session and store whatever the process data is. And then when any other users come in, they will, their sessions will download the process data and you don't necessarily have to go through that process thing. You have to do some stuff to ensure that the process data stays in sync with the raw data, you know, if the raw data becomes newer, then the processing pipeline might run again. So it's a slight improvement, but what that means is that for some users, your app might run really slowly, but for most users, it'll run quite fast. Another issue with data is that these transfer is quite slow. This is just some illustrated data that I made for my own machine a couple of weeks ago. So the speed at which data can move from the hard drive to the processor is about three gigabytes per second, whereas network data, you know, by a kind of internet speed test is about a thousand fold slower than that. So if you can minimize the amount, well, from the previous slide, if you can minimize the amount of processing that you do, if you can minimize the amount of data that you need to transfer across the wires, you will speed up your application. So another incremental improvement that we can do is we can only transfer that data that the user actually needs. So in the previous figure I showed, for every user's session, the raw data is being downloaded in full here, and in some cases it's getting processed and in others it's not. You don't necessarily need all that raw data if you are downloading processed data as well. So there may be a partition of the raw data that's only used to generate that processed data. And if you can identify that, then you don't necessarily need to download that every time a user connects to your app and they can use the processed data instead. So that was a slight improvement that we gained, but still it's not particularly brilliant because still some users will have a poor experience because it takes a long time to run the data processing pipeline, but most users won't. The final change that you can introduce is simply to take the processing outside of a shiny app. So rather than downloading the data that you need, running your data processing inside the app, you can move it to like a scheduled task or something. By on a daily basis you run your data processing against whatever the most recent version of the data is, generate a process dataset that's then uploaded to a data source and then you can whenever a user interacts with your app that data will get downloaded and used. And now there's no data processing going on whatsoever in the application. But the problem we've doing this kind of thing is that there's a kind of integration problem. You've now got data processing going on in, for us we were using something called GitHub actions, which I'll mention in a couple of slides time. So these are running on running in a data center somewhere this process, rather than in the running application. So you have to be able to manage that additional complexity to the architecture of an app. And the way that you can do that is by using kind of automation tools. So automation, I'm going to talk about GitHub actions. These are little scripts that run in your source code repository. So if you, they're quite often used during development cycles. So if you write a bit of new code and maybe make a pull request, if that's how you work, you can have a little workflow that runs on GitHub to check that. The style of your code is okay to check that it runs any kind of automated tests and stuff. And even for a shiny application, you can deploy automatically from GitHub. And I'll show you an example of that and a workflow that can be used for that in a minute. And we used exactly the same process of like running, of kind of defining a workflow script that runs on a specified kind of schedule. Which takes the data and does a bit of processing and then stores it such that the app can then download it and run as efficiently as possible. So yeah, there are other ways of running automated tasks. We chose to use GitHub actions simply because it was simpler because the actions run next to, you know, it looks like they run next to the code. But these are things where you specify when should the workflow run, should it run whenever you add new code, should it run on a daily basis, should it run, should it be possible to kind of manually set off a new workflow run and you can set all the things as well, the type of tasks. For deployment, so deployment is one of these tasks that like, it can be difficult. If you don't have a kind of automated way of doing it, it can lead to some inconsistencies. And if I deployed from our studio on my own machine, I might get slightly different packages installed than you would if you deployed from your machine, even if we were doing it to the same account. And that can lead to inconsistencies in the deployment. So when you're deploying shiny apps, there is actually a way to do that in an automated way from GitHub. So if you need to get some account tokens, if you're if you're working with shiny apps.io like the app that we were working with here, you can get your username and some token for shiny apps and a secret value as well. But these are secret values, you don't want anyone to be able to see these in your source code, so you need to be able to store them in GitHub in a secret manner. And there are, there's a way of storing secret values within GitHub that are only used when these workflows run. So for example, we added your username and a secret and a token for this application. To actually deploy, you can, sorry, a typical workflow script looks like this. You define when the script should run. So this just tells us that it's going to run whenever there's a whenever new code is added to the main branch of the repository. And then we specify a few different steps that run. So these are actually, you might not, you don't really need to know the details of these, but basically it's like you're pulling in the source code, you're installing R, and then you're installing any packages that your app depends upon. And then in the second half of this, we actually do the deployment, which is just to function calls. So you write your code, much like you'd write an R script. So this is a function within the RS Connect package that sets your account info. And then a second function that deploys a given app name to the shiny apps to IO server. So these variables in here secrets dot shiny apps name, they can pull secrets that you've stored in GitHub. And that's a way of kind of automating the process of deploying an application which simplifies your life. So if you're making a lot of changes to a report to a project. It can be easy to forget to do certain steps like, you know, making sure that tests run and deploy stuff unless you've got that stuff automated. So in summary, yes, so we've been working with this COVID-19 app and it has gone through a series of evolutionary changes to kind of improve its user friendliness to improve the speed that the data processing because people are making decisions based on this application so it has to be usable. In summary, it's best to try and simplify your app as much as possible to try as do. If you're writing a data presentation up, try to do as little processing as possible, and to pull as little data as possible into that app because both of the tasks are really can be really slow. There are tools, profiles for analyzing your code and lighthouse can be used if you want to gauge over time whether the changes you're making are leading to improvements in the start up time of your application. And then simplifying your life, you can automate the boring and repetitive stuff and by using these GitHub actions. There is a repository maintained by, I think it's by the RStudio team that has multiple actions for things like linting and testing and stuff like that. A problem with this is that if you move processing and stuff out of your app, then the architecture of your whole project may become more complicated and it becomes a little bit more of an issue to keep it all in your brain. Anyway, so that's my talk. I'd like to thank the WHO and particularly Roberto Pastore for letting us work with this application over the past two years. And, oh yeah, if you want to know a little bit more about the topics we're talking about here, there are a couple of blog posts available on our website. Thank you for listening. I'll take any questions if there are any questions. Thanks a lot, Russ. That was a great talk. There's one question in the chat from Eric. Have you explored other alternatives to GitHub actions for these workflows? You've seen that GitLab offers a similar service, but he's curious if there are any others that you've tried to use? Yeah, I have used a lot of different services for automation. Our firm typically uses GitLab for almost all of our internal work. And yeah, the tools that they provide, though slightly different to GitHub, they're perfectly usable. There are other alternatives that are outside of GitHub, like CircleCI and things. And also you can use things like deposit connect or the cloud as your type services for running processing on a schedule as well. So there are loads of tools out there for doing this stuff. Awesome. Thanks. I don't see any more questions in the chat. So we're still a little bit ahead of schedule, but thank you so much. Russ, that was an awesome presentation. No problem. It was my pleasure. I think that prof is was definitely a nugget. Make a lot of shiny apps and I haven't done that yet so might be scared to see what my code is actually doing. Check that out.