 Hello everyone, my name is Andreas Oteriades and I'm a data scientist in the NHS, the National Health Service in England in the United Kingdom. I'm employed by Nottinghamshire Health Care NHS Foundation Trust. I'm extremely excited to be here with you today. I am very keen to share my experience with scaling up shiny and text mining applications for a big decision making in the National Health Service. Without further ado, let's kick off. So why do we do what we do? Well, it's a very simple problem. I mean, it's simple in nature but quite complicated to solve. So we get feedback, thousands and thousands and thousands of records of feedback from patients. So we need to find fast, efficient and clever ways of summarizing what it's all about, what makes people angry or sad or happy, what is it that they mostly, they're mostly concerned with, and how we can actually translate all this information. How can we surface the key information first and then translate it into something that decision makers can use to prioritize improvements in the health service. Part of this also has to do with labeling. So we want to know what people talk about and we actually label the data. We read the whole data set. We read all the records, the patient feedback and we label it. This is something that not all trust do. Whether you do it or not, it's a massively time-consuming process. So we want to introduce the clever stuff in it, like text classification to at least semi-automate the process and speed things up. At the same time, we want to do this general text mining approach in our work to exactly for the reason I mentioned before, to surface sentiments, most commonly mentioned topics and everything else that they are hiding in the text and to illuminate always using bar plots, network diagrams, word clouds or whatever else we could use in order to translate those complex findings into something that the non-specialist, the non-data science specialist manager can use to improve services within the NHS. And I think quite importantly, I need to mention that we're doing this not just for our trust, but for the whole of the NHS and beyond. And so we're creating a free, we're developing a free open source license solution. And it's very important, the point about licensing is pretty important because this open source software gives users the freedom to run, copy, distribute, study, change and improve. So that said, just putting the code on GitHub without a license does not make it open source and it does not give users the right to modify or even run your code. The copyright usually is the person who wrote the code or the organization they work for. So it's pretty important to release whatever tool you want to release with the appropriate license so that then everybody can use it or they can make improvements and give it back to the community. Also, because we're not doing this just for our trust, it's very important to mention that we also have user groups that we work with and we have a test and rollout trust who try our solution and they say, we like this, we like that, we would like this, we would like that, we don't like this. So we get this extremely helpful feedback to help us make something that people will actually use in practice. Time to talk about scaling up and let's begin the conversation with Golem. So how did the shiny, how did the shiny app look before Golem or if you don't use Golem? So let's say that the more professional and more, a bigger and more complicated shiny app would look enormous. You have an enormous server file with thousands of lines of code including everything in there from data calculation and manipulation and creation of new variables or whatever to passing this into ggplots and then the further down the line doing your reactive stuff. So you would be actually mixing the more static elements of manipulating data sets with the reactive elements that are dictated by what the user wants, like for example a user that would like to see, to filter a table by a number of particular lines or whatever. So this would, this makes shiny extremely difficult to debug because you cannot really follow what you're mixing reactive and other reactive elements. The only way to test what's going on is to take the whole thing outside the reactive context. While all of it is in there, it makes it difficult to detect where the problems are. For this reason it would be, it's also very difficult to test. How do you test, how do you put the different tests in your code to make sure that the user is for example inputting the right format of the variable and things like that. And also how can you collaborate when you have this massive tailor-made script that probably makes no sense whatsoever to anybody else. It probably wouldn't even make sense to you yourselves if you left it for a while for a few weeks and then you went back to it to keep developing it, probably very difficult to catch up. So you cannot even collaborate with yourself in this sense. Very difficult to deploy. There are so many different ways of deploying stuff. So I don't think shiny is particularly strong in helping you, making it easy for you to deploy. And also you cannot really use it as a template, as a framework where you can actually pass different datasets and produce the same thing, but for a very different dataset. And I will come back to this later. So a shiny application ideally should be modular. Now modules are not unique to Golem. So now I will start talking about Golem more actually. But the thing with modules is that although they're not unique to Golem, Golem makes it very easy to use them. It builds the module structure for you. It names the modules automatically and the names are consistent with the R file names that contain them. There are in-code instructions, what to put where this module goes here in the server, that module, the UI part of it goes there on the UI. So you can actually see the framework itself has instructions of what to put where. And I would say that it saves you from the trouble of breaking the whole thing because you've mistakenly added a line space in the skeleton, which is shiny is notorious for this. A teeny tiny bank, a teeny tiny, let's say like a blank line can break the whole thing and you can spend hours and hours and hours trying to fix it. And it's just extremely frustrating. So the thing with Golem is that one can run consistent Golem code from day one because of this nice way that the modules are structured. Now I think that the Shiny app should also be strict as to where the business logic is or isn't. It shouldn't be where the reactive elements are. It should be independent. It should be in a utility, say for example, script where the tailor-made functions or it should be in another package that has functions that do what you want to do. So you call them, you call the functions, you do your calculations, and then you pass this into your reactive environment. So the Shiny app should also be documented for that reason, but in terms of functions, not the app itself. I mean, the apps themselves are pretty much self-explanatory. They are apps, they're interactive. But I'm talking about the functions. When your business logic is nicely structured as a family of functions, you can also very nicely document it, you can test those functions, you can use tests like, you know, test that or it will be, it's much easier to debug anyway because when an error occurs or your Shiny application, you know where the error occurs, you know what the function is responsible for this error is, and you can go back and fix it. It also makes more shareable that it's because, you know, everything is actually so nice and organized way that it's easy for people or at least much easier for people to catch up with your work, understand what you're up to. And ideally it should be agnostic to deployment, which is something that Golem is extremely strong at. It has built-in functions for deploying on RStudio Connect, Shiny apps, Shiny server and Docker, but also it uses a YAML as a control panel where you can deploy in different locations on the server for different tasks in our case. So your YAML, to give you a very simple example, consider that you want to create, I don't know, summary statistics tables for three different datasets, the Iris dataset, the Penguins dataset and empty cars. So you want, for the Iris dataset, you want to summarize, I don't know, SIPA length for each species, for Penguins, Flipper length for each species of Penguins, and for empty cars, you want to summarize, I don't know, the weight for for different numbers of years in the cars. So in your YAML, you define a group variable, and the variable you're going to run the statistic on. So then you have a nice framework, you have a Golem that can work with all three datasets. So talking about, before we're seeing that we would like to make the solution available to everybody, to all trusts, well that's where the YAML is very strong at, because we can use different datasets to do the same thing. And we can also use these advantages of Golem when it comes to deployment, to deploy in different locations of the server for different trusts at the same time, so, you know, we can host it for them. Well, I've spoken about Golem already, but just to give you more of the formal definition from the website, it's an opinionated framework for building production, great shiny applications. But what's more important here, and that's why I have these three lines of code here for you, is that all shiny applications are R packages, which obviously makes it easy to test, manage, dependencies and deploy, and you can just download the package. In this case, this is our dashboard, the one that we do sentiment analysis and text classification for our trusts. When you download it, you install it, and then you run the app, and that's it. Beautifully, it shows up on your computer, you know, it runs with whatever dataset you want to feed it with. Okay, enough with Golem for today. I think I was pretty enthusiastic about it, and I'm equally enthusiastic about reticulate. So just to give you a brief background of why reticulate is important and why we're so excited about it in this team, is that in R, to begin with, in the NHS, much of our stuff is R-oriented. But when it comes to machine learning, there is demonstrably a lot of advantages of using Python over R. It's particularly well suited for deploying machine learning at large scales, actually, and generally tends to be much faster than R. So our approach is, okay, we are R-oriented, but Python can offer something much better than R, especially in text classification. So let's make a Python library to at least make, you know, our solution available to Python people, you know, it's open source license, etc., as I mentioned before, but also let's make an R wrapper, PX text miner that makes the pipeline available to R people as well. Now, I'm not going to spend any time at all on this slide, I just put it here because I thought it would be nice for you to go back at your own time and read the summary of some key differences between state-of-the-art models, machine learning models, sorry, machine learning packages in R and Python, and why for, I think, you know, where I think each wins and loses. If you ask me, I think I could learn wins, but yeah, take a look at your own time, I think it's good, it's a good reference. Going back to, so, I'm not going to spend as much time on the reticulate as I did with Golem, I think things are pretty much straightforward with reticulate. So I think it's mostly, for me, it's mostly important to show you, to give you an example of how the magic happens, because it's really, it opens up so many possibilities for R, you know, being able to access Python in the background and to stuff with Python. So I've summarized the whole process in three basic steps. On the left-hand side, you see there's a definition of this so-called onload function that many packages use, where you import your Python modules and you have them there waiting to be called by a function, in this case, the function on the right-hand side. So you load, as you can see in step number one, what I'm doing there is that I'm telling Python, sorry, I'm telling R to import the pipeline factory from PXX Mining, which is the one that fits and builds and fits the pipeline and have it there waiting until it's called. Now on the right-hand side, I'm creating the R version of the factory pipeline. So it's factory pipeline underscore R, which has all the arguments that the Python factory pipeline has. Reticulate takes care of all the conversions by itself. So for example, here theme equals null, it will be converted to none by reticulate. The character vector, the learners argument will be converted to a list and so on and so forth. So the only thing we need to do is in step two to just call the factory pipeline from the onload pipeline, which we have called in the onload method on the left. Once we do that, the only thing we need to do is to run it. It's an R function now, so we pass the arguments to it in step three and we run the pipeline. I mean, that's it. It's magically simple. It's incredible. Tech home messages. To begin with, I think we have a fantastic toolkit which we can use to scale up and make nationwide impact. If you ask me, I think that the goal and reticulate are going to be game changers. With this tool, we can fulfill our vision, which is to do data science for the betterment of public services, not just for us, but for the whole of the NHS. So far, we've proved the concept to ourselves to begin with that we can actually do this to our founder NHS England, who are pretty excited about what we do, to our partner and rollout trust whose feedback has been extremely constructive and we are very much obliged. And what's next? Well, we want to make nationwide impact. We want to engage with as many trusts as possible, but we also want to spread the word in blogs and conferences like the current one, where there is a lot of clever data scientists who could take our solution, fork it on GitHub, make something even nicer, more efficient, more clever, put it back and make it available for everyone, for everybody in the NHS. And also, we want to do some deep learning stuff. We have text, we have a new switched on data science intern, Oluwasegun Michael Apeyoye, who has been very keen to try zero shot classification and everything else that deep learning is about, so the future is bright. Thank you so much for listening.