 Hi everyone, our next talk is by Marcus Adams, who is an associate director of engineering at Merck. His interests include effective digital visualization, reproducible research and analysis and convincing his coworkers of the diverse flourishing world beyond Microsoft Excel. Today he's going to be talking about deploying a GPX Shiny application. Thank you. My name is Marcus Adams, so I do work for the pharmaceutical company Merck, not just a lot of people think pharma, you think research, but actually work in our manufacturing division, so we don't just create the research, the drugs, we also make them as well. And I realize there's a few differences between medicine and pharma. Apparently in medicine, it's not uncommon to call conferences meetings, which is confusing to me because that means I attended 15 conferences this week, and that would be a lot. But I think there's plenty of similarities between us, especially when we're talking at our conference in Shiny development. So within the manufacturing division, I actually work in technical operations, digital and data COE, which is of course a lot of words, but I'll explain it in a different way with help from our mascot Rex here. Like our prehistoric friend and many things for him, we were finding that the data was just out of reach for a lot of our engineers and scientists. So day to day, a lot of what we do is try to get it in their hands and make them happy and make them much more productive. The idea, I've heard a lot of talks over the last two days, really is about getting the right data to the right person at the right time. And so in that spirit, I'm going to show you probably the world's most boring Shiny app, but it is an app that is in production. So we have this here, we have this great search box, we're going to paste in our search, could have uploaded from a file if we felt fancy. We're going to lock it and check the syntax, then we click, and there's a lot of waiting here, right? Progress bar, you know, I'd sing, but in the spirit of medicine, you know, first, do no harm. I won't wait patiently. And then once it's done, we click and open up our PDF. It's right there. We have our table of contents. We have some process control charts, some summary statistics, then a histogram. And it's kind of boring, I get it. It's kind of like finding out when your child's going to go to a state school for college. I mean, a little disappointed at first, it's not Ivy League, but you remember what's important, that, you know, got accepted, and it's about to save you a lot of money. And so for us, these are reports that we're having to run quarterly for every manufacturing line that we have in our global manufacturing network. So this saves us tens of thousands of hours, which translates to millions of dollars in productivity. And more importantly, for our engineers and scientists, they can spend those tens of thousands of hours doing something much more productive than running GG plot code themselves. And really, it's what our sites needed. This is exactly what they told us. They don't want to be a part of these fancy machine learning, their term, not mine, until they can actually get off these reporting details. Things like CPP, which is exactly what you just saw. It's called Continuous Process Verification Report. It is a GMP report, which if you're not familiar with that, that means it's a good manufacturing practices that it adheres to. Basically, a lot of regulations that we have to go through to make sure it's, you know, to snuff and it's rigorous. We will put this in front of regulatory agencies. We will make release decisions off of this. And most importantly, these decisions could have the potential to impact patients around the world. So we have to do a lot of our due diligence. And when so we think about having an app that supports all this, it's in production. And it's in production with a lot more around it. It's much more than just that core functionality of creating those automated reports. You know, that's our markdown one-on-one, but there's a lot more to it. And I think about this from the perspective of the classic computer science textbook by Fred Brooks, the Mythical Man Month. And he kind of breaks out software into these four quadrants. You know, in the down lower right-hand corner of programming systems product, that's what we're going to call production. But in the upper left-hand corner, you have a program. That is what he says you crank out in the weekend in your garage. But this is 2021. So no one does that in the garage. I guess the equivalent would probably be, you know, what you crank out in your basement over the weekend with six, two liter bottles of Code Red, right? But if you want to develop that, right, that's a seed of software and a production application. So you move to the right, you're becoming a programming system. You integrate it with databases. You integrate it with Epic, if you're unfortunate, apparently. You integrate it with clinical database and registries. And that's three times as much work. And then if you want to make it a programming product, you start testing your code, you generalize your code, you document your code, that's going to increase your work by 3x as well. And so by the time it gets this production app, it's nine times the amount of work it took you to create just that kernel of what your app is based around. And that means, I'm sure we all can do the math here, only one ninth of your app is the fancy recurrent neural network. The other 89% is what it takes to create a reliable, secure, and maintainable production app. And for me, I don't come from a computer application development computer science background. I'm a chemical engineer by training. And so really thinking about it in this way is something that was unfamiliar until I joined this project and we went down this road. And I take great solace in the fact that I'm not alone in that. A Birchwork study in 2019 found that only about 21% of data scientists come from a computer science background. The rest of us come from statistics, business, engineering, natural sciences, medicine. And so for me, I really had to think about changing an app into a production app. You had to really change the way I thought about it. There was so much more and expand my thinking. I didn't really think about production considerations before. I just thought about a lot about features and what it could do. And so that's what I want to share for you with you the rest of today is how I had to approach it and change my thinking to get to that production app. And so part of that is just understanding this concept of production. And what is production? And I think Joe Cheng, the CTO of our studio, did it really well in a 2019 keynote. I said production is software environments that are used and relied on by real users with real consequences if things go wrong. This is not your proof of concept. This is not your prototype. This is not your sandbox where, oh, it's a nice to have. That's really cool. People build their work day around this. They come to rely on it day in and day out. It has real outcomes in the real world. And so to be able to do that, it's much more than your code. It's a production environment. It's not just your code in there alone. And I know there's a lot there, but it can be done. It's been done before in many different ways. I'm just one of many examples. And so this idea of production environment, of course, implies that there may be other environments. And you may have more or less. You may have a QC. You may just have a test in prod. The idea is you separate your environments and based on your requirements. And as you move from your left to right in this development to production environments, become more stable, becomes more tested, becomes more controlled. I almost think about it now is there's not really production apps. There's only apps in a production environment. And it evolves, and it becomes much more rigorous. And it's really the requirements you need to satisfy to get into each of these environments and put something out there. And you have to define those requirements. I can't do that for you. There's a lot of talks, great talks out there, workshops out there that talk about a lot of how to scale it up. But there's a lot of other requirements. And for your production environment, maybe you have minimal requirements and maybe have low expectations like eating American chocolate. Or maybe you're putting out something that's for a regulatory app like us. So you have to have things like audit trails. You have to have timeouts. You have to have strict change control. It's something you have to define. And one thing I've learned is your production is not going to be the same as Merck's production. My production at Merck's not even going to be the same as other apps at Merck. And I'm going to venture that nobody in the audience right now is going to have the same as a Google production. I mean, I guess somebody out there works for Verily, they work at their health, maybe. But I didn't have a million concurrent users. It's great that Shiny has been shown to scale up to 10,000 concurrent users. But I might have 100 users. And I guarantee you, they're not all concurrent. It's very different when you're writing an application for internal-facing users versus out-to-the-public. You have a much controlled environment. And so that's where you have to move away from these abstract requirements into the specifics. There's a lot of debate about can Shiny support production, can Nora support production? And you're never going to win that argument to get to the specifics. And then you can start addressing those specific concerns and specific requirements you need to satisfy to move into a production environment. And to give you an example of one of the challenges we faced. So we had our three environments. We had three service accounts. We had three databases and three passwords. So depending on where that app was, it had to access different data sets and behave a little bit differently. But our global enterprise security requirements say that we can't store passwords unencrypted and paying plain tax at rest. And I'll say that credentials and secret management, deep topic. I'm not an expert on it. To quote someone much smarter than I am, it's turtles all the way down. But I'll say this is how we addressed it for us in our specific environment. Now when Shiny makes a request, so when your user goes to that URL, you have that subdomain. So for us, it was something like cpv-dev.mark.com, cpv-test.mark.com. And so based on that subdomain, we can actually determine what environment we should be operating in. And this is great if you're behind a load balancer because you don't have to realize what that server is actually environment serving. It comes from the URL. And so from that, we use the config package. And we can actually retrieve the appropriate account ID, the appropriate database, and the appropriate secret name. And with that, we can go to the vault we created using the secrets package, use the server's private key to unencrypt that password, and then go retrieve the data. And because there's this asymmetric cryptography, the public key cryptography, we can actually encrypt it, commit it to our Git version control. And then with a pull request, we can actually then trigger that as a deployment and rotate our passwords very easily. And you can actually this lease privileges. Less people have to know what that password is. We're not sending it around in iIn, we're not sending it around in emails. It's just done through that pull request. And so there's a lot of requirements, and this is just one of them. And so you'll find that as you build out these requirements, you're going to need help. This was different to the team that I worked with, right? There's 9x to work. You're not going to probably do it alone. You know, just here, I've listened to these talks, a lot of clever people out there, and Brian Kerrigan, he says, you know, he wrote the first book on C programming language. He probably knows something, and he listened to his advice. Probably a lot of people out there are going to need help debugging their code if nothing else. You know, so you're going to do at least one other person. You know, for me in my past experience, I've been doing a lot of this app development just kind of on my own, solo. But all of a sudden when we're working on this CPD production app, we had an entire team under this umbrella of working on this app. I couldn't even put this presentation together by myself. I had to go out and talk to different people to pull some screenshots and data and everything like that. It is truly a team effort. And your team may change in size and hopefully giving COVID, you know, you're not all standing this close underneath the umbrella. But for us, when we did this project starting pre-COVID, so we didn't have social distancing. But the idea is you're going to need help. Most of them aren't even going to be developers. You have people do user acceptance testing. You're going to talk to your domain experts. You're going to have your end users. They're all part of this quality checks. They're going to be part of this bigger team. And so even just that small piece of the developers, we've hired about six of us for that first pass, really have to change your coding habits. And I like to think that I'm a pretty decent coder even for future markets, but really want to think about cleaning up your code and meet a little bit more formal with it. It's kind of like when you return to the office after being away and working remote, you're going to clean up. You're going to put on real pants. You know, if you're really ambitious, you're going to shave. So you just kind of have to remember that golden rule of coding. And that's, you know, there's a lot to that. But to talk about three of them, I'm just going to say, first of all, have get. You have to have some kind of version control. That is a non-starter. Because if you're going to go demo your app, I guarantee you five minutes before that somebody is going to save to the file, and they are going to break your demo. And part of that is also having this get flow that doesn't really get talked a lot about. You might seem get hub if you go out to our packages, the maintainers. I mean, you'll see the get, but having some kind of get flow where for us, we have a master branch. That's what's getting deployed. Our development is kind of this, what we're testing. And when we're developing the features, they may not work. We use get flow. There's others out there. You've got to figure out what works for you, but have something so you can control how your program comes together. Document, I think this is something we all know we should do, but even doubly so for production. Our oxygen makes it really easy to do this. You add comments right by your functions. It'll generate the markdown. It's right there. Tests are also your documentation. They're also great for making sure your code works as you make changes. There is unit testing, such as test that and tiny test. But there's also things like UI testing, shiny tests. There's load testing, shiny load testing, and user acceptance testing. And this is all to make sure as you're making changes and you're iterating your application, you're not breaking your previous work. And I can tell you this has been essential for us to make updates and improvements to the application. And then lastly, because you're working on the team, you really have to divide and conquer your code. There's a lot there. And R is a functional programming language. And now a lot of people coming from object oriented programming may think that's weird and strange. It actually lends itself out very well to this division of labor. You create your packages and your modules. And you can send it off and kind of create these modules and not just one big monolithic 60,000 line script file that says run. And that for us was very critical to our architecture, pushed that very heavily. At the top level, we have our application. It's on the left here. And so below that, we had a bunch of packages, some of the normal suspects, deep flyer, et cetera. But we also created four internal packages. We had our Mantis DBC that connected to our data lake. Mantis is our data lake for the manufacturing division. We had our CPV reporter that kind of did a lot of the R markdown like in piling. We had the accelerator, which exported the raw data, so people would do other analysis. And then PPXQC was kind of our workhorse. It did a lot of the statistical process controls. It did the charting. It did the run rules for us. And so because it's all modular, you can just, when it breaks, it's one point to fix it. And I've moved off this project mostly as much as anyone can move off a project and still stay at the same company. I've been told by our now business owners and our technical owners for this application that this has been key for moving when we started with version 1.0, we're now getting ready to release 4.0. And the beauty of this on even further than that is that it's actually put in this effort. You can actually reuse it. And so this Mantis DBC package connecting to our data lake, we actually distributed that out to hundreds of our users at Merck. And so now they don't have to know things about JDBC and connection strings. And it makes it a lot easier. And so, you know, might not have the much savings, but I can tell you that probably just as many users for the app as it uses this package as well. And so this idea of modular modules and packages really hit on my last point here, and that this idea that you're going to build it up, it's not one and done. And you can iterate on this as well. So this idea that you're going to put out your application on a DVD and have civilization to and you're going to ship it and be done with it, that's an equated model. Really, we move toward this software as a subscription model, right? Your apps get updated. I think there was a presentation yesterday, talked about the wonders of web applications. It's always up to date because you can make updates. And you have a chance to add to it later. What you really have to do is think about it as a life cycle. You know, a lot of people get to this plan through deploy and they stop and they break the chain, but it has to go out in the wild. And, you know, as a life cycle, that also means some point you have to think about retiring, but really remember those links in the chain with that operation. You can't just toss it over the fence. Somebody has to maintain it. Somebody has to do the hot fixes. There's a piece of the product management of it. People have to, users have to be trained on it. They have to understand the documentation. You have to advertise. In a large organization like Merck, you can't take it for granted, people just know it's there. And then as part of the cycle for your next planning cycle, you have to collect feedback. You have solicit feedback. And by, you know, not having all the features out at once, you can build on your success. And I can tell you that, you know, you build something small and useful, you'll get a lot more funding, you'll get more sponsorship to do the next generation. And you'll leave your users wanting more and that will get them engaged because you'll be surprised how quickly everything becomes normal. They'll figure out, oh, this is the status quo and they want more. And to that point, I feel almost guilty showing that demo earlier because it really doesn't give justice to this current state. We started out with this version 1.0 with basic input. And we are ruthlessly prioritizing these features. So we got done with 1.0. We already had features that are coming out in the next two releases. And so in version 2, we brought in some people who are much better at UI design than us. We added this live preview, this data verification report. 2.1, we brought in in-app help as some statistical transforms from customization on precision. 3.0, we actually moved our entire backend architecture. Just our data lake in general, I mentioned Mantis earlier, that switched architectures as well as we moved from Shiny Server Pro to RStudio Connect. And we're very close to releasing 4.0 now. We've added data sources, new graphical user interface. We've been pulling qualitative data, not just the quantitative data. We actually have an in-app query builder. So really looking back at that demo, you remember I pasted a search, you can actually build that interactively. And this didn't all happen all at once. It happened in chunks. It involved over three years. It didn't have just one big splash. And so kind of look back and we've come a long way. It's really great and rewarding to look back. And I know this isn't Neverland, but you take a view from atop a mountain. It's great view, but it takes some effort to get up there, especially this is Mount Katatn at the end of the Appalachian Trail. It takes work, very analogous production app. It takes effort, took effort to learn things I didn't know. It took effort to do a lot of this work behind the scenes. It's not that core functionality all the users see. It took effort and I think we all know this to work with others. I'm not saying my colleagues are difficult. It's just there's a lot of coordination there. And I'll say from experience, all this effort, it's worth it. To go back to that definition of production, it's worth it to see others use and rely on what you've created. And with that, I'll just leave you with a quote from one of my favorite fictional doctors who said, there are no magical fixes because nothing in this world that's worth having comes easy. And putting in production app, I'm not going to lie, it's not necessarily easy, but it is definitely worth it. So thank you. Great. Thanks, Marcus.