 Hi everyone, and welcome to the R-adoption series. Thanks everyone for attending today. I'll get started. I think there might be a few more people trickling in as I talk, but we've got a lot to cover today, so I don't wanna waste too much time. So before getting into the topic of today, I thought I'd talk a little bit about the R-adoption series itself. So hopefully you've all seen the main website, by the way, which is where you would have signed up, which is on rconsortium.org. Someone could add that into chat, where you can see the link to the latest webinar and also recordings of the previous webinars as well. So there's already been two already. That's R-training strategies at Janssen and scaling R at GSK. So the R-adoption series is sponsored by the R-consortium. So just a bit about the R-consortium, their mission is to support the R-community and our foundation to develop infrastructure to ensure long-term stability of the R ecosystem. They do that in two main ways, which is financial support for infrastructure projects and working groups to enable industry collaboration. This is one of them. So the vision is an R ecosystem advancing the 21st century computational statistics and data science. So in terms of the R-adoption series, the scope is kind of aimed at those leading the R-adoption initiative. So it's a cross-farm initiative looking at individuals and groups across different farmer companies working on adopting R. But it's really open to all to present and to view. The focus is very much on how to. So we're trying to look into how we've done it and what we've done so far. So we get some real insight into what different companies are doing. And hopefully we can have a really productive conversation that allows us to get ideas from one another. The typical format is a presentation plus a focused discussion, either a panel or a breakout. The next talk will actually be pretty soon. These normally have been spaced about two months apart but we've squeezed another one in before the end of the year. So that'll be on the December the 10th. And the title is Speaking Different Languages, Clinical Statistical Modeling in a World with Choice. That's gonna be Mike O'Rimla and Mike Stackhouse. And I'll be talking about this group CSR MLW which is a collaborative working group between Fuse and the Arkansas team looking at the statistical methods and how they vary between different quantities. There'll be details of that on that website. I just showed you on the Arkansas team which will be going up pretty soon. I'm not quite sure the date but pretty soon after this webinar is finished. So one note before I get into the talk and introduce ourselves. So we are gonna have a breakout today but because of how Zoom works, right now we're in a webinar format where you've got Q&A bus and then chat. Only panellists can talk which means breakouts don't work. So at the end of this session where we're gonna be presenting to you, we're going to switch to a different Zoom link and there we'll split into the breakout. So what you're going to need to do is follow this link. Don't do it right now but if you wanna copy that link into a document just in case you lose it and you'll need that password as well. I'll be repeating all this at the end of the talk and we will share it in chat as well but I just wanted to highlight this right at the start in case you sort of days off and you suddenly go, oh, where's everyone gone? So we will be switching to a different room at the end. Okay, so getting into the topic for today we're gonna be talking today about our package management at rush. So just to introduce ourselves there's three of us talking today. My name's Kiran Martin. I've been at Rush for six years working out the Well-In Office in the UK. I'm now the R-Enablement Lead and focused on the adoption of R within my department which is PD Data Sciences. Tad, if you want to introduce yourself. Yes, of course. My name is Tadurs Longoski. I'm at Rush for 13 years in industry, 20 years and my role is pan-pharma collaboration product lead also the same department as Kiran. Tad? Yes, so my name is Adrian Rodil. I'm at Rush around five and a half years. I've started, we've talked about the NERS project around three years ago and currently I'm the chief engineer of the NERS project. Jeff? Okay, so our high level agenda for today we're going to start off by giving you a view of how we manage packages are on sort of a wider level. So just in terms of what we used to do, what we're currently doing and what we want to be doing in the future. And thinking about as we discuss this kind of what the needs of individual users are particularly in a regulatory environment when we think about the complexities added by managing our packages in that context. And through that, we're also going to discuss internal packages and how we might want to manage those. And that's going to lead us into a case study on which Adrian's already mentioned, which is the Nest team, which is sort of a suite of packages that Tad and Adrian will talk more about. And that will kind of lead into the next steps in developing and enabling co-creation and collaboration and just some thoughts and ideas on how you can build these sort of software engineering teams around our packages. And as I mentioned earlier, we're going to finish with some breakout sessions with three different groups on three topics which I'll mention later. So what I've put here is a very simplified view of our journey so far. And as you'll notice, I don't think I've done it too much in these slides but I am a bit addicted to making puns on the letter R. I think it's one of the best things about the R languages. It's very easy to make puns on. So you'll see it's R journey there. And what I've got here is three rectangles which kind of represent the past, present and future. So while R has been around at Rush for quite a long while, it was often in a more ad hoc way. Like there would be people would have a laptop version of RStudio where they do a bit of analysis on or they'd install it on a server. The first effort towards getting R adopted on a wide scale was probably in this rectangle on the left here where we built a centralized RStudio server aimed at exploratory use. And that was primarily aimed at things like Hashini, biomarker explorations, anything's kind of outside the GXP context. So people could use R to start doing analysis but the intention was to kind of keep this out of the GXP world. When it comes to R packages, one key part of this was that this was not just a centralized RStudio server that everyone went on to. It also had a shared collection of global packages. So whenever you went on to the server and logged on, you would have access to the same suite of packages that everyone else would do. But because it was an exploratory environment, we also gave users a lot of flexibility to install any additional packages in their home areas as well. So focusing on that for a second, that sort of leaves me into talking about reproducibility. So this is the kind of environment that the exploratory world looks like and also the validated one as well. We have a Unix server with R on top of it, which is reading data from some storage portal and the code is then managed via GitHub. Then the outputs are then sent to either R Shiny or down to some very reporting process, which we're not gonna worry about too much today. One advantage of having a global package suite is it allows consistency across projects. I think one of the biggest challenges with our package management, particularly when you have a user base who are used to using SAS, is that there can be so much variety and difference between different instances of someone's use of R. Ultimately, if one person is using SAS and another is, it's likely that code I write in one place will probably work in another because there might be a difference in the version of SAS you're using, but it's less likely that you're calling on lots of different libraries. Well, anyone who's been using R for a while knows that anyone can install any number of libraries of any kind of different versions. And as soon as you start doing that, you end up with environments that are quite hard to replicate. So our first step towards sort of getting around this was having this global suite of libraries which were fixed and unchanged. And that makes projects much more reproducible because now it's much more likely that I'm gonna come onto your project. If I grab your code, it's gonna work for me because I'm using the same shared suite of R packages. It also makes publishing to Shiny a lot easier because the Shiny server had connection to those global packages. So provided you were just using those, there was a good chance that your Shiny app which you'd made locally would also run on a server. However, there are disadvantages to this approach of this kind of shared global packages. One is that the central package repository needs to be really big. If you're particularly in an exploratory setting, if you're covering loads of different use cases, that ends up being a pretty large collection of packages which is not insignificant and it does need maintenance as well. Also, if you want reproducibility, you've got to pin versions. So what I mean by that is typically library parsing for a particular version of R, you can only really point at one global library. So if you want a new version of Deployer, you're not gonna get it in that version of R. So the only way really to release new packages to users in this kind of model is you have to make an entire new version of R in which they can work because the old version is pointing at an older version of Deployer. And the last one which is maybe obvious is that users are gonna break reproducibility over time. So one thing that's I think definitely true with our packages, particularly with a wide and diverse user base is that there's really varying knowledge levels of exactly how our libraries actually work and what the impacts of installing a library actually are. I think even for someone who's been using R for a while, they might not necessarily know what exactly happens when you install a new version of a library. They won't know that when you do install.packages, you're gonna get the latest version of cram. And you're also gonna get all the dependencies if you don't control for that as well, particularly if you're using a function like install.github which goes ahead and installs the latest dependencies for you. And then they're not even going to necessarily know that different R versions will be calling on different packages as well. So this is, you can manage some of this with user education by let them being more aware of this, but it can be a challenging problem. So I guess our first step towards resolving this, which is kind of where we are presently was when we moved to allowing GXP work being done in R. So for this, we basically took that first step, the centralized RStudio server and created a validated version of it. So in many ways a setup was quite similar to that diagram I showed earlier where we had a Unix server with R with a shared collection of packages. The main difference was these packages were validated for GXP use. And one additional move we took here was that we made it more difficult to install additional packages. So unlike in the exploratory world where you could basically install any additional packages you wanted and break reproducibility, we set it up so that if you wanted to add additional packages, you had to be aware of what you were doing and understand that those packages wouldn't be validated. So that kind of leads into something I wanna talk about about education and enforcement. So when it comes to following and setting up processes for any kind of live system when it comes to like package management, I think, and I've certainly been in organizations that have followed this, there's a tendency to building too many controls on the user, make it very, very, very hard to deviate from the process or almost impossible. And there's a couple of problems I see with that. The first is that if you restrict people too much, you're almost always gonna find exceptions to the rule. You can say, oh, everyone needs this fixed set of packages and they can never use anything else. And you will immediately get three people who have a really pressing need to deviate out of that. And if you set up your system in such a way that it's impossible for them to get out to do that, you're making it much harder for everyone to do their work and you're making a system no one really wants to use. And that leads to another problem is that if you restrict too hard, users are just gonna find ways around you. The classic example is that if you make your RStudio server really horrible for everyone to use, they'll just download RStudio and work there. And then suddenly all of the effort you've done towards reproducibility and GXP is completely lost. And I'd like to say that this won't happen that much, but users absolutely will do this. They'll find ways to get around what you do. So our goal when building this was to build a process which could be followed and was fairly motivated. So we did make it so it was harder to install packages. That is when you typed in still.packages, it wouldn't work, but we built processes and guidance so that if you needed to do that, which users often needed to, they would be able to. So we didn't completely stop this, but we made it so that if they did it, they'd do it in a much more controlled and a much more reproducible way. So I guess one thing to mention here is around the process of validation of packages as well. So one obvious thing is that validation has a cost. It's at the very least it's a cost in terms of time to validate any particular R package. And particularly depending on the process you followed, that could be quite an expensive cost. And because of that, you can't include every single package. It's just not really possible to do. And users always kind of want more packages. This again is another one that's a bit tricky to balance because there's definitely from experience had people asking for R packages, they didn't really need, they could utterly replicate the functionality in a different R package. But there's always gonna be cases where there's some statistical methods that only exists in a particular R package. And when that's the case, there's really nothing the user can do other than use a R package or wholeheartedly just grab the code from the source code as well. The other thing I want to mention around validation as well is you kind of have to be careful when you talk about validating R packages because you really need to validate packages in a context egg system. That is a package has lots of dependencies and reverse dependencies. It depends on a bunch of R packages. It depends on the version of all you're using. It depends on the operating system and the underlying installations of C for instance, that are running on that system. So you really can only validate packages in a context and it's important for users to understand that that validating a package just completely on its own probably isn't going to be sufficient for most purposes. So the next thing I want to talk about is just internal packages. So packages get developed for lots of different purposes internally. And while I think I and Russian general are moving more towards open sourcing more of our packages, I think there's always going to be a place for a package that is purely internal to the business simply because there are packages where it doesn't really make much sense to open source it. So one example is API access. We have lots of internal APIs for things like GitHub or databases or metadata, all sorts of things where that's an internal system that no one else in the world wants to access other than people at Rush. But it can be very useful to develop in our package to make that access easier. And we certainly have lots of examples of that. And interestingly, those packages, if we want to use them in a GXP context can be a little hard to validate from an external user because the external user doesn't have access to our APIs. So they can't really test our package very thoroughly. And there's obviously other internal uses. I mean, one obvious one is tools for internal clinical trials and kind of the NEST team is going to actually cover both this and statistical methods to some extent, tools specific to the kind of reporting work we're doing as well. So one challenge we're going to face, and this is actually true of external packages as well but definitely internal packages is that developers have a variety of backgrounds. They may not be familiar with many software engineering principles or any software engineering principles at all when they develop the R packages. If we're lucky, they may have read something like Hadley Wickham's book on R packages. So they have some kind of basis for what might be good principles. But even then that's not guaranteed. There was definitely, I know of statisticians who develop R packages who have been coding in R well before those kind of resources before ROXigen existed. So they haven't necessarily followed the best practices when building those packages. So one thing we have to think about is how do we encourage these kind of best practices and how do we get people up to speed into what we think our packages should ideally look like? So I guess there's sort of three levels of quality control and there's probably more than this but this kind of leads us into the future state as well. The minimal one we could do and this can be useful in a context where you maybe don't need full GXP validation, you just want a basic check of quality is just that it passes CRAN checks. That's a really useful thing to get people to do in general. CRAN checks are not that stringent but they do check for some basic code hygiene in your package and they can cap some bad practices at least if you have any tests the CRAN checks will check if those tests actually run successfully. The heavyweight approach and that's kind of what we took when we built this validated server is we can externally validate, we can pay external companies to help us validate these R packages. However, there is a new approach that we are hoping to move to really soon which is basically building automated checking of quality measures. So we wanna do something a bit more robust than just doing CRAN checks but we also don't wanna go full heavyweight of adding our own additional tests. So instead what we wanna build in fact, what we have built is a series of automated tests that look at the code quality. And if you're interested in this, there was actually a talk in our pharma by Colleen Zavallas all about auto-validate R and all the tooling we're building to be able to basically move our validation into an internal setting. So that takes me to my last slide before, well, penultimate slide before we move on to the NET project in detail is just thinking about the future as well. So when it comes to our packages in particular but also the environment, we wanna move away from this centralized server which is sort of quite restrained in how it works and it's kind of has to be one size fits all and move to a situation where we take advantage of dockerized technologies to build images of validation and exploratory. And that gives us a lot more flexibility to move quickly and to switch from one environment to another more flexibly. The idea is that images will then contain perhaps a small selection of packages and then we'll have an internal package repository for installation and make use of tools like our Rn to manage packages. And that way we get this full suite of reproducibility because we have the operating system controlled by the image and then the packages controlled with package manager plus Rn. And that will allow us to be pretty reproducible while also giving the users a lot of flexibility for how they use these systems. So just a word on package manager. This is something we're moving to rather than something we're actively using. So rather than having this sort of shared already installed packages, instead we have a global repository with managed snapshots of our packages. Giving users flexibility to determine what snapshots they're gonna make use of which allows more flexibility and protects reproducibility. And it's really easy to integrate internal packages to this. We can just add them as images to the package manager. The tool we are choosing to, what we think we're using is RStudio package manager, which we think will be quite a flexible and useful tool. This is really a view of the future though. So I'm not gonna go into too much detail here today. So yeah, I will now pass over to Tad to talk. Thank you, Kiran. Thank you, Kiran. And yeah, the next one we want to actually, as a part of this, how we manage our packages really focus on one of our project, internal project called Nest and wanted here to really take all of the people that were with our journey and this project really took quite a substantial team right now. And this is really fantastic and we hope we'll show you first a little bit history and what was about. And next slide, please. So first of all, I wanted to talk about a little bit that the Nest project from the beginning on had in our mind that we wanted to really, everything we were doing in this project, we wanted to challenge the state to school in study report, study results delivery internally but also externally. This is really important aspect and I hope we'll touch this also today. And we believe about this really differently because the way we challenging the status quo is by really by careful design of the steel, shiny framework which is easy to use and user friendly. And what is quite unique, it connects the exploratory analysis capability where this is where we start and also regulatory. And this is what we will show you today. And the most important that from the beginning on we were aiming to automatically, we were applying the CI CD, also agile methodology. And at the end we have the various teams and the products and all together it consists like a 20 to our packages with extensive documentation automation logic. So a little bit details next slide please. So a little bit history, which I would tell you here. In fact, the project when it was founded it was around three years, but as a proof of concept using shiny in exploration of study results started around 2016, maybe a little bit earlier. Whereas the TEAL analysis, it started with the project at the beginning of 2017 where we were thinking about how we can help our scientists to better understand the date. And this is how the first TEAL app started for our translation medicine scientist. And we then follow up with the next use case where we build the customized app for the RNA-seq analysis. I think if I remember correctly 800. So it showed really that we can speed up with that. Can you click please? And this actually show this that we from the beginning we're thinking and challenges state to score. This from the left side, from the spiral of the different analysis to the one suite of the interactivity where a user can really nicely navigate from the filtering, from the encodings and having everything under control together with the show art code generation. Can you click please? And then over the time, we built a lot of so starting end of the 2017 we started to prove of concept of our studies on the database logs where we prove that we can really augment our normal static analysis with this interactive. And this really proved the concept and from the beginning of 2019 we started with the small scale the project can click please. And this represents that the different visualization over the time we're adding more and more. So starting from the Kaplan-Meyer there were more on the study results implementation up to the more generic application. Can you click please? Yes, and then we over the time once developing all of those shiny application we also pay attention into the documentation. Can you click please? And this represents this is actually that our latest snapshot of course of this documentation but which each release had always the extensive documentation not only the R packages but also end user facing with the TOO modules with the examples, our TLG catalog, biomarker catalog, many more examples where people were able to really take it and implement into the study. Can you click please? And then you can see from the 2019 we really started our journey as a project we implemented the Scrum methodology and you can see here the different sprints and I have to say that one of the teams, the core team will have a sprint number 50 end of this year and this shows the whole long, long history of this and we always use as you can see here we had the sprint planning we have a daily scrum of scrum use DevOps we use automation and everything. And now you can see that first release was really in April 2019 followed by the other releases and from 2020 we had releases every two months right now. And now we are going to the stage where we're thinking and preparing in fact this project that we could maybe collaborate on that because we believe there is lots of good value in this that could change the industry. Next slide please. Okay, Adrian, back to you. Good, thank you Dot. So I'm going to talk about a couple of our products and then more about the organization and the documentation of our products. So if you go to the next slide please, Kieran. So four of the main products in there as Dot mentioned there are around 22 packages those are just some of the core packages for end users are tables to create is a generic framework to create tables and turn which an extents of tables to create tables used for clinical trial analysis. It also does graphs in the future listings and Chevron then is a template in package where you can very, very quickly create a table given data that follow a standard until it's what Dot mentioned the shiny based interactive dashboard toolkit. And in the bottom you see and I'm going to show that later live that the code that we add from our table is to turn to Chevron order. What changes is that, yeah, our table is very general is essentially so you can split by variables you can analyze, you can summarize row groups and so on and then turn ads differently out in instructions that do statistical methods and yeah, essentially are geared towards clinical trial analysis and then Chevron will link them deeply in the standards and there we get data like data standard so data format dependent meaning now we assume ADSL follow CD standards and there's only very, very few arguments in the Chevron and template functions to create the table. Good if you could continue and so the products themselves so the reason we create them which will be part of discussion the breakout rooms is because there are certain use cases that are important for the pharmaceutical industry for clinical trial reporting and so for example here is how we designed our tables to essentially distinguish between the rows and so you see here label rows like how we separate the rows of an art table into label content analysis rows which is important that when you start pagination so you split the table up and print it on multiple letter pages and multiple pages we want to make sure that that would like to for example if you break before back pain that goes on the new page that then you print gastrointestinal disorders and the total numbers again so we have kind of context of the green analysis rows you repeat them on each page if needed and also there are in the pagination algorithm constraints for example the minimum number of siblings in the lowest analysis context so for example if you say it needs at least two siblings on both sides then you cannot make a page break after abdominal discomfort because that essentially would mean those three lines above it the content rows and the label rows would have are only for that single line and then it would be repeated on the next page but you could for example if the minimum number of siblings is two you could paginate before back pain. Good so that's one of the special cases with tables then till you have seen screenshot before those are shiny web applications we provide a framework where with relatively little effort you can configure your modules to the particular data sets you have and then the users of those shiny apps or T-Labs essentially can open the app and start analysis meaningful analysis immediately without uploading data and also the person who sets up the application can choose how much exploratory or how much we would like to guide the end users so good so those are some of the products I'm going to talk a little bit about around those products what we set up and what is also very important when you start developing packages in terrible and so documentation is probably one of the most important thing we have very extensive documentation which was meaningful which was really important to essentially get users on board to our tools without using developer time so essentially independently on board to the tools we also provide trainings but documentation is very extensive you see we have different release the release on the upper right corner the releases are just tagged with the date when we released approximately every two months where we release our packages and then we have on the top you see the different components of our documentation and so almost all the user guide is a separate web page the TIG catalog is a separate project by market catalog AP references are the package down documentation for all the packages that we release to the user and so on so there's lots of documentation behind those behind the overall documentation I'm going to show a couple of them if you continue, Kieran so here's the agile law which is a general user guide and on the left hand side it talks about turn it talks about how we get started how can we install it or how can you load a pre-installed release of Nest packages then lots of information around teal form very simple use to more advanced use of creating neural modules some news and data access that are essentially rush specific in that case good, if you go on and working so in the API references we show the package down websites where we give much more detail on how to use the functions, the package like everything with vignettes under articles references every function that is exported is documented with examples and so we provide that to the user marked up essentially because the documentation is also part of the package installation itself but with package down you get like advantages of for example, you see the outcome of the when you run the example code good, if you can go to next the TLG catalogs are a product we started to before starting Chevron because the idea was that when we switch in the TLG space a table list and graph space from source to R we should first build off all the outputs that have been documented in source and provided as templates with R but we didn't want to do templates at the beginning we wanted to create those general layer layering instructions for our table that are clinical trial data relevant and so the TLG catalog was our way to essentially build output by output and then build general computational elements to get to those outputs and so TLG catalog shows that the code then the outputs all the variations and if there's a T-Module it would also tell you what T-Module you can use to recreate these standard outputs good. And so that's what is seen by the stakeholders now by the stakeholders which are a statistical program or biostats or anyone who would like to use our tools in terms of how we enable the developers to do efficiently that jobs I'll give some examples right now. So as Todd has mentioned, we do DevOps which doesn't mean that one team does it all we have multiple teams but we looked at that infinite cycle of planning, coding, building, testing, releasing, deploying operating monitoring that feeds back into the next plan. So essentially being agile looking what the users want reassessing what we have done and then improve it in the next sprint or release and then looking for feedback again deploying it looking for feedback that has been very good. It adds incremental value to the business we get feedback. It leads to better products. Yeah, good if you can go to next. The teams are organized using as like like that part of the strategy but then the way we actually organize ourselves is using scrum meaning we have extensive overall backlogs contains cards in GitHub those will be issues where it says what should be done every issue is an amount of work those get refined and prioritized then when we plan sprints which is a gate at the time like time slice like two weeks for example where we say that one team works on them and some of those cards that means we set a high level goal for a sprint we move the cards that we need to address to meet that high level goal into the sprint we looked at the amount of work required for those cards doesn't exceed what the team is capable of delivering in two weeks in the sprint period and then we keep the sprint backlog locked during the sprint meaning no other cards will be added during the sprint and so that's very quick summary of one step back so that's very very quick how scrum works in then we have the daily stand-ups we release it as a retrospective where we say what went well what didn't go well in the team and we also present to the stakeholders and get feedback and that we don't do like most like the presenting to the stakeholders we started doing more offers more often as the product has become more robust and more used throughout the company we don't do it after every sprint Good, next slide please so those cards in GitHub are issues and issues then are amount of work once the work is done usually they end up in changes in the code base which means the person who works on that issue makes a pull request and says I would like like that's how I that's how I do the work described in the issue then we require somebody to review it so when you see on the left hand side is a review required and we require the automatic checks to pass and automatic checks are currently run here in the GitHub action automation code and so those are the two requirements checks needs to pass somebody needs to review it and so here you see some communications and in the pull request there's communication we automatically post communication the automation button there's also the reviewer and the person who makes the pull request then discuss if it's good or if changes are needed Good, thank you Kieran and so yeah, so that's essentially how we organise a very, very high level in the development space I do have three examples I'm going to start with one short example do we still have time? I think we still have time now Kieran? Yes, we do Good, so I'm going to show one tool that we have found that we also in-house developed where we separate the space of package that is needed to install NAS in internal and external dependencies so all the internal ones when we develop external ones are what's needed to install our internal dependencies and that has been very useful for automation but also for collaboration and it essentially made the importance of dark frame which is where everything is installed and pre-configured like it shifted the focus from those dark frame which is because now it's very, very easy for people to install our environment and also to make essentially an issue that modifies three repositories is possible in other stage dependencies Good, so I'm going to share my RStudio session here if I'm not here, good Good, so what you see here is RStudio we are in the share-form package on the branch main and so first I show stage dependency and the internal dependencies have this stage dependency YAML file it says what's the current repository where is it hosted and also what are the upstream and downstream repos so downstream that are in the internal dependencies downstream we don't have a pattern at the moment is the most downstream in that dependency tree of course there are other packages that are siblings that may be more downstream packages Good, so and so what you can do with that so we provide them and RStudio add in where essentially you can say install all the upstream packages install check and install the downstream packages so if you make changes, you can say well, do my changes affect the downstream packages yeah, check and install the different versions and only test downstream dependencies so those are kind of the main features of the stage dependency and they have part of the origin in for example projects like Basel that do similar parts but it's optimized for all packages and fairly simple and fairly effective in my opinion so I'm going to show you very quickly within a project instead of running this instead of running the RStudio add in I'm going to run that code here so in the first line I say give me the dependency table for the current project otherwise I could specify and what you see right now it goes and makes a clone of all the packages and it actually creates a dependency tree that includes all the connected nodes in the graph and then if you look at this maybe I can make it a little bigger and look at this again even a little bit more bigger then you see the whole thing yes, so right now right now you only see 90 I'm not seeing anything at the moment I'm just seeing your RStudio screen it's just not doing anything okay, give me a second I resized it I think that confused them good, sorry I resized the RStudio so when you look at this you see 19 packages it builds it or it did the dependency graph and so those dependency graphs if there's not a link to a node from starting from Chevron you don't see so there's 22 packages but we didn't make links to all of them so you only see it's offset that is connected from Chevron and that gets into the topic of entry points and so on you see we're currently in Chevron those are the upstream packages those are not upstream packages but are connected on the dependency graph and so what you can do then so that's some information it works with GitLab and so on those GitLab packages are currently not you can't see those what you can do then is essentially say well I would like to install all the upstream packages for Chevron in the right order which you see here gives you the order I'll have a quick because I've already installed it before it skips it if it's already installed similar to remote install GitLab and it shows you what has been installed so that works for developers that they always get the latest version of the packages there's a cascading of branch name meaning you get kind of that monorepo feature where you can make change multiple repositories at once by essentially naming convention of branch branches what we have also added is without cloning the repository you can essentially just install it somewhere with this part here with these codes so essentially just point for where in GitLab what branch and then it does it automatically if you don't need to clone the project initially okay so now we have it installed and now I'm going to show you some of the examples of the R packages so it's Chevron loads turn on R table so I don't have to load those packages I take synthetic data so what you see here is not no sensitive patient information it's all completely made up give me a second oh good, I need to load that package apologies uh-oh let me fix it it's of course synthetic.cd good apologies it's SCDA sorry there's just a presentation here so we have ADSL if you it's an archive if we don't create it on the fly you can look at ADSL that's all made up data we have the variable names, variable labels quite a few of them and so the first thing is very vanilla R table so we make a basic table we split it by arm we analyze the variable H with mean the function mean and what you get is a very very simple table then in turn we add essentially business logic with summaries and so on so now you can say summarize variable H if you build now that table and so the important piece is this is R table this is turn here you get a table like that and then in Chevron we get data standard dependent but therefore it's much less code to get title and footnotes if that is right in the standard documents so no footnotes here but some titles good I think this is good for me Todd if you want to show if there's still time we can show we can continue yes of course okay I'm sharing the right screen can you see my screen yeah assuming you want to show the tail up yes yes of course yeah this is an example so thank you Adrian for showing how we did it how we are actually working on those different packages but I wanted to really come back to the story how we can use it right and then first of all this example of the really ad hoc modules and where I can utilize the packages that are actually external to us like tip liar like plotly like a like a ggplot or like a survival analysis and even here I used the external package called these are where I added encodings I can switch from from the different endpoints or I have actually comparing here with our internal Kaplan my occur which keeps the same but way more in terms of encodings in terms of yes of course the endpoints there is a faceting capability comparison and so on so on plus up to here it was not shown but I was talking about this our code generation so this is all the teal modules that we have they have by default those capabilities whereas those ad hoc modules of course you can on the top of this you can always add and you can build your story so this is actually I'm coming back to this quad that we are challenging the status quo how the results are delivered because end user can use those things to really create their own view they can decide which of the let's say endpoints user and user can can use it what are the faceting which fasten here we give two of them but we can we can give them by default all of the let's say what are the features of this and so on but something that maybe was not shown we can always use the filtering system which is really intuitive we can change from one to another and so on so this is this is one thing but what I wanted to show really is the amount of code so this is this is the R tables for instance this is really more where you need to let's say it's custom where we just copy and paste our our code and we just rendering the output whereas of course if we use the predefined module it's really this amount of code where I said we can really change and decide what are our variables to navigate and really tell the story and give the end user that not all the possibilities but really the limited possibilities if we want to and of course the other example of ggplot you can see really small example this amount of code if it's really something like a plotly like also here tip layer but there are other examples where I can show you I'll try to also stop this one and I will run hopefully it works sorry I need to this is another example where I used our suite of different function and I was building this and you can see here of course looks like a very complicated thing but in fact we to run such association is really defining reference and defining variables so and this we can very easily do this if you want to like a one row extract we just defined what is the data set what is the variables what are the choices what is the pre-selected value and if I come back here this is all what user need to do this and then can really navigate and of course we can we can change here and do this and what is important we still have this our code generation so this is example versus the ad hoc to the one that you can really go through this one and have more analysis or code or combine this with the plotly and so on so this is how we can really really go and tell our story not only exploratory but also regulatory thank you and I will stop sharing and come back to the presentation I can click share there there remain a couple of slides in a second yes so actually can you share I think this is not the optimal can you see is that good enough or I'm not sure if we should present can you can you not click present I mean I can let me see that does that work yes thank you so finishes the demonstration there is we are now looking into industry collaboration there's also collaboration and for example with our studio with our tables and teach and as for example was announced at our informer 2021 we're very happy about that in teaching here is a package to that takes any table like a standardized table representation and that can come of course from our tables or other packages and then it maps it maps it to different output formats like RTF HTML later and so on very thankful for that because different formal companies have different requirements on their rendering specifications and so in the so as Kirin has mentioned we are moving towards open sourcing more of our package or general at the moment those are okay so there's also from this those are the open source there's also admiral and other packages that from rush have open source at the moment the ones here are less specific and and and maybe to wrap up the software development part and then going to the breakout session I think there's lots to say lots of good stuff about making tools that are geared towards the users and the particular use case I think there's also some that's what I'm focusing here there's some important points when you do software development you want to have that software maintainable scalable extensible and for that you need a particular set of talents you need the tools like infrastructure and so on it usually takes quite a bit of time to do that so we can talk about it in the breakout room but yes it's not planned today in two months kind of exercise generally if you don't design your packages then the integration will become more complex usually it leads downstream costs so say in short term saving usually end up in downstream costs when it's released and then the users are trying to do because there's much more end users than there are developers I think in the former context also I think it's important to say that development has generally a slower pace than other projects that are done to deliver the core business and that also needs some discussion with the different leadership team and portfolio management and then the last thing that shift towards the use of R I think it's very important to not forget that there are much more users than developers of the tools and some of those users they already deliver a lot asking them to change technology needs a lot of support, change management and patient and so that I think all the kind of learnings or watch out when you embark on such a transition and with this I think thank you very much for attending the presentation part maybe Kieran and Todd if we have something to say Yeah, the main thing is just from the breakout session so just a reminder in case anyone didn't hear this at the start to move into these breakout sessions we're going to have to move into a separate this zoom is going to end so I will someone put into the chat that link and the passcode and please make sure you copy and paste that and make sure you've grabbed it make sure you have it because when the zoom ends you won't have that link anymore I'll give you all a minute to pick that up and then we'll end this meeting and we'll start up the next the breakout session meeting so there will be three different breakout sessions on these topics from myself Adrian and Todd and yeah we'll probably just auto assign people to it but yeah just please do copy that link and get that passcode because there's basically no way for me to give it to you outside of this meeting so if you don't get it now it will be very difficult basically impossible for you to join the breakout session so just again it's in the chat right now or you can just type it from the screen make sure you've got it make sure you've got it on a notepad or in your browser already make sure you've got that passcode so you're all ready for it so yeah we'll give wait and to fill there's a question and I think Cuny might be able to answer that just for context can you talk about what you mean by large scale and he talks about the servers how many deaf do you support how many users more or less so it's an interesting question because so directly in data sciences it's probably around a thousand users but a lot of platforms end up being used by wider than just our department so I'm not actually sure of the exact number for the wider user base I know for instance that the validated user base has about around 400 users but the wider exploratory use has been used by people across the company actually so it's probably quite a lot of people I think that it's a transition phase not everybody uses are everybody where to use are then it's in the thousands so it's probably like a safe safe but it's probably like three to five thousand people would have to use the system is that fair to say with contractors and all the externals and so on yeah yeah I mean if we would support like if everyone was doing nothing but are then our numbers would probably go up to that so a question from Andy was since he started this journey the whole industry's attitude to are has shifted significantly technology has moved on what would you do differently if you started again today I can take this question to be honest with how we started we started from the exploratory analysis and I wouldn't do anything differently because over the time we learn a lot from the end users what they want what is useful how we can help them and on an ongoing basis build those tools with them and we were getting lots of feedback on the way and the same the same thing of course today we have we've built lots of different tools around but but truly we had all the principles of the careful design this is what Adrien was saying think about this from the beginning on this was our approach and at the end we got where we are today Adrien, anything to add? Part of the tool technology we used to use Jenkins we moved away from Jenkins I'm very happy we did that I think GitLab CI and GitHub Actions are much better automation tools in terms of transparency what happens and so on from an organization point there's also the other questions in the organization how do you plan how did you design it I think that's a what's the Git workflow and so on I think that has its incremental not the developer change we look back we try to be, I mean we are agile so I think the most important thing is generally what Todd said don't write the piece of software and make it immediately a regulated GSP product before having tested it so exploratory to regulatory is the right journey in my opinion as well yeah Kyren, any? Yeah I mean just I think it's an interesting question like it if you have the tooling at the start we've taken particularly when we think about the platform and the setup maybe not but then definitely like the exploratory regulatory I definitely agree was the right way to go because it got people more comfortable with the tools and it also gave us a chance to build a user base who would be prepared and would be ready to work in this regulatory context and in general like a lot of this journey has been about us as an organization getting more and more comfortable with what we're using are so I mean I think are there parts of our system and architecture which we change yes and we kind of that's kind of what the future part is saying is like if we're starting from scratch then we'd start we'd ideally aim for something much closer to what we have right now I mean the other question which is less than our package question which is kind of we didn't get into in these slides but it's a technical debt thing which is we have to be mindful of the technical debt we have in our code base our user base is our SAS users where I starting a company from scratch which was all our developers using all the time I would definitely approach some of the ways we're architecting and thinking about things differently because they're a concern that there are ways you approach problems in SAS that you don't really need to worry about in our and probably vice versa as well but that's still a growing thing you know always definitely becoming much more popular in rush but there's still the majority SAS users as well do we have any more questions on chat do we still have time or are we switching to the breakout group? Yeah it might make sense to switch over to the breakout group so one last reminder hopefully everyone who wants to join the breakout group has now got that link so please do copy the link make sure to get the passcode that passcode 170825 and that URL and then so do we have to end this meeting first I think I can't be that good Thank you everybody we'll see you over in the breakout great thank you