 Thank you. Welcome everyone. So I'm going to share my screen and we will go ahead and get started. So as I do this, thanks everyone for joining. So my name is Ben Aaron Sevilla and myself and Yoni CD we're going to be talking about the software engineering working group and MMRM package. Let's go ahead and get it started. So a couple of things that we want to talk about in today's agenda. First off, we're going to give an introduction and an overview of what the software engineering working group is. We're going to be talking about mixed models for repeated measures, why it is a problem. We'll do a little deep dive into the MMRM package. So talking things about why is it not yet another package. We'll talk a little bit about long term perspectives and comparing the MMRM package to SAS and Yoni's going to take us through a demo. And then we'll finally close it out and talk about next steps. Before we get started, I just wanted to say thank you a lot to our sponsors like RStudio, PSI, and others. Obviously, if you signed up for this webinar, you know that these webinars are free. They're accessible on the May RStudio or the R Consortium webinar page. And so big shout out to everyone helping to make this possible. So quick introduction. My name is Ben Arizebi. As I mentioned, I'm a director of data science at GSK and I sit within our statistical data sciences innovation hub. And Yoni, if you want to introduce yourself. Hello, my name is Yoni City. I'm a director of modeling simulation at Safe Therapeutic inside the data science lab. Great. So let's talk about software engineering and biostatistics. So as I'm sure people are aware, open source software has gained a lot of increasing popularity within biostatistics over the past two decades or so. And that's led to some really interesting pros, but also some really interesting cons with this new way of working. So from a pros standpoint, we have a rapid uptake of a lot of new statistical methods, and there's a lot of new opportunities for collaboration and innovation. One of the major cons though is that there's a huge variability in software quality. Anyone that kind of exists within the open source world, that is a big thing that we always have to think about when introducing new software, new packages, or anything into our environments. So reliability is key and efficiency and maintainability is very different in this new world that we're beginning to enter. So developing that high quality software with good coding practices, reproducible outputs and self-sufficient documentation, that is critical in going forward in how we inform clinical and regulatory agencies. So how do we deal with these issues? So to deal with these issues of quality assurances for our packages and creating that high quality statistical software, a group came together recently and we created the software engineering working group, SWEWG. To look at our packages from a statistical point of view. This software engineering group, to give you a little background, it's an official working group of the ASA Biopharmaceutical section. It was formed in August of 2022, and it's a cross industry collaboration with more than 30 members from over 20 organizations. I've included a link to our homepage there, but that homepage is where we store a lot of our information about members, packages that we might be working on, as well as things like webinars, documentation, and things like that, about what are our thoughts about how to deal with software engineering, especially as it relates to statistical packages. Oops, sorry. So what are the goals of the group? So our primary goal is to collaborate to engineer our packages that implement important statistical methods to fill in those critical gaps. And then our second goal is to develop and disseminate best practices for engineering high quality open source statistical software. The second dairy goal is something that we're beginning to look at in depth more frequently, because I think there's a real need in the industry to think about these big questions and how we deal with, how do we engineer these packages with the adoption of new open source ways of working. So some activities that we've undertaken recently. So the first R package MMRM was published on CRAN in October of 2022 and updated in December. So the goal of the group is to establish this package as a new standard for fitting mixed models for repeated measures, MMRM. And we have been developing and adopting best practices for software in the MMR package, and it's been open sourced. So the package itself can be found on the open pharma GitHub repo. And you can see sort of how it is that we're going about developing as well as any of the kind of discussions that are ongoing related to the package. Currently it's under active development to add more features. So it's a great place if you have ideas on what should be added or any places that you'd like to have conversations about the package itself. This is a great place to start. So the question is, why do we need a package for MMRM? Why was that the first package that was looked at? So mixed models for repeated measures, MMRM, it's a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials. The problem with MMRM is there's no great R package. Initially we thought that the MMRM problem could be solved by using a combination of like LME4 and LMER tests. But we learned that this approach failed on large data sets. It was slow and did not converge. Then we started to look at NLME4, NLME, and we found that it does not give Satterwhite adjusted degrees of freedom. It has convergence issues and with E means it's only an approximate. So from there we tried to extend GLM TMB to calculate Satterwhite adjusted degrees of freedom. So before creating a new package, we looked at GLM TMB. So we tried to improve on that existing package and we tried to extend it to calculate Satterwhite adjusted degrees of freedom, but unfortunately it did not work. And we had to think about long-term maintenance and responsibility of a package. So we came up with some idea and some details. Because GLM TV is always using a random effects representation, we cannot have a real unstructured model. We only want to fix effects models with a structured covariance matrix for each subject. So the idea is then to use the template model builder TMB directly as also underlying GLM TMB to code the exact model we want. So we do this by implementing the log likelihood function in C++ using the TMB provided libraries. So what are some of the advantages of TMB? It's a fast C++ framework for defining objective functions. RCPP could have been an alternative interface. It has automatic differentiation of the log likelihood as a function of the variance parameters. We get the gradient and the Hessian exactly and without additional coding. And this can be used from the R side with the TMB interface and plugged into optimizers. So why is it not just another package? There's a lot of ongoing maintenance and support from the pharmaceutical industry. It's supported by the American Statistical Association. And the package is part of the mission. But to emphasize our goal is to push out information on practices for engineering high quality open source statistical software. And as I mentioned before, this is really key. Our goal is to push out these practices to define these practices and to have like real thought behind it. That way we can establish an industry wide viewpoint on how to build out these open source statistical pieces of software. Sorry, it went all the way to the end. So from here I'm going to hand it over to Yoni and he's going to take us through a comparison of SAS and R for that MMR and modeling package. Thank you. Thank you. So we're going to go through a high level comparison of SAS and R and to run a MMR model in SAS. It's recommended to use PROC-MIX or PROC-GLM. PROC-GLM-MIX can also do this. There are less model assumptions that are applied in PROC-MIX than PROC-GLM, primarily how one treats the missing observations. We will compare PROC-MIXed to the MMRM package under the following characteristics and attributes. So there's what's listed here, the documentation, the covariance structure, unit testing, degrees of freedom methods, estimation methods themselves and contrasts how you can compute contrasts. Next. So with regards to documentation, we'll be sharing these slides afterwards. So each of these bullet points is actually a link. So we constructed a side-by-side comparison of the documentation that the MMRM package has compared to PROC-MIXed. So each of them, for those familiar with SAS, there's the standard homepage of the PROC-MIXed documentation. They go through the usage, the theory of how the models are estimated, and the different covariance structures that they give, along with the degrees of freedom options that they have. Similarly, the MMRM package has a similar layout to the documentation where there are vignettes for basic usage. There's a large vignette for detailing the estimation methods that are being used. Similarly, with covariance structures, they're listed in the documentation. And finally, the degrees of freedom, there are separate vignettes for both Kenwood Rogers and the Satyr Swate degrees of freedom. Next. One major advantage of MMRM is that the unit testing is transparent. There's, in the GitHub page itself, there's the testing folder for the package, and there's document that there's, for every change and commit to the package, we run through all of these unit tests and it's reported transparently in the repository. We use a test-app framework with the cover package to communicate the coverage of the testing. There's a link at the end here that takes you directly to that folder for the unit tests. We do note that the integration tests of MMRM compared to the SAS PROC mixed, they're set to tolerance of 10 to the negative 3. So obviously there are differences between SAS estimates and the MMRM estimates, but we feel that that level of tolerance is acceptable at this time. Next. So comparing the estimation methods, both of them can be used for ML and RML. So those two methods are comparable between the two languages. Next. Covarian structures. So SAS has a lot of different covariance structures. It has 23 non-spatial covariance structures, MMRM has 10. Nine of those intersect with SAS. There's one that is in MMRM that is not in the SAS options, which is the anti-dependence homogeneous. SAS has 14 spatial covariance structures compared to a single spatial covariance structure in MMRM. We do though have the ability to have issues be open for feature requests at the GitHub repository. And we already have one, I think, open at this time, and we're adding a feature right now. So if you do feel that those 10 that we do have right now need to add more, then feel free to open a pull request and our feature request and it will be accommodated. So in detail, the covariance structures, which ones are available. So this is a comparison table for the ones that are available in MMRM. So as you can see, there's the unstructures, topolus, compound symmetry, outer regressive anti-dependence spatial exponential, both nearly all the cases. There's the homogenous and heterogeneous versions of the covariance structures. The unstructured can be unweighted and weighted similar to Procnext. And again, there's the anti-dependence homogenous covariance matrix that is available in MMRM, which is not in Procnext at this time. Next. Degrees of freedom methods. So there's overlap between Procnext and MMRM for the degrees of freedom methods. There's the Satterstwaite Kenwood Rogers and Kenwood Rogers Linear, which is just a note that the linear is not equivalent to the K2 setting in Procnext, where the documentation there is kind of not that straightforward, but when we did the testing, we saw that it's not the same setting there. In order to get contain between within and residual degrees of freedom, you do have to go through EMM means, which does support it, is not a native support within the MMRM package at this time. Finally, the contrast in LS means. So there is native built-in contrasts within the MMRM package. You can use the functions DF underscore 1D and DF underscore MD. These are S3 methods that are compatible. There are S3 methods that are compatible with EMM means. So any settings that you do have in the model fit, they will be inherited through the EMM means, which means that if you choose, let's say Kenwood Rogers, degrees of freedom, EMM means will understand and inherit those degrees of freedom methods and give you LS means that are corrected for Kenwood Rogers. You can get LS means difference through the pairs method of the EMM means. By default, Procnext and MMRM do not adjust for multiplicity where EMM means does. So that's the little nuance that you need to look out for that in order to get the same results as Procnext, when using EMM means, you need to set the multiplicity adjustment to null. Great. Thanks. So to go back to our working group. So what is our working group's long-term perspective? The software engineering working group. So and from a long-term perspective, we view that software engineering is a critical competence in producing high quality statistical software. And a lot of work needs to be done regarding the establishment and dissemination and adoption of best practices for engineering open source software. Specifically, our industry, this is a new transition that a lot of us are moving into. So we're trying to figure out what's the best way to do it. So that is our goal as a group is to think through what does that actual best practice mean. Our goal as well as improving the way software engineering is done, and if it's, if we're able to improve that, our goal is, our hope is that it's improving the efficiency, reliability, and then innovation within biostatistics organizations across the industry. In terms of what's next with MMRM and the software engineering working group. So as I mentioned a little bit earlier in the presentation, one of the things that we started to look at and work on is preparing public training materials to disseminate best practices for software engineering in the biostats community. And then at the beginning of February a face-to-face workshop will take place in Basel, Switzerland with a focus on open source software for clinical trials, organizing conference sessions with a focus on statistical software engineering at CNJSM and ASA FDA workshops, and then putting together a video series on best practices for software engineering and a link to that video series is included. Finally, the software engineering group is beginning to look at and work on some new packages. So things like SASR, HTA, and then a Bayesian MMRM. So if you have an interest on working on these topics, please, we're an open group, come work with us. And information can be found on the software engineering working group, which can be found at this link here, the ASA Biopharmaceutical Software Engineering Working Group homepage. So that is really what we wanted to cover today. So if you have any questions, please post it in the comments section. And as we wait for comments to come in, I'll just quickly show the existing GitHub repo for the MMRM package. So as you can see, this is where we do a lot of our work for the MMRM. We have an issues log where we have conversations about sort of what things need to occur in terms of enhancements, bugs that we might find, documentation that we need. So even if you don't feel comfortable in developing features, there is always needs in terms of documentation that if you'd like to contribute, it's a great way, sort of like a first issue. Also to show the vignettes that Yoni mentioned, within the MMRM package website, we have a great website here and under articles we have a lot of things around the details of how we do model fitting, covariance structures, an introduction to MMRM and some other details about the package. So within our vignettes you can see how to use the package, what you can expect outputs to be, some control functions and different things like that. One additional vignette that is in production right now that will be useful is direct comparison of PROC statements with MMRM statements and also other R packages that can run MMRM models such as GLS and LME4. So that is something that is being worked out right now but that is in the plan for the future and that will be a good resource for new users to the package. That's a great call out and as mentioned since it's all open and transparent if you want to be part of the conversation, here's the issue if you want to contribute. Great, so I haven't seen any questions come in about the working group or the package so we'll give it another couple of minutes just in case. Cool, so we have one. So do we have any plans or are you already working on comparing performance between MMRM and PROC mixed under different scenarios so like covariance structures, inter-qual relations between repeated measurements, etc. through a simulation? Yes, so in the unit testing as stated before we are running direct comparisons to SAS PROC mixed under data sets and simulations that have to answer up to a 10 to the negative 3 tolerance between all of the things that you mentioned and the question. Great, and as mentioned during the presentation all of our tests are found within the GitHub repo so that's what I'm trying to share here on the screen. Any other questions that we can help answer? All right, well if there are no additional questions we'll keep this webinar relatively short but thank you everyone for joining. We appreciate it. We appreciate you bearing with us as we dealt with some of our technical issues and please if you have an interest in joining the group I will put a link to the primary website in the chat in the comments so that everyone can have access. Please join us and I think it's really exciting to be able to start thinking through what are ways to establish for the industry an approach to building out software of statistical packages. Thanks everyone for joining. I will post the website here in the chat for you if you have any interest in joining and thank you so much and we'll end the broadcast. Thanks everyone. Great, thank you.