 It's very nice to have you here. I hope you're well seated with a cup of coffee or tea. That's the big advantage of online events, so let's all enjoy that. It's my pleasure to host today's R-Adoption event from the R-Adoption series. My name is Colleen Ceballos. I'm a R-Strategy Lead at Rosh Informatics and also an active member of the R Consortium. Just to give you a little background before I continue about this event. In April, I gave a talk at the DIA FDA conference about submissions in R, and after that, my colleague from Rosh Ningling suggested that we also bring a similar content to you guys here at the R-Adoption series. So here we are today. We're very lucky to have three FDA collaborators who will speak. It's very exciting. But just before we start, I had a couple of quick slides, very a bit of formality, but we want to thank the sponsors who make this event an event like this possible. They are Consortium, Fuse, PSI, and today the BBSW. For those for who it's the first time you join the R-Adoption series, it's aimed at anybody who has initiatives in the R world. So feel free to go on the websites and to contact Andy Nichols who leads the organization of those events. The next event will be in September. We don't know exactly when, but check the websites for updates and also last thing for this slide, you can find previous webinars recording on the webpage. So follow the link here. We will share the slides after the conference. Today's session, so I start with a short opening with me. Then we will have Ryan and Haesu from the FDA who will present for about 20 minutes each. And then we'll have a panel discussion for the rest of the conference. So feel free to post your questions on the chat. Ning will be moderating the panel discussion and collecting your questions so that we can ask them live and get responses from our FDA speakers. Also important to note that we will have a third FDA collaborator, Paul, who's not speaking but who will be part of the panel discussion. Without further ado, I'd like to ask Ryan to come forward. Ryan is a senior statistical reviewer at the FDA. He has experienced reviewing phase three and four, safety and efficacy focused reviews. He has a PhD degree in biostatistics from Yale University. Ryan, welcome and over to you. Thank you, Colleen. Can you hear me? And yes, okay. So yep, hello everyone. My name is Tae and Ryan Jung and I'm a senior statistical reviewer from Office of Biostatistics, CDER, FDA. And thank you Colleen and Ning for your kind introduction and inviting me to this exciting event. So in this presentation, I would like to share my recent review experience from a regulatory approval of real world evidence supplement and the submission. And throughout this review, R was used for the entire submission in the real world data management and data analysis. For disclaimer, this presentation reflects the views of myself and should not be construed to represent FDA views or policy. And there is no relevant financial relationship for this presentation. This is the statistical clarifying statement. FDA is software agnostic and does not encourage nor require use of a specific software package. However, the software package used for statistical analysis should be fully documented in this submission, including versions and build identification. Here's the outline of today's presentation. Due to the limited time, I'll briefly walk you through the overview of the program real world evidence supplement and the submission. For further study designs and analysis in detail, I will refer to the sponsor study published last June. And then I'll go through the pre and post discussion between FDA and the sponsor in regards to using R for this regulatory submission. And at the end, I will show how the R markdown supported the regulatory review and wrap up with the lessons learned from the regulatory perspective. So as you may know, in July, 2021, a new indication was approved for TechLemis with the brand name of ProGraph. And this was the cedar's first acceptance of an observational or non-interventional study as an adequate and well-controlled study providing the primary support for a finding of substantial evidence of effectiveness. So it showed real world data is no longer limited to supportive but able to play as a primary evidence that meets FDA regulatory standards. And here's the timeline of ProGraph approval. So the 21st Century Tract gave birth to the 2018 RWFD framework and the sponsor Estella started to plan to obtain an approval for a new indication of ProGraph. And according to the sponsor, they got governance project approval on April 2019 with no data and no protocol. And after their search for real world data sources and drafting protocols, the sponsor had the first type C meeting with FDA to ask the adequacy of real world data selection, proposed outcomes, exposures, and the statistical analysis plan. And after our comments, they went back to just plans and come back to the second type C to discuss the dose recommendation exposures and data source. And at the type B pre-NDA meeting, we discussed a new data science practice for the submission because there were several issues that we didn't expect in RCT submissions. So surprisingly, it took only 20 months to prepare a submission package for SNDA starting with no data and no protocol, whereas conventional large phase three trials take many years with a huge amount of money from conception to completion. And this SNDA was granted priority review with a six month clock since there was no FDA approved immunosuppressant drug products for long transplant recipients. In the later slides, I will focus on the discussion between FDA and the sponsor communication for using R. Now let's start with the clinical regulatory background. The tachrolemus is indicated for the prophylaxis of rodent rejection and adult and pediatric patients receiving allergenic long transplantation in combination with immunosuppressants. Under the brand name program, tachrolemus was originally approved by FDA in 1994 for the prophylaxis of rejection in liver transplant recipients followed by the kidney transplant in 1997 and the heart transplant in 2006 based on RCT evidence. Although RCT for long transplantation hasn't been submitted to FDA, this drug has been widely used as the mainstay of immunosuppressive regimen in most transplant recipients, not only for the approved indication, but also off label. So the sponsor designed a study as a non-interventional study for the analysis of treatment and outcomes for patients who received a long transplantation and compared to historical control and the absence of combination immunosuppressant therapy. The primary endpoint was a composite endpoint of graft failure or death within one year post-transplant. And this study used the scientific registry of transplant recipients data here after referred to as SRTR data on all long transplantations in United States between 1999 and 2017. The study population of interest include adult and pediatric patients and tachyliminase immediate release in combination with mycophenolate model built and as a therapy. Now let's deep dive into the SRTR real world data. So SRTR is a national transplant registry and made available under a data use agreement to external researchers. So it's a public data and includes outcomes for all transplant recipient in the United States since 1987. For program real world evidence application, the sponsor submitted the relevant SRTR standard analytic file. That's the raw, that's the final data set that I'll explain more in the later slides. And the SAF was chosen between 1999 and 2017 and the cutoff is based on the tachyliminase increasing uses since 1999. And here's the visual flow of transplant data starting from local hospitals to SRTR registry supplemented by CMS and Social Security that master file. The Center for Medicare and Medicare Service CMS supplements the end stage renal disease information. So it doesn't really apply here in the long transplication case. For more detail data collection process and data management, I will refer to the SRTR public website. This is a snapshot of the public data dictionary of SRTR standard analytic files. And SAF is the real world data from the SRTR for which the raw data is provided in a transform format. The final data set of interest submitted for FDA was generated from the SAF and it included 20,000 subjects with 453 variables in one final single data set. So you might assume the sponsor didn't submit the data set and see this format, which is the FDA standard and your assumption is then correct. And here's a rough comparison between conventional RCT and DABL submission and the program submission. As usual, both have common documents such as study reports and defined documents and FDA electronic submission gateway were used to submit data regardless of the type of submission. Unlike typical NDA submission, the program RW submission had several challenge because of the complex characteristics of the non-interventional real world data. Because it is RWD, their length of variables were longer than the FDA standard. So SAS export version eight was requested by the sponsor whereas FDA standard is version five. And this is another long story that I cannot address here due to the time, but the irregular format of real world data was not compatible with the FDA standard. As I mentioned early to my knowledge and for the first time, the sponsor use our program for the entire submission where SAS is the prevalent language in FDA submission. Now let's see the communication between the sponsor and the FDA before and after submission. So before the submission, the sponsor and FDA had multiple rounds of communications. The first discussion was open at the type C meeting and the sponsor asked whether a hybrid of SAS and R is acceptable. So we said it's okay. And in the type B meeting of on August, 2020, the sponsor asked whether using R Markdown is okay. And the program submitted in RMD and HTML formats will be acceptable for review. We agree with the program submission using R Markdown but ask sponsors to generate the final format into both HTML and PDF formats to give easy access to other review teams. Also, we asked them to specify alternative package if applicable. And importantly, we asked the sponsor to submit a sample data of 200 subjects with relevant programs and documents. And this is not only because of using R but also because of several submission issues using SAS export version eight. The sponsor submitted sample data and the code with relevant documents and in November, 2020. And we assessed those sample materials and found no issues at the time. So during the NDA review, we sent out a couple of information and requests because we experienced some difficulties first installing the company on R package. Some compatibility issues accord in this package and the sponsor didn't disclose a GitHub allowing for manual download. So we asked the sponsor to submit the GitHub address or clarify and clarify the required R version, provide a reference manual for this package. Another information request was sent on April because we experienced difficulties using sponsors suggested latex editor in the analysis stage. Well, we do our own programming, independent program first, and I didn't notice until I tried replication using their codes. And I found the analysis part was using a latex editor and didn't work well in our computers. And this latex editor was used for to make a better font for the outputs. So it was not that necessary, but it was used for the whole analysis part. So I couldn't move on. So after I spent a couple of days to solve this with the tech team, this was not solvable for unknown reasons. So I asked the sponsor to submit the whole R&D programs in a latex free environment. Now, I would like to share the real practice of real-world data management using R from the program submission. So first the SRTR provides several pieces of raw essay of data files and SAS format only, it's their rule. And the app, the sponsor converted SAS to CSV files and then loaded into R to create a data set of interests. Well, as you may know, there is a direct way of converting SAS to R data, but the sponsor conducted in this way to check the quality between the transformation steps because that's because it had longer variable names and an export version eight. And we were concerned about any truncation during the transformation. And their quality checks include assessing any truncation of data and using the simple statistics. So it's like comparing the mean, whether the mean and the sender deviant is exactly same before and after transformation. So it's not a special tool, but it was useful to check at least they were maintaining the same data. And the majority of the programming was performed in R markdown, which provided both the R codes as well as the comments and intermittent results in a unified framework. And as a step review with zero knowledge in the SRTR database, I think this was a great practice because it provide transparent data management procedure and efficiently executed the codes. And formatted SAS variables were converted into corresponding CSV variables and that were composed of both unformatted and formatted values. And the sponsor imported the CSV files into R to create the data set of interest. So that's the procedure of hopping from XPT to CSV and then finalize with R data. The sponsor analysis data supports all protocols and statistical analysis plans specified objectives. The final assay of data include one record per patient containing all variable using the tables and figures. And it includes efficacy, safety and baseline characteristic variables, not the PKPDB variables. As analysis was not an FDA for standard format, of course they clarified that they didn't do conformance check and although they did some quality check on when transforming CSV to R files, as I said, it's just checking whether the numbers are matched using simple metrics. And like the RCT and the ABLA submission, the applicant provided ADRG analysis data reviewer guide and looks like and feels like defined document. And these documents were very helpful when understanding the data structure and the data management process. So here's how the analysis data set is created. So starting from the SRTR SAF, which has SAS Expert Version 8, the imported data was stored in the SAS library with the SAS 7B extension. Next, the sponsor exported to Excel CSV files. I think this step might be unnecessary because the data can be directly transformed from SAS XB2 to R data using available package. However, it appears the sponsor wanted to make a safer choice during the data transformation. And then you can see the CSV data are fed into each relevant R Markdown programs on the right tables. And each interim stage, you can see the flow of the output data which becomes the input in the next stage and so on. So it's like stacking cakes step by step to subset the SAF data and build a study specifically to review the step team. Consider this was an excellent data processing practice because this streamlined framework provides super clear and easy to follow management procedures. And even though we had zero knowledge in this registry data, this process really helped me help us understood the complex data structure and the data management process of the real world data. The R Markdown output in HTIL format looked good like this and was helpful during the review. And here's some snapshots of the data management analysis output. And you can see there is a navigation panel to go over different sections. So it was quite convenient when we had to communicate with clinicians, we don't need to go back, tap multiple like documents to show the analysis results. But this work, I think this worked well for a communication purpose. So through this program, here's another page for the output. And through this program application, we learned the importance of transparency of real world data. And this is because real world data source are desperate and non-uniform thereby will be difficult for us to understand the data source comprehensively in the beginning. For this submission, the real world data structure was really complex and we had no prior knowledge or experience with this is registry data. And although reviewers usually don't start review from data management, I had to do this for this time. So R Markdown played a very important role, especially in the data management because it provide a transplant and streamline process for review. The HTML output allowed easy navigation through the results and supported interdisciplinary collaborations. And we experienced lots of challenge when using R Markdown for the review. So my recommendation will be, it will be good to provide all necessary documents to aid a review process at one time, submit all relevant analytic package and complex features like unnecessary stuff or those features may not be compatible with FDA of computer environment. And the goal should be making the code and analyzable and executable. And that's totally fine for as a reviewer, as a reviewer. So as I described, like simple coding practice are just fine. And so, yeah. Oh, sorry. And my last messages. So if you plan to use any R programming for your submission, discuss the availability prior to submission and the sample data and codes will be very helpful for the reviews to prepare us to switch from the SAS world to the R world. And I'm aware of the ongoing efforts of our regulatory submissions. And our next speaker Hesu will present about this exciting topic. So I think this is all I have today and I'll be happy to get any questions from the audience. Thank you very much. Thank you, Ryan, for this talk and sharing this detailed work for the specific rural data use case. We rarely get this internal view. So us from the industry, so really thank you for that and see you in the panel discussion. And also everyone in the audience before I introduce our next speaker, don't forget to progressively send your questions over in the chat on the right. It's under the stage tab and Q&A. Our next speaker is Hesu. She's a statistical analyst in the Office of Biostatistics at DFDA. She has a bachelor degree in statistics from UC Berkeley and a master degree in biostatistics from Northwestern University. Welcome Hesu, the floor is yours. Can you hear me? Okay. Hello everyone. Thanks for introducing me and thanks for having me today. So my name is Hesu Cho. I'm a statistical analyst at Cedar here for DFDA and I reviewed the R-Consultium, R-Base Pilot One submissions. So I like to talk about open source software and share my review experience today. Also, I'd like to thank Paul Shudi for making his slide available for me to use. So here is the standard disclaimer. So this presentation reflects the views of the author and should not be construed to represent the FDA's views or policies. So Ryan shows this statement earlier, but since this is important, let me start with reading the 2015 statistical software clarifying statement again. So FDA does not require use of any specific software for statistical analysis and statistical software is not explicitly discussed in Title 21 of the Code of the Feather Regulations. However, the software packages used for statistical analysis should be fully documented in the submissions, including version and build identifications. So for FDA submissions, a sponsor can use any valid software for statistical analysis, but should clearly specify the version of the software as well as all the packages and libraries that go into it. As noted in the FDA guidance, E9 statistical principles for clinical trials, the computer software used for data management and statistical analysis should be reliable. And the documentation of appropriate software testing procedures should be available. Sponsor are encouraged to consult with FDA review teams and especially with FDA statisticians regarding the choice and the suitability of statistical software packages and on all these days in the product development process. So if a sponsor has any concerns or questions regarding using open-source software, FDA highly recommends sponsor consult with the FDA review team early on. Then what are the proprietary and open-source software? So there is a clear guidance documents stating that there is no need to submit any particular software. However, the FDA submissions have been mainly based on SAS languages which is proprietary software. So proprietary software is non-free or closed-source software and it can be expensive. And the source code can only be examined and modified by original owner of the software exclusively. And SAS and STATA are the example of proprietary software. On the other hand, open-source is publicly accessible. So it is distributed with this source code making it anyone with the academic knowledge available for inspections, modification and enhancement. And the examples are R and Python. And R is what I will focus on in this presentation. Then what are the reasons to use open-source software and what are the challenges? Especially in R. So there are many benefits and challenges but I just highlighted some of them here. So anyone can install and use R for free. So think about it. If you have to pay a couple of thousands of dollars for one-year license of a proprietary software or you have to pay extra license fee to add more functionality, you might just wanna choose open-source software because it's free. So cost of software is one of the reasons to use open-source software. Second, innovations. So adaptability to new trend is much faster with open-source. For example, R packages can be developed by anyone in the R community. And if you don't see a solutions, you can create and distribute your own R package. And anyone can contribute it to optimizing R packages and resolving errors if there are any. So that's why well-known machine learning algorithms are mostly available in R. And when we think about how to handle like unstructured EHR data, which is kind of a hot topic these days, R gives more of flexible and innovative method to deal with this new trend. Increasingly interactive data visualizations and dashboard created with R Shiny apps are getting common as well. So innovation is the reason people use open-source software. And third, training and familiarity. So with all this trend and shift from SAS to R, increased number of schools are teaching R over SAS. So recent graduate are more likely to be familiar with R than SAS. And pharmaceutical industry also investing heavily in R. So these are the reasons to use the open-source software. However, there are some challenges with open-source. So precision of quality. Some people consider open-source more secure and stable than proprietary software because anyone can spot and rectify errors that might have missed by the original developer or publishers. However, other people think differently. They think security threats or potential vulnerability of open-source because there is a lack of controls and have some mistrust in the open-source community. Validations. So the reason why FDA submission is mainly based on SAS because SAS is validated software. However, the open-source software are not always perceived to be validated. Support. What would you do if you have question about the R? You may go to Stack Overflow and then find the solutions as there are huge R community out there with discussion and questions. However, open-source does not have any dedicated technical support. So when you face any technical challenges that you can get answered from the Stack Overflow, the lack of former support could be frustrating. And the lastly, dealing with legacy code. So many companies have legacy code which has been reused over and over written using proprietary software. So ability to switch to different software might be somewhat limited, especially in big pharma companies. Now, let's talk about my review experience of R Pilot One. So R Consortium Group submitted R-based test submission packages to FDA on November, 2021, last year. And the objectivity was to test the concept that on R language-based submission packages can meet the needs and the expectation of the FDA reviewers, including assessing code review and analysis reproducibility. So R Consortium Group wanted to showcase the feasibility of submitting R code. So it was based on small simulated clinical trial data set with very simple analysis. And then the evaluating FDA acceptance of the system and software validation evidence is not in the scope of this pilot. And so all data set code and documents are publicly available using the link here. So the main components of the submissions were Adam data set, a PDF report with four analysis outputs, and analysis data reviewer's guide, which is ADRG, and then analysis output programs in our files and the sponsor developed our package in TXT file. So just noted that ideally SDTM data set and Adam generation program and Adam data set, these all three are commonly submitted to FDA. But in this pilot, we just receive Adam data set and it seems sufficient for these submissions. And we may look into running an additional pilot in the futures that provides more comprehensive packaging. And in ADRG analysis data reviewer's guide, the sponsor clearly specify our versions. So they used our version 4.1.0 and then all packages names and each versions and then the description. And then we also ask them to provide details, instructions to execute the analysis program in R. So they provided how to install open source R package and then sponsor develop R package. So they provided the details and it's in the ADRG document. And then these are the four analysis output. So first table was summary of the demographic and baseline characteristic. And then they also provide the cap and Mario plot. And then these two tables were changed from baseline to certain weeks using ANCOVA model. So different open source packages were used when generating these four tables to test wider use case of scenarios. So our consortium use our version 4.1.0 but using the R version 4.1.1, FDA was able to run the submitted code and confirm the submitted tables and figures. And using FDA developed code, FDA was able to independently generate tables using the submitted data. So there were no major issues with the accessing code and reproducibility, but there were some minor issues. So first was the rounding issues. So when you compute a 95 confidence interval in R, R has many different packages and then ways you can choose. And R also gives flexibility to choose either approximate value of 1.96 or precise quantile. So the inconsistency way to calculate the 95 confidence interval caused the minor discrepancy. And another thing that we pointed out was each tables and figures should be stand alone. So this is not particularly applicable to these submissions but or other submissions. Some important information such as specification of the ENCOVA model was not given in the table. But each table and figures should be intelligibles without referencing to the text or code. So these were the things that we follow with the R consortium group and then we completed the pilot one. And then we might find more potential issues in the futures, but one thing that we discussed was package dependency. So when the analysis get complicated or use lots of different packages, the package dependency could be matters. So maybe submitting our package dependency chart might be useful in the review process. So as a result, we received the electronic submission package in ECTD approved format and we constructed and loaded the submitted sponsor developed at our package and we installed it and loaded open source packages using the submission. And we will produce the analysis result independently and share the potential improvements for submission deliverables and processes via written communication. So in conclusions, we were able to complete the first publicly available our language-based submissions and we highly recommend sponsor communicate with the FDA regarding using open source software. So this concludes my presentations and thanks for listening. Thank you, Haizu. Thank you very much. We'll now go to the panel discussion. I shortly introduce the other FDA, not speaker, but FDA collaborator who will be part of the panel discussion, Paul Schwetz. He's a mathematical statistician, scientific computing coordinator at the FDA and he has a master degree and PhD from the University of Wisconsin, Madison with a specialization in probability theory. And I hand over to Ning-Leng to host the panel discussion. Enjoy everyone. Thank you very much. Hi, Paul. Hi, this is Haizu. Hi, Ryan. Thank you so much for the insight for presentation. Really enjoy learning from you. So in the next an hour or so, so basically we will go over a couple of questions. We have some prepared questions. I'll go over them first and we will also start addressing some questions from the live Q&A in the chat box. Yeah, so I'll start with some questions just trying to understand more on like what does FDA reviewer or FDA staff's daily life look like when you review a package? I guess like could you maybe elaborate a little bit on that like what system you are using whether you are using like laptop or a server and how do you retrieve like our packages from Chrome, from GitHub or some like a product or packages submitted by the submitters, yeah. Ryan or Haizu, do you want to start? Yeah, I can start first. Can you hear me? Paul, that was a lot of some questions and I wonder what your question but I'll just start with my daily life. So when we receive a review, I go into look into the ADRG and define document first before I get into the data set. So I read, well, I check what are all necessary things are all submitted and read the ADRG to understand the data and start playing with the data. And mostly it says use for our reviews but for the program it was R Markdown and R Markdown I haven't used for a long time. I use it during my doctoral studies but it was like more than like five years ago. So I had to like refresh my memories by looking into old school documents, materials and also use YouTube to refresh my memory but and it worked. So I start playing with the sponsor submission submitted R Markdown documents out with the data and that was kind of fun. And what was the other questions? I guess maybe can you elaborate a little bit on like how you retrieve like say like a CRAN package GitHub package? Well, so my computer and most of our computer for reviews are just laptop and it we have a R steel or SAS installed in our laptop. So it's just like not different from you guys it just installed a new package from CRAN updated from CRAN. If like, but the company on package like in the photography view that was not available in CRAN and they didn't have a GitHub. So that's why we ask the package to submit through the electronic submission gateway and get work in that way but except those have a package I think our activities are very similar to yours. Gotcha, yeah. So basically like for different FDA stuff different FDA reviewers because like you use your own like a laptop company laptop over like a laptop over there. So basically the package versions can be different like from different stuff. Yeah, just a special transition that we have we are having is like we are moving SAS from desktop to SAS analytic remote grid environment in search. So it's kind of a new practice for us and maybe Paul and as we will be more knowledgeable about this and I'm still learning how to use that in in search but we're still using it in our laptop with Windows system. Cool. Thank you for sharing. Yeah, I can just echo what Ryan said. I mean, you're in the same situation. So you're just using our FDA laptop to run the R code and then packages. So I will I start with the reading the ADRG that's sponsored submitted because that's where the sponsor provide the all the specific information such as like what version they use or what package they use. So once I check their versions and I think the sponsor should use the recent version of the software packages. So if once I check their version and if the version that is installed in my laptop is not a huge different I just run I just use my version first to replicate the result and then invest here further if there is any discrepancy. So and yeah, that's how I start analyzing the sponsors code. Thank you. Maybe a follow up question is like do you always rerun sponsors code or you like sometimes you just go with like independent independent programming directly. So for me, I usually start with the coding by myself because if I once I see their how they did it is influence how I'm gonna review. So I just start with my independent review like replicating the result first. Yeah, same here. And I do my independent coding first to replicate the sponsors result and then go to the sponsors code and get surprised because their coding is always professional than me and better quality. And I go over all the lines from their coding line by line. And if I see anything unusual, I flag them and I report to my supervisor for discussion. And I think that's so we do both one independent coding from the reviewer side and also replicating the whole code from the sponsor. Thank you, that make a lot of sense. Yeah, it ran in your presentation. You also mentioned that you asked the submitter to specify some alternative packages to use. Yeah, can you elaborate a little bit on like why that was needed? Yes, that's a very good question because depending on the package, the estimates could be like not completely different but a little different based on the like the standard deviation calculation, the variance calculation and about some different results incurred by rounding. So we tried to specify all available options during the meeting to like learn from their package. So if they are using a specific package for analysis, this time it was just simple because it was just survival package for Kaplan-Meier estimates. But we asked whether if they're conducting any other like analysis, we asked them to specify all possible package that has the same analysis methods. And I tried them all to see how different they are. And if they're not really different and we try to accept the sponsor's package but if don't we flag it and have a discussion internally first and ask the sponsor to modify the package if they're not using the right package. So that's kind of our practice during the review. Thank you for sharing. I totally agree. I feel like many times there is not a right answer but there are different ways to implement things and it's good to try out different methods and just like to make sure we are not in a corner case over there for the estimation. I guess then like as you said, basically I think he also mentioned about like a slight rounding difference over there. So I think it's probably normal to see that the FDA independent coding result is slightly different from the sponsor, right? And if the difference is not that big and if like basically if the clinical conclusion doesn't change then it shouldn't be a big issue. I think so. But whenever I see the discrepancy I report to the statistical reviewer. So I don't make a decision by myself like we talk internally first and see whether this is critical or should we address these issues to the sponsor. So yeah, so we sometimes see, many times we see the discrepancy when we use different packages or between the result of like coding in R and versus coding in SAS. But yeah, the rounding issues could be very frequent issues but as long as it's not a critical issue, we don't, I think it should be okay. Thank you. Yeah, thanks for sharing that and maybe switch the topic a little bit. I think like both Ryan, he said in both of your presentation, you mentioned like pretty close collaboration with the submitter. I think for Ryan's case, there were like two or three pre-submission meetings talking about the like submission format and he said like for the R consortium pilots, like there were definitely a lot of meetings with Pa and yourself like before the submission. Yeah, maybe just whether you have any general guidance to a sponsor if they want to submit our code or are mocked down in their future submission, like what information to provide during the pre-submission meeting, how early and how to help the reviewer, like FDA reviewer to review those like R-based submissions. I can start, maybe Ryan can add more, but sponsor can start conversation all the on and then like provide version of R or list of packages or explain like intent to use the open source software and reason behind why sponsor wants to use certain package like things like that. Also the individual reviewers maybe or may not be familiar with R, so if they are not familiar with R then they can request helpful statistical analyst like me. So FDA can prepare for the obfuscation review ahead of time as well. So I think the establishing agreement between FDA and sponsor would be an important thing. That's a good answer. And of course we are getting a lot of help from our division of analytics from Hessex and Pulse Office. Analytics, information, informatics steps and now it became a division. And my recommendation would be like clarify the goals with using R and also specify the plan as early as possible. So the FDA can be prepared with the best available reviewer for R communication. So for myself, I work for a couple NDAs for like six to nine months and then I completely forgot about using R and then I have to switch back to R to do R review and then that it's not like a easy transition. So I think if you can notify FDA that you will use R and they will try to find the best available reviewer for this mission. And I believe there is no limit topics limited for discussion. So open as much as you can. So in our discussion, the sponsor were very specific. They said they wanna use tidyverse, they wanna use like a couple of certain package for management, can you accept this report? So they tried to get my agreement for like a very specific package to see whether I'm knowledgeable or okay to use it. And yeah, and I think that was kind of great practice and very honest communication because we know what we know and we try to learn if we don't know about their plan. So I think those kind of like communications are very important. Thank you. Thanks for sharing. Yeah, thanks for sharing. Like, yeah, actually like the practice of like reaching out to Paul and his team. So basically it sounds like even if like the stats reviewer for this program, if that reviewer is a primarily stats programmer, like this reviewer may get extra analytical support from like basically Paul and his team on our programming, right? Yes. Oh yeah, very much so. Cool, awesome, thank you. And another question. I think like right in your presentation you mentioned that in the very beginning like you have some challenges of like installing the preparatory package submitted by the company and then later on you asked them to like upload it to GitHub. Yeah, just kind of like want to hear yourself like whether it will be easier from FDA reviewers or FDA staff point of view whether it's easier to kind of work open source package compared to preparatory package. And also like whether it's like offering of offering operation or easier and whether you will perceive that you will be like more trusted if the package is out there and there are other people using that. Yeah, so well, if it's in GitHub or able to install from CRAN, that's, I mean that's more well used and frequently used up package than the sponsors package. And it's easy to download. And well, when I download survival or like or Gigiplot use Gigiplot package I don't need to do any kind of inspection for those package, right? Because it's well in package but for company package, for preparatory package it's a difficult thing, it's a different animal. So this time I was lucky because their company on package was just formatting the table in a better way. And but I had to check all line by line from their package to see any any unknown like sources included in the package. But it was couple, it was pretty easy but if the package comes in with difficult algorithm I know it's like some companies are developing their in-house or their own complex algorithm for analysis but I think everything has to be clear, super clear to make the reviewer understand the process and the logic before they run it. Thank you. There's also the R-Validation Hub that we can use for most general packages. Company developed packages like Ryan's talking about in some sense serve the role that macro serve in SAS. So we regard that as perhaps a little different it's when we have more complex analytic procedures that are in proprietary packages that we have increased levels of concern. And occasionally, but not often, we're starting to see that particularly with integration of machine learning methods. Thanks for sharing. I see like in the Q&A there is a question about validation basically. I think like for Ryan and Paul as you mentioned for those like commonly used packages such as DipliR and GT Plot 2 seems like other idea reviewer and stuff are trust them pretty well, right? And I saw that there are a couple of packages like have like validation document available to the public by our foundation and maybe by R-Validation Hub. So like for those kind of like commonly used packages in general, like you will believe they are there with pretty good quality here. Yeah, if we go back to the statistical software clarifying statement, it has to be the testing procedures for using those should be well documented. And we could know when could argue that the R-Validation Hub is one type of documentation to show that those are available. When we get into proprietary packages, it's up to the sponsor to document appropriately that their software does what it claims to do that it's fit for the intended use. Thank you. And thank you for sharing the R-Validation Hub. I think that will be a very, very good resource for any companies who are developing their in-house or proprietary R-Package. I think they have pretty comprehensive guidance over there on some good practice around testing, et cetera. One thing that I can add is if the packages provided by the sponsor or packages new, one way that I use is trying it with different program, different programming language. So like such as SAS. So I use SAS and R together for cross-checking and see if both provides the identical result. So that's one way to validate this unknown or new R-Packages. Yeah, that's the same practice from my two. Yeah, sometimes I compare the results, especially for primary analysis result that I'm concerned when using R for this time. I run all the analysis through SAS together to see whether they matched the results. So that was kind of like a new practice because R was used this time. Yeah, thanks for sharing. I feel like within industry, sometimes we do that as well. If it's like a new methodology that we might try to program using two different languages or... Cool. Yeah, I see that another highly rated question from the Q&A is about whether FDA will share a guidance around maybe prerequisites like for R-based submission. I guess Ryan, both Ryan, he so shared some like learnings from your experience such as like dependency or how to submit a proprietary package. Yeah, I guess whether you'll foresee that FDA may come up with the guidance or do you foresee that guidance or example may come from some like cross-functional, cross-industrial working group such as the R-con-sortium submission working group and such. FDA probably would prefer the R-con-sortium working group model. Developing a guidance, publishing a guidance, collecting comments is quite involved and tends to be focused on really high-profile issues such as how to conduct clinical trials during this COVID pandemic interruptions. So I'm not sure that our submissions rise to a level of a guidance. We would probably prefer to work through a non-profit organization such as the R-con-sortium. Yeah, go ahead. No more comments. Yeah, thank you. Yeah, so I think like also just a kind of shout out for the working group. Basically the R-con-sortium working group is open to everybody. Yeah, so if you have specific ideas on how the like how the guidance should look like what are the questions like we want to work on just feel free to like join the working group. You can find the information in the R-con-sortium website. And going through the questions in Q and A, I see there are also questions about using like SAS and our hybrid submission. For example, maybe using one language for data generation and one language for like TLF generation. I just wonder whether you have any like concerns on that. Is there any challenges? Do you foresee any tips for people who want to do this hybrid submission? Go ahead, Ryan. Yeah, so I think it is better to submit in one language to make a stream line review like not switching back from SAS to R2SAS. But based on like specific purpose, I think it will be okay. And at once the sponsor is just a hybrid of SAS and R, we accept it because we thought that that will cause no problems in our reviews. And our goal is not like reviewing the codes. I mean, the reviewing, I mean, what are the codes? What the package are well made or not? It's the science that included in the data. So if they are making correct science with their data, their gears, their tools, whatever language I think is okay, but most of us are familiar with SAS and R. So that will be the two majority language that will be considered. But as a reviewer, I think like one language is more easier to deal with. But if the hybrid can make like better solution, why not accepting it? So I may accept it. Paul, please. Yes, we have hybrid workflows internally. It can be more challenging now as Ryan alluded and Haysu alluded to is that we're moving from a PC based SAS support to a server based. So going back and forth is a little more challenging right now, but if we can, if we can use one tool consistently in one stage and another for a different stage, that's probably preferable to intermixing the two freely. What I mean by that is if one performs the analytics in SAS and data munging in SAS, one could still perform the visualizations, for example, in R. And we've seen even internally for about almost 10 years those types of workflows going on. So I think that type of formulation is probably more acceptable. Where we run into issues is if we have a SAS macro that calls R in specific R programs or vice versa, there could be implementation issues on our end. Well, based on my experience, like when the whole review was submitted through SAS, but some information request questions are submitted by R, those kind of their sponsor answers were addressed using R, like especially for graphics because R provides better graphics, good colors. I think that was one reason that sponsor illustrated their response with our programming and also they submitted the code, but for most of the general review process, they tried to keep one language, like program did, kept one language, and also the other and the RCTs kept one SAS language for the whole review. Thank you. Thank you for sharing the consideration. So it sounds like if a sponsor want to submit a hybrid, you'll be nice if they can use different languages for kind of independent tasks or independent tracks. The nice ones will cause some headache. Yeah, you're right. Yeah, cool. And I see there are a couple of questions about like R environment and using the R in file like a file to kind of like manage a package version and also a question about Docker basically container-based solution. I wonder whether you want to share a little bit on like how FDA stuff manage package right now. And Paul, I know you shared a little bit on the IT, like restricts from FDA, from in the working group meetings. Could you maybe share a little bit on yourself, like in the future, like how the package management may be done in FDA? Sure, so there is a pilot called Fiddle that's looking into providing a more enterprise level R system based on some of the R studio products. That's actually more of where the package management exists. Most of our reviewers are at a slightly different level where things are just basically being done on the laptop level. So there is the possibility for evolution to a enterprise approach for package management sometime in the future. Docker is a problem right now. We've had an early attempts to use Docker but they were not terribly successful. Part of it was that the officially approved version of Docker that we have for FDA was not consistent with what sponsors were providing as test cases. More recently, Docker has gone to a different subscription model. So even to test, we would not be able to use a free version. So because of that, I think Docker is probably not a suitable platform at this stage. There are ongoing discussions within the R Consortium working groups whether a different container platform such as Podman might be more appropriate. Thank you, Paul. Yeah, if anyone want to contribute to this like future like a container based pilot, please feel free to join the R Consortium working group. And another related question I saw is asking that whether it's okay to use a snapshot of Kran or M-RUN to kind of specify the package version. I guess from the R Consortium pilot one, we did use M-RUN. So maybe from Keith and Paul, it sounds like that could be a good practice for people to specify the package version. Very much so. And we run into that problem in the proprietary world even that the different builds of SAS have different capabilities. And in some cases we've had to say, oh, this is not working because we need TS5M1 or something like that instead of what we were using. So that exists, some of those types of issues exist both in the proprietary and in the open source world. Thank you. Cool. Maybe to switch the topic a little bit, I know like when talking about our shiny is in everybody's mind. Yeah, just wonder like from FDA side, like are you using shiny? Do you have any shiny apps developed like for review purpose? And have you seen any shiny based submission for like USFational product? Yeah. I can start first. So I haven't seen any shiny app submission through official review, but we have some internal group developer group that who are developing apps for internal views. And Paul may add more about this group. And yeah, I think as a reviewer limited to my perspective and experience, I think shiny app can be very helpful if those interactive dynamic features can be demonstrated in like a specific analysis to address like questions in IR. But I consider like static tables and graphs are still good enough for like a streamlined review process so far. But knows like how the future will change. Right. So we are developing shiny apps for internal use to help out with some things. These tend to be more specific cases that are not addressed otherwise. We have yet to see a shiny app as part of a submission, but as Hesu and Ryan were pointing out as part of their standard workflow is they attempt to perform the analysis independently without looking at sponsors code. And so the shiny app needs to basically complement that workflow. That we need to be able to believe that the sponsors formulation is correct in the first place before we'll look into specific results. In some ways, this is me talking as opposed to FDA. I think some of the interactive and visualization type issues may be more receptive to clinical reviewers might be the ones who would benefit more than statistical reviewers in some cases. Just because we're more focused on providing that independent review activity. Yeah, I can just echo what Paul said. So I think I'm using our shiny app as well for my internal research, but our shiny app with the submission would be helpful and useful as a supplement. If it is not a supplement, I think as a regulatory review perspective, like we have to verify whether it provides the accurate and reliable result. That means we have to understand behind the background process. So I think it increased the review complexity if it's gonna be the primary things that we have to verify. Right. There is one thing that could help, but we're not necessarily advocating it entirely is the Zhou Cheng came up with the R meta package or the meta package that allows one to produce code, R code that would replicate a specific shiny snapshot, for example. So those types of tools are something that folks may want to consider. Thank you. Very good point. Kind of like for documentation purpose, like we don't want to purely rely on our interactive interface over there. Very much so. Yeah, but just adding a comment. So for the program, the R markdown output, the HTML output was kind of useful when we communicate with the clinicians. They liked the new format and it's easy to scroll over all the results or just moving by tabs. It was making us communication more effective. So if shiny could show up, be kind of play that role. And also clinicians are curious about what happened if the levels of this serum goes up and down. So if the answer can be provided in dynamic way, that could be another way to use the shiny as independent tool for review to address specific questions. Yeah, one side is that maybe it will be helpful if like the sponsors or the companies open source their tools, like instead of like submitting a shiny and maybe like open sourcing their shiny tool or analytical tools. So that FDA reviewers can use those tools if those tools are trusted, getting trust and then use those tools to generate, implement a shiny F by yourself for your independent review, then that can be used for kind of like information sharing with clinicians, et cetera. Yeah, cool, awesome. Yeah, and another question, I guess Ryan, you mentioned a little bit that in your review, like basically alternative data format was used like basically CSV was generated and our data was generated. I think like many our users experience that like the XPT file required right now, it's not particularly friendly for R. Yeah, so like I just want to hear your thoughts like in the future, do you foresee that like this might be to the XPT requirements be changed or maybe additional data format can be used for supplemental purpose, et cetera. Well, I first want to clarify that when we are doing review from December 2020 to July 2021, the FDA rolled the evidence guidance, draft guidance for those registry base or claim dates with documents hasn't been published yet. They were published like in November, December, 2021. So during the time of review, sponsor had a lot of freedom to try a lot of things. And we had no specific guidelines at the time. So we had a lot of discussion with different office, office of computational science and also real world evidence subcommittee, word domain divisions office that we had discussion how to handle these data issues. And the issues that I address is it, it's because the real world data, it's they're using the raw data, not transform into status format. It has longer variable names, like the length of the, it has longer things in their data. So if we transform that into our FDA format, expert five, a lot of the data was supposed to be truncated. So that's why they were requesting for version eight. And based on the discussion, we thought version eight is only beneficial for lifting the limits of characters. However, that could further increase the file size and we were concerned about the audit trails. But after the discussion, although FDA cannot require version eight, accepting it was a different issue. So we decided to accept version eight by reviewing the sample data. And that's why we asked the sponsor to submit the data so we can play from our end. And that was, so that's why we accepted version eight data and working with our markdown as a tool. And I believe in the future, there will be more submissions coming with raw data with that irregular data format. But I believe the new guidance has some guidelines that recommends to you see the status format. I don't know that that is finalized, but probably there will be more systematic approach to handle those irregular type of real world data in the future. We can also point to the study data technical conformance guide for most, a lot of the standard issues in that area. As Ryan's pointed out, we can't accept more recent versions of XPT than version five. I think we might, we might last part. Yeah. Maybe while waiting for power to join us. Yeah, another question, related question I saw in the Q&A is like Ryan mentioned that like right now probably the real world data guidance require people, we recommend people to use cities format. And there was also pointing out that that the cities format may not be, especially are friendly either. Yeah. So, so basically like, is there any thought on how we might kind of like influence the cities like group to kind of like, like accommodate some of the challenges with using R. Yeah. It's, C-Disk is again a consortium of its itself. And most companies have representatives on the C-Disk boards and development groups. So, you guys have a say. Yeah, I feel like it's the same similar style rate for the ADRG. I think when we did our consortium or submission pilot one, the ADRG format is very fast focused as well. And then we're hoping to work with the fields working group to maybe update some of the sessions over there. Yeah. Pa, I think we lost you for a second. Like, do you have additional thoughts on the data formatting question? Oh, I've been at FDA since 2008. Nobody was happy when I, with XPT files. When I came on board, there have been several pilots that didn't really pick take off and we're still using XPT. So I don't know that anything has come along that has gained white spread acceptance and will be allowed to replace it. But it doesn't mean we should stop trying, but for whatever reason, there are issues involved. So hopefully things will improve with the subsequent version eight issues that Ryan was talking about with the length of the variable names. The version five XPT file format is not terribly efficient when it comes to transmission or storage. Thank you for sharing that. Yeah, and feel free to let us know if an industry can help with any evaluations or any information. I know there are a lot of people want to, maybe like establish a working group looking into that. Cool. Yeah, go ahead, anything to add anything? Oh, well basically I think that's still an ongoing issue that we've tried a couple times. There was one attempt to look at HTML as replacing XPT that didn't lead anywhere that I'm aware of. So if there is a better method that industry wants to propose, I think people are open to it, but there seems to be deficiencies that have been pointed out with the alternatives to XPT so far. Thank you, thank you for sharing that. All right, I see we only have seven minutes left. So maybe I will close with the last question. Yeah, so my last question to everyone, to Ryan, Hissu and Paul, is that like how can we kind of help like to make your life easier when reviewing our like our face submissions? Yeah, I guess like you shared a bit of tips like in your presentation, et cetera. Maybe the last question is just like any final tips on how we can make your life easier when we submit our face submissions, yeah. I'll go first, so. I believe, so this time the progress submission was kind of an excellent data sense practice because it provide all necessary documents with good explanations. But what could be even better is like annotating the codes will be better, especially for the proprietary package because there was actually no like detailed description in the code. So I have to understand line by line which takes lots of time. So those good description will be very helpful for reviewers to save time and also both understand the sponsors perspective for using those packages. Yeah, so I think, yeah, like I echo Ryan, like, because we have to evaluate the submission independently and replicate the result by ourselves. So maybe easy to read and then clear like what version they use, what packages they use and then like do the good coding practice, like commented it, what this code are doing. So that might make us easier. And then as I said, communicate with the FDA review team only on so that we can plan ahead and then aware of it. So that might make it easier. I'd echo my colleagues, good documentation, good programming practices. A while back, Fuse put out a white paper for some of that. Our colleagues in CDRH have good machine learning practices document for those who are moving into that space. So I think that's one of the key things is that having code and processes that are well documented definitely helps from our end when we're having to figure everything out and become what I used to call human compilers, it becomes much more time consuming and that becomes a real problem when we're on a high priority review clock. Thank you very much. Thanks everyone for sharing. So with that, I guess I'll close this session. Yeah, thank you so much for the great presentation and also for the transparent discussion over here. It's very nice to learn kind of like your day to day work and also it helps us to kind of like make sure that we prepare our submission packages accordingly. And also thank you everyone for joining us today. The recording will be available on the our consortium, our adoption seminar website probably several days later. Yeah, all right. Thank you and hope everybody have a good day. Thank you very much. Thank you so much. Bye. Bye.