 And we're going to start again moving on to the next panel whose topic is EMR and clinical phenotyping challenges and new opportunities. And again, with the general motif of an e-merge presenter, a reactor and a summary, our presenter is Josh Denney. Are you there, Josh? You're unmuted. You're unmuted. Are you unmuted, Josh? You're unmuted, Josh. Ah, great. Can you hear me now? Okay. Now we can hear you. Yay. Great. By the organizer, which, you know, you're not able to unmute yourself, I think, at that point. So Brandy, why don't you just drive the slides? And some of these slides are going to be kind of a rehash of what has already been presented, so I'll go through them quickly. And then we can move on. So Brandy, why don't you go ahead and advance? Next slide. So this comes from our original charter condensed a little bit. Our goals were to develop, validate, and implement about 27 EHR phenotypes, which would bring the total between e-merge 1 and e-merge 2 to over 40 for genomic study across e-merged sites. In addition, the total gets a little bit larger when you think of things like hemicromatosis and other projects that have been engaged. The model for each phenotype was that there would be a lead site that develops it and validates it, one to two other sites to deploy, validate, revise the algorithm with their lessons, and then deploy it across the network. And e-merge 2 used the model, but everything used existing genotype records, as opposed to using specific cohorts as done in e-merge 1. And of course, we wanted to investigate ways to preserve privacy, promote data algorithms we use across the network and amongst other sites. And then the second goal was to improve the process of EHR phenotyping. And we had this wrinkle that we were thinking about implementation here, too. And as we've gotten into the e-merge PGX project, things where it might actually get into clinical care. Next slide. So this just summarizes where we are now. The three colors here represent where we are. What was originally, I think, greenish on my screen was a little different now. It represents phenotypes that have been done, really with GWAS, various stages of completion now. You can see the yellowish color, the phenotypes that are expected to be done next, and then the others in development. So we're making good progress on the specific e-merge 2 phenotypes. There have been some phenotypes that have been explored in addition to this set through some cases even more formal algorithms and validation that then were realized to be not feasible. So part of the process, I think, is an investigation process and then revision as needed to see which phenotypes actually can be done. Next slide. So overall, what we've learned is there can be summarized as four important parts to a phenotype algorithm to get an accurate case and control algorithms. And they usually almost all include billing codes of some sort as a necessary but not sufficient first step, and then validation of those billing codes with things like medication data, lab and test result data, and then clinical node data. And sometimes we use employee various degrees of natural language processing. We may use text mining. VRACS mentioned. We've done some investigation into machine learning techniques and active learning as well for specific phenotypes. And one of the things that's not represented by this, some of them include temporal elements as well. Usually they are combinations of Boolean Logics operating on these different components. Next. And this was already gone over by VRACS. I'll skip that. Next slide. And VRACS and Dan both mentioned PKB. Just right now we have about 66 phenotypes that are in various stages of development. 73 implementation evaluations have been published. And one of the nice components of this is it has extended some beyond just to emerge as a tool for others as well. Next slide. And so this slide summarizes the 73 pieces of implementation data we have on PKB right now. And so each X represents one site's implementation data. And you can see that we've broken out the positive picture value by primary site and then the secondary site implementation for case and control. And the red diamond represents the median PPVs in each of those buckets. I think one of the great things we see from this is by and large the algorithms have performed well at the secondary sites as well as the primary sites. So it sort of illustrates that transportability is possible, sometimes with variation in the algorithm of course. And I highlighted there one of the outliers was drug induced liver injury. But this highlights that for very rare phenotypes lower positive values would be tolerated. And I think it's okay that this sort of rare phenotype algorithm didn't perform as well from a positive picture value standpoint because it's feasible to review the algorithm. So that's one of the learning points we've had as we've gone along with the recognition of how to optimize PPV for a given algorithm based on its goal. Next slide. I wanted to illustrate where sometimes this can be challenging. One of the network algorithms from eMERGE 1 that has persisted into eMERGE 2 with the new sites running this algorithm as well as resistant hypertension. And overall the algorithm performed well but in some cases the necessary data wasn't available at a given site for the algorithm to perform well. In some cases there were just the difficulty of the algorithm led to some implementation issues which the initial estimates of the positive 50 values didn't bear up with closer scrutiny. And that's why you have numbers like 95 going to 46% to 94% to 3%. Of course we fixed all this but the initial runs of the algorithms did find some of that. And in one case the control algorithm actually wasn't able to be run at a given site just due to lack of the necessary information in the EHR. So one of the things this highlights is the need for, of course, careful scrutiny of how you implement it but also the need to be able to evaluate your algorithm and potentially share it in a more structured way than what we've typically done which are use of Microsoft Word documents and PDF and then ways to automatically validate things like the data dictionary as we've had these same sorts of problems with that data as well. Next slide. This shows the FIWAS catalog website. We've put all the FIWAS results that Rex and Dan had talked about earlier on this website as something you can query, you can download data, you can graph the data, search it, that sort of thing. And then demo data sets as well. And so it's another way we're trying to share data from eMERGE. Next slide. This shares the record counter. The record counter is a tool we created from the Coordinating Center that houses the data from those 53,000 genotype samples in eMERGE. And it allows you to quickly query via things like ICNAN codes, CPT codes, demographics, and site information to see whether something is possible. That goes into filling that need. I mentioned when we showed the original accounts to help you focus what phenotype to get next. And we're doing the same thing with Sphinx as well. So Sphinx will also have medication data in it. Our shit data already does have medication data in it. And this covers the data that's being generated from eMERGE PGX, those 9,000 people genotypes. Next slide. So key questions for eMERGE 3. We talked about what phenotypes to explore, how to make the process faster, better, how we can improve accuracy and reproducibility, and how we can best leverage the unique nature of the EMR. Next. And so one of the thoughts we had in our group was moving beyond just disease gene associations to more detailed phenotypes. And we think that might be something that we can do specifically about the longitudinal data that we have, the deep data, and a large sample size of 350,000 plus people. So one of those would be less common at rare phenotypes, pharmacogenomics to follow up on eMERGE PGX as we start to sequence and have this data over time to these subtypes that may not be available in these large cohorts that collected prospectively, and then longitudinal phenotypes such as change in creatinine over time, or development of progression of a disease state. And then another option, another idea would be phenotypes for clinical implementation such as would be deployed in a clinical decision support system. These phenotypes would have unique characteristics such that they could be real-time, implementable, likely you may want to optimize positive sensitivity differently, and could help iterate in that learning healthcare system. Doing these rare phenotypes or subtypes that require bigger sample sizes, they may be harder to implement, may need more manual validation, and so I think there could be a tension between the number of things we do versus the detail that we do, and we may want to engage in fewer phenotypes to do so. Next slide. One example of a rare phenotype would be adverse drug events or rare diseases. This shows as a screenshot of a case control GWAS on Ploocloxacillin drug-induced liver injury where they had a highly significant signal with 51 cases, just showing that rare phenotypes don't always have to mean you may have stronger effect sizes, so you may not need thousands of cases to find signals. The clinical impact could be greater, and given that many of these phenotypes may be lethal, having a prospective collection such as we have with the EMR cohort may be a good way to capture them. Problems could be that GWAS data may not be detailed enough, and we may need new genotyping or sequencing to go after people within the 350,000 people that haven't already been GWASed or sequenced. Next slide. Gosh, you're at 10 minutes. Okay, I'm almost done. So new methods as well. We could expand common infrastructure phenotyping using them for CDS clinical decision support. Machine learning and active learning could be applied more robustly. I think phenomic methods. We've talked about GWAS. We could expand those across the network and looking at refining phenotype algorithms to include possible cases and how we could capture those and include them in our algorithms. Group health done some work on that. Next slide. And central resources could be expanded. The only thing I want to highlight here is structured data dictionaries in data validation tools. It would be something we want to coordinate, I think, to have across the network. We wouldn't want a whole bunch of standards. We would want to have one standard as much as possible. And I think the next is going to be the final slide. So this is just summarizing those questions and what we've stated. That looking at different ways to improve accuracy reproducibly makes this faster and maybe different kinds of phenotypes. With that, I will end.