 So, we're going to move into the first session, which is making the case for the plan for genomics informatics research strategy. And so, we have two co-moderators that will address this, help address this session. Our co-moderators for this meeting will be Dr. Rich Chisholm and Dr. Ursula Onomachado. Dr. Chisholm is an Adam and Richard T. Lynn Professor of Medical Genetics in the Finverse School of Medicine, and also the Professor of Cell and Development Biology and Surgery at Northwestern University. He's the Founder and Director of the Center for Genetic Medicine, and is also the Vice Dean for Scientific Affairs at the Finverse School and Associate Vice President for Research at Northwestern University. His research focuses on genomics and bioinformatics and precision medicine, and he leads Northwestern Biobank efforts in New Gene, which enrolls research participants in a study focused on investigating genetic contributions to human disease, therapeutic outcomes in gene environment interactions. Dr. Ursula Onomachado is an Associate Dean for Informatics and Information and Technology and the Founding Chair of the University of California San Diego's Health Department of Biomedical Informatics, where she leads a group of faculty with diverse backgrounds in medicine, nursing, informatics, and computer science. She's also the PI for the Califory Precision Medicine Consortium for the all of us research program, and Dr. Onomachado's research focus on privacy, focused on privacy preserving and distributed analytics for healthcare and biomedical sciences. Dr. Chisholm, Dr. Onomachado, thank you for being our co-chairs for our first session, and I'll have you introduce the panel of discussants for this. Thank you so much for inviting us to help moderate this session. Rex will do that while I will introduce the speakers. First, the one who needs no introduction is Co-Chair of this meeting, Dr. Mark Williams, who is the Medical Geneticist, Professor and Director of Emeritus of Geisinger's Genomic Medicine Institute. He served as PI for Emerge projects, and he leads the EHR work group of ClinGen currently. He's in the board of directors of the American College of Medical Genetics and Genomics and Interested in Genetic Services, Economic Evaluation, and Implementation. So he will review the survey that was conducted prior to this meeting. The other speaker will be Dr. Casey Overby-Taylor. She is Assistant Professor of Medicine and Biomedical Engineering at Johns Hopkins School of Medicine, who joined appointments in Informatics, Computer Science and Public Health. She was previously co-chair of the EHR Integration Working Group of Emerge, and recently awarded NHGRI's Genomic Innovation Award, recognizing her research in developing and evaluation methods to incorporate genomic results in clinical decision support. She will talk about the technical disorder for genomic clinical decision support reflecting on implementation. And lastly, we have Dr. Janina Jeff, a Population Geneticist, Bioinformatization Activist and Educator. She is Senior Bioinformatic Scientist at Illumina, where she developed pipelines for content annotation selection in design of population specific genome-wide content for Illumina's genotyping array. She is one of the top 100 influential African-Americans by The Root magazine and author of In Those Genes, an international award-winning podcast that uses genetics to decode the lost histories of African-descended Americans through the lens of Black culture. She will be talking on inherent racism that induces bias in algorithm development and can implementation of genetic information in clinical informatics decision-making be any different. So please welcome our distinguished speakers. Great. Thank you, Lysella. Thank you, Rex. I appreciate it. I also wanted to thank Dr. Green for presenting the strategic plan. I would note that this is the first genomic medicine meeting post-strategic plan, so one of the things that Ken and I will be doing is to try and map some of the takeaways from this meeting to the strategic plan. The other thing I wanted to highlight was to say that we really took very seriously the NHGRI's commitment to diversity and have tried to have the presenters reflect that commitment to diversity. I think we're fortunate in informatics that we do have a very diverse work group and so we've been very intentional about trying to make sure that we have that reflected in the presenters that you'll be seeing today and tomorrow. So with that, let me go ahead and launch into my presentation here. The objectives for my presentation are to present and discuss the survey results. And first of all, thank you for all of you that took the time to complete that survey. We find that to be very helpful in terms of framing the meeting and also helping to give information to presenters that makes their presentations more relevant to the objectives of the meeting itself. We also have the opportunity to compare results to a prior meeting, GM 7, that was focused on genomic clinical decision support. We also had a phenomenal number of written comments that came in as part of the survey. We've not done a formal thematic analysis, but I have extracted some themes that seem to be recurrent and we hope that this will set the stage for the rest of the meeting. So we invited 83 attendees to participate in the survey and we had 33 responses, which gives us a response rate of just under 40%, which is really quite good for a survey of this nature. And importantly, of those that did respond, all of them completed the survey and provided extensive written comments. Now, these are the eight questions that we used to frame this meeting. And I'm not going to go through these questions, but I thought it was really interesting as I looked at the visual to see where the responses clustered. And you can see somewhat unusually for a a liquored scale that we had a lot of responses in either agree or disagree, disagree, and strongly disagree with a smattering up to agree and strongly agree. This is informative in the sense that I think it is an endorsement that we're on the right track for the topic of this meeting, which is there's a lot of opportunity for research around these types of questions as reflected by this. And I wanted to just pull out a couple of ones that I thought were particularly useful, at least when I was thinking about it. The first is to look at the mean response rate across these eight questions. And you can see here that this numerically reflects what we saw on the prior slide where the responses are really ranging between about two and four, so in this middle range. But in particular, question five and question six, I thought were very interesting. So question five are the methods for integrating analytical interpretations are well established. And four here is consistent with a disagree. And you can see that the range was only between three and five so essentially neutral to strongly disagree. The question six then is the genomic medicine community will benefit from having a revised technical desiderata and here we had a mean that was on the agree with a range that was between strongly agree and mildly disagree. So again, I think that was very useful to look at those sort of extremes of response. There's also a tremendous amount of variability across the different questions, which is again somewhat unusual for questions of this nature, but very little variability actually across question five. So this raises an initial point to me of whether or not this question five really is a frame for a research priority. So we'll come back to that tomorrow after we've had a chance to absorb all of the rest of the presentations. Now in October 2014, as I mentioned, we had a genomic medicine meeting genomic medicine seven, which I was pleased to co-chair with Blackford Middleton, who unfortunately was unable to participate in our GM 13 conference, but the focus was on genomic clinical decision support. So much more narrow focused meeting than what we're currently experiencing. Now we did a survey prior to that meeting as well. And we used as a basis for that 14 desiderata, seven that were published by Dan Macy's and colleagues on desiderata to support genomic medicine in the electronic health record, and then a follow-up publication by Brandon Welch and colleagues that looked at technical desiderata around genomic clinical decision support. Now in the GM seven meeting, we queried on two different scales. What was the importance of a given element? And what was the gap between the current state and the ideal future state of that particular element? But we also asked the attendees of that meeting to prioritize the elements. Now for this meeting, GM 13, we essentially asked people to agree or disagree with the desiderata, which is really more similar to the importance of a given element. So while they're not directly comparable, I think we can still take a look at this and draw some conclusions. Now this was the mean element importance. And we did score this like golf, where the lower the score, the more important the element is, and the orange bar reflects the standard deviation. So you can see a lot of variability across these, but this did give us a rank order of these questions. And so what I've done for comparison is to really just do from most to least important across the two meetings, and then I pulled out a couple of questions to really focus on. Number eight was scored the highest for attendees from both GM 7 and GM 13, which is the clinical decision support knowledge must have the potential to incorporate multiple genes and clinical information related to those multiple genes. And I think that that is something that we'll need to kind of keep at the center of what we talk about over the course of the next two days. Now the number 10 was also in the top five. It didn't score near the top, but it certainly was represented in the top five for both meetings, which is clinical decision support knowledge must have the capacity to support multiple electronic health record platforms with various data representations with minimal modification. So those were the two that sort of maintained a priority. Now as we look at things that have changed, there were several that changed from a relatively low priority to a high priority between GM 7 and 13. One is to leverage current and developing CDS and genomic standards. My interpretation of why this has moved up is because at the time of GM 7, there were almost no standards for either clinical decision support or genomics. Now we actually are seeing standards emerging in both areas. So we're able now to actually use these standards as they're becoming more common in clinical practice. The next desiderata that moved from low to high was to maintain the linkage of molecular observations to the laboratory methods used to generate them. And I think this reflects a standardization of techniques in the laboratory that have become more standardized and that we are reflecting more on the importance with the different techniques about what the particular method actually has to say about the molecular observation. And making sure that that information is available at the point of care is critically important for interpretation. And you can see that that ranks now as the second highest priority for attendees of this meeting. And then the third one, and this is really interesting to me because we spent a lot of time at GM 7 talking about this idea of the separation of the primary molecular observation from the clinical interpretation. And we spent a lot of time trying to define what we actually meant by that. Well, now this is becoming again, almost axiomatic that the clinical interpretation really is a different process from the molecular observation. And it's critically important to reflect both of those areas. Now we had a couple that moved from a high priority down to a low priority, which again, I think reflects how the field has changed over the intervening years. One is to support a CDS knowledge base deployed and developed by multiple independent organizations. The fact that this moved down, I don't think reflects that we don't think it's important, but we now understand just how incredibly hard it is to do this. And so this may not be achievable at least in our current electronic health record environment. And then lastly, the need to support human viewable formats and machine readable formats to facilitate implementation decision support rules. This moved low, I think because we've actually accomplished a fair amount in the intervening years to support this human readable and machine readable formatting. Now I stole this directly from the strategic vision. So Eric knows that at least one person has read it besides his staff. And I think this is special relevance for number seven, which is the desiderata to support both individual clinical care and discovery science. Now, it's interesting that this was ranked as number two in GM seven and was ranked sort of in the middle for GM 11. But I would hold that this is a really critical piece because it absolutely is reflected in this diagram, which is the idea that we need to create this, these virtual cycles that takes basic genomics research moves it into a genomic learning healthcare system, and also sends knowledge from there back into the basic genomics research. So this is an area where again, I think is a good opportunity. There were a couple of additional themes that emerged from the free text, the importance of assessing stakeholder preference and workflow, sustainability of resources, lack of methods for evaluation of innovation implementation, and the impact of consent and a regulatory framework. So my takeaways in terms of the implications for the research agenda is that this is a target rich environment. We've got a lot to do in genomics and informatics. There are persisting priorities over the last five years, but there's some of these priorities have also changed. And that research needs to include attention to the stakeholder engagement and workflow evaluation, the development of rigorous evaluation methods, consideration of a policy and regulatory environment research agenda, and sustainability. So the full survey results, including all of the comments are included in the meeting materials. And so as you review those, please contact Canner myself if you have comments on our interpretation. As I mentioned, each speaker has received narrative comments from the survey that are relevant to their topic and has been asked to reflect on those in their presentations and always keep the overarching implications in mind during our discussion. So with that, I will end my formal presentation and turn it back over to the moderators. Thanks, Mark. Great job keeping on time. We can ask for clarifying questions if anyone has one. And while I didn't see any in the Q&A, but Mark, maybe you could just comment. It seems to me that there is a little bit of a mismatch between number three, which was maintaining the link of the methods, and number one, separation of the primary molecular interpretation. That seems a little in conflict. Yeah, I think that's an area where we need to explore a little bit what we actually mean by these two things. I think the way I interpret this is that the molecular observation in some ways represents the raw output of whatever we're doing, whether it's a genome or an exome or a panel or whatever. There's then a molecular interpretation. In other words, if we see a variant in a given gene, is that pathogenic? Is it uncertain? Is it benign? And then that transforms into a clinical observation, which is it may be a pathogenic variant in a gene, but from a clinical perspective, if that gene is not associated with a disease that fits the clinical presentation, then it may not be of clinical relevance, even though it looks to be a pathogenic variant. In the graphic that we created for the meeting, we have this data to knowledge, to wisdom, a rubric, which I think encompasses all of those different areas. I think there's been a lot of work through a lot of the programs at NHGRI funds to begin to sort out this sort of thing, but there still remains a lot of work to do. That's certainly been a major focus of some of the efforts of the clinical genome resource, for example, to try and answer these questions and provide the information that's needed for implementation. Okay, thanks, Mark. I'm not seeing any other questions, and we'll have plenty of time for discussion. So let me move to our next speaker, who's Casey Overby-Taylor. Casey, the floor is yours. All right. Hi, everybody. So my name is Casey Overby-Taylor, and I'm going to be talking about the technical desiderata for genomic clinical decision support, and this is a very good follow-on to Mark Williams' presentation on the desiderata because they were preranked in terms of priorities, but I'll be giving some examples from my experiences within one network that's funded by the NHGRI, and it also provides some context for what people are also thinking about when they answer these surveys because what's ranked as reflective of the people who are involved in answering those questions. My involvement in e-merge has been as the EHR integration workgroup co-chair from 2015 to early last year with Sandy Aronson, and so this is just one example. I was asked specifically to talk about our current state for adjusting technical desiderata for the integration of genomic data in the EHR, also considerations for the genomic medicine community to revise the desiderata, and then areas where the research strategy developed by NHGRI could be useful to achieve the goals described by the desiderata. So first to just give very brief overview of e-merge, so this is an overview of the strategy used by nine clinical sites and two genomic laboratory results to return results from genomic sequencing using clinical reports and also the raw data that could be used for discovery. So this is kind of the beginning of the cycle that Mark brought up in terms of supporting both clinical use and discovery through reporting and repositories like DBCAP. One of the main goals for this effort in the third phase was to integrate genetic variants into the EHR, and so when we compare with the desiderata, the first part of the desiderata, we see that this strategy first maintaining separation between the primary molecular observations and the clinical interpretations of those data through the recording process of having the reports and the raw data separate. It also supported the compact representation of actual subsets in the reports and it supports both the individual clinical care and discovery sciences as I mentioned previously. To keep in mind also, the feeding back within the clinical system is a piece that wasn't necessarily covered within this infrastructure, so how we define what we mean by these questions will probably come out during this meeting as well. And digging down a little bit more into the architecture for how genomic results are returned, I'll just draw your attention to a couple of areas related to the desiderata, the second point about supporting lossless data compression. Within the infrastructure, there were the raw BAM VCF files that were maintained and then there were also the structured genomic result for return, they were structured within the XML format, which is this human readable and also machine readable format. And digging into this structured sequencing reports, this is a more detail of what sections are actually included in the report and relevant to the desiderata. So the third point to maintain the linkage between observations and lab methods, the report includes information about lab methods. And then the fourth point around including the actual subsets we see, for example, the patient interpretation of the disease was one particular section from that report. And then the sixth point to anticipate fundamental changes, there was a portion in the report around interpretation revision. So this is a format that could enable all of these data information to be included with the reports itself. Another important factor of eMERGE is that they spent a lot of time harmonizing items across sequencing centers. And by doing this, this was both the reports and processes that were relevant for genomic clinical decision support. When we look at the decision support capabilities, by doing this harmonization, there's potential for multiple genes and clinical information to be supported. So this is because the scenarios that were focused on within eMERGE included reporting from variants on 67 genes and 14 SMVs. In order to keep the knowledge separate from very classification, they maintained that knowledge so that they were able to have ongoing classifications within the network. So because of these scenarios that were being supported in eMERGE, they had the labs in particular had to come up with the infrastructure to support this and harmonize these items. In addition, there was support for a large number of gene variants while simplifying the decision support knowledge by having this XML structure that contains some decision what could be considered decision support knowledge so that variants variant associated phenotypes could be considered decision support if you're able to add that to a problem list of a patient, for example. And then this number 13, which is providing a knowledge base that can be deployed at multiple independent organizations, I think this entire process of coming up with consensus around these different reporting and processes within the network is something that allows for deployment across the network because it's adhering to a specific standard. So beyond the initial scope of eMERGE 3, we went on to some of us did a pilot project where there was first mapping the XML style report to an HL7 fire standard and this just shows a snapshot of what came out of that mapping process showing that there are portions of the report that were mapped to the fire standard and an enormous amount of work went into this process. When we look at the remaining items from the desiderata, by leveraging this standard we would be able to address this. The portion that I was involved in with Luke Rasmussen was in demonstrating the potential for decision support and so a lot of that was focusing on specific portions of the report that could be used for decision support so that included the medical implication as well as the variant associated phenotypes which is not shown in this picture. So in terms of the desiderata, by using this standard the knowledge would have the capacity to support multiple EHR platforms because vendors are now enabling web services that could be able to read data that's in a fire format and so by doing that if the EHR vendors allow for access to data in those formats then we can leverage that in multiple platforms. Also we can leverage, we're leveraging current and developing CDS. There's of course additional phases in making this clinical genomics fire standard in the approval process that has to take place in order for it to be used broadly but this is a pretty strong step in the right direction and then finally access to transmit only genomic information necessary for decision support. Ideally we would be able to leverage the APIs if they're supported by the EHRs to be able to pull genomic information directly and in our experience we were able to do that to a limited amount but this is again a very good starting point to be able to do this. So now I'll move on to the second topic of considerations for revising technical desiderata and there are two experiences that I want to just describe from the EHR integration work group. One is we had monthly virtual work group meetings where at some point we had the site's report on the lessons learned that they had and also best practices for implementing genomic decision support and then second we had one exercise at a meeting in June of 2019 where we brainstormed some potential hazards related to implementation of genomic decision support and so we focused particularly on alert based decision support and two areas, pharmacogenomics alerts on drug orders and new variant knowledge updates to previous results because the second bullet was a large area focus for the labs that were involved in the network and I just wanted to highlight that alerts and reminders are just one type of decision support and in our conversation we might also be considered other types of decision support too and then another area of focus was around the architecture so we consider both ancillary genomic systems and EHR centered management. Ancillary genomic systems kept the genomic data in a separate system similar to a LIMS and the EHR would only be presenting the interpretations that were provided from that system and then for the EHR centered management example this is from an example that Mayo published where they used genomic indicators that were part of the EHR itself and so that's an example where the vendor has provided functionality to be able to manage genomic knowledge and so focusing on the last row here in this graph we saw that there was a mix and emerge in terms of which sites were leveraging EHR only versus ancillary genomic system and then there were four sites that were both EHR and ancillary and two sites that were neither so this is among 12 institutions that were involved in the network and we also captured the characteristics of implementation among eMERGE in terms of three dimensions first was timing whether it was prior to genomic testing or so the kinds of guidance on when to test or when to use genomic results as part of their as part of a clinical decision and then post decisions post test decision support which was for example upon ordering a medication saying that upon ordering medication notifying the clinician that they should that they should that they should use test results that are available for them so it's those are kind of the two contexts so when to test and after testing and then for delivery there's passive where you have to actively review the decision support versus active which we're often referring to as interruptive alerts where you might stop a clinical action from happening given the data that's available on a patient and then context where opportunistic is the area that all sites in eMERGE were participating in because that's you know for testing that's used for research purposes having a secondary finding being returned to a patient was kind of the scenario and then for three of the sites that we surveyed they were doing both opportunistic and population where population it could be a healthy individual who's getting genomic sequencing so that the results could help to help to inform ways to prevent future future risk of of disease so in our results we saw that five were doing post test only and that's that makes sorry all all all of the sites were doing pre we're doing post test but then a subset we're doing both pre and post and then for delivery both were doing passive and active decision support at seven sites where two were only doing active and one was doing passive and then then for context again everybody was doing all the sites were doing opportunistic with three sites doing population so one area so now that you have the context for where for the people who were involved in their activities within the EHRI integration work group for so initially we found that many of the sites had some unique experiences in terms of the implementation best practices and lessons and so what we ended up doing was was creating a venue for the the sites to be able to publish their case studies of how they implemented decision support and really last year they started to publish these they've been kind of rolling out the case reports of this of the sites who who submitted details about their implementation as part of the network and so these are lessons that are shared and can be drawn from second we did this hazard analysis at a at a meeting last year where it resulted in us understanding a few themes that could impact being able to implement genomic decision support inappropriate alert firing context technical issues user experience problems knowledge management or for the themes that came out of this and so we might be able to talk a little bit more about that what what we found is that more in depth has an analysis exercise exercises are needed to really flesh out like what are the what are the what's the severity of some of these issues that came up or there may be additional issues and then also which which level of risk is acceptable so being able to really assess those aspects you see one minute warning okay thank you and so just so just to summarize so we have these case series articles there's other articles that that report lessons learned and so one suggestion might be to do a content analysis of these articles to figure out like what are really the the common challenges across groups and then in terms of a hazard analysis we'll be able to get into many of the implementation challenges at this at at this meeting but we may also want to expand further in a larger NHGRI workshop and so I'm out of time so I'm going to stop there thank you thanks Casey great summary I think we'll save most of the questions for discussion but maybe just to the highlight one that was asked in the chat do you have any sense whether there's going to be differences in implementation of these desiderata in different environments for example urban versus rural health settings yes I mean we just we found within within e-merge that every every site even though we had this common infrastructure for reporting that the the implementation ended up being different so if we're considering rural versus like those kinds of settings the context may be different so one example is for when a pretest when genetics is relevant for different scenarios may be different and so how you implement will be different so for for warfarin testing is more helpful in some in some cases in rural settings to be able to get to the right dose for a patient much quicker than a site that might be able to monitor a patient more more closely and make adjustments quickly so for that when you provide the decisions for that timing aspect may be different thank thanks Casey I think that actually sort of highlights our transition to the next step the importance to pay attention to diversity as we think about this broad environment so I'll invite Janina to give her presentation floor is yours okay hi everyone I am giving a very very very quick talk on inherent racism and how it induces bias and algorithm development so I'll speak very broadly at first and then I'll talk a little bit more about genetics still very broadly so I do want to go ahead and start off with a big disclosure that this is just a glimpse of information that's really just digestible given this time constraint I honestly can and am writing a whole book on things like this so by any means this is not a complete thing I also want to also say that this is not a reflection of my work at alumina nor does it reflect the beliefs of alumina or my work there as a scientist specifically so this is outside of the realms of what I do at alumina which I do speak about but not today so let's talk about how race influences algorithm development and so this is an example of a commercial software compass that was used in Florida and really what they were trying to do is predict the risk of a group being re-arrested and so this was used by judge before trial for people who were being convicted to determine their sentences but also to determine the likelihood of them being a repeat offender in this case they they looked at these two groups let's just say one is blue and one is purple and these were the training data they made these assumptions based off of previous data that three people in the blue group would be repeat offenders are re-arrested versus six people in the purple group what's really interesting about this is that the people who developed this algorithm say that this was considered to be predictive parity which would mean that even in the case we are making these predictions and there's an underlying assumption that there are more people in one group who would be more likely to develop a certain outcome that the rate of false positive and false negatives would also be the same but what's really unique about this this particular algorithm is that that is not true what you actually see is an overrepresentation in fact in the blue group one person out of this seven was misidentified as being high risk for being re-arrested where is almost half of the people in the purple group were more likely to be false positive and so the false positive rates are not equal and so this becomes extremely problematic mostly because what is not being accounted for here is that by using these training inputs that we're not accounting for a race as being a confounding variable in this case being blue or purple and we also are not accounting for the likelihood of the fact that the false positive rates is higher in one group which is probably speaking to what begins with which is that the purple group might have already been unfairly targeted for arrest in the first place and so this is just one example of how you know accurately predicting data from the past could be harmful in predicting data in the future this is one example but this is one of many examples where we see inherently races algorithms being used in the criminal justice system to a greater to a different extent we see this there being robots show that beauty contests that use algorithms will you know disproportionately select winners who are white versus those who have darker skin earlier last summer when we all started getting on zoom a lot of people were complaining that when they did zoom virtual background that in fact their their face would disappear if they were darker skin and you also see some ai like google maps that have glitches in the way that they read certain words for example malcom x boulevard is called malcom 10 boulevard so there's a lot of bias already in a lot of systems that we have how does this look for clinical informatics and this might be something this might be something that you guys have seen already but one of the one of the largest examples of commercial risk prediction as used over 200 million people in the us really targets patients for for high risk care management and so the purpose of this tool is to be able to predict patients who will require additional attention for complex health needs before the situation becomes too too dire or too needed and so those who are over 55 percent are defaulted into this program those who are less than 55 percent are returned are referred to screened by a physician and those who are 97 percent or higher automatically enrolled into this program what i'm showing on what is showing on the the left hand panel here is the total medical expenditure and this is really important because we look at these two groups black and white and we look at this algorithm risk prediction which is based off of um insurance type diagnosis procedure codes medications everything even you know age and sex except ironically exclude excludes race when we look at these two we see that the total medical expenditure between the two groups is not different given the algorithm algorithmic risk score but what the authors here found is that this algorithm risk score is not taking into account how sick the patients were in fact there was a lot of bias that was only looking at the amount of the medical expenses which are also disproportionate between these two groups so how did we get here how did we get to the point where we have so much inherent racism in all of these algorithms where it be in the medical field or whether it be in the beauty industry and one of the ways in which i believe that we got here is because one of the foundations of everything we do is funded on the principles of racism and so for those of you who don't know i have a podcast called in those genes which is a podcast that uses genetics to decode the lost histories and futures of african descendants and we recently started our second season our second season is all centered around what is truly genetic and really the purpose of our show is to in somewhat of a documentary and educational format break down genetic concepts using black culture and hip hop in this particular clip that i'm playing though i talked to delon justin bill who's a biocultural anthropologist and a sociologist dr. saida grendy and the whole episode is focused it is focused on how race became about so it's a journey of how race started but this particular clip speaks to how racism initially started with genetics and in genetics as defined by gullton eugenics was the study of all agencies under human control which can improve or impair the racial quality of future generations and the goal was to have superior humans mate with one another to breed out inferior humans he was a racist who was trying to kill off all non-white people and there was a bunch of people on board openly identifying as eugenics but then in the early 20th century when real life science that is genetics rolls around a shift happens very many departments programs institutions and even professorships of genetics today they just renamed from like this institutions department of eugenics became this institutions department of genetics this professor of eugenics became this professor of genetics so all this genetics is trash we got to cancel the whole show um so while genetics itself i would not argue is racist these theorists and practitioners maintained former practices and ways of thought science is as socially and politically constructed as any other field what all of these colonialism era you know european empire era thinkers and scientists that are show us is that really all of these disciplines can be corrupted to make really wrong assessments in the service of european white supremacy and so that's just the clip and i think when i was first making this episode i was really caught off guard by how much i learned being in this field for over 15 years now but then i started to learn that like this is not something of our past this is actually something that is still very much so with us today and i would say even more so a part of computational genomics which is my expertise and so one example of this is car piercin who is the creator of the chi-square test p-value and principal component analysis something that i use very often as a population geneticist and he is also the person who justified eugenics in the annals of annals of eugenics which is a journal a peer-reviewed journal at the time that was used to justify a lot of eugenic theories now the name is just simply changed to the annals of human genetics and then you also see other examples like galton who is the founder of eugenics who was also the person who developed the regression to the mean concept which was then called the regression to mediocrity so i can't help but ask this question if we have such a dark past that is inherently you know fueled by racism and eugenics is there hope for us to change this and most importantly is it hope for us to change this in the context of participant engagement given the though given the history that we have that still sits with us today and so i show this slide a lot of times is one of my favorite examples where we can see the impact of race and the clinical implementation of genetic information on this left side i am showing genetic variants associated with hypertrophic cardiomyopathy in a paper that was published in 2016 and really what was special about this paper is that it showed these pathogenic variants in these five genes that were supposed to account for 70 percent of the trait variability or 70 percent of the genetic heritability for this particular trait but when we looked at it in black americans they were being misdiagnosed because the allele frequencies between these two populations were very different in fact in african americans the allele frequencies were quite common not the clinical you're not the you know typical picture that we see for pathogenic variants in clenvar a more recent study conducted showed that when we look at a cmg variants in clenvar 11.5 of these have inflation in these pathogenic there in these past the pathogenic sets and this inflation gets exacerbated the less information that we have to confirm these clinical these pathogenic variants by studies by representation by data essentially and so i go back to this question of can we do this and the answer is maybe um there are some few examples that people who have done this and as kasey has mentioned there is a lot of things that are going and happening one example is going back to the study i had showed earlier with the 200 um with the algorithm that was classifying based on uh the number of money that is spent on a patient given how sick they are going back to the same paper if you look at this algorithm risk score which we know is biased and we look at the number of chronic conditions we see that the black patients are sicker um although they are not being able to get put into these programs and that is because this cost that is associated with this risk score is actually representing the lack of cost that's invested into the black patients despite them being sicker this group once they account for this and re-correct this show that they can increase the amount of black participants that would have been enrolled into this very beneficial program by from 17.7 to 46.5 percent for those who are greater than 97 percentile meaning they would be automatically put into the program likewise for genetics going back to this a cmg example we could see that also taking into account things that we know impact genetic variants and disease like population a little frequent like disease specific population of little frequency will improve the inflation but not infix it and so taking into account these disease specific frequencies the inflation now is much smaller however this is only made possible because we have to have representation of genetic studies which is also a manifest manifestation of a different form of racism in this case the lack of trust between diverse communities and research communities therefore it's kind of like a double you know two-edged sword where in the sense that we know that we can improve this with more data however we haven't gotten to the fundamentals of why we have these issues why we have this lack of representation which we know largely impacts our ability to do this well another thing that this paper found which is not so surprising is that one of the burdens was that a lot of these were rare variants and as we know rare variants are population specific and most importantly underrepresented but I do believe that there is hope I do believe that we are we will be able to do this but I think one thing that we have to kind of remember is that are we ready to do this and when I say ready I think it goes beyond how many algorithms can we create to fix it well it really speaks to is how can we create a new system to build trust to also fix the issues the technical issues that are built in with these within these algorithms which is the work that we're doing and also how can we also include this entire ecosystem of the people who are building the algorithms and so when people ask the question what will take this slide doesn't do it justice it's actually an entire ecosystem but this paper published just last week talks about the one of the issues and Ruha Benjamin talks about this too which is that the coded inequity perpetuated is really because of those who design the algorithms and the tools and the people who design these algorithms and tools are not thinking carefully about the systemic racism that underlies them this is a problem that is you know also impacting a lot of diverse scientists and in this paper they show that when we look at R01 applications from 2014 to 2016 that the award rate is twofold lower for black black scientists what's really nice is paper also goes through the exercise of showing what it would take in order to make things equitable and according to what it would take it would only take this little red dot represents what it would take to implement an equity policy given the NIH annual budget this figure on this this figure figure B is also showing that if we do a lot of practices like making sure we have anti-racist reviewers we might be able to also increase the retention of black scientists who are doing this work as we know these scientists do have extra insight on how we can approach these problems and again being able to develop these algorithms as a critical critical critical part in one facilitating community engagement but also into making sure we have less bias in these systems that's all I have like I said this is a very very very short presentation but I'll end with this last clip from the podcast given all that history how should we engage with genetics given what we know about it being fundamentally built from racist ideologies I think that we first and foremost should not shy away from acknowledging exactly that constantly reflecting on the ways that it still informs shapes and contours how we operate today it's also about understanding that we're more complex than simple scientific measurements we should hold genetics in tandem with the other ways of thinking about how we understand ourselves how do we think of the biological alongside of the cultural um understandings and conceptions that we've always had you know how might you think about a genetic ancestry test alongside your family's oral histories I know I'm way over time thank you so much thank you that was really terrific and I think in many ways really eye opening I think we'll transition to the broad discussion but I did just want to give you a chance Janina to refer to or to comment on a comment put in the chat by Jeff Ginsberg who raised the broader question of how do we think about the construct of race and how that information gets captured in the electronic health record yeah I mean one thing that I think I would like to see us move forward move away from is using race as a term in general um one thing you'll notice throughout the presentation I had race and quotation marks is because I largely believe the more that we use the word the more that we are continuing the practice I think one way that we can start to move away from that is kind of tapping into what we do as scientists right we as scientists thrive on numbers in in quantity and so if we don't if we don't also engage in that when we recruit people and we put these things into our medical systems EHRs then that's one way of eliminating that bias and I think genetics is a great opportunity for us to make sure that what we're exploring or what we're testing and what we're studying is the genetic architecture and not racism those two are two separate things one of them does impact clinical outcomes not to say that they're not related but I think when we communicate this and when we start to talk about it we just have to make sure that we make sure that those things are separate and that participants understand that as well thanks and at this point I'll turn back over to Lucila for the to moderate the general discussion thank you all the panelists I think it's been very enlightening to to also see the chat going into directions one which is technical how do we get this done and implement and the other one is just the last presentation on how the injustice of some practices perpetuate and we are at the risk of enhancing that so both both threads lead to discomfort discomfort of different types and I would ask the the the panelists to discuss exactly that what kind of information of genetic information do you see most helpful to include immediately in the electronic health record and what do you see the the advantage to just disadvantages and potential perils of having that information be misused so let's start with Mark then Casey then Janina great thank you so we focused our efforts at my institution on the high impact genetic variation so looking at a relatively small set of genes where we understand the gene disease relationship well and where we have reasonable understanding of the mechanism of variation that can lead to disease and also some information about the prevalence of disease that's likely to result from that variation so in particular we focused around the CDC tier one conditions BRC one and two Lynch syndrome and familial hypercholesterolemia so that seems very straightforward but I think the slide that Dr. Jeff presented is the one that's really critically important which is all variants are not understood equally and while in our population which is a predominantly northern European we have very good data that can help us to interpret variation pretty effectively if our population were much more diverse we would have a dramatic increase in variants that we currently do not understand or might be miss annotated as being pathogenic when they're not which could lead to an inordinate amount of harm so I think the whole point is that even if you focus on something that we think we understand very well the reality is in the context of genetic variation across populations there's a lot we don't understand and there's a lot that needs to be done to really move that forward. Thank you Mark Casey your impressions. So I guess most recently at Hopkins I've been interacting with the with the clinical geneticist and so they I think it may be you know different at different sites but because there are quite a few patients that come here for genetic diagnosis one area that I've been looking into more is reanalysis of exome sequencing over time and so for the genes that we know a lot about that may be a place at least in this setting where we can really explore implementation more fully because it's happening regularly and I think it might be helpful to know for those who are doing implementation projects what types of genetics assessments are being done across institutions to be able to help focus like what's appropriate for different sites and I can come and brought up a point about making sure that we're not just focusing on those academic institutions that that are highly resourced and so getting the span of like what would be useful for sites that may not be have the same profile as a as a as an e-merge site for example would be useful and Janine hi I was I was I was asking you can you repeat the question I I missed the question as I was reading comments the question was we have the ability technical and opportunity to include genetic information in electronic health records do you think that has a potential danger of biasing biasing even more what clinical medicine is doing okay I mean I think we we see that already but I do think that again this is something that's avoidable I do think it does require the rigorous work of you know not just creating algorithms but testing these algorithms for bias in all the different many ways that you can find it and then in the in the slide that I showed with the blue and the purple people one of the reasons why they colored them blue and purple is because they actually didn't even use race and so we we see that sometimes and I think I think Tara mentioned this we see that even when you don't include these things you still have data that is shaped by these things and so this begs the question that if all the prior data that we're using to develop the algorithm is already biased then we're starting at a bias point that would mean that in order to really eliminate majority of the bias we would need to kind of start over and of course no one wants to start over or start to build these things from the beginning but that is one solution I mean I do think that there are some other solutions in there I don't think that we figured it out and I think it's a very complicated question that genetics and bringing genetics into the equation will not solve and we probably can avoid it from exacerbating the problem but again we have to correct for all of the data that we're already using and that it's already biased so it's a complicated it's a complicated answer but I don't know so if I could just follow up on something that I think is very provocative that Dr. Jeff just said which is the idea that none of us wants to start over but the focus of this meeting is really about you know research strategies and so I'm curious if you have some ideas about maybe some different ways that we could frame research particularly in genomics and informatics that might help us to at least partially restart and set things up on a more level playing field. Yeah that's a good question I really don't have all the answers but or no really where to start but I mean I do think that we've gotten really good at correcting bias in the most you know urgent situations dire situations we need it if if COVID-19 is not a good example of that then I don't know what it is so I know that we are capable of doing it and that we're smart enough to create these these ways and these methods I do think that it'll take a lot more effort than we originally anticipated I mean we are already at the point when we know how complicated genetics is and we know how complicated how it interacts with all the different omics and then social and all these things right that level of complexity will still need to be put in place when we start to think about how do we develop these things so I think it's possible I think that there are ways in which we can do it I think it requires a lot of testing a lot of ridicule and a lot of data and I think when we get to the question of the data which is one that I think we have talked about too much without actually exploring the why so I say too much because for a lot of my career I've always heard we need my we need more diversity we need more diverse samples and there was a paper that came out around COVID-19 that was kind of like showing we'll actually like having more samples may not really make a difference if all the data is already biased and so I think that that question is actually the wrong question is yes there is a sample size issue but there are also a lot of issues that go beyond just having the end right there are a lot of issues that go with like the technology even that we develop around you know studying diverse populations all the way down to the questions that we're asking and so it's not just the end issue it's also you know what are all the other things that impact it and then the last thing I'll say is that the way in which we do when we talk about this in eMERGE is the way in which we do recruitment in way we want to engage with communities we have to remember that it shouldn't be a transactional relationship where a participant comes and they give something with you know who also suffers through you know systemic racism in the healthcare system and all of these other issues in fact the paper that was looking at the algorithm with the cost difference in hospitals actually talks about Henrietta Lacks and how this algorithm would have specifically impacted her and how she would have still likely died and so I think that there are a lot of things that we haven't talked about but I think we just have to shift the conversation to how can we make this you know impactful for everyone and I'd like to add to that a little bit so the I completely agree with what Jeanine was saying about being able to reduce the bias by being able to improve how we recruit and retain study participants and also on the other side of developing algorithms it's it's not always going to be possible to get the the biggest and the most diverse population and so there's there's some research that we might or some approaches that we might be able to draw from in terms of like transfer learning or like ways that you can build a model in one site and be able to test it in other sites and refine it so that it works with other subpopulations that we can that we might be able to draw from in genomics research also a new new evaluation frameworks I think that came up already but how we evaluate to this is considering models and algorithms versus individual gene variants but how we how we evaluate them would shouldn't you know include the impact on different subpopulations and and potentially identifying whether like you know if we're missing certain populations how to refine for those populations later great and I see in the chat also a lot of back and forth and how how do we gain trust how can is how could investments in improving that could be done in your opinion Janina why don't you take the lead on this one you know one thing I talked about before was well I talked about through this talk and even I you know a person who works in this field didn't really know the history behind our discipline I think that that highlights the lack of information that the public even knows about our discipline particularly I could speak to African descent populations in the US most of the engagement with genetics is typically around genetic ancestry which if we think about sociology sociology like so like why that interest is and why people are interested in it it's also connected to racism but I think that we don't have a lot of transparent conversations with participants on exactly you know what it is that we're studying why we're studying and what if there is a benefit and being honest if you know maybe we don't have a benefit yet I think a lot of times when we see and we think about data data always benefits the researchers when we get data we publish when we get data we get grants we become more successful in our careers as a minority person if you give data does your health outcomes improve over time what is the benefit for you and you don't really see that benefit you know you you see that in fact you give data and you actually still become a victim of you know these systemic racism practices in the healthcare system and in the world and so I think if we can start to think about one how do we be transparent and communicate you know even some of these uncomfortable conversations around why we even need data is one place to start and eliminate the transactional relationship and by doing that we have to really sit and think about well what are the benefits that we can give to our participants where they see the value of data and giving data the same way a researcher sees the value of receiving data those two things should be equitable these two things should be equitable the person using the data should benefit equally as much as the participant particularly when participants are being disproportionately affected by some of the systems and practices in place that may have even found that the discipline as as it starts so that's a very simplistic answer to a very complicated question yeah I think there's a there was a really interesting point there that I have to admit has eluded me we've worked very hard in our system to create a partnership between our participants and the researchers and tried to establish the type of value proposition that you're describing but when we you know talk to people about why they're choosing to participate in research the dominant theme and I think this is relatively generalizable is one of altruism that I don't you know I may not benefit from this but this may be something that will benefit others it may be something that will benefit my children or my grandchildren but in some ways that's a very privileged answer because it reflects a background where we probably have actually experienced that type of transfer of knowledge that has in fact benefited us from prior work whereas I think another populations and cultures that has not been the case and so to expect that somehow you know altruism will rule the day and that will somehow lead to increased diversity is likely not the case in which case we really have to re-examine that value proposition to make it equitable because we're not starting at the same place and to add to this idea of supporting a value proposition there's there are there is an opportunity to you know show at least say how how individuals who participate in research have have contributed to science and there's there's a value and appreciation by returning result or returning you know what's been disseminated from the the data that they've contributed and so so that also requires rethinking our research infrastructure a little bit because there would have to be some kind of link maintained between the participant and the research that they're they're participating in and and so by doing that that does add another layer of risk risk for participants to some extent but but really like talking to people and knowing like is there is does the benefit risk in this context and should there be ways to to provide updates regularly or to allow study participants to say like you know if they're findings in these areas then they would want to know about it because their families affected by this those of course like a large bio by ethics conversation on this topic but I think those bioethics con conversations could also go hand in hand with some you know technical conversations on how to support those processes thank you so much I would like to pass back to Rex to conclude this session thanks I I I guess I would be remiss if I didn't point out based on this conversation here that one of the grand challenges that the NIH NHGRI put forward was the concept of removing race from genomics and genomic medicine and I think that's going to be a very interesting challenge for us all to work on but I think it's an important one I think we've heard a few themes that we can think about as and I'll build on Jeff Ginsburg's great comment in the chat but we need to in the light of thinking about removing race from genomics and genomic medicine we need to make sure we're paying attention to the heterogeneity of the health systems that are engaged in our research not just major urban medical centers but also thinking about you know smaller health systems and rural systems we need their participation as well we need to pay attention to the heterogeneity of the populations that we're engaging in all of our research and I think Janina made the great point that we need to be constantly testing for bias and racism in our data sets to make sure that we're using the best possible data and even using it to fix and remedy the injustices of the past so I think a lot of really great discussion in this session and I think we can turn it back over to the Ken and Mark as we go into a break. Thank you yeah so thank you so much to the speakers and our co-moderators I thought this was a wonderful session to start off with and also thank you again Dr. Green for giving us an update on the strategic vision so we now have a 10-minute break we'll give people time to get up move around and think about this session and we'll reconvene at 155. Thank you everyone. Well for Mark is there anything else you want to you want to add or we close out for this session? No we'll see you in 10 minutes. Okay thank you.