 So welcome everyone for those of you who are regular attendees of our seminars This is actually a new seminar, which we are kicking off our monthly Seminars that are open to the entire intramural program. And so we have our inaugural Lecture for that which is Heidi Rehm. Heidi Rehm is not only a friend and a great colleague. She's a great critic and And I have to say I just always enjoy working with people who not that infrequently tell you You're just wrong about something and you enjoy the interaction and it's a great thing Everyone should have colleagues like this For those of you who don't know her Heidi's co-director of the program in medical and population Genetics and a member of the Broad Institute She is this a great title chief genomics officer in the department of medicine at Mass General Hospital and a professor of pathology at Harvard Medical School and a faculty member For the center of genomic medicine at Mass General. So Heidi, please take it away. Oh I forgot the logistics. Sorry Chat has been for those of you who are online chat has been disabled I'm supposed to say and if you have questions Please input them in written form for those of you who are online Written questions in the Q&A box and I'll be watching those and Heidi do you prefer questions ongoing or at the end? Okay, so if you have a question at the any juncture Please put it through and we will consider it and Q&A at the end if you're in person Q&A at the microphone at the end so welcome Heidi it's great to be here and Share with you a little bit to stand off to the side You know when you're short you like to stand not behind them a podium So I'm going to share a little bit about what I work on and it spans There we go It spans a few different areas So I'm going to first talk a little bit about our rare disease research and the programs we're going to do but that will sort of lead into Just generally as a field thinking about solving some of the challenges in rare disease diagnosis and discovery and Along the way, I'll talk about a lot of the resources that we build Because my goal is not just to do the work myself But try to build tools and resources that the entire community can use so we'll talk a little bit about Those things as we go along and then at the very end I'll put on my chief genomics officer hat and talk a little bit about my role in the hospital setting and some of The work that we're doing to try to actually get this work to patients with that I'll start out with the rare disease side So I think it's worth just thinking about where we are today in rare disease discovery There have been over 4,800 genes implicated in at least one disease But depending on whose cohort you're looking at 50 to 75 percent of cases remain unsolved after we you know perform Exome or genome sequencing and there's some evidence for at least 5,000 perhaps as many as 10,000 more genes to Yet to be discovered for monogenic disease. So we still have a lot of work ahead of us So we are we put together a center Daniel MacArthur and I started our center from indelian genomics a number of years ago It's now led by Anna Donna Luria Mike Talkowski and myself And we have a large team that works on everything from recruitment to methods development to analysis and Development of the tools that we use to do all our work. We appreciate the funding including from NHGRI and other NIH sources And this team is really how we get our work done in rare disease analysis But equally important is the fact that we partner with lots of colleagues and these are many of the Physicians and researchers that we partner with each of whom is recruiting their own cases in their respective disease areas We also patient partner with patients patient organizations. They're on the right And have our own direct recruitment into our what we call the rare genomes project And that has led to over 20,000 samples from over 8,000 families from over 50 collaborators from 57 countries So it brings us just a lot of rich data to work with and those partnerships have been incredibly valuable Now one thing we did discover Because our our recruitment for our rare genomes project Is online it can anyone can be virtually consented and sign up But we definitely realized that we were not pulling from as many underrepresented populations So over the last year we've been really trying hard to come up with strategies to have better Representation and try to understand the barriers to to Enrollment so we've done a number of things instead of just asking our clinicians to tell the patient about the study We actually have them fill out a form sent it to us And then we do outreach because sometimes these families are just too busy to put this at the top of their list to go Contact some study and that has improved our enrollment of underrepresented individuals. We've also hired lots of multi-lingual staff We now have staff that work nights and weekends so that we can adhere to the timelines and when these families are available We've actually now contracted with quest for mobile Blood draws so they will go to the individual's house at there on weekends or evenings to get blood draws These are all many different barriers And so it's actually starting to improve our enrollment of underrepresented individuals in rare disease diagnosis Another way that we really democratize the work we do and crowdsource it in a lot of ways is the secret genomic analysis platform that we have built We've made it open source This is originally started by Danny MacArthur We now have it installed on the anvil platform So the main instance that we use is the same one that any one can come on to the anvil platform and use And that allows all sorts of different types of analyses that you can do we bring in RNA seek data There's now over 2,000 people from over a hundred countries that have performed analysis with this platform And we constantly add new features and data types to that platform and that way everything that we bring in new Everybody else can get access to as well and So I mentioned that's on anvil not only is that platform on anvil But all of our data is also shared on anvil both through a duo space system as well as now part of the Gregor consortium data set and just as an example of the benefits of data sharing so This was one of our cases that we hadn't yet solved But Aaron Quinlan who is a bioinformatics computational Scientist was testing out his new structural variant color on our data and solved one of our cases So, you know just examples of the importance of putting all your tools and data out there because Collectively we can make more discoveries together So definitely encourage that use of platforms like anvil which is NHGRI's cloud-based platform for both Sharing your tools and pipelines as well as your data So what progress have we made in in our own studies? So this is our solve rates for both Exome data and and genome and I've parsed this or I should say a member of my team has parsed this by Solved or a candidate gene so still you know at the time we discovered it was still in That candidate state needing more evidence or unsolved And this has now led to a lot of these Candidates being fully fledged novelty and discoveries that have been published working with many of our collaborators as well So I'm gonna you know sort of parse this side of this table into about 35 percent positive We get 45 percent negative and 20 percent inconclusive, but now the question is okay. What do we do next? How do we follow up these cases? Well, obviously the solves are done But for the unsolved so some of these might be variants that are missing from the actual sequencing data So, you know one obvious step when we start out with exome is to you know move to genome Those are one way we can find additional Variants that might have been missed in the first round when they're actual physical variants So we did some analysis of of what the added value of genome is And this chart just shows the additional discoveries that were made by our genome analysis after exome But we clarify that in a lot of cases people say oh, I you know, I added 30% yield Well in a lot of those cases It's just that those variants would have been found by exome But you did your exome a long time ago and you hadn't looked at it So we differentiated those things that really still to this day with a current exome and current analysis Would not have detected and then parse those into the ones that are Requires genome sequencing so those are mostly due to structural variants coding variants where there's Consistently poor coverage in exome the deeper intronics the triplet repeat expansions Those are the things that this the genome actually adds and when you get rid of things that should have been caught by a Current exome it really adds a yield in our hands of about eight point three percent So those are you know really adding actual technical additional yield The other thing is like many groups We are using different types of long read sequencing to clarify regions that are difficult to get from short read So we're in the process of analyzing a lot of long read data From both on T and pack bio and I don't yet have that data summarized But we have this is just one example that's shown there Of a region that was very difficult to sequence, but we were able to detect a one KV deletion that was de novo in that region The other tool that we've been using quite a bit of is RNA seek And that in that case it's an example where the variant is present in your data But you're overlooking it because especially if it's a deep intronic It's just really hard to figure out that that might be causal although. I will say the splice AI Tool that's been developed based on machine learning is actually one of the much better tools in this arena So if you haven't implemented splice AI, I do recommend it as an tool that really has done a good job at finding cryptic splice sites and and other types of splice variation and then we follow that up with Getting tissue samples that we can look at RNA seek and see if we can show an actual change in splice splicing It really helps with non-coding interrogation So this is a case of a patient with centronuclear myopathy with hypotonia weakness ptosis reflux And there was clinical suspicion for myotubular myopathy, but they were negative for MTM 1 which is the known gene for that Negative by exome sequencing, but in the end we did triogenome sequencing with RNA seek on muscle tissue We analyzed for all the types of variants But ultimately we were able to show with the RNA seek very low expression of that exact suspected gene And then we're able to show the alteration in splicing With within that gene and it turns out that there was a retro transpose online element Inserted into intron 3 and it was actually also mosaic in the mother So, you know that we only reason we were able to really zero in on that is because of the RNA seek data that could Detect that This was even more interesting case. So this was a One of our rare genomes project cases and What it was showing was a whole lot of genes some of which were under expressed some of which were over expressed And we looked at all of these genes were under and over we could not find anything obvious But then we looked at the retained intron signal What you can see here is that this particular sample was off the charts compared to all of our other RNA seek samples In terms of retained introns, but interestingly when you went in it wasn't just all introns there were select introns that were retained where nearby introns were not and It turns out that this particular gene It's actually an RNA Non-coding gene or RNA gene RNA and you for a tack that basically is part of that U12 dependent minor Splices ohm so there's there's about point five percent of introns in our genes that are spliced with this splices ohm Approach and that these introns were being retained And so because of that retention in some cases It was showing over expression because there was more RNA material in other cases We think the under expression was the rapid degradation of the transcript leading to less And so you're seeing sort of both of this stuff on both sides So that was an interesting case and overall if we look across a hundred and seventy cases that we've done with RNA Seek we've been able to solve and added 14 percent of cases by adding RNA seek data to our analysis So the challenge is that that works great for diseases that are like there are muscle diseases where you can get access to Tissue that is relevant to the disease and the gene is going to be expressed It gets a little more challenging when you have disorders like eye disease and neurodevelopmental disorders We're taking the blood biopsy and the eye biopsy are a little more challenging But sometimes those genes can be expressed in blood or in fibroblasts So it's not it's sometimes worth trying even if you can't get access to a disease relevant tissue But then moving on to the ass bucket here Well actually just to so this last one go back for a second. So In this last bucket we might have detected a variant, but it was classified as a variant of uncertain significance so thinking about the Exome reanalysis paradigm and how often we find discoveries when we the first time we looked at it We didn't see it. So our solverate in reanalysis is about 17.7% so that's really the highest yield of tools that we have is just going back later to the same data Amazingly enough. And so what we're now working on is really a more rapid automated Reanalysis pipeline and so we've been working to build that so we call our pipeline that this has been developed in collaboration with both Den and MacArthur's team in Australia our team at the Broad and Microsoft Team that we've been collaborating with and basically building a tool that brings in ClinVar data, GenCC Also OMIM, you know all the sort of knowledge sources bringing in some of the predictors like the splice a I tool and other things and bringing all of this into an Automated pipeline that then runs on our data and we were validating it initially and being able to show that we were able to catch Pretty high percentage of the variance that we'd already caught and then we've been now really looking at the Specificity and time efficiency we have compared it to other tools like Exomizer and Lyrical Where our recovery rate is higher than some of the existing tools that are out there now So we're starting to now and the rate like we definitely don't want a tool That's gonna trigger every case every round because that just creates work So what we found is that there's about one candidate variant per 200 cases per month that are coming up And we've made five new solves over the past past year. So it's actually a reasonable rate of of Things to review and then you're just reviewing those cases that that get flagged So this is sort of there. This is a pipeline that we're gonna then be putting into place now that we've been validating it This is also something key to think about. So there's a ton of data that goes to clinical labs clinician orders a test gets run at a clinical lab and Some of those labs will offer one free reanalysis after a year, but then that after that, that's it, you know And and so what we're trying to work on right now is bringing all of the data from clinical labs into the Anvil platform and Then allow those clinician researchers to be able to look at their cases in our seeker platform As well as have this run through this automated interpretation pipeline. So our first This is actually a project that's partly funded through the Anvil clinical resource and add on to the Anvil grant And so we're working with gene DX as the first clinical lab where we'll set up a pipeline So this means the clinician doesn't have to themselves go ask gene DX for the data Get the data on a hard drive go load it someplace and all and what's being done now We're gonna set up a pipeline where the clinician just tells the team what the what the sample is and then will manage the process of gene DX to Anvil Pipeline and then also then the pipeline for joint calling and loading that data into seeker As well as then running this automated interpretation pipeline It'll then when new findings it'll actually send an email to the clinicians saying A new finding is there if you want to look at it so they can both look at it themselves Immediately is when the data gets in there, but then this pipeline will also be running So this is infrastructure that's not yet in place yet But we're working with gene DX and then we'll once we get it working with gene DX will add other clinical ads And we really hope that this is a way to reuse this clinical data And bring it into research in the future More at more at scale I should say Okay, then going back down to the the bottom section So many of us find those dreaded V us's and they may be in novel genes or they may be in known genes So how do we really robustly tackle this challenge? So I think it's also worth stepping back and just thinking about the overall challenge We have in rare disease so most rare diseases are actually Incredibly rare and this figure that I've captured from the orphanette group on the side here is Basically showing you that 80% of the people walking around with rare disease like some of you may hear people say one in ten individuals has rare disease They're collectively quite common and that is true But the challenging thing is that 80% of those people with rare disease have it due to the most common 150 rare diseases and in fact most of the rare diseases 3000 here represent Less than 1% of people with rare disease and these are the much rarer than one in a million so there is Likelihood that any given physician or clinical lab or researcher will ever see more than one of the case So there is no ability to build evidence to implicate these new these diseases these very rare diseases And the genes that cause them unless you collaborate and share data And that was why we built this platform now about eight years ago called the matchmaker exchange And the idea was somebody has a candidate gene and you just want to find that other case Somewhere on the planet where they have the same candidate gene and so we but finding that person was challenging So we put it into a platform called the matchmaker exchange. We now have nine nodes That are connected from around the world Nobody's data moves it all sits in their own databases, but we connect all these databases through a common API and so that if you when we put a variant in our Gene in our seeker database it then queries all of these other databases and if there's a match It returns that it connects the groups together and then they can share detailed phenotypes, and if it's a discovery You can co-publish it so from our own experience. We've submitted over 1500 unique genes into matchmaker exchange We've received queries on a half million queries from other databases that has triggered 13,000 matches to candidates and seeker Overall, this has led to over 300 novel gene discoveries almost every discovery that we and our collaborators make has benefited from matches and matchmaker exchange I would argue it's one of the most useful tools in rare disease research in terms of the data sharing element of it So that is current platform, but here's the challenge For many of our cases, we don't have a candidate And there's nothing to put into matchmaker exchange, right? So, you know, what do we do with this? Or it's a V us and a known gene so there's no candidate here He's just one of the many dreaded V us as we find in every gene we look at so we're working on our next step of this game So right now we like to call a matchmaker exchange two-sided matching both groups have a candidate And now what we want to do is one group as a candidate and they just want to query all the raw data around the world So same seems seems easy. No, it's not eventually. We'll want to do zero-sided matching Nobody has any hypothesis and just bring all the data together and we'll solve everything So that's eventually where we're headed but but sharing data globally across countries is actually no small feat So we are trying to ease into this with what we call one-sided matching And so we've been starting to there are groups that have been opening their databases to allow Query of the raw data and the first one that I'm aware of is variant matcher It's the same group that built The gene matcher platform that's part of our matchmaker exchange network at Hopkins led by Nara Sabara And so here I can just sign up as I have for a username to their database log in Here I entered my variant with the phenotypes in my case and immediately got back a list of eight individuals Who have that variant along with their phenotypes which I've highlighted in red and you'll quickly see that half of the individuals are unaffected parents the other half of four different phenotypes that don't overlap with mine and Largely quickly ruled out this variant as causal Ruling out a variant as causal is equally useful as You know building the evidence so this was just an example of ruling it out So we're now working to build a federated variant level matching platform around the world We're working with global alliance on this same as we did for the matchmaker exchange We have an API that's called the beacon version to API that will be used to connect all these databases These are many of the groups that we are currently working with to hook up the first node-to-node connection has gone live between variant matcher and Franklin Which is actually a commercial platform used by many clinical labs and I'll just show you so now if you are in variant Matcher using that platform By default down here. You will also be querying the Franklin database That commercial platform Similarly, if we go over to Franklin when you're in that database the lots of clinical labs use You will automatically see whether the variant is in variant matcher in this case It wasn't or in this other variant the variant was found 726 times 144 homozygotes and When it's found you can click on the link and it direct and it drops you right into variant matcher where you can see the phenotypes So this makes it far easier than if I were to have to go around and one by one log into a thousand different databases To query for my variant. So this is a way that we're really starting to develop this platform This is the first node-to-node connection But we're working on connecting our seeker platform to this node and or to the Federated platform and we'll be bringing lots of groups into this whole system So this is just one step in that sort of global data sharing approach That will get us to really understanding the variants and solving these cases Another approach in the data sharing realm is really getting access to large-scale genomic data Very easily without having to log into any database. You can just see it publicly So we just launched this fall the next version of nomad Where we're now up to 807,000 exomes and genomes you can see from our past sizes of a database. This was a pretty significant jump in added data Now interestingly a lot of the data we added was from the UK Biobank almost a half million samples came from the UK Biobank Which unfortunately from a diversity standpoint is a lot of European data The big blue bar there. So however, despite the fact that we added a lot of European data We did add 169,000 non-European samples to that Dickely increase in Middle Eastern data. So pretty much all of the underrepresented populations. We were able to add a Lot more diversity now interestingly What nomad is best at is ruling out variants because they're too high allele frequency to be implicated in rare disease That is mostly what all of us use it for. We also use it for gene constraint to find Genes that are likely to be implicated in in monogenic disease But mostly it's to exclude variants. So a key question is well, how many variants did you now convert to being too common To cause rare disease when we added this data, right? So on the left side here Are the that little animation was me oops try it again was going from v2 to v4 You saw all those European samples if you were looking at the right side This is the number of variants within allele frequency over point one percent which largely rules them out for monogenic disease But if you looked carefully at that what you saw is that despite adding like another half million European samples The size of this bar Barely moved However, it moved hugely on the non-represented populations and that tells you that we've mostly saturated our understanding of common variation in Europeans We have not even closely tapped our understanding of common variation in other populations in terms of access to that data So it is immensely critical that we focus on the collection of diverse data sets for monogenic disease analysis and Our next launches v5 and v6 not sure which order they'll be in but v5 will focus on Adding the next call set from the all of us program 450,000 which is over half from underrepresented populations and v6 We're working on a federated nomad So to date we've only been able to do nomad by bringing all the data physically together and Processing it you seeing it as a joint called data set We are now working on with collaborators in all of these countries here Where there are groups that are willing to process their data with the approaches we use in nomad and Get a final aggregate allele frequency data set that then they can send outside their country Whereas they can't send the raw data to us right now then we could aggregate the aggregate data sets and actually launch a Essentially a Federated representation of nomad, but it's all still in one database at the aggregate level And so it's going to be a little bit of a heavy lift in terms of the computational approaches We need each of these groups to do but that is our commitment to really working on ways to bring in much much more diverse data sets into nomad So down in this bottom bucket, you know, there are different ways We're dealing with these V us's matchmaker exchange the variant level matching nomad Of course, you know We do partner with comp and other groups that can run functional studies on our variants to help interpret them But the other challenge is with a constant evolution of evidence for the role of genes and variances How do we keep track and ensure the resources that we're all using are accurate and up-to-date for widespread use? So we've been working on generating knowledge bases some of that comes through clin gen the newest Knowledge base we've launched is called gen cc for the gene curation coalition And this started a few years ago when we recognized that there were a number of public resources like clin gen omem The genomics england panel app and the Australian panel app and orphanet that all had gene level resources But they were all using different systems to define the relationships the validity of those relationships So we're trying to get everybody to harmonize these resources We then discovered that there are actually a lot of private groups particularly clinical labs They were curating gene disease relationships with clin gen standards, but none of that data was public So we worked to get all of this data together for gene disease relationships I like to call it the clin var for genes because anybody can submit their own opinion of a gene disease Relationship just like you submit your own classification of pathogenicity for a variant. So essentially it's a clin var for genes So so once all that data is now harmonized and publicly available and we can compare it We then are collaborating to resolve discrepancies in which genes are actually validly implicated in disease So this is just a screenshot from the database We have 12 different groups submitting with over 17,000 gene disease relationship claims So you can type in your favorite gene and You'll see a clin gen says it's strong for this disease or disputed for that disease genomic England says this and so on And then you can dive deeper and get more information There's a quick overall look at what what the claims are on any given gene on that on the sheet so we've been now Taking this pipeline and feeding it into our own genomic analysis pipelines Omim is often behind in novel gene discoveries So there's groups like Australia that are curating the literature and putting things in we're now getting the Gregor consortium to put our Novel gene discoveries that aren't even published into this as a way to earlier in the process get out these these gene disease Relationships and also clarify whether they're valid by doing the discrepancy resolution So we've done 33 discrepancy so far across the groups We mostly define whether they're have at least moderate or above Versus limited and below and that is a boundary of what you should put in a clinical test versus not and so we try to To vote and get them either above or below by consensus occasionally we hit Borderline and just can't agree where it should yes question gene lists and diagnoses for rare disease either genome or EHR analyses and He's in the sub comment is I imagine international data silos might be limiting here also Yes, there's a few questions in there. So In terms of AI and machine learning so there's different ways. It's being used. I I mentioned Splice AI is actually a really good tool that was developed by machine learning by basically looking at the entire human genome and asking What are the sequences that have led to exonic splice junctions in real life? And that actual ML that was used is why that algorithm works so well So there's very very precise and good examples of the use of machine learning in the context of genomic analysis I think another area. So we're we have a collaboration with Microsoft right now working on actually aggregating evidence from the global, you know Internet literature etc to really just make it quicker for the analyst to to review all of the evidence about a gene or a variant And so we're we're creating it's just the time it takes to go out and search PubMed and Google Scholar and all the different resources and databases That that that takes time And so some of it is just can we bring that information much more quickly to the analyst to then make a decision? I think one of the challenges today is that most of the evidence that is the most valuable evidence We use in classifying variants is not accessible to computational infrastructure, it's like pedigrees or de novo occurrence or Functional data that's not well structured things like that So we as a community if we're gonna make use of AI and machine learning We really need to get our data structured and accessible to these tools Then I think they will work better. I will say that The the consumption of electronic health record data and the phenotypes. I think is gonna be Useful area for AI and machine learning. It's already proven useful I was just talking to Chris Lunt the other day and Jeff since Jeff Ginsburg asked this question is CSO of the all of this program You know Chris has been you know looking at how well the data instead of relying on ICD-9 codes and other things in the EHR To actually just take the entire EHR It's content for an individual and use that in your phenotypic extraction and improving much better That said you still only have what goes into the EHR Which is a limited set of of things that your patient has so and rare disease is not the greatest EHR is not the greatest for rare disease Where we really want to use terms like the human phenotype ontology which is still not well represented in the EHR So I think there's lots of exciting areas for AI and I think it's gonna rapidly take off But exactly you know how well it's gonna do I think just remains to be seen But but lots of people including ourselves are starting to work with these tools. Did I hit enough of that great So ClinVar is also a really critical research Resource and of course we get more and more submitters around the world Submitting to ClinVar. It's it's really helps crowdsource the problem But you know, we still have the highest category in ClinVar is variants of uncertain significance, right? And these are just as the panels expand the rate of VUS is on these Clinical tests going back to patients is just continued to expand and then reclassification our MGH cancer center received very variant knowledge updates on 17% of the variants in our patients It's and the clinicians are getting annoyed in some ways about the constant updates and reclassifications We also heard from several insurance companies concern and one of the reasons they're not reimbursing exome and genome is the fear of All the VUS is returned to patients from these exomes and genomes So we actually said, you know what or I I didn't think that was accurate that the genomes and exomes were Generating all the VUS is so we we actually collected data from 19 labs across the US and Canada 1.5 million tests worth of data it gives you a sense of what's being run today So about 97% of tests run in this period were panel tests and about 3% were exome or genome but What we found is that the rate of VUS is was statistically significantly higher in panels Compared to exomes and genomes which seems a little counterintuitive and it's very correlated So the size of the panel as it goes up It gets higher no surprise, but then it goes down for exomes and genomes. Why is that? Well, it's actually quite obvious if you understand how clinical labs work. What do they do? They get no phenotype for the panel tests They're asked to interpret all the variants and they report out everything that's VUS and above With no correlation to the phenotype of the patient and as the panels get bigger the correlation between phenotype and the gene is actually pretty low In an exome and genome you cannot write a report interpret it without phenotype So then what do you do? You look at all the variants and see how well they correlate with a phenotype and you only put the ones most suspicious onto your report That is why the rate of VUS is is lower is because we're actually using our brains to interpret that and not just this sort of pipeline process But either way you look at it a third of all genetic tests and in an inclusive result due to VUS So that's a problem no matter whether you're running exomes or genomes I will say we some of the labs that were part of this study But the lab I used to run at Mass General Brigham and quest we're using subclasses of the US And what you see here is only those VUS low get reported in the panel tests They nobody ever puts those on the exome and genome, right? So this is just an indication of why it's better to really think about the level of evidence and what you put on reports And in the new framework that Les and Steven and our group is leading We will very explicitly define three buckets of VUS VUS low VUS high and VUS mid So and we've actually been looking across the three labs and just discovered lab cores But doing this for 15 years too So we're bringing their data in and we asked the question How often does a VUS low get reclassified to path or likely path and the answer is it never has across these three labs There was one variant in here in this figure, which we just took out because it turned out it was a risk allele It wasn't actually a pathogenic So essentially we can use this data to justify that these variants in the VUS low category Probably shouldn't be put on the front of a report and could maybe just get relegated to a supplement or something So we actually I formed a work group through American College of Medical Genetics And we're now developing guidelines for labs on when to report VUS is and most importantly when not to report VUS is and how to use these new subclasses that will be coming out with the new guidelines With that I Okay, so I did want to talk a little bit about the global alliance I mentioned this earlier because we're working with the global alliance and developing the standards for this variant level matching platform But this I recently became chair of the global alliance and I spend a lot of my time with this organization Because in my mind it's critical for the next era of genomics where we really need to share data and We're collaboratively across the entire globe, but that's not going to work Well, if we don't all use the same standards for how we structure our data share our data, etc so we have a whole organization of partners and driver projects and assigned experts and clip contributors We have these work streams developing standards in these different areas and kind of a whole ecosystem of defining the needs of our Community creating those standards and implementing them And so if you haven't gotten involved I encourage you to do so one of the areas that I'm I Spend a bit of my time is is our national initiatives forum We're really a lot of these countries are implementing genomic medicine at scale in their countries And these are all the members of the National initiatives that includes the all of us research program and we've been developing sort of a toolkit for Countries in terms of addressing a lot of the different aspects of genomics from government strategy engagement of patient populations clinical tools Knowledge curation technical and data tools consent and result return Where all the stars are are standards that have been developed by the global alliance to help in these different areas And then we all convene both virtually and conference calls as well as physically in two in-person meetings a year To share our experiences and best practices and really try to harmonize all the approaches that we take in this space Similarly, we work across the world in clingent and we have now over 2,800 volunteers across 66 countries that participate in over a hundred expert panels that are classifying genes and variants And it's a wonderful way to really expert curate this knowledge In in these different disease areas bringing the experts into these different expert panels And so it's a critical way It's also a way we get access to evidence because the people with the best evidence join these panels And then we get their evidence when it's not always anywhere publicly accessible as I mentioned all of us is part of the national initiatives, but I also think it's an incredibly important resource for For for generating knowledge because we can get access to patient phenotypes at a large scale We are also working to return results as a way of returning value to participants And so we've been returning both health related genetic results and pharmacogenomic results to individuals This is a little bit out of date, but we've returned over a hundred thousand Results from the hereditary disease risk panels and the color team that supports genetic counseling point was done over 2,000 genetic counseling appointments to return mostly positive Results to individuals so a lot of work there the last little bit I'll talk about as I mentioned is our our efforts within the hospital So I have a small team at Mass General that's really trying to work to integrate genomics into medical practice This is the team shown here and we basically work to engage stakeholders identify pain points and deliver new solutions So I don't have time to talk about all the stuff, but one of the things I did was convene a genomic medicine implementation team Luckily I did that across both hospitals because just a few weeks ago It was announced that Mass General and Brigham will be fully emerging in the next couple years So lucky for me there's only one chief genomics officer and I work across both hospitals So I think I'll keep this job But nonetheless we identify clinician leads and and where there are genetic counselors those leads across each clinical division across the hospital and then they be we then gather on a regular basis to Understand where the pain points are across all these clinics and then try to deliver solutions So we've done a lot of different things Standardizing consent forms consolidating external genetic test services integrating results in the EHR We've developed a genetic counseling network and deployable part-time genetic counseling services So we share counselors across different clinics that don't need a full-time effort We now have a referral website for all the codes for referring to the different genetics clinics across specialties We've developed education programs and I'll spend the last little bit on some of our preventive genomics efforts So the slide So we recognize there was a major gap in our our care provision in genetics in terms of at-risk patients So the Cancer Center had a massive population of patients who met NCCN guidelines for cancer risk And they couldn't handle the appointments They can only manage to handle those with cancer or Sometimes the family members through cascade testing they couldn't handle the volume So we launched the preventive genomics to handle that volume of cancer susceptibility as well as preconception carrier screen or OB Clinics couldn't handle the volume of carrier screening requests So that was the main incentive and in fact those were the some of the largest volume of tests But we had two clinicians seeing patients. There was only so many they could see right so we actually Despite this being quite successful and getting the referrals and the focus that we wanted and most the patients weren't paying anything They were you know covered by insurance for these indications We actually discontinued this because we couldn't actually scale it at a level to meet the patient demand And so we've then switched and now launched preventive genetic counseling services in conjunction with primary care So the idea here is RGC's Work with primary care So they indicate they have a patient and then we do pre-visit the genetic counseling assistant collects the pre-visit information We have a telehealth visit with the genetic counselor The order is placed but it's placed with the primary care physician's name as the order and then we help with return results If they're positive then the the PCP does the referral to the specialty clinic This has allowed us to scale much more quickly. We're still hitting the target Populations we're trying to hit is the cancer risk preconception screening We deal with despite the fact that none of us want to the DTC results that the physicians don't know what to do It and the occasional other thing but this is just started in the last like five months And it's really taking off. We're starting to go around all the PCPs and they are really liking it So that's a way and we hope that over time with them getting used to this They actually may not need us to be there once they really get used to it So and we've also launched an e-consult service Where we any physician can put in a basic question and then our team returns a result Or our guidance through the e-consult and that's been effective in terms of the outcome so these are the different types of Recommendations we made through the e-consult service most were from PCPs That was our target population to a large extent of the 88 actionable recommendations 78% were followed through by a provider Only 17% were not followed and 5% we couldn't track they left the system But we published these results in terms of our experience supporting e-consult So this service is continued and now our scalable GC service is there in place and with that I'm going to stop so we have more time for questions But I do want to thank a lot of people that work on a lot of the different projects that I've talked about today As well as the funding support that we received to be able to do all this fun work And I'm happy to take any more questions Tour de force not surprising If maybe I missed it But I didn't hear you say much about the growing availability of more and more diverse reference sequences eventually a Pan-genome and I would think for especially some of the some of the topics you covered including rare disease work That is it possible you're unsolved because you're missing variants because you just haven't used The right event of the best reference isn't either available or you haven't started using it yet. What are your thoughts on that? Yeah, it's a good question And we do have work underway both within the global alliance working with the human pan genome consortium as well as Working with EBI and NCBI on an annotation of the human pan genome It's and we have a project within the Gregor consortium Where we're hoping to try to assess the added yield by using different gene pan genome sequences as you are alluding to It's a good question as to what we think will actually be the outcome of that I think both Long-read analysis as well as Different genomes because I think where the biggest challenge is is those regions that have significant chunks of difference between Between haplotypes right where an entire region is either missing from the current genome build or Or there's duplications other things. So right now The that's not going to help much if we don't have access to long-read Technologies that can actually get at those most challenging regions So we're both starting with long-read that will allow us to look at those But we also need to define the allele frequencies of variants in those regions that have no data and nomad Because they're from short-read data, right? So actually Mike Talkowski wrote a grant to develop a nomad for long-read data That'll also help with these regions that are going to come out of the pan genome project and be Haplotype specific from different populations But we're kind of waiting for the resources to help us because we're finding variants in these regions But having no idea what their frequency is because there's no reference database We're all used to oh, we just look it up in nomad right not in nomad. So these can be common variation So we really in order to really make best use of that We're gonna need the resources and the allele frequencies the population data to all of those regions So there's actually a consortium the Pac-Bio has started to share Pac-Bio data They're actually working with a company called DNA stack to get all the groups to put their Pac-Bio data in the same place So that we can start to build reference data sets out of these data as well So it's it's coming but it's slowed by the lack of the sort of population reference data to get there Since you cooked me dinner last night They're gonna expect you to cook for them to talk about this last night now That's because the Italian wine was so good, but um You made the comment about the direct to consumer yes I can't help but mention that this week a new company came out of stealth mode called nucleus genomics Are you familiar with I so they're offering $399 whole genome sequences yep direct to consumer and I just curious in general What your view is on this growing availability of direct to consumer genomic data and how that's going to interface with the kind of things You're trying to do yeah, I have lots of opinions on direct to consumer And they range from some areas where I feel it's good because there are patients out there That are at risk and they're not getting access in our healthcare system to basic data and You know even companies like 23 and me where they've had BRCA one at least the three Ashkenazi Jewish variants In testing for a long time There's been lots of patients who have gotten their risk when they would have fallen through our healthcare system, right? So I think there are some areas where I strongly believe and support the use of direct to consumer But then you go to your nebula genomics report And these have been showing up in our physician's office with like 35 pages of every risk thing you can imagine Most of which is not valid and the quit the patients and the physicians are like, what do I do with this? And I'm like throw it in the trash, right? So so I think you know, there's there's the hype and the engagement And taking advantage of people's interest in genetics to just make some money and put whatever Whatever random new thing just got published that somebody made a GWAS claim on into a report And so that I don't find very useful So I think if we can focus on the things where there's actually something to do with these results And and we can direct care based on them because they're valid Then I actually, you know, there's a lot of the In addition to true direct to consumer where there's no physician intermediary There are a lot of the companies that are providing physicians and it's essentially DTC But at least they're targeted tests that are clinically useful And they're helping get around our really poor healthcare system and the access issues So I think there is a role for this in our ecosystem. We just need to kind of focus it in the right places if that makes sense That was a great talk. I'm going to ask a I guess a bad question But I'm just curious about your prediction. I think it's a bad question But so all these tools that you're building and these data that you're gathering and the things that you're doing to make these The analyses easier and better and more accurate. So in 10 years from now or x number of years from now What do you think of molecular genesis or LGG trained genesis or pathologist? Will these the fanciest tests you can imagine x-ome genome with all these other ohms Is it going to become more like a cbc or you don't need much human intervention or does the runway keep moving? I'm just curious about what the human role is going to be if any Yeah, I mean, I think that the There's several different arenas that are going to evolve But I think the one place that is going to be a long time coming Is really trying to um appreciate the complexity of phenotypes in a patient and how well they match to What's known about a given gene and the variants that cause disease in that gene and and being able to Judge, you know, this may be a pathogenic variant. I mean this happens all the time Like somebody finds a pathogenic variant and like I solve the case Well, and I ask actually does that gene have any role in that patient's disease? And like maybe not So like that that does require a fair bit of judgment now where it's going to get that part We'll also get easier over time is if we can capture the phenotypes Of every individual with rare disease and build a really valid database Then machine learning and semantic similarity some of the hpo sort of algorithms that match things you can actually get A semantic similarity score between your patient their phenotypes and the collective knowledge about a given gene It's associated with a given gene So over time we might have a scoring system that replaces our need to judge Like could this you know patients phenotype do they actually have marfan syndrome or not and does this fpn one, you know Variant actually makes sense and today we're using a lot of clinical judgment in that process And some lab geneticists have some of that knowledge Some of them actually don't have good clinical training and the physician really needs to play that sort of subsequent clinical as less than I like to talk about you know the different roles in this process but but the So I think that part is going to evolve but it's still going to require some for a while Some you know role in the in the geneticists and making some judgment there As well as the fact that the evidence that supports pathogenicity is still You know as much as we are trying to make it as objective as possible It is not always objective and there's a lot of subjectivity deciding if that functional assay is actually valid and really Documenting the role, you know the disruption or not. Is there even a good question or system that's being tested? So there's judgment in looking at that evidence that you have at hand and saying is that really good evidence? They're not so good to say that variant is even pathogenic great. So I think those are the areas We will still have jobs In our line of work Yeah, follow up on that. Yep. AIP How a is AIP? So a is uh the pipeline is fully automated But the interpreting the results that come out of the pipeline is definitely not and has opportunity for improvement Right now it actually isn't using phenotype matching and that's a part that we'll be building into it So for sure right now like stuff comes out. It's like, oh, that's a new pathogenic variant There's nothing to do with my patients phenotype. So so that's where you know building that phenotype matching will will be helpful So I think we'll refine it over time in terms of sensitivity and specificity But it will always require some review at the end And we're going to try and get the numbers of things you have to review down as low as possible And you can also set it at different levels right in terms of you know only send an email to a physician If it's this high Or versus our analysis team that does this stuff all the day Send it at a different level so you can you can titrate that a little bit So, um, this is wonderful. Thank you so much. Um, you didn't say much about pharmacogenetics And and I'm just curious as to how you're going about implementing that where you are Yeah, you know, it's been interesting like Long ago, we ran the crown trial for warfarin and we had a test set up that had a turnaround time of five hours So the moment patient presented, um We could get those results back immediately to In influence the dosing of that drug in those patients the physicians in our hospital Were the ones leading that trial we did all that work And then the trial ended and nobody ordered the test after that I'm like really and it you know, it comes down that there's been years of Uh discussion and and difference of opinion about the utility of warfarin Pharmacogenomic testing and is it clinically useful or not? Is it easier just to use empiric? You know, uh sort of dosing So I think there are certain pharmacogenomic Areas that are Clearly useful But a lot of them are a little unclear and then you know when I first took on this chief genomics officer role I was interviewing all the clinicians in the different clinics Asking about what's working. What's not working. How are you using genetics? And I actually asked about pharmacogenomics and a number of them in places where they were ordering at least one marker And I said well, you know, if you could just for the same price order a whole panel Which then would have markers that might be useful later Why not just do that and the answer was I do not want the liability of all those other markers From a test I ordered that then gets buried in the medical record not accessible to any sort of support And it's my back to like figure out what all those other markers do and is the patient on any other drugs And for the lifetime of that patient managing that data So really at the end of the day you have to have an EHR system that has clinical decision support And we actually don't have that a mass general brigham today Unlike some of the you know hospitals that are better leading than those areas So I think eventually absolutely that will be important But you have to have a health care system that can support it Or it's not going to be utilized that effectively sort of work So you mean putting it into the medical record in the background and only when the drug gets prescribed Exactly so it's got to be point in time sort of guided and the epic group is working with our Our genomic knowledge stream work stream and global alliance and they're building that support to bring in VCFs into the electronic health record So because we can't rely on everybody building their own pharmacogenomics tool So I think once it really gets instantiated as a common feature in the HRs and we can bring Data into those pipelines then I think it'll take off much more regularly