 It is a pleasure to be here. Did you notice they just found flowing water on Mars? Did anyone notice that they just announced apparently they found flowing water on Mars? So on Mars Sounds funny with my Scottish accent Mars. That's the planet the red planet It's not supposed to have any water on it. Anyway It's great to be here off a red eye. I might be a little spicy But I when we when we planned the meeting we thought that one opportunity might be for me to focus a little bit on some of the Technological aspects of opportunities potentially for Caesar 2.0 here my disclosures So this is not news to anyone in the room, but I think it bears repeating genomic medicine is here It's actually here today. We're practicing it already. You're familiar with many of these stories That we've seen there are there are stories like this happening all the time. It's actually here in mainstream medicine today Both for rare disease and for non-invasive prenatal testing. So we've arrived. So that's good The exploration at least to the limit can stop and the next phase of actually moving this to mainstream can go But a few years ago five years ago We thought when do we get to the point when we get a medical a genome in every medical record and we're certainly not there yet So I think some exploration into that world is what is appropriate for Caesar 2.0 This man said this this phrase Satya Nadella actually the current CEO of Microsoft quoted Bill Gates said most people overestimate What they can do in one year, but underestimate what they can do in 10 And I think this this mirrors what the comet that Eric Green was making this morning about the massive Progress from the from the start of the human genome project I think we couldn't have imagined being where we are here today Question is how do we get from here to there probably more likely from here to there? And and in order to do that we have to expand clinical utilities. So just mirroring a little of what Robert just said We have to choose use cases for going forward We have to design implement thinking about what I said this morning We have to learn from what we do and have a virtuous cycle We have to build the evidence base for effectiveness first build cost effectiveness second because you can't assess cost effectiveness without Effectiveness and include pairs early. You may have seen some of these anus ICD 10 codes I learned last week. There are nine for being attacked by a turkey Including being pegged by a turkey being struck by a turkey and other interaction with a turkey Actually true. So what are the challenges in front of us if we want to get from here to there? Well, there they're these a genome is actually pretty complex You can't call it in relation to variants if you can't see it and the technical performance of our algorithms It's actually kind of upside down it was optimized for cohort variant discovery in the in the days of GWAS and not for the n equals One that we face on a daily patient bases in our clinics This is the sort of slide that keeps me awake at night or rather waking in cold sweat Repeats make up 50% say 56% of the genome if you're sequencing with short reads you do not know where the short reads came from and Particularly in relation to some disease that are diseases that are caused by repeats But also just in general knowing where short reads come from There's also a lot of paralogist sequence segmental duplications gene family pseudogenes is probably 8,000 of those and all associated with varying constraints So they will vary differently this new data From Evan Eichler with the high-deticidiform mole looking at with the pike bio long-read sequencing gives us an insight into long reads and structural variation That we just haven't been able to see yet and moving forward We have to start to incorporate that in when we think about the technical accuracy of the genome sequencing We're doing for clinical medicine So coverage is one challenge here. This is from the exact servers. We mentioned many times today This is KCNH2, which is one of the long QT genes with long QT came up just earlier in Bob's talk You'd think yeah, then the coverage should be pretty even of course. It's not this is not again news to anyone But look at the the exon towards the end You think that might be an important exon to cover properly especially since if you look in ClinVar There are 19 pathogenic or likely pathogenic variants in that exon but you can't call those if you if you can't see them We think whole genome is better, but in fact, it's probably not a lot better at 35x This is the percentage of the gene not covered for the 56 ACMG genes You can see and remember that in the days of Sanger sequencing You would not release a clinical report if one base pair of the coding sequence was not called You just wouldn't release it and although we have made enormous strides and many discoveries that wouldn't be possible Otherwise, we have reached a point where we accept 10x coverage of 90% of the gene and call it a day I don't think we should and we shouldn't for for Caesar 2.0 We're good at calling certain classes of variation single euclid type variants the concordance between two platforms complete genomics and Illuminous shown here on the left in blue 99% concordance is pretty good But the concordance for small indels arguably a more important a variation of because it's more disruptive to the open reading frame Is only 55% we should be able to do better than that In fact, I think you probably say we we are better at calling Single nucleotide variants that are overall less likely to cause disease than those that are much more disruptive such as structural Various variants and indels and I think that's only because we haven't paid attention to that I think there is nothing specific about calling Insertions and deletions that is particularly any harder than single nucleotide variants, although we could certainly have a long discussion about that So what is the answer? Well, and one of the questions I was asked to address is does it start with an e or g? You know is the future an exome world or a genome world and so I think you Snapshot Spoiler alert probably somewhere in between Get coverage. This is a picture from my native city of Glasgow where it rains 300 days out of 365 a year We were so concerned about the coverage issue that actually we started a company and you know I'm showing you a graph from this company But actually many other sequencing sites have now started to fill in the holes in exome sequencing And this is good because you can see if you know where the hole is you can go cover it more Particularly the first exon is a little harder because you need to do smarter things with the chemistry to really cover first exons properly But if we know where the problem is we can shine a light on it be transparent about it We can fix it If you can do it for one gene you can do it for all the genes that's shown here And then here's for the 56 acmg So this is a graph a bit like the one I showed in two or three slides ago And this is a few different exome providers and genome at the bottom so this is to the question of should we order an exome or a genome and So you can see now we're doing base pairs along the x-axis So this is the number of base pairs that are not callable at 20x q30 bases And you can see the exomes actually do pretty well for these 56 genomes the new augmented exomes that you can either buy off the shelf Or order from a provider, but the 35x whole genome doesn't do quite as well Even though we're moving Theoretically to a genome world and that's for tenfold the amount of more data Not necessarily ten times the amount of pricing depending where you go You'd think that the genomes would do bet better for Inclusion of UTRs that's one of the extra things you get from the whole genome A little better, but even the newer chemistry is shown here at the high-seq x chemistry on the left is Actually worse than the high-seq chemistry from prior and so I think having these kind of metrics and in the undiagnosed diseases Network Which is where I mentioned of course this morning, and I'll shout out again at the end We've actually been developing these metrics further so that when new sequencing chemistry happens and the sequencing providers provide New approaches then we can pull the same metric out and apply it so that we can see what changes Currently in our report at the Stanford clinical genome servers We actually show the number of base pairs that are not callable per gene So knowing the enemy is important and bringing the groups together around The communities National Institute of Standards and Technology has a genome in a bottle Consortium as you know they actually have an outposting at Stanford and so that we see them on West Coast time and not uncommonly You've heard of that the sum of the dollars from the precision medicine initiative going towards the FDA for precision FDA and a new And new arrangement where there's a collaboration with with DNA Nexus and most recently Stanford with the first meeting just last week So one of the things that the misconsortium has done is talk about areas of the genome that can be called with high confidence And those that can be called with lesser confidence So it might be important to know that when we're calling these and so showing here is the number of bases from the ACM G genes that are in the high confidence regions again You would hope that they would all be up at number number 100 percent in reality It's much lower than that and that's just to point out that there are areas of concern that we can approach if we can see them Channel light of them we can potentially find a solution Most recently just two weeks ago from the NIS consortium a new trio from the PGP has been put out with long Resquencing and of the trio from multiple different platforms And we can start to do an analysis this one done just in the last 24 hours From that okay one couple more points to finish the strength of family analysis again in the undiagnosed diseases network We are doing 2.2 extra family members in addition to the pro band and the reason is shown here This is the number of genes on the number of variants for a compound heterozygosity model for a family-based analysis of a new developmental disease You can see that if you only have the singleton you look you're talking about 400 genes and 1400 variants That's all that's a lot of time for manual curation But by the time you can include inheritance state analysis of the trio or the quad you can get that down to something that's actually manageable So in terms of designing better algorithms, I mentioned that insertions and deletions were particularly problematic Small ones were not too bad as a sensitivity on the y and indel length On the X for haplotype colon unified genotyper from GATK As you can see the sensitivity drops off markedly after two to three base pairs So there are groups now such as this group in Oxford starting to think about how to improve that I think if again if we just shine a light on it and we can start to develop better tools So this is in closing I just want to say that I think it's possible that we can get to where we were before with Sanger sequencing where we have a 100% coverage of the genes that we're interested in but then what is the best test? Is it an exome where we know that those quoting bases are well covered or a genome where which we can use better When we can do structural variant analysis and others and really going forward the best test might be one That includes some whole genome coverage, but it's actually augmented around the genes So that we can actually know that we can call every coding base pair To be cost-effective It probably will have to include some augmentation of the genes because currently it looks like we might have to get to a 50x or 75x whole genome in order to get the coverage that we would really want also Not really mentioned here, but this would allow higher sensitivity for calling mosaics. So finally Just this slide about what I think about the current and future landscape of genomic medicine We talked about what's already here, which is the box on the left circulating cell free DNA, of course on the right cancer I think is coming Pharmacogenomics has been out there and I think many of us are disappointed that it's not more mainstream that it is now Despite great work from many people including in this room But it's coming and I think there'll be a tipping point for that and it'll suddenly just turn Infectious disease and complex disease are also coming final slide a shout out for the undiagnosed diseases network I've done it already mentioning it twice, but the gateway opened last week And so we're happy to be to be out and ready to to do business. All right. Thank you