 Felly, rydyn ni'n gweithio gydag 100,000 genomes perrogett, sy'n gweithio'r rhanig yn y UK. Roedd y gallwn ni wedi bod ni wedi ei fydd yn ddod o'r project. Felly, ydych chi'n gweithio'n cyfrifio, mae'n gweithio'n gwneud o Gymru Genomics England, y cwmpernig sy'n gweithio'r Gwmpedd y UK, mae'r gweithio'r Gwmpedd y UK. Mae'n gweithio'r gweithio'r gweithio'r gweithio'r gweithio'r gweithio'r Gwmpedd y National Health Care o gymhysig o NHS. Roeddwn i golygu. Mae'r gwirionedd gweithio'r llwyr roi'n amser gweithio genom yn Gymru Genomics England, ond yn cael gallwn i'ch gwladio'r Gymru Genomics England ac mae'r gymryd o'r rhanig hefyd, asddwn i gydag hyn yn siaradu cael ei cyfrifio. O ffrif wneudol, y cwmpernig sy'n felly 100,000 genomes ar bobl gasoddiad o'i 70,000 o'n gwneudon i gydag rai rôl caerhau neu melw. A yw ei ffrifio'i cwmpernig nag, for the rare disease families we try and get trios whenever possible and the key thing here is this is like not a genomic research programme it's it's all about sort of trying to transform our national healthcare system so it's a it's a very sort of different beast is sort of doing a research type programme with more than half of the NHS trust in the UK involved in this now and yes you'll see at the end this has been rolled out completely to the NHS and the mission so obviously to bring benefit to patients that kind of goes without saying but also to enable by medical research you you'll see how he's setting up the research environment to try and speed up translation and to stimulate the genomics industry particularly the UK genomics industry you can you can imagine why the government was kind of interested in this aim and obviously to do this in an ethical and transparent manner and just a sort of snapshot of where we were at the beginning exactly a month ago since the point of year so we've actually recruited more than 100,000 patients now we've sequenced over 80,000 of them I think we're up to 85,000 now actually the majority are rare disease genomes or genomes belonging to the patient or their family members and 17,000 of them are cancer and because of the we needed to do cancer using fresh frozen tissue this is actually surprisingly hard to recruit cancer patients and train the pathologists yeah not to sort of put the biopsies straight into emparafone formaldehyde and then so far we've reported on about 28,000 of the genomes coming from the rare disease side and so the diagnostic rate is around 20% overall like yeah for some categories like intellectual disability it's around 40% there's some there's some quite obscure categories we also recruit that are likely non genetic and our diagnostic rates for those are at one or two percent this is very much a pilot yeah we've got things like familial colon cancer where a lot of the patients we're recruiting are likely not to be sort of single gene and this slide just shows that you won't be able to read this from the back but we'll be covering all the major rare disease areas and all the major cancer types and then as well as like sequencing 100,000 people we also when we like enroll participants into the program we collect a lot of extensive data which i'll get into in a bit but we also bring in all the electronic health records we have available in the NHS into our research environment so things like the mortality data the pathology data and the biobank data and the hospital episode statistics we've brought in so far and then we're starting to bring in some of the general practitioner record data as well so this is all made available to researchers in the environment which brings me to the next point so the way we set up our research environment and allow researchers to access this and sort of try and do this in an organised and coordinated way so it's through this genomics england clinical interpretation partnerships or G-SIPS for short so i guess this is our equivalent of the kids first data resource portal so we set up this research environment and we provide access to the researchers through this mechanism and this all goes back to Chris's first talk where we're trying to speed up that translation of academic findings into sort of clinical benefit for the patient so the idea is to have this environment where you bring together all the data the clinicians you work on these patients and the academic researchers and in industry to actually speed up this whole translation process yeah and then talking about industry so i've talked about the research environment where we have the academic stakeholders the clinicians and our internal genomics england stakeholders and then here we have the discovery forum where various companies have joined and signed up to become members of this forum yeah to speed up this whole process of eventually arriving at new diagnostics and therapeutics so we've kind of seen how that's kind of decreased over the years and the idea is setting up this type of mechanism is to try and reverse some of that trend and then yeah i was just going to give you one success story just to sort of demonstrate how we work and the the type of work we do so today the programme has very much been involved around sort of clinically diagnosing patients with known disease genes so this is a good example so this little girl here jessica presented with epilepsy and developmental delay and she'd had all the standard genetic tests and then we ran the sort of bioinformatics pipeline yeah doing the the typical thing starting with the 6.4 million variants in our whole genome sequence and narrowing it down to the rare ones the ones that affect protein and then looking at you know which variants were different to her parents so it's likely to be a de novo variant we're hoping and then the other thing we do is curate gene panels for each of our disease categories and these are it's like an expert crowd sourcing app called panel app where anyone can suggest genes that should be added to a panel for a particular disease they can put their evidence and then there's a final round the curation to say whether the gene should be part of that virtual gene panel reapply to the whole genome and using this approach narrowed it down to one gene this sl2 slc sorry 2a1 so it's a de novo denation in this gene that caused her glute 1 deficiency syndrome and the great news about this is like by putting her on the ketogenic low carb diet has managed to reduce a lot of her epilepsy also it's a de novo variant so her parents can kind of put these safely go on and have further children so that made a huge difference to them and then I'm just going to start talking about some of the clinical data in the phenotypes which is where my work comes in more so you can imagine this sort of process of going from a patient yeah there's all these different steps that we've had to build from scratch in the NHS to deliver a genomic medicine service yeah they didn't really exist or we had to like rebuild them to achieve this healthcare transformation so from consenting the patient through some sample collection sequencing interpretation and finally treatments and all these later steps really I really rely on collecting good clinical data and particularly those clinical phenotypes encoded using a structured terminology such as the human phenotype ontology so we we sort of took this pretty seriously right from the beginning so for each patient we recruit so say they have alport syndrome there's a particular set of questionnaires each one of these is a human phenotype ontology term and we try and ask the clinicians to say whether they definitely have that term or they definitely don't have that term and of course they can also add additional phenotypes but it's a very sort of structured way of collecting this phenotype data and this sort of feeds into so I'll have sort of changing tack a bit in the middle here this sort of feeds into a lot of my academic research interests particularly with the MONIC initiative so we've been making a lot of use of phenotype data in particular and the idea behind all of this is to go beyond using just comparisons of the patients exome and genome variants to like public genomic data such as NAMAD you have to start bringing in the phenotype data and comparing it to model organs and phenotypes or phenotypes in OMIM to see if we can like further narrow down the list of potential variants that are causing this patient's condition then improve diagnosis and treatment and another key aspect for us is bringing in other species data so I think everyone in this room I probably don't need to labour this point but obviously there's a lot of genes where we don't know what the phenotypic consequences in human if they're mutated but we do know from looking at model organs and databases and we can make use of this data and what we've done over the last sort of 10 years or so as part of MONIC the MONIC initiative has developed methods to computationally compare compare a patient with a set of phenotypes to like all known diseases in OMIM or Orphanet for instance so we can say yeah for this particular patient what disease in OMIM does it most look like and we can do that computationally we can put a score on how similar it looks and we can do it cross-species so we can say yeah what's the most similar mass model to that patient and what genes involved and you can imagine we can start to use this for gene prioritisation and yeah this is just like a couple of slides to say yeah we've kind of proven this approach works so like last year in Nature Genetics we published this IMPC paper that basically compares all the diseases in OMIM and Orphanet to all our IMPC phenotypes and we managed to show that we can use these computational methods to find new animal models with human disease genes and it's across all the major body systems you know from bone, hearing to the eye out of all the Mendelian disease genes that we already had data from in IMPC we managed to show for like 40% of them we could like recapitulate some of the phenotypes and key was like 72% of these models were novel yeah they hadn't previously been published in the literature there has never been a mass model for that disease before and then the example of this so Bardic-Bedal syndrome and this BBSS5 gene that's part of one of the 19 genes in that big protein complex all of which all those genes are involved in various different types of Bardic-Bedal syndrome and we found the IMPC mice are perfectly recapitulated you know the obesity you can clearly see here the retinal dystrophy and we find additional phenotypes of these glucose homeostasis phenotypes that would kind of be interesting again to go back to the human is to seeing the Bardic-Bedal patients so the reason I sort of went off track and talked about the academic interests there like with these phenotype comparison methods is we bring this kind of all together in this exomeiser software that we use at genomics england so the idea of exomeiser is to do the normal starting of a whole exome and a whole genome and to filter down the variants and prioritise them so you hopefully end up with a single candidate that's segregates with the family is in the coding region is rare when you look in nomad it's predicted to be pathogenic but most importantly when you look at the gene and you compare the patients phenotypes to what we know from human disease or from model organisms we can see some sort of phenotype evidence for that gene being responsible for that patient's conditions and exomeiser kind of automates that process that we heard in the sort of talk before it sort of tries to make things easier for people so hopefully at the top of your list of candidates you've got that that variant in that gene where there's some already some phenotype evidence that's come from the literature and being curated into these various databases oh and I should make the key point that one of the databases that's going into this is the impc genotype to phenotype associations and we've rolled this out in genomics england so now we use exomeiser as our sort of parallel pipeline to the virtual panel based approach and it's quite complementary to that approach yeah we can do 300 cases per day per sort of computational node we set up and we can spin up as many of those as we need it's recently been nicer accredited and what we find is we can find yeah looking at the known diagnosis that have come through the program so far we can find exomeiser will identify that diagnosed variant that's the top here in 71% of those cases and the top five in 92% and the next release we're kind of getting up to 97% by introducing new features like being able to deal with incomplete penetrants yeah I can skip this one in the interest of time so now I'm sort of going to try and introduce like how we can use INPC data for the 100,000 genomes project so this has gotten a bit of a challenge for this talk so like I said today we've been very focused on those like 20% of cases where we can I sort of say it's easy it's not that easy but like making the easy diagnosis in the known disease gene yeah solve those cases that we should be solving but we've still got these 80% of cases that we're yeah we're giving a negative report at the moment and this is where the those GSIP communities are meant to come in research these cases and find some new variants that explain the patient's condition so some of these are going to be variants in known disease genes but there's going to be a lot of really interesting variants like a de novo variant and the gene that's never been associated with human disease and this is clearly where the INPC data can play a role so what I can do at this stage so I can start to look at some of the positive diagnosis that we've had today and have a look at the INPC data and say yeah but the INPC mass have helped us find this diagnosis that we didn't already know this was a human disease gene this is all proof of principle and I've started delving into some of this on negative cases where what I'm finding by running examizer we can find like rare predictor pathogenic variants in in genes that are not associated with disease before and the phenotype evidence is coming from the INPC so I can show you an example of one of those but just to show you the type of approach we can use using the INPC data so it'll be a bit hard to see from the back this is our interpretation portal at Genomics England and I've hopefully blanked out all the patient identifiable stuff but this is a retinal dystrophy patient we didn't recruit any of the parents presumably because it was fairly late onset and this patient was diagnosed with a missent variant in this C1 QT NFF5 gene and this is a known disease gene evolved with retinal dystrophy so it was already on our virtual panel so it was fairly sort of easy for us to diagnose and we also sort of showed we identified it as the top hit by examizer and when that sort of delve into the examizer results yeah I mean it was the top hit by examizer because it matches that sort of retinal degeneration disease in OEMM but also if that wasn't there like if we didn't know it was a human disease gene it would still have been the top hit because of this mass match to the INPC data so we're matching the patients phenotype of retinal degeneration from our INPC database so we can see abnormalities in the retina that's what's shown by these bar charts so these are the mutants so for female and male and there's all sorts of detailed PDFs you can sort of download I don't know what they mean but there's various eye assays that sort of confirm this evidence so I can say this is a proof of principle that the INPC data can help us diagnose patients in the future another example of a patient of osteogenis imperfecta so here we had the well we had like um the the two sips and the parents and we identified this denovo variant so it was only seen in the affected female and there was a missense splice region variant in col 182 so it's causing and the HBO to me that's relevant was the increased susceptibility to fractures so again this is like an easy diagnosis to make it's a known disease gene that was the top ranked hit by aximizer but the INPC actually has some quite nice evidence for this so you imagine and the patient has increased susceptibility to fracture actually in the knockout mass in the INPC we see increased bone renal content and increased bone renal density which is kind of intriguing for you from the knockout we're getting an increase from that missense variant we're getting increased susceptibility so maybe it's a gain of function variant and it shows us some insight into the mechanism and then finally into like a slightly more interesting story to end up with so we've got this patient with cataracts and we didn't manage to make a diagnosis from all the analysis to date um but what we find when I run aximizer that's at the top here is a what's it is a missense variant that's seen in the affected pro band and also her mother that's affected of cataracts and it's in this CDKN 2 ageing and when I sort of started delving into this this gene is actually on our gene panel for cataracts but you'll see it's marked as red with low evidence so all the experts decided there wasn't enough evidence to really have this gene on the diagnostic panel and actually when I look in pub med I can't really find any disease evidence and the only reason it was on the panel is it was already part of this UK genetic testing network existing panel so someone knew a reason why this gene was interesting for cataracts but somehow that evidence has been lost but when we look at the aximizer results the reason it's the top hit by aximizer is because of the impc mouse that has cataracts so it immediately sort of shows it up as a great candidate and something to follow up with the clinicians involved in this case and as I sort of start to delve into the data more I think we're going to find more and more examples like this so that's one aspect where the impc data can help actually as part of aximizer or other software packages helping us diagnose patients based on our genotype to phenotype associations there's obviously the functional validation aspect that impc can offer so like I say I keep banging on about this 80% of cases we don't know what the diagnosis is this is the interesting challenge and we're just putting together a paper on our rare disease pilot which is based on the first 2000 families so what we find there so this is like early analysis but we've got 77 of families without a diagnosis where I can see a strong aximizer candidate in the gene that's not previously associated with human disease so that'd be great to make an impc mouse without we've got like 20 families of a denova variant in the conserved coding region in the non-disease gene again that these would be interesting these families so we've got 64 families where we've got sort of three error analysis in a denova variant in one of these conserved non-coding elements so this would be like a really nice thing to start modelling using impc production and phenotyping and we've just started to do this sort of cohort type analysis so burden type testing of comparing all the patients for a particular disease against the controls and looking at the aximizer results and saying can you see variants in a particular gene novel disease gene they enriched then we got 39 candidates from there so you can see we're starting to build up quite extensive lists of genes that it'd be interesting to coordinate I can see Sarah looking excited and then so this is all well and go be heard about the impc so if it's a null allele that's going to model this patient's condition nicely we can like if there's not an impc mouse already we can sort of feed it try and prioritise it in our production pipeline but what we've been doing for the interesting point mutations is working with the MRC program genome editing mice for medicine so this has been there's been four calls so far so it's been running for a couple of years so is it like four and a half million giving to the Mary Lyons Centre at Harwell to produce these mice with point mutations and various other more complex mutations and there's been 289 nominations so this comes from the whole UK genomics sector but a lot of these will be genomics England cases and that was like involving 31 different institutions and so 70 lines have been awarded including 42 point mutations and you can talk to Sarah Welty's in the audience about how far we've got with the production and how you get more involved in this so I think the first mice are starting to be producing phenotypes and sent out so we should have some good news stories from that soon and then just to finish off so you heard from some of me yesterday about this essential gene study we've been doing with the impc so the key finding from this is we can find enrichment of disease genes in this developmental essential fraction so these are genes that are lethal in the IPC viability pipeline but when we look at human cell essential screens they're not shown to be lethal so the genes in this bin are enriched for disease genes we've also shown they're more likely to be associated with early onset diseases and various other fractions so what we want to do is start looking at the 100,000 genomes cases yeah can we find a denaver variant in one of these genes one of the novel ones that's not associated with disease yet we'll sort of use that to prioritise yeah and then I just want to finish off by saying so the 100,000 genomes project's meant to come to an end at the end of this year but we're now moving on to this fully commissioned phase in the NHS so now like any rare disease patient in the UK when they come for genetic testing it all be centralised and they'll mainly go through whole genome sequencing which will flow into our centralised database and this is actually have just been announced at the conservative party conference I believe officially so it's going to be good news for the UK and too many people to acknowledge in the whole of the monocanish shift the whole of the impc and the whole of the 100,000 genomes project so if that I'll finish take any questions