 Felly, nesaf i gael ei geniwna i'r unrhyw unrhyw o'r hunain rhywbeth a gweithio'r rhan o'r hanhwyll. A'r bwysig o'r bwysig ymlaen i'r Unedig Ymryd Gael Ymryd yng Nghym Unrhyw Gael. Ynno, mae'n ddweud y Prifysig Ymryd Ymryd Ymryd, ond mae'n ddweud eu cyfnod ymryd ymryd. Efallai yn Ymryd Gael Ymryd Ymryd Ymryd ymryd ymryd, ond mae'n ddweud ymryd Ymryd Ymryd. NHS England, which actually is a formal organisation now, 1.4 million employees, 110 billion per year of spending and it got reorganised substantially in April this year, last year. If you want to see anything about the structure, there's an all government website, so just one website now, which has been a quite successful project and you can actually find stuff on it if you go here. And yes, I thought that would lay some laughs. So you can explore the full horrors of all the different entities we have that regulate the system and manage it, etc. Okay, so that's that. Now, so the fundamentals behind all this, I think, have been for a long time that we've had this genomics world down here in research and then there's been the health world. And in the UK at least, the health world has been going electronic for a long period of time. There has been quite a lot of money spent on electronic health record systems, not so much in hospitals, but certainly in the primary care sector. You know, it's fairly universal, actually, a few systems across all GPs, but that's not been available for research. So, you know, you'd like to get this genomic stuff into here. There's a little bit that has been happening at the single gene level, the 23 centres across the UK, which provide clinical genetics expertise and be doing single gene tests, etc. But there hasn't been this flow backwards of electronic health. There hasn't been this accessible for research. And so there's been this long process to try and fix this. Trying to bring together funding agencies. So there's the MRC, the Wellcome Trust, the sort of academic funding agencies which sit in one ministry. And then there's the health service, which has its own research funder, NIHR. And there was a process in 2006 to bring them together, at least linked through this Office of Strategic Coordination of Health Research. And that's led to particularly the e-health things to try and enable the electronic health records for research. So that's the trying to deal with the research accessibility of EHRs. And then the other side, of course, is the genomics for health. We've had the House of Lords report, which was referred to earlier. And then the Human Genome Strategy Group, which was a sort of department of health process to try and work out how you might implement this in the system. So that's ends in 2010. Kind of in parallel with that, the government was getting concerned that the UK wasn't such a good place to do pharmaceutical research. It was actually quite triggered by Pfizer closing down its research establishment. And so they have actually got together to produce this document of a life science strategy for the whole of the UK. And quite a lot of different things within this in terms of funding, support for innovation, et cetera. And some of those things were funding this bit here, so ensuring that there's actually a database for anonymised access to primary care records. And that's one component, but then you need to build capacity around using that. And so that's something else that's got funded by a mixture of these different agencies. The Fire Institute, which is four entities, just been set up, launched last May. And that will build capacity across the UK in using those records. And then later on, we've had the Human Genome Strategy Group reduce its report on how to implement genomics medicine in the UK. And in the update of the life science strategy at the end of 2012, this announcement of the 100,000 genomes project. So this was kind of recognition that in order to get this working, the health system wouldn't implement genomic medicine because they wouldn't see it being cost effective. So it needed some priming to get this thing started. And so that's what this commitment is, to prime it to get it started. And the mechanism for doing that is the creation of this entity, Genomics England, which is a company wholly owned by the Department of Health, just to set up as a procurement entity to manage the system. So now you've got this side here. Of course, these are going to feed into association with this for clinical health. Essentially, you're closing the loop up in terms of being able to feed back. Since it's seen that obviously it's got to be about delivering this, but also understanding and producing the evidence in parallel. So where are we with this? So the mission is 100,000 whole genome sequencing, not exomes, to improve the health and wealth of the UK, build a legacy of infrastructure to be able to handle this class of data within the health system. And we have this 100 million over the next five years. It's quite a large scale because if you think Sanger has done the UK 10K project on 10,000 genomes, where it's actually, they're not all whole genomes, some of them are exomes, so it's low coverage, whereas this is all going to be clinical, what we think of as a clinical level, 30X or more. There's actually a limited amount of that around globally right now. It's mainly in the cancer projects to national cancer genome, consultium and TGCA, and also there's a 500 project which has been done at Oxford recently of individual patients to see if you can interpret that. So it's a big step up from that. There's been a lot of discussion about whether we should be doing a million exomes or 100,000 genomes, and basically we've decided that it's the time to go to whole genome, because there's all these things that you don't necessarily get if you go targeted or you go exome that you'd want to know later. And it is meant to be delivery, but obviously it's a research data set as well. So the targets are rare diseases. We already have the single gene tests which are in these established centres, but there's a limited number of things that get interpreted through this in terms of diagnosis. If we start doing whole genome sequencing, then we think there's quite a lot of evidence, as we mentioned previously, that you'll end up diagnosing quite a large fraction of individuals. And there's already these research consortium, of course, like, for example, DDD, the rare diseases consortium in Europe, which have already done this. So we'll work with those groups, leave rejecting off their expertise. The other side, of course, is cancer, and this will be the selection of cancers. There's already some documents on the genome mixing website about some initial suggestions of prioritisation, but it's really going to be driven by this sort of evidence, incidence and survival, and go after the ones which have low survival rates and substantial incidents. And then pathogens, there's also commitment to do genome sequencing pathogens, but this is going to be coordinated for another entity that got created in this reorganisation called Public Health England, which takes over from the Health Protection Agency in the UK. So, obviously, it's a slightly different thing in terms of sequencing pathogens on our scales opposed to sequencing whole genomes. So, we're going to build a data infrastructure for handling this. Obviously, sequencing centres, collection of samples, and incorporation of data coming out of the primary care records and the hospital systems. And that will be fed back to the clinicians. But we'll also build out of that this integrated data set, and that will be accessible to researchers. Now, the way it will be accessible to researchers, there's a very clear statement that this data is not going to get distributed, but if you look at the, we've had open data release for the genome and things that Francis talked about, we've had managed data access, which is, of course, quite hard. But this is health service data with clinical records, and it's not felt that, of course, people can maybe be able to release their own data if they personally want to, but if they're a patient within the system, the data is going to be kept within the health service. And in order to do research, you'll have to bring your algorithm into the system. And with the kind of infrastructures that are now possible, virtual machines running inside a sort of local private compute cloud, we think that's possible. So the global alliance model of everything goes to the cloud, I think actually it's going to have to be modified. It's going to be federation across different countries' data because, you know, certainly right now, clouds are not the most popular ideas in terms of privacy and protection of data. So that's the model. You bring your algorithm, you run it, you can only take away the summaries. You can't take away the raw data. In terms of how this is being implemented, there's a number of phases. So phase one is running right now, which is bake-offs. So the other thing to say about the whole genomics England, it's a thin procurement operation. So it's about 10 people, and we're going to buy services. So we're talking, we've sent some samples to some sequencing providers. This is not a question of us building sequencing factories. We're going to contract providers to build those and do the services. And in the initial phase, of course, samples have gone out to wherever they're doing their sequencing. In phase two, which is a pilot which will run in 2014, will collect about 10% of the amount of data. These will mainly be collected almost under a search sort of consent type of arrangements. It's the main study, which begins in 2015, where this will be completely integrated with the health service and we'll expect sequences then to be in the UK, and the samples will not go out to the UK and the data will not go out to the UK. Obviously, there's also an education component of this as well. So the bake-offs, so in terms of building this whole thing, obviously you've got to go from sample to clinical action, and we're seeing this as a set of different procurements. So obviously this data procure some sequence and we're expecting, at least in our initial bake-off, we've required the sequence providers to produce BAMs and VCFs and we're going to put those in our database. But we're separating that from the annotation. We're then expecting that to be the substrate for annotation and we're going to do a competition for that. It's going to be in our next phase. And we're expecting this to be entirely automated and that to feed something useful for the clinicians inside the health service to use. Now, of course, there's a tall ask to be able to do all this, but that's what we're hoping to build a market essentially, a market for sequencing and a market for analysis and to have different groups compete against each other to improve that. It's not even clear, in fact, what's available right now, what level of utility the outputs of this are. And it may even be that there are three steps, that various people have said, well, we can do better at refining the interpretation of the raw sequence. We're not so good at this, but we could set up something in here. We'll assess whether that adds sufficient value as well. So sequencing assessments in progress. It's obviously on quality, accuracy and coverage. Annotation assessment is much harder because there isn't really a gold standard of what the right answer is and there's not a data, there aren't really good status standards either. Obviously this will have to be integrated with clinical data capture as well. There are assessment exercises out there, which are the clarity challenge and the cagey challenge. They haven't been actually quite the same though as being a real patient necessarily with just the data that the doctor has. We're going to provide in this assessment exactly what comes out of the health system and that will go into these people, these annotation providers. We'll assess how good they are at identifying pathogenic variants, how good their clinical reports are in terms of interpretation. A key point is how fast can you turn this round, particularly for cancer, and can you operate at scale? There's a big difference between a research group that has a wonderful algorithm they've published and whether you can handle 100 whole genomes a day and be able to cope with that throughput and deliver in a timely fashion. So this initial work that's going on right now, which will select suppliers for this pilot in 2014, in the next phase will evaluate that and use that as a basis for the main programme. But when you're running in the main programme as annotators, you'll be expected to adopt all these things like, again, the data isn't going to go across the world to your data centre, you'll have to run the annotation software inside our data centre as virtual machines and you'll be subject to the kind of things that have been alluded to in FDA terms of being of compliance requirement for the software. So it's a kind of app's type of vision for how we're going to handle the analysis. We're expecting a competitive app's market to develop. So that's the summary, 100,000 whole genomes by the end of 2017 and there's a lot of, you know, there's some scepticism but in fact now there's quite a lot of engineering around this and engagement going on in the UK and certainly in the academic sectors being aligned with this in terms of infrastructure because obviously the key thing here is to collect the data, to store it and to send it through the clinical outputs but obviously a big benefit is other people using it for research, particularly the academic community, so making sure that that's possible. And so I'll just acknowledge Genomics England where I partly work one day a week and all the other various parts within the NHS and the Department of Health, and of course a lot of this is built on all the stuff in the past around Sanger and working trust and NIH. Thanks very much. Tim, I'm just wondering if you can give us your thoughts on how one, in quote, validates findings that are rare. I understand contact tracing in the infectious disease realm and host tumor sequencing. But except for some rare disorders for which I guess the evidence is quite, you know, very tight, what does that really mean? You mean this is the annotation assessment? So we don't really know. I mean we have an advisory board for that sector. The initial phase of that is really going to be whose software actually runs and produces an output that's intelligible and to kind of assess what the range of different software providers are able to produce. Some of them will produce integration with literature. Some, you know, how useful is that to the clinician? It will still rely on clinical expertise. But in order to make any sort of real assessment we'll have to do hundreds. And so that's the idea is to filter them down in the pilot, in the phase one, Bake Off, and then to assess them in a better way throughout 2014 and procure multiple providers during 2014. Jeff Ginsburg, I have a number of questions. I'll just keep it to one or two just to let the folks behind me. So you started out saying this was a prime-the-pump activity that would be a demonstration that the UK should sequence 50 million genomes as opposed to 100,000. When are we going to know the answer to that question of whether this is the right strategy? 2017. That was an easy answer. Age of 2017. So that you didn't really talk about how you're collecting the cost data around this project that's going to really enable you to make that argument. So you mean in terms of how the... Oh, so the economic analysis stuff. Yes. No, I didn't. I think that we're doing this very rapidly and really we're trying to work out how to actually put the engineering together, particularly the connectivity of sample flow and right now that's the biggest priority to get sorted out. We have set up things to feed the pilot which is running right now. The phase two is running right now actually for rare diseases. But it's got to be fed to a much greater extent in phase three, 30,000 a year. That's quite a lot of connectivity into the health service. So there's quite a lot of engineering. So we're worrying about that right now. There's only ten of us. Okay. My question is about the computing infrastructure for supporting that. How large is that, for example, storage and computing nodes? You mentioned the virtual machine is used and some data center will be used for that. And also in addition... So none of this exists yet? We're anticipating a BAM file, 150 gigabytes per sample. Now, whether we need to store that for our processing clinical pipeline, maybe it's enough just to put that on tape because in fact VCFs, if the pipeline is good enough, maybe VCFs are good enough. But the researchers certainly don't want the BAMs thrown away, not right now. So we're actually looking for funds to build a data center large enough to hold all those for research purposes. And this is very like the million genomes proposal by Housler in terms of how you might go about this. You know, these index file formats where you can just... You don't build a database. You certainly don't... It's too big. I mean, we're talking about 50 petabytes of data, perhaps. 50 petabytes. If you store at this level. But there are all these compressed formats. Basically, we don't know how to compress genomic data yet. We don't know what to throw away. We don't think in this period of three years the community is going to work out what to throw away. So we're going to have to store it for those three years, at least, and that's the kind of level, at least from the research angle. And then in terms of compute, you know, all kinds of estimates, maybe 20,000, 30,000 cores in order to allow research activities over that. But the sequences which we'll procure from we're expecting them to do the analysis and provide us BAM files and VCFs. The annotation will have to run on our system, but the demand, the load there is much lower than the initial processing. Okay. Thank you. Yeah, Irwin Bodd and Jane Brief. I'm interested in hearing about the considerations and the approach to diversity in the populations in your 100,000 sample, which I assume is considerable. Yes, but this is not a research study. This is the idea here is this is driven by clinical need within the health system. So in phase three, those 30,000 will be tied into who's in the health system where we think there's most benefit to their treatment. And so this won't be taken into account that much. Obviously in terms of which centres in England have good collection of secondary records within the hospital system, for example, that will be a determining factor. There's no point in sequencing if you haven't got that data. But it's not going to be stratified in a particular way from a research perspective. That's fine, boring.