 So, thank you very much for inviting me to give a talk here today. It's been a really interesting meeting and I must admit I was really struggling as to whether or not my talk was going to be on message or not. So I think it's approximately so anyway. And Tim I'd really like you to get up and do a little dance about two minutes before I finish, okay? So I'm a public health doctor and epidemiologist and probably know less about genetics than most of you in the room. So today I wanted to talk to you a little bit about, sorry I can't get this working to go forward, okay got it, okay. So three things that struck me about yesterday where Professor Kari saying we need to do the studies to provide the evidence base for clinical utility. Then I think I'm paraphrasing here so excuse me if I'm putting words in your mouth but I think it was Dr. Erinsson that said if you're going to put things in BIN2 you actually need to state clearly what you need to know to get it out of BIN2. And the third thing that I think Elaine said was that her priority was to have better data to establish whether variants are likely to be causal. So that's a huge span of activities I think that needs to be considered. Now the PHG Foundation in the UK, I think Tim and Paul were involved in this, recently produced this really comprehensive report on the future with whole genome sequence and what it means to the NHS in the UK. And one of the statements that they've made is that the NHS presents a wonderful opportunity to implement whole genome sequencing in a way that's evidence-based, systematic and efficient. So I think what I was going to try to sort of talk about today was how can NHS data really be used, what are the practicalities and some examples about how such large scale data might be usable for answering some questions to help close the translation loop. And I wanted to use MODI, maturity onset diabetes in the young, which you'll all know is monogenic diabetes as a sort of a working example of that. So I'll just briefly mention something about electronic healthcare data available for research in Scotland. Mention our bioresources and how we lengthen to data and then talk a bit about MODI. It's very unresponsive. So the key points are that in Scotland we've had a unique healthcare identifier for quite some time now available on all healthcare records. So it follows the patient around with every healthcare encounter that they have, okay, and through to death records. And this unique identifier permits linkage between many different available data sets. So for example, we can link together all hospital admission records going back to 1981. We can link to maternal and child healthcare records. We can link to psychiatric records. And in some circumstances we can link to primary care data. Though the expert on primary care data is here in the room so we can pick his brain after it, John Parkinson, who manages GPRD. Importantly, through a welcome trust funded initiative of the Scottish Health Informatics Program, a bunch of us led by Professor Andrew Morris who works in Dundee where I'm also based have been working through a wide range of issues, including, importantly, some of the governance issues and data safe haven issues for how you actually collate and use those kinds of data. To mention one or two bioresources that we have because REC said at the beginning, we need to really ask ourselves, do we have enough actual established bioresources and available data sets with good depth of clinical annotation? So here's one example. Generation Scotland is a triumvirate, really, of three studies. Probably the most interesting one is this one here, the Scottish Family Health Study, which is now a completed collection. It has altogether about 24,000 patients in this study. They're actually sampled from the general population. They have extensive and deep phenotyping done. It's a pedigree-based structure so it comprises about 7,000 families. And importantly, we can link all of the data to all of these other routine health care data sets that I mentioned. So it's an enormously valuable resource for research and it's open for people to apply to use it. Sorry, I'll just go back. Can I go back? Right click, that's what I'm doing. But it's not working. Okay, okay. So as I say, a depth of phenotypes on here won't go through them all. So there are studies, specific instruments that are used in Generation Scotland. Now I'm going to turn to diabetes data. In Scotland, we have about 250,000 people with diabetes. About 25,000 of them have type one. We have an electronic health care record that is used throughout the country. It's used by most hospitals as its primary EHR. But even where it's not, there is a feed-in to SkyDC, as we call it, from the EHR that that hospital is using. It also receives a nightly feed of key items from every primary care physician in the country, bar three, on quite a lot of data, including issued prescriptions. So that's pretty much what I spend quite a lot of my time on, is studies that are built around this data set. And we can link the data in this data set to, as I say, all of these other records systems here, database systems here. And importantly, we have built various bioresources. So the Wellcome Trust funded case control bioresource, for example, linked to these data in Tayside, has been pivotal in a lot of the replication studies for the major finds of genes for type two diabetes in the last few years. The type one bioresource is something I'm building at the minute. We started collecting in February, and we've biobanked almost 3,000 patients so far. And our focus with this, ultimately, we want to get to 10,000, is really very much around diabetic complications. Sorry, I'm just not getting on with this at all. So this is what happens when Tim tells you to hurry up. OK, so we have these bioresources, but you could just as easily substitute the brave new world of what if you have next generation sequence data on all these patients, and even what if patients get to themselves and want to upload it and append it to their clinical record? Because we do have a patient-facing aspect of the data set. So we might want to think about that. So that's a bit about resources. Oh, God, two minutes. OK, so MODI. Most of you know about MODI. 80% of MODI is monogenic diabetes, but it's usually clinically misdiagnosed as either type one or type two. We've known about MODI and we've known about the genes that underpin it for many, many, many years. This work by Andrew Harris-Lee's group, who's the world's leading expert on MODI, has shown that we currently diagnose less than 20% of all MODI. I don't know what the figures are in the US or how anybody's looked. I know that EGAP, you have it on your list, but you haven't done your review yet of it. So why? So it's a perfect example of an actionable but unactioned variant. And maybe we need to take stock before we worry about all the stuff we're going to learn tomorrow about how lousy we are at implementing what we know today. And so there's a whole bunch of issues here in relation to this, which I won't go through. But the question is, how do we do studies that actually solve some of those issues? Now, first thing is there hasn't been a cost-benefit analysis. But the different data elements that would be needed for that still need to be generated. So at the minute, some of the studies we're doing are using our bio-resource, link to our national data set and in conjunction with Andrew Harris-Lee, doing a study called the United Study, where we're evaluating certain algorithms for prioritizing who should get sequenced. So that's an important thing. And that's part of the action ability as distinct from the clinical utility space. So one of the real issues here is to bring down cost to improve that cost-benefit ratio. The question is, should you stratify first biomarkers and family history and clinical features from the record? If so, how? Which ones are best? Which ones yield more? Which ones are most cost-effective? And interestingly, this is one area where you might want to think about other biomarkers in your biobank being really useful for telling you something about who needs to be sequenced. So a fascinating aside here is that a GWAS of the plasma glycomb recently revealed HNF1-alpha, one of the main genes for Modi, to be a master regulator of fucosalation, opening a whole field of using end glycan branching assays as a diagnostic test. So that's one example. Another example where I mentioned randomized trials is exactly in this field of clinical decision support. So one of the things we're trying to design at the minute, which we can do is not exactly randomized, but what we can do is we can implement in different parts of the country a clinical decision support tool to prompt the potential screening for Modi. And we can compare then how well that's achieving an increased yield of cases in comparison to the status quo. So that's something we're trying to design at the minute. I've mentioned about the studies on looking at the different algorithms for how you might approach something. And then I think an interesting thing to go back to Elaine's question about how to infer causality, we might want to consider what the future will look like if you've got lots and lots and lots of sequence data that you just happen to have on people. And you have lots of phenotypic information, how we best exploit that. And at the minute, I think this is an idea that I cooked up with my husband when I was talking to him about coming here this week, because he's been doing some work on basically detecting, using GWAS data, increased regions of IBD sharing. So if you have lots of GWAS data in your population, you've already got some Modi cases that have been sequenced. You could look at the IBD sharing and actually say, hang on, some of these patients who appear to be type 1 actually have excess sharing with some of my no Modi cases. Is this a route to detection? So you could do a whole sweep of your population to evaluate people. Anyway, just a thought. I think it's worth thinking through. Summary and conclusions, we need to harness the power of EHRs linked to buyer resources to complete the translation loop. We can do the clinical invalidity and utility studies, but that needs money. And we can also think about the more actionability questions, including actually doing randomized comparisons of approaches to actionability. But it needs demonstration projects and systematic effort. And I would say with some careful consideration about the feedback of GWAS data at the minute in these situations. And finally, any effects of reporting back need to be formally evaluated, so as to feedback into clinical utility. I'll stop there. Sorry about the slide, Donson. I'm interested in the consent that you operate on. Is all the data anonymized? Or do you have access to identifiable data? And what consent were these collected under? OK, so we have two different systems that operate. We have one system, if you just want to do what I call dry data analysis, where you're linking records together, and they're completely anonymized, de-identified, et cetera. For that, what we do is we have a system whereby we have a privacy guardian as well as ethics committees. So we have to get a privacy guardian approval for everybody. But also, increasingly now, what we've set out is a blueprint for a maximal secure way of utilizing those data. Even when they're de-identified, our new system is going to require passports for validated researchers. It's going to require the data to reside within data-safe havens and so forth. So that's that level. Then in terms of bio-resources, they are all, by definition, individually consented. So the patient comes in, and we consent them for collection into the bio-resource. But we also consent them for retrospective and prospective linkage to their clinical record. And then a third form of consenting that we do with patients is, at HAC, as patients come into the clinic, we consent them also into a research register, which allows us to provide them directly with information about studies we're trying to recruit from without necessarily having to go through their primary care physician again.