 Un o bobl gyda'r unig iawn am y tro ymhyngon iawn gallwch i gwyn nhw'n gondol i'n med yn erbydd i'r maes o gondol. Felly, mae'n golygu'n gwneud ymhyngoddau sydd wedi'i arwain gyda'r gwyffredinol, ac mae'n gwneud frysig fel ychydig sydd y llwyddiadol. festaeth mae'i gydag 20 munud a'r 30 munud am y tais yn y peth. Fe wyddech chi'n fawr am ymlaen nhw, ac mae angen i'n y diweddol efallai yma. Ychydig serod am eich newid o'r cymaint gwych yma. I can try to shorten it somewhat, but perhaps most worryingly for me is at the start of this session you asked for how many people are researched in the room, and I think I was one of the two people that put our hands up as a researcher, so I'm speaking to an audience of people that I really don't know, I don't know where you're coming from, I don't know your background, and equally you may just not have any interest in anything I'm going to say. So you know I'm quite anxious here. To top that off I noticed on the agenda the talk is actually not at all what I'm going to talk about. It says here I'm going to talk about big data and knowledge for engineering and health. I know nothing about what that title would imply. The middle word for is actually not supposed to be there. It is big data and knowledge engineering. It's the knowledge engineering concept I'm mostly going to talk about today and how that's relevant to healthcare. So having said all that let me see what I can do. I think I should also say one more thing. I was almost arm twisted into giving this talk because I said we don't do that much with big data actually. And so most of my talk will not be about big data as such and so to try to cover my back on that I'm going to say a few comments at the start about big data, then spend a lot of time talking about the stuff I know about and then come back to big data at the end. Again you may not be interested because of that but let's see what I can do. So let's start with my first few comments on big data. Someone mentioned maybe 2012 is the year of big data hype and that resonated with me because I think everyone is talking about it but do we really have a think about what are the issues with big data? Is there a big data problem or are there different problems or are there actually in many cases actually no problems really? And so thinking about this I thought about what could the problem be broken down into or the problems be broken down into and one obvious thing is that there's a massive change going on. Is there a change or is there actually a stable rate of the data generation or the data availability? And we've just had a fantastic talk from the Sanger. The scale of data is going up so there is a big change there it obviously needs to be dealt with. But if you think of a big test goes or something like that they've got the same number of customers pretty much coming in this year as last year as they will have in 10 years time. The amount of things they buy will be pretty much the same. So the data are not really changing dramatically there. So we move on to the other points then. Is perhaps the issue in some cases that the data are becoming more complex? Perhaps they are but perhaps not. Suddenly with sequencing data it is becoming larger and more complex. But with commercial data in many cases customers coming into your shop you just need to track how many cans of beans they buy. That doesn't become more complicated five years from now. Another issue might be changing or the relative degree of change or stability in the requirement to use the data. Probably the reason a supermarket is panicking and spending a lot of money on organising these data now is just because their competitors are. So they feel compelled to start using data that's been around for a long time but now they feel they have to make use of it. But once they've got the systems up and running to process those data perhaps they're stable, they're okay. And that brings us around to a second, another very much related point there which is the tooling to process the data. Probably the reason the test goes of this world didn't worry about these issues before is because they didn't have the tools to do it. But now the ICT, the computing is making it possible, they're all just rushing to catch up. And the tooling is coming available and I think the problem will solve itself in those kind of context pretty quickly. And something else that came up in the last talk, this question of, well, is some of the data actually useless? Are we getting too uptight about all of this? And I think that might be the case in many situations. One of these sequencing files from one of these sequencing runs in the Sanger terabytes of information. Once it's been processed you identify just the list of variations, the sequence, base positions that vary from the reference for the whole human genome. You might be down to just a handful of a few thousand positions, literally a text file with a few thousand characters in it. So we can probably throw away the rest of the data. Now the argument for not doing that is oh we might need to reprocess it better ways but once we get the processing right and the quality up to a sufficient level we will be happy to just process it, throw away everything and just store that very little condensed file which actually has the knowledge in it. So we have to make a difference between the data and the knowledge. And that's really the thrust of my talk, concentrating on the knowledge dimension rather than just the data dimension. Because in different scenarios the data dimension, yeah we may need to deal with it, it may be changing or it may be not. But what we really need to focus on in my view is the knowledge dimension. So those are my first comments on big data. I'm now going to talk about the knowledge dimension and come back at the end to some more comments on big data. So this concept that I mentioned at the beginning, knowledge engineering, what do I mean? Well the term knowledge engineering was coined some time ago, a few decades ago, as being an engineering discipline. And I'm going to emphasise engineering that involves integrating knowledge into computer systems in order to solve complex problems normally requiring a high level of human expertise. And I think that's a pretty good definition of what we try to do in healthcare. So what I'm really going to talk about when I, as I say, more about knowledge engineering is how it may be, if we come to understand what knowledge engineering is and develop everything we need to do good knowledge engineering, that may be the key to really delivering optimal personalised medicine in the future. Today medicine is not optimised, it is not personalised. This is the way medicine really works today. Your average doctor has, you know, they're expected to process and keep up to date with all sorts of information. All of those kind of areas of information are changing rapidly. In reality we know the doctor can't do that. That's why you go to one doctor, you get one diagnosis, another doctor, another diagnosis. It's basically a mess, far from optimum, far from consistent. So that's the problem we're all trying to solve. People say, oh, we'll sequence everybody's genomes and then we'll have a thing called personalised medicine and it will be perfect. But how do we get from here to there? That's what I'm really talking about today and where big data might fit into that. The ideas I'm going to present to you aren't just mine. It's actually coming out of a number of international workshops and conferences and meetings and discussion for others that we've been co-organising for a few years now. They're listed here, but it's culminated in the formation of a network called I4 Health. That I4 Health network is obviously on the web and you can go and look up more about it. But the I4 at the beginning of the name stands for integration and interpretation of information for individualised healthcare. So it's this integration and interpretation, the extraction of the knowledge from the data coming through again. So this graphic is meant to try to illustrate the ideal scenario where in fact most patients who speak to think this is what we're doing today. They would be horrified if they knew the reality. But let's work this through. We're separating on this graphic on the left, research on the right medicine and on the left there you see all the different types of data we generate in our research activities. What we are supposed to be doing is taking those data and extracting from them knowledge and putting that somewhere. And from that knowledge concluding things like which bits of that knowledge are clinically useful. Which bits indicate biomarkers, tests, indicators that the doctor can make sense of and use in diagnostics and prognostics. And those things are then integrated further with the information coming from the hospital, all the different tests, all the different modalities of hospital data and EHRs, electronic healthcare records. They're coming slowly but they are coming, a digital record of everything that can be known about that patient and their medical history. So all of that biomarker information, the diagnostic test, the clinically useful bits of knowledge are all pulled together for clinical decisions, support software tools and diagnostics. That's how medicine is supposed to work and then there's a beautiful positive feedback cycle here where the hospital data and the outcomes of those patient experiences can be fed back in to research. So that if there was some current understanding that certain drugs cause a certain side effect in a certain population or certain age range, but the evidence when it was used in medicine showed that wasn't the case, perhaps a certain nationality or racial group didn't have such a side effect, that knowledge can be fed back in as a kind of positive virtuous circle. So that's the system we'd like to have in place but we don't have it. Why not? Well just illustrations of that, there's hundreds of thousands of published biomarkers out there. Less than 100 in routine use. Most of the knowledge that could be clinically useful has not been filtered out and is not available or known to most doctors. That's just the reality. So what is really the reason for this really disastrous situation we have today? It's because in our view or in the view of this Eiffel health community that research and healthcare are in different worlds, different universes, sitting very far apart on a spectrum. We need data to flow from one to the other, obviously, knowledge and data, that's meant to imply, from research to healthcare and back the other way. And of course that involves biobanks, registries, all of these kind of things. And we need to bring together the kind of people and the kind of activities over here, that's the bioinformaticians and the academics, with the medical informaticians and the companies creating software in hospitals. They live in different worlds, they're different people, different backgrounds, use different standards, really they're two different universes. So there's a massive gap between these two. We cannot reconcile those two into one by just putting more money into research and putting more money into healthcare or calling our research translational. We really need to address the fact there's a genuine gap between these two worlds. This just illustrates that. It's the total count of all publications in those four areas labelled in the corner there, some of those being medical informatics, some of those being research type informatics. And you can see the number of publications that actually cross those two areas is essentially zero in the middle, none of them. No papers in the last decade are really touching on those all four areas and very few are touching on any aspect of research and healthcare together. So we have a gap between research and healthcare and we would like to transfer utility across the two so we need to make that possible. And our little gap, our little car, our little doctor delivers his healthcare across that gap. So we're building a bridge here aren't we? When you build a bridge you don't ask the scientist, the metallurgist to go and build the bridge. You don't ask the lorry driver or the car driver here to build the bridge. You ask an engineer to build that bridge. And we really do not have these engineers, these knowledge engineers that we need. So the world is defaulting back to the best next thing which is just trying to mine all the information that's out there via Google. And I guarantee you 99% of doctors when you leave your appointment with them if they're not quite sure of what they've said they get on Google and they check out if they've given you the right time. You speak to any doctor over a pint and ask them. That is the world today, of course that's far from optimum. We need to build a tailored purpose built system for doing essentially that. So we've put that into a picture here just for trying to get our heads around all the issues, research at one end, healthcare at the other. And then this new concept, a new discipline called knowledge engineering in the middle and it forms then a chain between the two that links them together. And you can think about what information would flow, what would be the utility, what are the challenges and these are the kind of questions that the eye for health community has been working through. If you want to know more about that actually there is a publication just out recently, look up my name, look up knowledge engineering. It was in a journal Human Mutation, I'm sure you'll easily find it. So I'm going to go through a few slides now just to give you a flavour of some of the aspects of knowledge engineering for health. So you can think about it some more and reach your own conclusions. This slide is showing how things work today. We have lots of data, each of those coloured shapes is just a piece of information. And most of it is sitting in the research world, obviously that's where we generate it. And some data, there's a lot of data in the research world, sorry in the healthcare world, the clinical world, but not much of that is made available for researchers to tap into. But we are trying to pull some of that across, put it all in one place, in the mind of a researcher essentially and they're meant to go away and make sense of it all. Making sense of it all, knowledge generation, that's what we do. And then we publish a paper and we move on to something else. We haven't really helped with the overall healthcare delivery. We're suggesting that what you need to do instead or in addition to this is identify from that information C, the useful pieces of information that you do fully understand and you can use and do what you need to do to get them into the healthcare situation. So a rectangle, we know exactly what that is, it's a simple shape, very straightforward, fully understood, we can make things out of that. Or circles or whatever, you get the concept here. Although that's not a very useful car probably. There's also standards, we need to really work on standards. We can't have different silos of data using all different standards, nothing will be interoperable. And that needs to be true across healthcare and research whereas at the moment those two worlds are not working on the standards as much as they should and standards in many, many different areas as you see here. There's a question of well, who owns these data, especially the information that we want to flow from the patients. And we all know what happened a few months ago when Cameron said all this healthcare data on you all, we're all going to hand it over to the drug companies to help with research, part of that cycle, big backlash against that. There's difficult complicated issues to be worked out on the ethics of all of this. I mentioned electronic healthcare records earlier, they're not really happening in the UK. We all know about this big UK connecting for health program, many, many billions, the main goal of which was to put a common EHR platform across the country and they've given up on that. But without EHRs this system will never come to life properly. The countries are far, far less, it's not just going to come to life if we build it. So we have to think about clever ways to work around the data sharing problem and here's some ideas. Let's develop the right incentive reward systems, let's think of a middle category of risk, not highly risky or totally safe, but mildly risky. Because we can probably come up with very efficient technical ways to share those data that are sufficiently safe. Compulsion, sanctions, the ideas of researchers having IDs and I encourage you to look at the ORCID system, O-R-C-I-D, no H in the middle there. It's a system now that's coming online for every researcher having an unambiguous permanent ID by which they can interact on the internet. Then you unambiguously know who they are, what they're doing with the data, who's requesting the data and so on. Open data discovery, if we can't necessarily get to the data openly, maybe we can openly discover at least where it exists and then have follow-up routines for getting our hands on the data in appropriate ways. Then the idea of taking the questions to the data instead of the data to the questions becomes very relevant when the data are actually too big to move anyway. But that also overcomes many of these privacy concerns because you're not sharing the individual information on the patients, you're sharing the knowledge that exists across the data on many patients by taking the questions to the data. And there's a lot of work going on that front in Europe at the moment. Some of these systems are mining patient data, 10 million patient records in real time to look at adverse drug response reactions without any compromise to the patient's privacy. So there's things we can develop. These are the kind of concepts that all sit within knowledge engineering idea. Whenever anyone talks about healthcare and personalised medicine, they talk about models, computer models and some idea of a perfect computer representation of every individual. Well, you can look at this paper I mentioned earlier for more details on this, but essentially there's a whole lot of issues involved here. But you've got to think what is the end result here? Is it an end of one? One person? Specific about you, each individual? For that, where's the evidence base? Or is it actually about groups of individuals? Subgroups of individuals, where you'll have enough individuals similar around the world to create an evidence base to justify your predictions. So you can think about all of that. But lastly, coming back round now to big data, which I'm supposed to be talking about. A few thoughts on that. Well, the compulsory slide of lots of servers in a room. Here we go. That's mine. But the question's been brought up already today about some people, I think Rob first of all said, so it's becoming centralised, all of this information, so you can process it. Well, that's happening, yes, but only in certain contexts where it's needed to be done that way. But with healthcare, let's think about that. Can we really take all of these patient private data and move it into one institute, one big data centre? We think not. We think much more the federated approach is necessary. So we don't think you'll have a big data warehouse to solve this. Finally, what kind of architecture could sit behind this vision that I'm trying to put you to you today? Well, we thought that through, we and others, and I won't bore you with the details of this. Other than to say there's lots of different information types at the top, lots of different sources. At the moment it all goes on the web and it's a mess and Google allows you to search across all of that in one go. But we think the trick is to actually have a structure below that which organises and distills and filters and refines the relevant clinically useful information. Then you'll have integrators below that and eventually humans as well as computers using and adding to and refining that information. Because humans are very good at this. We're not talking about a computer system that replaces the doctor, the doctor, the patient, experience. These all have to be part of the system and those humans as well as the automated outputs need to feed back, as you see on the left here, to different levels in the system to optimise it. As I said at the start, an optimising, self-optimising positive feedback system. Just mentioning humans, I read the other day in fact, just as a kind of side comment here, that the human brain has the storage capacity of 2.5 petabytes. So my brain is better than the whole of the Sanger Institute, I'm quite pleased to know that. So if I boil this down to what we're really saying here, lots of different information we need to concentrate on how we process it. Let's not worry about how an individual organisation manages it internally, they'll get their heads around that and you've heard from the Sanger how they're making a great progress on that. What's going to come out of that is just those limited pieces of knowledge, not data, and it's those. We've got to work on how we identify those pieces of information, how we extract them and then how we use them downstream for optimised healthcare. So bouncing back to my first picture, let's think that through. Where are the big data going to sit? Well it's going to be in the research institutes and in the hospitals. They will get their heads around that, they're taking care of that. I'm not particularly worried about that. I think the real challenge actually comes down to how do we identify the knowledge from these data and how do we then process and use them both from research, both from healthcare data into clinical practice, but also from healthcare data back into research. So those are the real challenges as we see them, these two lines just say the same thing. The large data can sit largely where they are, it's really the knowledge extraction distillation filters that need to be created. That really is the nub of the matter as we see it for knowledge engineering. So just a final few policy and strategy thoughts on this. To kickstart this field of knowledge engineering, we need money, resources to go into research, development and application projects based on these knowledge engineering type concepts. We need to create the expertise. Both previous speakers have emphasised that the IT stuff will take care of itself one way or another, but we just don't have the people around with the right expertise to do this job. We need to create that expertise. We need to cross train people who have a natural talent for engineering. As I said, it's an engineering challenge, but we have to cross train them in computer science, bioscience, healthcare at the same time. We have to ensure interoperability of all of this at the end. I mentioned standards in the talk, but this also means we have to think about organising from a middle out rather than a top down or bottom up approach. Top down, some government decides we're going to build connecting for health. We know that doesn't work. A bottom up anarchic approach is obviously never going to work. We need a middle out approach where we organise the communities into identifying needs and evolving needs and building solutions across consortia, across countries that address those needs. We need to ensure innovation, of course, but we also need sustainability of these systems. Innovation tends to come from the academic world, sustainability from the commercial world, so we've got to get those two different worlds working together as well. Finally, we want to bring the system to life. It's back again to emphasising this filtration, distillation and provision from the source data. Finally, with luck, if we get it right, then that top picture can turn into this bottom picture, emphasising this middle component, this infrastructure, this big data, small data knowledge management system, tailored and built for the future healthcare. And really that's all about knowledge engineering. It's not about research. It's not about healthcare. In closing, I'd like to acknowledge some of the key people, my own team, they're obviously some of the particularly engaged and brilliant scientists that we're working with across Europe here. Lastly, just a quick mention of this new institute we're building up in Leicester. It's going to be called the Data to Knowledge for Practice Centre, which is going to be trying to pull all of these threads that I mentioned today together to concentrate on how these filters work, how we identify the useful information, how we can use it in clinical practice. This facility goes right from the research informatics, healthcare informatics, biobanking and clinical practice, focused on cardiovascular disease. So hopefully that building is underway now. It's very much advanced. We'll be moving in in the next few months and hopefully in the next few years we'll see if these ideas can be brought to fruition. Thank you very much. Any questions for Anthony? Nathan Smith from National Institute Medical Research. It's all very nice, very social, very old school, natural philosopher. You're at university, you're doing that research for the greater good of humankind. But how much of the money for your new research institute on this is coming from say like Big Pharma or all of the commercial companies out there that this kind of system I could see, the sharing of data, data coming from research, their funding going straight out into the clinical world being fed back in. How much have you got behind you for doing this? I think it's the right way to do it, but everything now is just seems to be run by big business and not by pure research. Well, I tried to make the case that more money needs to go into the knowledge engineering area, but I would say the European Commission for example has been very excited by these ideas and there's a whole new programme called Horizon 2020. I think you'll see these kind of ideas coming through there. There are six projects at the moment funded under a programme called FET in the European Commission, Future Emerging Technologies. These are six pilot projects in different areas, but one of them is in personalised medicine or these kind of domains. That's called IT FOM, IT Future for Medicine. Of those six, which are now coming to the end, all six will write a proposal for a scale up for the big scale effort and that will be a one billion euro project with the idea that maybe one third of that comes from academia, a third from industry, a third from other sources. Who's to know? Different countries contributing, but you're talking a very large scale organisation of these kind of activities. The project is called IT FOM, so at the moment it has a two out of six or one in three chance of getting funded. I think this is just a flavour. I should also perhaps also mention rare disease. A lot of these ideas can first be put into practice with rare diseases, Mendelian diseases. The genetic change is clear, it has a big effect, causes the disease, it's not a small little modifier. So there you have big effects clinically actionable and you could build an initial system, pilot these ideas around rare disease. And again, there's a major programme called ERDUC. That's a major consortium pushing for this and very serious money going into rare disease research with these kind of ideas in mind also. So I think over the next few years, say five to ten years, a lot of these things will get a lot of support both from industry and from academia. And I hope some of it becomes a reality. Any other final questions? OK, anything from Twitter? OK, can we just say thank you to Antoine?