 That's great. Good afternoon. If folks can take their seat, we will get started. Well, thanks everybody for coming to a continuation of this six-part series, just as a reminder. NHGRI, and in particular, our History of Genomics program is sponsoring a lecture series commemorating the 25th anniversary of the launch of the Human Genome Project. The schedule for that is shown here, and we are here. And after today, we'll be at the halfway point. And so far, it's been just wonderful with our first two sessions, and today will be no different, and we look forward to the final three that will be in March, April, and May. But today, the focus is on our guest all the way from the UK, Ewan Burney. Let me start with some biographical details of our good friend and colleague, Ewan, did his undergraduate training at Oxford University. When he was an undergraduate, mostly focusing on biochemistry. But his transition to bioinformatics was swift once he got to graduate school, where he did graduate studies at Cambridge University, earning a PhD while working under the mentorship of Dr. Richard Durbin at the Wellcome Trust Sanger Institute. Then his transition from being a graduate student to essentially being in the limelight of the Human Genome Project in the field of genomics was a very rapid one, and this will actually be, I anticipate, part of what Ewan talks about in his seminar. But needless to say, Ewan, and the reason we asked him to come speak today, truly represents one of those superstar researchers whose career was launched by the Human Genome Project. Ewan's many accomplishments include being one of the founders of the ensemble genome browser and other databases, playing a key role in many large-scale genomics projects, such as the Human Genome Project, but also NHGRI's ENCODE Project and a number of other important projects he's been involved with, but also playing a critical role in annotation of genome sequences that have been generated for organisms like human and mouse and chicken and several others. Meanwhile, he has a vibrant research program of his own that focuses on genomic algorithms and inter-individual differences in human and other species. But the other distinguishing thing to talk about with Ewan is his leadership. He is a natural leader. He has assumed various leadership roles in his still very early career because of his youth, but what's particularly remarkable is already his most recent appointment as a position of associate director of EMBL European Bioinformatics Institute where he shares strategic oversight of EBI services with his colleague, Ralph Oppweiler. And EMBL EBI hosts some of the world's most important collections of biological data, as I'm sure you know. The other reason we invited Ewan to come here today and participate in this is that he's just such a great friend and a colleague of NHGRI and has been for so many years and actually also to NIH. If I sort of started thinking back on the number of times we've asked Ewan to come join us at a workshop, to come lead a working group, to come serve on a study section, to come advise us about this, that in addition to being a grantee many times, I would run out of fingers actually pretty quickly. He's just always there to help us out, including just jumping on phone calls with us when asked. So he really is really part of our extended family and really one of the people that does a lot of important service work for us. So that's why he's here today and that's his biographical information. But I also wanted to share with you just some fun aspects of Ewan. And remember, this whole series is about being part of the historical program of genomics at NHGRI. So we have historians, we keep track of things at the Institute now and we're also really good at doing research about it. And so Ewan's going to be here and specifically tell us some things that are connecting him to the Human Genome Project and so forth. But there is some history here that's worth sharing and also even some recent history. So first of all, I dug out from our archives and old photos in particular and then found some on the web I wanted to share with you. It really gives you a little bit of insights into Ewan. This picture actually I had in my archives, this is right when he was about to get famous because this is just coming out of the Human Genome Project and the group that was instrumentally important in helping to stitch together the Human Genome Sequence, which I'm sure he's going to talk about. But Ewan even back then was sought after for stories and he certainly was one to get a lot of headlines written about him. Here was one talking about the genomics big talker. Here's the one talking about bring me your genomes. And so the press covers him. As I've said, he's come and done many things with us and sometimes when we've hosted him here for some major workshops and data jamborees and so forth, it's been fun to go watch you and blow off some steam. So a number of years ago was part of one of our workshops. This is Ewan at Dave and Buster's. He probably doesn't even remember this. You do remember this. That's good because I wasn't sure how many beers you had that night. But needless to say, he got into wanting to test every possible game that was at Dave and Buster's. And even wearing his famous geek T-shirt, which he was, he's very quick to tell everybody that he's just one of another number of geeks. So that's fun loving you. And the other thing I really like about you, he reminds me of me, is that you just can't talk without your hands, at least. And you and I share that in common. And boy, you go on the web and look for Ewan pictures. And boy, does it show that he is just like me. He cannot talk without using his hands. And it just plays out over and over again. So I really identify with him a lot. The other thing though, I hadn't, I mean, Ewan is a very fun, great sense of humor, jovial person. I was shocked that they're actually are imitating a thug photos of Ewan out there. I had no idea he ever looked that serious or mean, but apparently cameras don't lie. So for some reason he can sort of be a little bit intimidating, if you will. But I was even more shocked because I've seen Ewan dozens, if not hundreds of times. I have never seen him making a fashion statement, but apparently somebody got him to make a fashion statement. I barely recognized him with a tie, so. But most seriously, when I was doing my research, to me this was the prototypic picture because I think of Ewan as a prototypic thought leader and have a programmatic leader and here he is with I know two of his very good colleagues that he works closely with and has Janet Thornton and Ralph Oppweiler. And I just thought this sort of typified what I think of Ewan as a leader and as someone who's a great colleague. So with that in mind, I will turn this over. This is the title we thought he was gonna use. But the power of PowerPoint is that you can quickly change titles on the fly. And from what I understand, I think Ewan actually did that in his talk. So I will turn this over to Ewan now, thank you. Great. I'm gonna be very self-conscious now about my hands. It's gonna be tough. I've also got this kind of laser beam. So we're bringing NHGRI into the 21st century and just a moment ago we decided on the hashtag for this talk series as well. So we now have a hashtag. So do tweet if you want to as well. Now, Eric gave me a brief. He said, Ewan, I don't come with your normal talk was the brief. I kind of want history and personal stuff and all of that. And that made me really think. So the slides actually aren't quite as polished perhaps as I would like them to be, but they are very much for this audience. So we'll go through them and you'll see there's a theme. So I also decided to do my life a little bit in pictures. So this is me not quite at the start of 1996, but all the way to me now in that picture with the tie. One picture with the tie. And there's sort of three layers I think to my scientific career. And interesting enough that it doesn't start with genomes. It starts with data resources. It starts with delivering things to people through bioinformatics services. And that's probably the strongest theme to my life. Ending up now, I'm director of the end board, not associate director of the end board, leading to looking after strategically EBI. And I'll come back to this later on in the talk. But the other part of my life is writing methods and algorithms to analyze these data sets. And that's an interesting world. I'm not actually a computer scientist or statistician. Some people think I am. I'm a biochemist who taught myself how to program and then taught myself in our statistics to fake it in front of some mainly biologists, but the occasional statistician. And these different methods are methods, some of the methods that I've produced, usually with just one or two other people. But then as Eric mentioned, I have been right in the engine room of a number of these projects. And it was very, very young when I was involved in the Human Genome Project. It absolutely accelerated my career. I basically missed a postdoc or a long postdoc. I went straight from being a student to a PI. But all of these projects, and there are many more I could put up here, have had that same feature of doing huge amounts of analysis. Really, again, for me, actually not for your nature papers, but to go back to these, to delivering data sets that other people use. So the rationale for me getting into the involved in the genome project is not really my name is on papers. It's so that the stuff that I have made gets used by researchers around the world. And I did a little bit of kind of, I mean, this was a bit shocking to me. Who is this amazing person? So how did I have the time to do this? I've published over 210 papers. I have an age index of 92, 15 papers, I've got over 1,000 citations. I've got 30 pieces of software which I can find on my hard drives that I have released to the world. A whole bunch of those are still in use today, chunking away in the middle of things. I've founded and made happen in a couple of widely used data resources and I've been elected to these august bodies. In particular, as a Brit, I'm incredibly honoured still to be a fellow of the Royal Society. So I wanted to think about that. This I am amazed. I stand here amazed. I don't believe I could have really done this in this time. And so how on earth did this happen? So the first thing to realise is that I was lucky. I'm going to come back to my luck. There's all sorts of aspects of luck that one has in life. And some of these are luck that is sort of a role of the genetic dice or role of where you were born in society of all sorts of different things. I also did sort of time things almost perfectly. I taught myself to programme before it was really cool to be a biologist and programming. And then I went and did stuff at the Sangra Institute with the Human Genome Project. I was at the right time at the right place. Now to my credit, you can't really take credit for luck. I think the only thing you can take credit is seizing your luck and using it and using those opportunities. But actually the other thing about this is that I am not really the person responsible for all of those 210 papers or those citations and everything else. I've always done this in a context of people around me. And it's very weird scientifically, I think, to think about individuals. We're really, I don't think science works as individuals, despite the fact that there's a lot of focus on individuals in all sorts of different areas. It really is teams of people in different configurations. And my other part of my luck very early on was the series of mentors I've had. This is sort of early. I'm picking out Adrian Crainer at Colesbury Harbour. This was before I went to university. I had the chance to work at Colesbury Harbour for a year, what Brits would call a gap year. And Adrian Crainer wrote two papers with me, one when I was a high school student. Now thinking about that now, that's quite a risk if you think about it, to be a supervisor and say, someone with sort of no qualifications, I'm gonna co-author a paper with this person. In Campbell at Oxford, he's an NMR, a spectroscopist, and he let me be totally independent in my fourth year. Again, he could have turned around and said, no, no, no, no, no, please do this project, do this. I said, I kind of want to fool around on computers. Will you let me do that? He said, that's fine. Richard has been a really long time collaborator. He's my PhD supervisor, and then my collaborator in ensemble, and we still work together, we have coffee together, we have dinner about once every couple of months. He's a very, very now close friend of mine, but also someone who gave me a lot of freedom during my PhD. And then when I went to Emberley BI, Graham Cameron and Fotos-Cathartos took a big risk on me to hire me as a PI. Actually, a month before, I had sat my Viva for a PhD, and I remember my original contract came with this letter from Fotos-Cathartos, who's the director general, basically said, I am now breaking a rule to recommend that you should be hired at Emberley, and just to make it clear, if you don't submit your PhD in the next month, then you're immediately fired from this job. So I had this sort of, a lot of people who took a risk about supporting my future, and I've had other people who, as I've kind of grown up, there've been some really key people, and perhaps foremost amongst them is Janet Thornton, as Eric mentioned. Now, I don't know how many of you know Janet. She comes from the structural biology community, and in bioinformatics, there are two kind of wellsprings of science that lead to bioinformatics, and one is structural biology, and the other is genomics. So she comes kind of from the other side of bioinformatics. Janet is charming, lovely, looks like she wouldn't have had a fly, very nice, but actually has this kind of inner backbone of steel. She knows exactly where she wants to go, and she can either be nice to you, or she can be nasty to you as long as we go in the right direction. And in many ways, she has told me, taught me a lot about relaxing in, as you get older and you have more influence, in many ways you have to give more space to the people around you for them to grow as well. And so that business of stepping back as you go, as you have more influence is a really interesting process, and Janet absolutely showed me the way there. And two other senior people have had a very big influence on me. So one is Francis Collins, and Francis met me when I was a young kid, and I was all crazy through the genome projects, and there was a point in particular around the ENCODE project, where Francis again, I think, took a risk in saying that a computational biologist should try and cat-herd the analysis of all these transcription, chromatin, and all these other things that I didn't really know that much about. And he backed me then and through some rocky times as bits of this developed. And then the other one is Ian Matai, he's director general of EMBL, and is still a great mentor to me now, and again had to support me when I was being promoted very young, basically. But it's not just the people above me, I have a huge, and I'm, you know, every, all of those 210 papers were done with people. And I'm just very, very lucky about the peers I've had. And this is a list of some of them, and it's definitely nowhere near the full set. And these people come and go, I find, you sort of have a relationship sometimes with someone for four or five years, but sometimes they come back as well. And so, for example, a great example there would be Jason, who I worked with when he was very young, when he was an undergraduate doing Biopurl, and now he's into Fungi, and I dabble a little bit and Fungi with him. They go from very big projects, like the fact that the EBI directorship is shared between me and Rolf, and we are joint directors, and have worked together incredibly closely over the last seven years. My wife describes him as my work husband, which is probably sums up the whole relationship in one. But there's a big list of people, including, you know, more recently, two really gifted clinician scientists who I now work very, very closely with, Nazneen Rahman and Stuart Cook, and I'll be talking about one of the projects with Stuart in a moment. And it's not only your peers, I've just been gifted with this huge set of wonderful students, post-docs, and other people I've worked with, and I really can't list them all. It's quite interesting, because sometimes there are very, very distinct people who you've done one particular thing with, and you know that if you weren't with that person, this thing would not have happened. And so a great example there is Daniel Sabino with Velvet, for example. But there's also sort of teams of people that make things happen. It's going to be a theme to me again. There's a big list here at the start of Ensemble of lots of different people, and we all had to pull incredibly hard to make things work. So let's have a look at this. This is an impressive CV, but it's really not me. It is this big collective of people in which I've been allowed to be part of. And I know I get, in many ways, a disproportionate, I think, recognition for this. That's partly because I like talking. It's partly because I'm happy to engage with people, so I end up at the front rather than the back, and there's a gift and a curse of that kind of process. But it's sort of, there's something slightly crazy about focusing on the individual. I've been lucky enough to work with this big group of people in very complex ways to deliver an awful lot of things. So that was my first thought, that science is a sort of human tribal experience. The other thing about science is that it's open, and I, some people don't get this, and it confuses me all the time. I'm part of this wonderful society, the Royal Society. Now, very unfortunately, the French beat the Royal Society by three months on this, so this has always hurt the Brit in me. But in the Renaissance, there was a really important moment, both in France and in England, and it was a shift of the transfer of knowledge between master and apprentice, between alchemists, being secret, being something that was closed between small groups of people, to something that was open and published, and anybody could access, see, and assess. And the motto of the Royal Society is Nullius in verbia, which is basically on the word of no one, and it means without anybody's authority. I don't take other people's authority to decide what is true or not, and that comes down to show me your data, show me your argument, show me your data. I will make the decision about whether I believe you. And that process of moving away from authority and more towards objective truth got kind of codified in these two things, which are the first scientific publications. The first one from France, now thankfully, there was a French Revolution, so it's not been continuously published since 1665. So the British can claim, the Royal Society can claim the oldest continuously published scientific journal, which is the Philosophical Transactions. And I've always been open. I've been brought up open. I can't do it any other way, but I get confused when people don't do it this way. This is sort of this upbringing from molecular biology, deposit in your data in ENA or GenBank, deposit your structures in PDB. This is the rule of the game that I was brought up as a scientist to behave with. There's actually another theme to this. In the 1990s, if you're a computer programmer, this was the launch the internet gave rise to the open source movement. Now, most of the internet is based on that. Most of the things we do, your Apple operating system, the core of all the bits and bobs that keep your email ticking over, is all open source. We all use it all the time. So it's another theme of being open to this. And then genomics went a step further, what you might call aggressive openness, sometimes for conceptual reasons, but sometimes also for a very pragmatic reason, which is if you were going to concentrate an awful lot of money to a small number of labs, then you really wanted to see that being used by a large number of people. And that logic stands the test of time today. And I get confused about people who do science in a closed way. And there were some other science disciplines away from molecular biology, which do that I had an incredibly interesting discussion with an oceanographer. And an oceanographer said, it is impossible to share data globally. I said, what do you mean it's impossible? I said, all the cultural norms, all the credit, all the ability to do this, how can you ever construct a system where you can share data globally? And I said, well, in molecular biology, we have been doing it since 1972. So we're a case in point on the opposite side of this. So it is sort of a choice by a group of scientists about what the rules of the game of openness are. But I believe quite strongly that it's kind of crazy to be in a closed world. And it's particularly crazy in the 21st century. And the reason why is that data generation has become so cheap that really holding things on to, holding data on to, on the belief that that's giving you an edge is a misguided view. Your science is going to be driven much more by these three things at the bottom here. Asking good questions, designing good experiments or the good analysis approaches and then executing good analysis. It's not going to be about can you generate the data and truly speaking it's not going to be can you get access to the resource. So I think the successful labs in the future more and more are going to be the ones that understand not what data do I have but what analysis can I do. And that's going to be the driving thing in the future. This comes on to the next point which science is very much a team sport. We're all geeks. We're either dry geeks like me or you're wet geeks lab people. We all like fiddling around, fooling about, doing things, understanding things by experiment and manipulation. But these days you need all of these different pieces to come together to do a successful piece of work in modern big data biology if you'd like to put it but not even big data, pretty mediocre sized data you still need all of these things. Every lab very often needs all of these components. It goes from very kind of wet geeky stuff, animal husbandry, I have a tremendous amount of respect for the people who know how to keep different species alive in the lab in all sorts of weird and wonderful ways. All the way through to pretty hardcore bits of maths and statistical theory. There are things in the middle which are really important but nobody wants to do and can easily get forgotten about. So data cleaning would be the classic one. Someone has to do it. It's like the gene stables of every analysis is you have to sort of sift through your data decide which things you're going to discard, did they get their replicant structure right, which seems to thrive and stuff like that. All the dirty stuff that you don't even show in your supplementary material because it's just too embarrassing to even get to. But such is life. And there's a kind of view, some people have a view that this should be done as a sort of contractual process that obviously one lab can't do this so the right way to do is to sort of almost like a shell company of a PI and one or two people who kind of contract out their husbandry to one place, their data generation to another place, their statistical analysis and database into another one and it all comes together in that central shell company. And this really just doesn't work and the reason why it doesn't work is because you're innovating and doing clever things all the way through and even if you were trying to take a company like view of this, a company would have a whole bunch of risk-sharing joint agreements about this and a lot of porous walls between all of these components. So instead what I think you need is what I would call a sort of team science thing. This is trying to show areas of expertise of different people in a project and the triangle is the center of this person and what they are good at doing. And the two little arrows I've put here is where they can ask sensible questions. So this is actually a real example that I thought through. Hannah can ask sensible questions pretty much all the way to the edge of clinical samples. Antonio can ask some pretty good things about practical statistics and understand clinical stuff. Paolo is rooted in statistical theory and can just about ask sensible questions about pipelines and data cleaning but then it's a complete loss if you ask to put them in front of a clinician about where the data comes from. And the PIs here in this case there were two of us myself and Stuart and we have to stretch and be able to ask questions. We've got to be able to go from the left-hand side question-wise to the right-hand side and I shouldn't have put us as diamonds because of course we were absolutely hopeless. If we were put here everything would go horribly wrong but we pretend that we know centered at this point here in these things. There should be some other shape for me and Stuart. So this is what you need to do. You will not get one diamond that will stretch the whole distance of this screen. It won't happen. You have to have components and a team of people here to do this and often a team of PIs to cover the whole thing. So I really want to give you this as an example now. So this is an example piece of high-dimensional analysis going back to this. The two absolutely key people is Hannah and Antonio, Paolo and Katie. Katie was helping on the clinical end. Paolo's helping at the statistical end. And it's a collaboration with Stuart Cook who's a cardiologist at the Royal Brompton, Declan O'Regan and Oliver Stegel who is the statistician here. Statistics group. And this is a classic genotype to phenotype paradigm and we've got all too good at going from genotype to phenotype. Open a Nature Genetics programme and you'll see many, many papers about disease level GWAS. But of course it's not the case that this variant here somehow directly makes you ill. It's not like the SNP somehow makes your heart go wrong directly. It must go through some components. And these components one can measure. So there are molecular components and we've become really quite good at measuring these in the context of variation. So for example when you measure RNA in the context of human variation or species variation it's EQTLs. But you can also do that with chip-seq. Some people are doing it with high C. That's interesting. But we wanted to move to a different level. It's not that these molecules themselves make people ill. They themselves must act through some other things. And so we wanted to go to organ level phenotypes and to think about taking a phenotypic view of human organ and physiology. So basically what we're doing here is an imaging genetics project. I will not bore you with the slightly sorry story of imaging genetics. It has finally got good with the enigma consortium and some of the neurogenetics. They are the best people at this. But there are some awful, awful, awful bits of paper which any geneticist will wince at where horrible candidate gene studies were done in all sorts of different ways. And what no one has done is really had an unsupervised way of looking at these images. They've always... The successful ones have always taken a I know this region of the whatever and I know how to measure it. I know the hypothalamus. And so I'm going to now measure it. Actually it's often sort of by hand. Someone clicks on voxels around there. So we are going to do this on heart. 1,500 healthy volunteers. We get a high dimensional cardiac phenotype. We get genotypes. These days the imputation is so good that you really... I mean it's quite amazing. You really have to justify sequencing in a completely new way with imputation for research. Imputation is shockingly good now at this. And so for the people who are thinking about setting up a sequencing project before you do that, think about spending $50 on your genotyping array and hiring a really good bioinformatician to do the imputation. So this is the phenotype measure we're using and this is the old school way of measuring hearts with MRI scans and my collaborators, in particular Declan and Reagan, are very proud of this new school way here. So the old school way would have multiple linear planes going through the heart very often done on more than one breath hold giving a reconstruction of this of the left ventricle here with this sort of resolution. The more recent one actually shifts the magnetic fields in a kind of twisting movement on a single breath hold and you can see here a far better resolution of the left ventricle and just for you to know, the green area here is the myocardium so the muscle wall and the red area here is the blood filling up the left ventricle and unsurprisingly, cardiologists are obsessed with the left ventricle of the heart, that's what's doing the pumping. So we're going to take all of these different cardiac images and of course, if we want to do this with the phenotyping measure, what you can't do is just assume that voxel 1,1,1 is the same on all people. So you can't just take images and sort of pretend that they're the same. You have to map them to some reference. And here, Antonio DiMarvo was key in adapting what the neurologists use for image analysis of brains for the left ventricle where you make an atlas of hearts. This here, the green here is the left ventricle muscle. The yellow is actually the right ventricle which until I'm used to looking at this it always looks kind of weirdly big but it's kind of big and floppy whereas the left ventricle is small and muscular and pumps through. So you make 30 of these pictures by hand, by manual labelling and then you can automate the process of mapping other hearts into this idealised space. Now, it doesn't work all the time and so Antonio gets a glass of wine looks at all the results decides which ones aren't working manually labels some more runs it again gets another glass of wine takes about two, three weeks of quite concerted effort to get to a point where you've got a full set of things. So now we can get all the people on to the same coordinate system where we have 27,000 points around the left ventricle and at those 27,000 points we can measure the thickness of this green line. For example, that's just one of many phenotypes. Now before we do anything else this is a kind of interesting business of experimental design. I sat down with Declan and Stuart and I said well, what we've got to do is we've got to do the same person multiple times but on very much different sessions because if we can't see individual differences there's absolutely no way we're going to see genetic differences. And Declan said are you sure do you know how scary how annoying it is to go into an MRI machine eight times and I said but I want this for my statistics and so a very brave person called Declan O'Regan went into the MRI machine eight times over three months so that we could get this kind of variance and he's promised me that he never wants to go into an MRI machine ever again after all of that. And what we did get though this is the variance over the mean across the heart here and the most important thing is most of this is yellow, there's a little bit of variance at the top here where maybe the model doesn't quite work and there's a tiny bit of variance at the bottom again where it doesn't work so well but in general we get a very low amount of variance between of an individual between different sessions. And then we could also take these 27,000 dimensions and do the world's dumbest piece of statistics from 1970 which is do principal components and if you take the first principal component of this measurement and you look across all these people it correlates with weight and this makes cardiologists happy it's one of the most well understood things bigger people have more blood more blood means the left ventricle is thicker it's an incredibly well established thing so this got a big tick on this. So we're now confident that this measurement is a good measurement but we have a problem so if we go even to this smaller coordinate space of 27,000 coordinates and this is roughly speaking the number of SNPs that we would test we would be doing this number of statistical tests that means that either we have to make have about a 10,000 or 20,000 plus in cohort which we don't have or we have to make epic assumptions about the effect size of genetic variance for this so epic that one would expect that one didn't need a statistician to see that there were different people wandering around with different thickness of hearts so we were sort of stumped well we knew that this wasn't going to work out so what do we need to do we needed to create a smaller dimensional space that we could be confident was capturing what was going on in the heart and here Hannah tried all sorts of different things but we actually settled on something from Oliver Stegel's group who thankfully works at EBI where we call PIR which is a latent factor modelling system with a very strong Bayesian flavour now if that sounds pretty cool and sexy that's fine you too can say these words if you want to and draw this diagram I don't actually know how the Bayesian magic works in there I just know that it works but let me just tell you how PIR is set up you have a phenotype here this is 1500 people by 27,000 dimensions and you say that this is broken up into known factors like weight or sex and age hidden factors and then residuals now PIR has been around for a long time and has used a lot in EQTL studies and in EQTL studies you hope that the hidden factors are missing batch problems or weird things going on in your lab and the residuals you hope is your cleaned up signal but we actually wanted to use it in the opposite way where we wanted the signal, the genetic signal to go into these hidden factors and then into the residuals this is it pictorially so one nice thing about PIR well one night when we ran this gave us a lot of reassurance so PIR doesn't know anything about the three dimensional shape of this heart and what I'm plotting here are the first four PIR factors and you can see it's kind of the red and blue are meant to be the opposite sides of a sort of variational mode in this picture here some people have thicker hearts on this side of the heart or this side of the ventricle and thinner on this side and other people are the other way around they're thicker here and thinner here so these are sort of variational modes of the heart that we're seeing across these 1500 people PIR knows nothing of the three dimensional space the red and the blue could be adjacent to each other in terms of the mathematics we plot them they're not they're forming these nice shapes on the heart and so that was reassuring so now we can feel like we've got a smaller dimensional space and so we then put this into what's now the kind of modern way of doing GWAS with this mixed model the better model of noise and a kinship matrix and when Hannah first did this plot I was like we are in business Hannah so this is a Manhattan plot if you pick up your nature copy of nature genetics you will see this on many many pages the x-axis is the position of the snip and the y-axis is minus log 10 of the p-value and sort of statistical law mythology and petardonnally has has divined that 5 times 10 to the minus 8 is a good genome-wide significance level that captures the multiple testing that's going on do not really ask anybody to justify that number but we're all super comfortable with 5 times 10 to the minus 8 but we of course did a hundred different we did a hundred factors and tested a hundred different dimensions so we needed to penalize ourselves by another factor of a hundred and so it was these points here coming up above 5 times 10 to the minus 10 which made me just very very happy my belief is this is the first time that an unsupervised approach on imaging genetics has worked I don't believe anybody else has done this and that is all credit to Hannah and Antonia so let me just pick up one of those snips this one snip here again for the statistical geneticists in the room you'll be seeing this very nice QQ plot you should always plot your QQ plots and the QQ plots the expected distribution of p-values on one axis and the observed on the other you want them to be fitting the X equals Y line at the start that means the test is well behaved then you want a nice healthy kick at the end that means there's interesting stuff and so this is a good QQ plot there are other QQ plots which are bad QQ plots and they go straight up at the start and we don't show them in general so this is the modal sort of shape for this snip here so it's thick or thinner here and I first asked Hannah to do a very simple thing which was actually to forget about the complicated statistics and remember this is a snip in only two states so we're only three values hemizegous, heterozygous, hemizegous and we have one measurement wall thickness so we can fit an incredibly simple linear model an intercept a beta value, a slope on the snip that's the snip X and then an error and this is just plotting now the r squared of this linear model across all 27,000 dimensions signed and multiplied by the sign of the beta we can get very thick and thin if the snip is going in two directions and this is with no correction no age correction no sex correction no height correction, no weight correction nothing, absolutely raw data so at this point I was very very happy there wasn't some sort of weird statistical mirage that led us into this and then because of the way peer works we could break that signal down into the original single factor that led us to the snip this shape is basically identical to this shape all the other 99 factors summed together and then the residuals and the really nice thing is there's no signal in the residuals in other words peer has put all of that genetic signal into these hidden factors but notice that this shape which is the raw correlation obviously has a big overlap with the data here but it doesn't have the same shape at the top and indeed when you look at this it's almost as if there's something else which is counter balancing this factor is that red bits kind of removing that blue bit giving you the sum over here that says that this dimensionality reduction is not quite perfect or is not perfect, it's not bang on the biology and Hannah is now doing maths which is beyond the ability to understand to work out how to improve that dimensionality projection so by shifting around the dimensionality to get this to be more sensitive in discovering bits of biology now actually this isn't my favourite locus, I got slightly the wrong locus I'll move on to the right locus but we do have nice they lie under genes they're quite nice the snip hits and we had that moment we were talking with Stuart one foot is in Singapore and another foot is in London which makes talking to Stuart interesting so we were on the phone discussing this and I was saying the real thing we should be doing can we persuade someone to do a mouse knockout oh how exhausting this is going to be which locus should we do what about this, what about that and then we were reading around the areas and thankfully oh I don't have the actual someone in Wisconsin had done the experiment we wanted to do in 2011 and so this is a picture from her paper of a Jarod II knockout in mouse, Jarod II is a histone modification enzyme and indeed in a conditional knockout specific to the heart they see a very specific heart phenotype and the interesting thing of that hot phenotype in this mouse knockout is we could then look at a similar phenotype in humans and there is this non-compacted layer of the heart I've learned an awful lot of heart morphology over the last two years and so this here is the right ventricle, this is the left ventricle which is thicker and at the surface of the heart you can't expose the blood directly to the muscle wall there'd be a lot of hydrostatic a lot of shearing force going through onto the muscle the way the biology handles that is there's a kind of soft spongier layer of tissue that holds some blood liquid close to the wall and produces a much smoother surface of blood to be pressed against and that spongier area is called the non-compacted layer and it's picked out here by these blue arrows these are the homozygous minors for this case and perhaps you can convince yourself that these are thicker than these which are matched majors a match for age and sex to these minors you can do that by eye but we can also do that with pieces statistics after doing the measurement and we're working up other phenotype measures for this but this is already very clear cut that this will work out so here we've gone from MRI scans through to GWAS we were very lucky that one of our GWAS hits already had a mouse model with a knockout and a phenotype we looked at the phenotype that they had studied brought that phenotype back to our human data set and then showed that that phenotype a different phenotype from what we measured in the GWAS was also significant and I personally I think this is ticked now this has a big fat tick that says that this is correct but this was team science I couldn't have done this on my own Stuart couldn't have done this on his own Hannah couldn't have done this just in my lab Antonio couldn't have done it just in Stuart's lab an awful lot of people so this is a picture of Hannah I don't have one of Antonio it required all sorts of webs of people around to do this it was also embedded inside of the, believe me cardiologists do not like healthy people going into MRI scans MRI scanners are for diseased people so you have to have quite a long extensive conversation to even persuade them that 1500 healthy people going into an MRI scanner is a good idea and so that is Declan and Stuart and we actually we use one MRI scanner for about a year and a half that was the amount of scanning time needed to generate this data set so that's about science sort of at a small scale requiring teams but at an even bigger scale we need more than teams and we need more infrastructure we I think life science is the last science to go through this pretty much all the other sciences high-energy physics, astronomy oceanography, climate all these other things have realised that one has to build on top of an infrastructure and life science were taking these very slow steps towards this now I used infrastructure all the time coming here yesterday I used this infrastructure I've been using that infrastructure on the internet, I used Amtrak it's a great piece of infrastructure to get from New York to DC this morning and I'm using infrastructures like electricity here today I did not say to Chris could you just double check on the Washington DC power generators for today that there's going to be electricity on the NIH campus because you know if you know if Washington DC isn't going to have electricity my talk's not going to work I'll stay in New York and have a much nicer time I made an assumption that there's going to be electricity Chris did, you all did here nobody worried about whether there's electricity or not these are infrastructures and infrastructures you only really notice them when they fail so this is Heathrow in the snow it is a poorly designed infrastructure for the snow if anybody knows Heathrow there's a very small dusting of snow creates this ridiculous two week scheduling disaster after a small dusting of snow I'll contrast that to Helsinki airport which will keep open all the way through the Finnish winter all sorts of things can go wrong but we only really notice them when they go wrong and in biology we need infrastructures and this is one infrastructure the storage of DNA sequence over time it's not very exciting I don't think anybody says to themselves you know my gosh I'm going to get a Nobel Prize for storing an awful lot of DNA sequence that is not the motivation for this team they are not going to get papers in nature they are not going to get plaudits what does this and delivers this does it knowing that they enable a huge amount of science beyond this and in Europe we have started to build a data infrastructure for the life science data through this system called elixir and in many ways this is building out from the EBI we at the EBI realize that we will not scale over the next decade in fact that's not a technical problem it's a social and scientific problem we will not be able to get all the scientists we need cited at the EBI to deliver the infrastructure for the future and I know NIH is going to the same thing with big data to knowledge and then in the future and I think it's just really important that as life scientists we understand the importance of these infrastructures it's kind of interesting running infrastructures because you tickle a different part of government and we had to write a report on the value and impact of our of the EBI and do it with a very hard-nosed piece of economics very simple is this value is this infrastructure value for money it's really a really straight forward question in many ways now sometimes it's very easy you can assess this very easily for some things but many things actually it's quite interesting how you assess infrastructure for many bits of infrastructure the benefits are very distributed so the costs you can count but the benefits go in a distributed way out that includes all sorts of things like transport links and everything else spillover benefits and stuff like that and so at the EBI we commissioned this report that was run independently by this economist beggary group and actually we had to persuade them so these are conservative numbers they didn't believe the first set of numbers and we were like no no no no you know there really are that many life science researchers around the world and that's the assumption that we just save one hour every week and they're like well but that number is big and I'm like well that's the way it works out isn't it so our conservative strategy is that we have an impact of 1 to 5 billion pounds per year we've got the 20 fold higher it's 20 fold rate than our operating costs now the UK government feels an infrastructure is worth money when it's 1.5x alright just to give a sense of that 20 fold this is like a you know you've won me at this argument stage a remarkable way of achieving efficiency around the life science community now I have to admit this is a worldwide number we have to produce a UK centric number because we get quite a lot of our money from the UK but even when we do that we're good value for money just for the UK but we're doing this across the planet now I want to return this is my final set of slides I want to return to that first thing about luck and some people might have noticed those two things that I put in at the top I am a rather boring Caucasian I have been 23 to me and I was hoping for some exciting piece of genomic ancestry but I'm pretty much bang slap in the middle of France and England in terms of my ancestry I have a touch very small touch of South Asian ancestry and obviously I think some of you have genotyped me visually and you are correct I am XY and there is something wrong still I think I mean as a society both here and in the UK we are still feeling our way forwards about not having this as luck just as the same thing as hair colour the same thing as randomness not being an attribute that I should list here under the things that I'm lucky about because I'm an XY genotype and most of my ancestry is European and I think it is really unfortunate the amount of talent we are wasting we are wasting talent on the XX genotype we are level pegging on XX genotypes up to postdocs and then something very radical happens just when you look at the bulk statistics something goes wrong about that this is UK data about different ethnic groups it's always interesting actually when you are a geneticist now you start distrusting people's self reported ethnicity but obviously the people here they as a way of describing genomic ancestry but the ethnicity that people feel they have is obviously something that is just personal to them and the really interesting thing that is a phenomenon that happens in the UK and I think it might well happen over here as well is that there is actually quite a lot of people from non-white ethnic groups going into science subjects but they are not becoming scientists they are becoming medics in general they are becoming medics sometimes lawyers as well and I don't think that is a bad thing at all I just think that there is a lot of talent but medicine doesn't need talent it needs a lot of talent I am sure but there is talent that we are missing attracting into science there as well and there is a particular problem at the leadership level now this I have described for me as luck but I am now in the position where I am and I have got to be part of the solution and help set up the solution for the future and actually the easiest thing is diagnosing this the easiest thing is talking about it the harder thing, the far harder thing is setting up structures that will actually change the way we support women support people with different ethnic, different roots into science, different ethnic diversity to really fulfil the same kind of potential that I have clearly had the opportunity to have I don't think it is easy I don't think it is I think it will take time it requires I think myself a very data driven approach you have to be careful about the data and think about this but it is something that I hope lots of people can work together on in particular the people in higher parts of organisations to change how we do this so with that I would like to end and I would like to thank like I said I am just one person in this big tribal process of making science this is a big part of the tribe that I am part of Emberle B.I. there are 600 people there and thank you very much for listening Thank you you and people if they would come to microphones we can certainly we have time for questions I will start so you and I really appreciate it and like not the triangles the diamonds with the extensions and maybe this almost relates to your last topic but what do you tell young trainees who you know at one point may resonate with being one of those diamonds and sort of seeing their place in that progression but also immediately recognizing that to be successful scientifically they need to be part of that team to reach out to others and yet the traditional by which we assess people's success goes against the team science approach so I mean what's the advice you give them you talk to some of our trainees or today I'm sure this might have come up how do we recognize the team science reality with the incentives to get recognized be promoted get grants get papers so I think we've got to get I mean I think the problem I think it is an interesting problem it's actually an interesting problem I hate to say it you know biopheticians are just in demand so what I actually say to my trainees is don't worry you've got a job when you finish no matter what as always you're not stupid and you know so for biopheticians and computational biologists where there's so much in demand that that problem of shaping your career is much more I think one of personal choice and I spend a lot of time encouraging students I think in a PhD in particular that is the time where you can play around with where your centre is and explore the different places and I think that's very very good for people to do that but I do think we have a problem in assessment I do think we have a problem in assessment I notice it there's a sort of very interesting dynamic very very often on papers where there are joint papers there's an agreement that the experimentalist goes first and the computational other author goes second as joints first author I'm totally comfortable with that because the computational biologist is definitely going to get a job I mean I'm really sure of it the experimentalist is in this slightly more aggressive you know there's not enough experimental slots really and I don't think we assess things right I the thing that I think has been a good shift is this idea of doing research from research, meta research and actually thinking about these problems in a data driven way I think we do a lot too much gut feel of oh well this is how I think now and when I'm doing this we need actual experiments to drive these things it's just the same thing that happens to sport people did a lot of this gut feel when you were managing professional sports are not managed by gut feel anymore they're managed with a very strong stats kind of basis around teams so it's not like we can't do this we've just got to we've got to put it in the middle so let me just press you on the one issue though it's one thing to get a job which you biopharmaceuticals always can but in particular you showed some data to talk about gender fall with respect to leadership getting a job is one thing but progressing, being promoted to leadership roles that's compromised if people aren't being recognized but how do you reconcile being recognized while being in a team environment yeah so I think that is true but again I think the I think the drop-off I'm not sure that the drop-off in the way in exact genotypes in women it's all about the fact that women like to be team players and men are kind of get out there but what I am very keen to do is try to create an environment both in my own research team and in my in the broader in EBI to let people have space and encourage them to step into that space so that they can get comfortable before they have to get comfortable kind of acting up before they have to act up in front of lots of other people and I guess I think that's normal mentorship at some level but really trying to concentrate doing that for everybody in the lab and everybody in the in the institute yeah but I would love more data on this I just don't think we have enough data on it Hi, thank you very much for very interesting talk my question is relates to the aspect that you touched about needing infrastructure for biomedical science as it compares to other fields for example high energy physics or oceanography outside so the question that I have is compared to those fields where they have data level concept levels of data from the time let's say for example a satellite data that they receive and they go to multiple levels of processing before that data is converted to or a bit as good as knowledge which is scales very small do you see that kind of approach working for also biomedical science and if so could you give some insight to it I think that's what I mean so we spend a lot of time chatting to our colleagues in CERN CERN is a sort of we're kind of the poor sister of CERN in a European treat CERN is a big daddy science European organisation we are set up in a similar way but with the molecular biology focus but we're quite close with the guys who do stuff in CERN and in fact there are more parallels than you think about the way we currently process data in particular things like cancer genomics data with the way CERN does it and a very similar mindset when you get down to the guts of it I think there is something slightly different about biology that it's heterogeneous than you think it is when you start talking to physicists it's a bit more interesting so I think we've got lots more to learn than to make big plays with the difference one difference that really is there though is that in high energy physics you have a very small number of data generation sites in molecular but in life science you have thousands thousands of data generation sites and that is very different that's very clearly very different and we have to have a slightly different infrastructure in bringing data sets together but I wouldn't be too negative about that we actually as a community have a good track record of this the thing that I want to make sure we internalise is that we have to make the justification for these infrastructures and we have to understand how those infrastructures are run and delivered for us otherwise our science and for me it's not an option these infrastructures must exist one last question over here hi Ewan thanks for giving what was very personal talk and a human element to it because that's not what we're encouraged to do a lot of the time as scientists so it's very refreshing but pilling on your... that worries me well you shouldn't be but you started with the theme that science is human and what flows from that in a way as scientists are humans and Medawar and even John Cleese have sort of developed the idea that actually being basically human is a lot of scientific characteristic to it and I think this connects to your other two themes about team science and about openness and I don't know this is ending up phrasing the question rather provocatively but it's the split in a way between the science and humanities which with the education I had to go to which is probably not that dissimilar from yours focused on biology, physics and chemistry by the age of 16 what about the humanities what should we be teaching scientists to be humans to put it provocatively so I think we should accept that scientists are humans we are humans I wonder if we shouldn't do a quick GWAS of taking all the scientists together comparing them controls and see if there's any traits that come through I think that GWAS could be quite interesting the one thing I would say this makes me, I'm an old man I have grey hair and I have come to sorry but I have come to really like encouraging my students Leland is here and he had this recommendation a couple of months ago of doing a history of science part to the introduction to their thesis and the reason why I think a history of science approach at the study of thesis is good is it makes you see science where you kind of know the answer but you have to really start thinking about why did these people do these experiments why did they think this way why was this a big issue in where never it was that it came about and actually for me so I think the history of science is under taught by scientists we should embrace our own history and I think because as you say science is very human when you take a historical approach to science you understand actually a little better the processes around you now because they are not so dissimilar 50 years ago 80 years ago what have you I recommend I won't go into this I recommend reading my blog about X-rays in medicine if you want to see about a new technology coming and changing medicine maybe a bit like genomics what has happened before and that was X-rays coming into medicine and I think we can learn a lot from the history of medicine about how X-rays were internalized into medicine and that was from the 1890s for example and I would recommend it as well well we're going to stop there first of all what a terrific way to end because what you just said in your last one minute is exactly the reason why we're having this series here completely I think is very important to think about it, it's only a history from only 25 years ago in our field but I think exactly in the same way thinking about other disciplines that have been around much longer I think the history is completely lost on people because we don't appreciate it we actually don't even catalog it very well I hope everybody join me in thanking you