 Next up on the agenda is a presentation on comparative genomics from Eleanor Carlson and I've asked Jen Troyer, Program Director at NHGRI, Division of Genome Sciences, to do her introduction. Thank you, Jennifer. Great. Can you hear me, Rudy? Yes, I can. Hello. It's my great pleasure to introduce Dr. Eleanor Carlson. She is an associate professor in bioinformatics and integrative biology at the University of Massachusetts Medical School as well as being the director of vertebrate genomics at the Broad Institute of MIT and Harvard. Eleanor started out at Rice University and received dual bachelor degrees, one in biochemistry and cell biology and the other in fine arts, which will become relevant. And then went on to get her PhD in bioinformatics from Boston University. And then after doing a postdoctoral fellowship at Harvard University, she began her research group at the University of Massachusetts Medical School. I first got to know Eleanor as the PI on an RL, NHGRI funded RL1 on the 200 mammals genome project, which I believe she is going to tell you something about. And is involved in part of our comparative genomics portfolio. And so as such, we nominated her to be on the organizing committee of a trans agency workshop on comparative genomics last year that she will also be talking about. And since then Eleanor and a group of other investigators funded by NIH, NSF, USDA, have been meeting on a regular basis to talk about and put forward an idea for synergistically moving the field of comparative genomics forward. So really look forward to hearing what she has to say on this. And the last thing I want to mention, which I'm not sure will appear in her slides. Unlike everything else I've mentioned is that she's gone back also to her fine arts roots roots and spent the last week working on creating stained glass. So we may not see some now but I hope to see it someday. Without further ado, happy to turn it over to you Eleanor. Thank you very much. I'm just going to share my screen. Yeah, the stained glass is not featuring in my talk unfortunately but except for the fact that I've abandoned on my finger because it turns out glasses sharp. Okay, let's see here. So can everybody see my slides okay. This weekend. Thank you very much. Okay, so as Jen introduced so nicely I'm going to talk about this meeting we had on perspectives and comparative genomics and evolution. It turned out that part of this meeting was figuring out how we were actually going to define comparative genomics and evolution or sorry comparative genomics and this often happens every time we tried to come up with a definition it just got bigger because people would say well what about this. So we ended up with being the study of the entire tree of life from microbes to humans, things both still around and those that are extinct their interactions and the evolutionary processes that created the diversity. But a more simple way of thinking about it on kind of a DNA sequence level is it's you know when you're looking for basically you sequence a whole bunch of things and you're looking for things that either haven't changed suggesting that they're functionally important for things that are changing in interesting ways such as being changed in humans but not in other mammals and things like that. This is familiar to anybody who's ever worked in genomics we get into a dimensional problem with comparative genomics because you end up having this by this by this so we've got the genotypes we've got the phenotypes we've got individuals we've got species and I was putting these slides together I thought about tackling the whole single cell thing and decided that was just too much so I would leave it off of here, but there's today an estimated somewhere between 10 and 14 million species probably on the planet and so it is just a really big scope of a problem. So the perspectives and comparative genomics meeting that was part of NIH's strategic plan was joint sponsored and it was actually a fantastic meeting because it was joint sponsored by NIH NSF and USDA. It actually brought together people working on humans non traditional model systems agricultural species, veterinary models traditional models like humans and mice, and even natural models are just things that are still out there in the environment that we're in. And they are challenges kind of to figure out what the key challenges but also the opportunities are in comparative genomics and propose ways to kind of move the field forward. And also to kind of accelerate the scientific breakthroughs that are actually focused on improving human health and how comparative genomics can be used for that. So just a quick side note on my perspective on this as kind of two different main threads in my research the first is I have this huge project on dogs which is usually what people think about when they hear about my research because everybody likes dogs. And so we're using dogs as a natural model system basically for human diseases and human behavior. And I won't say a lot more about that except to say that we have an open data genomics project on dogs called Darwin's Ark and if you want to sequence your dog we will do it for you I'm just saying. You know, go check out our website. But the project that Jen mentioned the 200 mammals project is the one that's probably most focused really on comparative genomics and the challenge of this project was to resolve conservation so I'd be able to identify individual positions in the human genome. And so we made new genomes for 130 species that hadn't been sequenced before, aligned them for over aligned them together with anything else we could find that was publicly available so we had an alignment of 240 theory and mammals, which was a process that really highlighted one of the genomics that alignment was a beast it took about a year, something like 2.1 million core hours on the Amazon cloud to actually make that alignment. But now that it's there it's an incredible reference, because it's a it's not referenced on the human genome meaning that you can look at it from any perspective any species in the alignment. And the idea was to basically find highly conserved and also accelerated things that are changing faster in humans than you would expect positions in the genome. I'm also part of a number of different genome sequencing consortia that have come up over the last five or 10 years with the goal of basically providing primarily referenced genome so single reference genomes for more species. The kind of umbrella organization is known as genome 10k there's our 200 mammals project where we're trying to rename that to zoom me up so that we're not quite so stuck with the 200 genomes problem. Since we're now 240 earth bio genome project the vertebra genomes project bird 10k bat 10 one K DNA zoo. And this is a really highly collaborative community where people are building these genomes and putting them out for people to work with. And you know this was actually one of these things that ended up highlighting one of the powers of comparative genomics so back in March, when we were all initially stuck at home, working virtually. Maris Lewin approached me from UC Davis and asked me to ask me mentioned to me that he had this idea about whether we could actually use comparative genomics to identify species that were likely to be at risk of being infected by SARS CoV to the virus that causes coven 19. And this was actually spurred by a lot of questions from the community of people that take care of these animals and we're really concerned about whether their animals were at risk of getting infected by humans, especially for the ones that are endangered or you know, or even close to extinction at this point in time. And so we actually sitting at home, a bunch of us got together from all over the world, and basically mind all of the public databases of data for genomes that were out there for other species, made an alignment for the ace to receptor gene of I think 400 over 400 species, and came up with a basically a risk prediction for them. Obviously, a whole lot more follow up work needs to be done but in a crisis where people were looking for research, looking for answers on a really short time schedule scale it was really nice to have these genomic resources available to use without, you know, and be able to, you know, basically to address a question that hadn't actually ever been in anybody's mind when they made these genomes to begin with. And it's just kind of a shout out to everything that that NHGRI has done so far as far as making these resources available and supporting them. So comparative genomics what is it useful for. Well, I've got my Swiss Army knife on here because it's basically useful for many different things. There's three different categories that we've kind of broken it down into now. So there's things of applications in human medicine and then discoveries of genomic innovations and then agriculture and environment. So in human medicine, this is basically, you know, the 200 mammals project is a great example of this. One of the big challenges in human medicine is identifying positions in the human genome that are likely to have functional consequences if they're changed. You know, the human genome is three billion bases long, tiny percent of it actually codes for genes, but a lot more than that is actually conserved and if something's conserved that suggests that it's actually doing something. And so the 200 mammals project, for example, so it was a follow up on the 29 mammals project which had 29 species in it. With 29 species, you have a branch length of 4.9, which means on average you expect 4.9 positions changes at any position just by random chance across those 29 mammals when you line them all up. So by scaling up to 200 mammals, we actually get the branch length up to 16.6 expected by random chance, which radically drops the number of positions in the genome that you just would expect based on a very simple kind of Poisson calculation are going to be conserved completely conserved across those species just by random chance. And it goes from 22 million down to 23 million down to 191. The actual the other consequence of this is it means that rather than identifying regions of conservation, we identify single bases of conservation. I will point out since I work at the Broad and people are talking all the time about sequencing humans that I ran the same calculation for the human nomad project and if you sequence enough humans you might get to similar power but there's not nearly as much ability to resolve it. Based on human data and so you are getting an awful lot of information out of just 240 genomes with the 200 mammals project just by expanding that phylogenetic diversity. The AgWethers actually looks like when you look at the data. So on the top here I plotted the phylo P scores for the promoter of LDLR in the genome so these are single base scores for conservation across that region. And if you actually compare that to data looking at the functional consequences of altering those positions you can see this very high correlation between things that are highly conserved and things that when you change them actually have functional consequences in this case altering the expression of the gene. And so that's kind of the clearest application to human medicine but there's obviously an awful lot of other applications from trying to figure out what the best model organism is for a particular trait to studying for things like SARS-CoV-2 the source or reservoir for zoonotic diseases. One of the things we saw in our comparative genomics of the ACE2 receptor is that you see selection in the bat lineage that you don't see in the other lineages and there's some evidence that bats have evolved ways to tolerate viral infections that humans can't tolerate and it will be interesting to know how they're doing that. But the other kind of aspect of comparative genomics that I find really interesting is that when we study human disease we often go about it by studying trying to figure out what has changed in the people that are getting sick what is putting them at an increased risk. In actuality we're not actually in the business of trying to make people sick or actually trying to do is make people well. And so the idea of looking in other species to actually identify species that do things better than humans do as a way to identify new therapeutic opportunities is really intriguing. One of the nicest examples of this is hibernation. This is something that's being worked on by a startup called FaunaBio who was at the meeting that we had in the fall. And so hibernation is amazing. Basically you have all these species who first of all become obese and insuline resistant prior to hibernating because they eat constantly and then they hibernate for six to nine months during which they're almost completely sedentary. They don't have any food intake. Their synapses lose all sorts of connectivity and they experience these repeated, you know, oxygen deprivation and then oxygen reperfusion events. And then they move out of hibernation and they're totally fine again. And they don't seem to have any inflammation or consequences in terms of organ damage that you would see humans went through a similar process. And so we don't know how species are actually managing to achieve this or what changes they have in their genome to actually allow them to to tolerate these experiences will be very damaging to non hibernating species. But if we could identify how they're doing it there might be ways to actually translate that into new therapeutics and humans. So I put together this was quite a fun slide to put together, but it really is that question of when you're studying humans you learn an awful lot about what the human genome is doing but not a lot about other species and there is this question out there about what other species are doing that humans aren't doing what we could learn from that. And then the kind of last part of it is the agricultural and environment kind of side of things. There's endless examples of this it was quite hard to actually figure out how to put a slide together about it. But basically there's an awful lot of projects going on involving comparative genomics out there and other species that don't really yet intersect with projects that are focused on human health, but have the potential to teach us an awful lot. So like in the agricultural species, genetic selection for food production, they're creating these huge huge data sets of chicken data and pig data and cattle data in order to do genetic selection. There's an awful lot of value in that information that's not being leveraged necessarily right now especially in the context of human genomics. There's things in wheat where you're trying to protect human food supplies by looking in wild populations for adaptations to withstand diseases and things like that you don't see in the domesticated species and figure out whether you can actually engineer them in. You can retain there's a lot of projects using genomics to try and retain diversity and critically endangered populations we have this problem right now with species going extinct. So the future of species preservation is that once you lose genetic diversity, you can't ever get it back. And so there's this kind of very urgent need to figure out how we can actually make sure that we retain as much genetic diversity as we can, until we figure out how to better protect these populations. And unless you know what diversity is out there, it's very hard to preserve it. And finally there's the kind of cutting edge of species preservation where they're trying to restore species from frozen cell cultures. The most notable example of this is probably the northern white rhino where there's only two surviving females but there's 12 fibroblast cell lines banked at the San Diego frozen zoo and they're trying to actually use those to resurrect that species. So that's kind of the exciting what comparative genomics can do. But despite all of these possibilities right now, it actually remains a relatively small field. And at least as far as genomes that have been deposited in gen bank. You know these are the kind of counts of species that we have sorry genomes that we have in gen bank for the most part it's just one genome per species. And it's right about 54 out of 376 species and while some of these numbers are kind of large. They're pretty much completely dwarfed by the amount of human sequencing we're doing right now and so I think there's a there's a lot of value there that hasn't necessarily been leveraged yet. In this meeting we talked about all of this endlessly and had lots and lots of discussions and basically I think we came up with three different kind of major categories of challenges. The first one that came up was technical comparative genomics is is it is hugely cross disciplinary of course. And right now, while there's a lot of projects going on all over the place there's not a lot of integration of those projects and so there's data out there that might be useful and that isn't necessarily any way to access it. And so there's a lot of talk about the fact of lack of universal standards for how you store this data and the problem of actually having it be all over the place, and then the computational challenges of actually analyzing all of that data. In a way this this challenge is in some ways more straightforward in the sense that a lot of aspects of it have actually are actually already being tackled I think in human genomics. Through projects, a lot of them supported by NHGRI where you're trying to figure out how to get everything just up in scale and really start tracking and storing this data in a way that it's really useful to scientists. So this idea of building a library for life so integrating genomic and phenotype resources so phenotypes, once again are hugely challenging and mostly they're not tracked and so even if you can get genomic data for a speak for an individual from a species you're not going to have any phenotypes attached to that. And so if you can actually integrate those resources provide some universal standards for it and actually expand the computational infrastructure for working with it. That would be a huge push forward. So, you know just developing software for comparing multi dimensional data. I've learned through all of my dog research all the challenges of taking human taking software developed for the human genome actually applying it to other species, starting in my, my very first project in dog genetics when we figured out that the software called Plink just assumed that everything had 23 chromosomes. Actually we've got the species that has 39 so we need to change that. And so there's all these kind of little fiddly things but a lot of the software that's out there could actually be expanded to actually work with other species, in addition to humans and add in this kind of multi dimensional component. And then also just supporting all of the browsers and data sharing requirements for comparative genomics. There's a lot of different standards and different expectations out there in different fields and being able to make a much more rigorous approach to data sharing would make a huge difference, I think. The second type of challenges we, we, we contact we kind of confronted with these interpersonal challenges comparative genomics, if it's actually going to study all the species needs to be global it needs to be worldwide. And right now it's not really the biggest thing that came up over and over again from a purely practical side and I'm sure this comes up in many, many conversations is just the shortage of skilled computational scientists people that have the scientific training and the skill in software development or algorithms or computer science to actually analyze these data sets that are out there tends to be a restriction on a lot of projects moving forward right now. And that is just the data management and ownership and sharing challenges that I mentioned already. And then the fact that even though there are millions and millions of species on this planet we tend to have focused on things that for us are easy to access. I can't remember exactly what he said but but but Harris Lou and had some some kind of shorthand for it which was basically, you know, thing ourselves things that we live with things that we eat and things that kill us, or something like that. So we tend to focus on things that will live in, you know, live in zoos or or or agricultural species and that misses most of the species out there. I was quite surprised to learn when talking to all the writer from the San Diego zoo that there's a whole bunch of species they just they don't tolerate captivity you can't keep them in captivity there's a bunch of bat species out there that are just for diseases that we can't study as model organisms in laboratories because they just can't thrive and so figuring out how to actually access harder to access species is really important. And so I think the kind of ways forward that were identified were, you know, just training opportunities in the computational sciences came up over and over again, but also transagency funding opportunities that would encourage groups from completely different communities to actually work together. And that was a real benefit of having the kind of USDA and SF perspective out there is it kind of highlighted that there's a lot of research going on. And there's a lot of overlaps that could happen that aren't happening yet. And then finally, supporting projects that have a multifaceted design. And so you know there's kind of a, you know, if you're applying to NIH or looking at human health interests and then if you're applying to NSF there's a different set of interests. So kind of explicitly support projects that are both are designed to advance human health and support biodiversity conservation would allow a lot of new exciting projects to happen, I think. And then the last is just the scientific. There's a huge amount of excitement around this obviously. How do we actually discover the rules of biology that actually kind of go across the tree of life. And it's, you know, there's a lot of examples out there how major discoveries in basic biology and medicine are from other species. Things from CRISPR transposed on the evolution of regulatory elements, you know, many other things. But right now much of genomics is focused on humans and research and data from other species is difficult to integrate with those projects. And I think a few things that came up were first of all just normalizing the use of any organism as a model organism people often run into difficulties with proposing model organisms outside of the traditional ones. And developing new experimental methods for working with non traditional models, you know, there's a lot of species for which nobody's ever grown a cell line. And so doing cell line work right now is is not possible, but it could be and those methods could get developed. And then building true cross disciplinary collaborations things that really combine, you know, the interests of multiple different fields, I think it's going to be really important. But I think at the end of the day. It's an incredibly exciting field to be involved in right now, like many things in genomics just the ability to generate vast amounts of data is opening all sorts of new opportunities and it's just a matter of figuring out how, how to really leverage them. And it's an opportunity to discover what makes us different. And what we share with other species, including my cat who has decided that this is the moment to use the cat wheel behind me, accelerate the discovery of new therapeutics, and then addressing environmental impact of humans on this planet and preserving biodiversity. And I think sometimes we like to think of humans and then the rest of the planet and it's really important to realize that we do actually live on this planet and that protecting our environment is going to be part of protecting human health. So thank you to everybody that was involved in the meeting and everybody that is involved in the organizing committee with me and I'll take questions. Okay on our thank you very much so I see Steve rich his hand is up. Yeah, thanks for the presentation that was very informative, you know, being old as I am I remember in graduate school talking to one of my agronomy genetics professors about seed banks in the area of maze and wheat and things like that and even before that going, you know, reading all the early investigators who would go to different parts of the of the world to collect Rassafla and do studies, which, you know, is a good thing to do if you can get to Hawaii or places like that. Yeah. You know, one of the things that it brings in mind is, as you mentioned, you're gathering the information across various species of what's happened in the past is a fairly large task. And obviously there are organizations that have been doing some of this, either in terms of resources like seed banks and so forth, or just generating data. How do you see that going forward, realistically in a, in a systematic way so that, you know, you don't have to keep doing it over and over again. So one of those projects, those tasks that alternately excites me and terrifies me as to how you would actually do this because the complexity is so large. But we had some discussions at this meeting about how existing resources things like the bio sample repository could actually be leveraged or expanded to integrate other species. There's a, there's sometimes a perception in genomics that the the really hard part of genomics is the sequencing and the data analysis and the computational stuff. And in comparative genomics and probably in everything else the really hard part is getting the samples and the phenotypes and tracking them and organizing them and knowing what is what and and figuring out whether there are ways to take advantage of what's been learned in human medicine to actually do that better in other species I think could be a really interesting discussion. Are there, you know, I thought about it and I was like, well, you know, if we could set up a way to register samples that was actually kept track of information for the researchers themselves in a useful way that would encourage people to use it. And I think there's a lot of potential there because if we actually knew what was out there we could actually design projects that took advantage of it. And right now it's hard to even know I mean, the one thing I didn't mention with the 200 mammals project is we did the genomes of 130 species. We didn't do 20 species because we could never get a sample for them and our DNA requirements were minimal. Like we were doing short read sequencing we didn't need anything super fancy and we still couldn't get a sample. And so I don't have an answer for your question but I think it is one of the biggest challenges and I'd be really, I think we need to figure out a way to move it forward. Thank you. Sharon. Thanks. That was a great talk. I just wanted to highlight one of the challenges and to push you a little further on ideas about it. I was on an advisory panel at an institute that has a large plant genomics program and what really struck me is, I know absolutely nothing about plant genomics. And as well as I really didn't know even who the leading, you know, names that people are talking about is, you know, obviously well respected, you know, I of course had never heard of. And so, you know what your committee came up with the need to have these cross fertilization projects I would encourage the idea of grants where you have to work with someone you haven't worked before. I found in my own experience in our Texas Medical Center does that sometimes to get institutions to work together. You know, and when there is a specific program and it says to do this you really need to take people from I don't know three different organisms or you know three different levels of narrative genomics but I just think it's a real challenge. And I was just wondering to push a little more was there more discussion about how to overcome that. Yeah, I was laughing when you made that comment because I to constantly have to be reminded that plants exist. There, there are completely weird ball game when it comes to genomics and there's all of these challenges of working with them that we don't confront, especially if you're working on mammals which honestly are pretty straightforward and easy compared to most species out there because they're so similar to humans. And I agree with you that making figuring out a way to get those collaborations to happen is really critical because you know it's even just as basic as a language problem like we use different words for things and so we don't necessarily even understand the scope of the problem. But then also figuring out what the opportunities are as well and I think that's where those those kind of process of plenary collaborations can really help identify what big questions are that could actually be the answer of using another species. How you're on mute. Thank you, Eleanor you really make a compelling argument, arguing that more is better but I'm wondering when is there is there a way of managing the risk of diminishing returns is there a way to be strategic in the next organism or set of organisms that are sequenced and you know might that begin with phenotypes of interest instead of sequencing everything that does and does not hibernate rather try to learn what's in common among species that are immune from disuse muscle at something that might have immediate medical significance. So who's coordinating those kinds of efforts. I have an answer for who's coordinating those kinds of efforts, and I think that's a that's a good point I think your point is a really valid one and to make clear for so for the 200 mammals project. We had a very clear criteria for selecting species which was to resolve conservation to a single base we needed to maximize our branch length as much as possible. So we were going for diversity across the mammalian tree, and I think you're right having that criteria for how we were choosing our species was really important in terms of directing the project and making sure that the, the data set that we got out of it was as powerful as possible for the question that we were asking. So the challenge is with everything in science because sometimes if you narrow down your question too early you miss opportunities to learn new things just because you didn't think to look there because you didn't know that when you were starting out. And so I think there has to be some combination of focusing on particular problems, but then also exploring the parts of the, the problem that we haven't even looked at yet. But there's a super weird species that hibernates and Madagascar called the 10 rec. And it seems to have nothing to do with external temperature. Nobody's quite sure exactly why they go into torpor and when they come out of torpor but it's hibernation but it seems to be overlap in some ways with what goes on in the 13 line ground squirrels and in some ways is completely different. So we're figuring out both how we focus the problem on things that we already understand but at the same time, keep it expanded enough because we really have so little understanding of what hibernation even is right now biologically. Thank you. Sarah, if you said this I missed it and I apologize but in a comparative genomic study like this, what kind of sequence product do you need low pass sequence 10x is that all short reads does everything have to be assembled. It's a good question. One of the things that kept on coming up at the meeting that I didn't really get into it all here was this whole question of what a reference genome was and what genome technology you should be using for various things and you know all that kind of thing. Personally, a really big believer in figuring out what kind of data you need to answer the question that you're asking, but then doing it in a way that it can actually be used in the future as well as technology gets better because there's one thing we know about genomic technology is it will better and faster and less expensive as time goes on. So for the 200 mammals project we were really focused on this question of single base conservation. And based on the fact that it started several years ago, we were using short read alumina sequencing because it didn't require really fancy samples which we couldn't get for most of our species and it provided the data we needed. At the same time, there are restrictions as to what we can actually look at in terms of larger variation in the genome. And so we're now working with other things to use things like I see to actually upgrade the genome so you can use all of that aluminum data that we produced but then add on the kind of icy chromosome structure information and actually scaffold those those short read genomes up to much higher scaffolds. And really that would be, you know, really targeting the data towards the problem that you're interested in, and into an extent but then also creating, getting the genomic data out of it that's useful for other things as well. You know, it's kind of that that kind of line that you're, you're trying to walk. Okay, thank you. Other questions for Eleanor. Well, in addition to just being simply interesting. Thank you very much for your presentation it'll also be useful and informative as we plan our comparative genomics research studies for the future. Thank you very much. Bye bye.