 Yeah, so I got asked about 20 minutes ago to introduce Beth, which is a rather daunting task. But Beth is co-director of the Paleogenomics Lab at the University of California, Santa Cruz, and also a newly minted HHMI investigator, which is fantastic. I think I first met Beth a few years ago when I was relatively new into the whole faculty thing. And she gave me great comfort, because I think she had more strange, weird projects going on in her group than I was contemplating in mine. So I was like, OK, good. Somebody does this. Also, she'll apparently get DNA out of anything, which is really cool. But she also has a really nice way of taking all of the DNA she's taking out of random things all over the place and turning them into some really interesting biological and evolutionary stories. So without further ado, Beth. All right. Well, I am excited to be here and honored to have been asked to give a plenary talk this morning. I kind of wish it had been yesterday morning, because then I could have stayed out later last night. You know, it is what it is, right? So thank you, Eleanor, for that introduction. And I'm under strict instructions to make sure that I encourage people with my talk to think big, come up with nice, creative, new, dare I say, because I'm funded by NSF, transformative ideas about how to move this field forward. It was hard to think to really put together what I was going to talk about this morning, especially following yesterday. What I'm going to try to do is pull some ideas together from yesterday. Maybe I'll repeat a few things, but try to put it in the context of the biology and think about what types of questions that we're asking and really where this field as a whole is going. Let me see if I can get this to work now. All right. So I mean, you have to start at the beginning, right, when we think about comparative genomics and ask really, what is it? What is the main idea or was the driving idea behind this field, which really is only a couple of decades old? And the idea really from the beginning was that we would compare genomic features of different organisms. And from the very beginning, we started here. This is figure 40-something from the human genome paper that shows the conserved regions between mouse and humans. And we've come a long way since then. We have a lot more data. Several years later, we had a 28-way alignment. And we found that the ultra-conserved elements, which were discovered by Gil Begerano and David Hauser's group some years before, and that group, were conserved among the amniotes. And as these data sets increased, we realized that we could use comparative genomics to ask specific questions about the way genomes and organisms evolve. So what are the conserved elements in a genome? How do particular genes, why are particular genes conserved? And what does this mean about what we can discover about what's necessary to support life? And then also, obviously, as we talked about yesterday, how can we link genomes to phenomes to learn about the translation between what the DNA sequence says and what the organism actually looks like, how it acts and how it behaves? And over the years, the number of genomes has increased. And we've gotten larger data sets, more sophisticated tools for assembling and aligning genomes. And we've been able to ask questions about the biology of particular organisms. And this continues to advance. And here we are at the forefront of comparative genomics as the top of that page always says. And some things are abundantly clear from listening to the talks yesterday. The things are still very hard today. We haven't actually solved a lot of the problems that we set out to solve in the beginning. But it's not going to be like that for long. All of the things that seemed impossible 10 years ago were suddenly possible five years ago. Things that seem impossible today, we talk about really fast genome assemblies or large alignments, are not going to be impossible forever. And so what we're doing here is trying to think of what comes next. We know what's hard now, but what is that next phase? Where are we going as a group, as a field? We know that now in 2019, we are dramatically increasing the number of taxa with some of these giant projects that are aiming to sequence everything that lives everywhere. We are getting better at making genomes. This is a Benedict Patton slide that he always shows where he says that he can get a $10,000 genome in seven days using some of the long read technologies. Benedict's group is really working on Oxford Nanopore based technologies. And we can also do very large alignments. This is the latest version of the Brode's 200 Mammals Project, it's actually 224 alignments using Cactus. But this also brings up a bit of a problem, something that we haven't really addressed from the infrastructure perspective. And I asked Joel yesterday, and Joel and Benedict have been a little bit coy about how hard this was to actually do. But yesterday, they finally admitted to me that it took 1.9 million CPU hours and about $50,000 to assemble this, to align these genomes. Now they both say that they could do it better the second time around, probably get it down to about $35,000 and maybe take a little bit less time to do it. But this is for 242. So when we start thinking about trying to put together everything that's everywhere and aligning them, we really have a big data storage accessibility compute time challenge ahead of us that we didn't really address yesterday. But really, where I would like to focus the discussion this morning, and hopefully the discussion later on this afternoon, is as all of this stuff continues to improve and we continue to get more and better and faster and more accurate and more comparable and compatible genomes, how will our research questions change? How will the questions that we can address and the ways that we go about addressing these questions change given that we're going to have all of these new data? And thinking from the perspective of a biologist, how will that then help us to decide what types of new data we actually want? Now I'm basically an ecologist, an evolutionary ecologist, or a paleo ecologist, or whatever. Lots of different things I can be called, sometimes not nice things, by Eleanor sometimes. But so I tend to think of things at different levels, really at the level of species. So how can we increase the number of species that we've sequenced so that we have a better sampling of the stuff that's out there? The level of populations, how will our questions change as we start to get more than one individual from each population? And then also communities, the questions that I'm interested in, many of the questions I'm interested in, really aren't about a particular species, but about how species fit into ecosystems, and how ecosystems change when species either come into that ecosystem or disappear from that ecosystem. And in order to ask these questions, we really have to have a much broader sampling of diversity and understand the interactions and links between the things that live there. And of course, as we increase all of these, we're going to need new tools, lots of different software approaches to asking and answering these questions, and then obviously some big compute time. So at the first and simplest level, it's clear that we want more species. And we don't just want more the same species. We want large and repetitive genomes, things that are hard to sequence, hard to assemble, hard to align. And we also want extinct genomes, I have to say this, because I work in ancient DNA. The beginning of ancient DNA was in 1984, when some folks at Allen Wilson's lab at Berkeley managed to recover a tiny little fragment of mitochondrial DNA from a museum-preserved specimen of this guy. This is a quagga, and they discovered, it'll be shocking to you, that it's a zebra, right? Well, and while that discovery wasn't particularly revolutionary, the idea that one could actually recover DNA from something that wasn't alive really was revolutionary and set off a whole bunch of crazy. Initially, there was DNA from dinosaurs and insects preserved in amber and all this kind of nonsense that turned out not to be true. But over the years, we have learned a lot. We know that the DNA that's preserved in ancient things is in rubbish condition. I like to compare it to confetti, whereas modern DNA is a really nice long roll of confetti paper here before the day of the parade. This is just a plot showing the fragment length distribution of five different last glacial maxim that's about 20,000 years ago, Arctic horse bones. We got a little bit of bone powder from a horse, crumbled it up and extracted DNA. And you can see that the average fragment length varies quite a lot, but it's pretty short. It's pretty rubbish. You can't really imagine putting together a de novo assembled genome from any of these things. And yet we can get tons of information out of these because over the years we've learned a lot. We've learned that there are certain conditions like cold conditions, keep stuff out of the sunlight, UV decays DNA, hydrolysis of water decays DNA, fungi and microbes that are in the sediment will catabolize the DNA and chew it up. And we've learned to look in places that DNA is going to be better preserved. And we now know, and we also have developed all of these sophisticated approaches in the lab that we use not to contaminate our samples. These are great, actually. It really stops anyone else from coming into your lab when they look through the window and they see all of the students in the lab wearing these safety kits. They're like, wow, I'm not gonna go in there. I might die. If they did go in there, my students would kill them. So it's probably true because they would contaminate the lab with their own DNA and therefore make our research impossible. But over the years, access to ancient DNA has provided lots of interesting things. The oldest genome that's ever been recovered was about a 1x genome from a horse that was discovered in the permafrost in Canada's Yukon. It's about 700,000 years old based on proximity to a volcanic ash layer that's been dated to that age. We have very old viral genomes. A few years ago, we recovered 700-year-old viral DNA, an entire genome. We're able to put it back into a plant and reconstitute the virus, which is kind of scary, but also pretty cool. And of course, all of the really interesting things that we've learned about human evolution by sequencing DNA from things like Neanderthals and this tiny little pinky bone that turned out to be from a different type of hominin. So one of the questions that my group has asked from a species perspective with ancient DNA is what is interesting about this guy? This is a passenger pigeon, which is an extinct bird when extinct in 1914, but 100 years before it went extinct, they flew around North America in flocks of billions of individuals, and they literally went from billions of individuals to extinct over the course of somewhere between 40 and 60 years. It's our fault that this happened. We caught them in droves and murdered them and ate them, and they couldn't live in small populations. We were wondering whether we could learn something by comparing its genome to its closest living relative, the band-tailed pigeon, about why this bird went extinct, why it didn't somehow survive in some pockets of isolated forests somewhere. And so we sequenced the genome from this guy. We got about a 50X genome from an extinct passenger pigeon from the tow pad. And we found indeed that the genome was very weird. So this is a pattern of heterozygosity across the passenger pigeon genome in red and the band-tailed pigeon genome in blue. And we see that the passenger pigeon has very high heterozygosity at the edges of the chromosomes. So these are mapped to the chromosome of the band-tailed pigeon. And the band-tailed pigeon has pretty similar heterozygosity across here. And this actually matches the recombination landscape in birds, where they have high rates of recombination at the ends of the chromosomes and low rates of recombination in the middle of the chromosomes. And so we wondered if we have high rates of recombination, what we should have here is very large effective population size and therefore very strong natural selection. And in fact, we did find that. Both rate of adaptive evolution and the rate of purifying selection was much stronger in passenger pigeons than in band-tailed pigeons and was much stronger at the ends of the chromosomes and at the center of the chromosomes where lack of recombination meant that there were constant purifying sweeps that were actually reducing, suppressing the local diversity, which probably maps to effective population size. And indeed, if we look in these regions for genes that are under selection, we found several genes that seem to be associated with life in large populations. So things like stress response and disease resistance, genes were under selection in passenger pigeons. So this tells us that with this comparative genomics approach with the passenger pigeon, we can not only learn something about the actual genome, we're seeing that recombination, this landscape is actually constraining how mutations and diversity accumulates across the genome. We can think about how population size, we have a very large population size, is going to impact evolution and we started to learn something about the biology of these organisms. Of course, to think about biology and natural selection, we had to have more than one individual from these species which really brings me to my second and most important point here is that when we think about comparative genomics, we're really moving into a timeframe where we want to know about populations rather than just about individuals. And that means that we have to think about sequencing things at a much wider scale, maybe re-sequencing, maybe de novo sequencing, maybe it depends on the question that you're trying to answer which of these approaches is actually most useful for you. And an example of this that we're all familiar with is obviously ourselves, human evolution. When we got the DNA sequence from the Neanderthal which has a population divergence time from anatomically modern humans of around 300, 400,000 years ago, we learned initially that there had been admixture, there had been gene flow from Neanderthals into humans. But it wasn't until we had population scale data to compare these two that we saw the number of different times that there's been gene flow between these different populations and how the amount of Neanderthal genome and the distribution of Neanderthal DNA that's in different human populations varies geographically. And it was only with the population scale data that we could really start to reconstruct the evolutionary history of these interactions and learn about how humans moved across the globe, Neanderthals and our other archaic cousins moved across the globe and when and where they interacted. And with population specific data, we started to be able to answer questions about how that introgression interaction with Neanderthals impacted human evolution. We saw that there are certain genes that came from Denisovans into high altitude Tibetans that allow them to survive better at high altitude. And we see that there are some genes that have gotten into populations at high frequency that might have negative consequences like this particular variance in this gene that is a risk factor for diabetes in Mexico. And of course, as we continue to sequence more and more and more human genomes and align them and assemble them and make them available for comparison, we can start to map across the human genome where all of the Neanderthal alleles are. And we all know that we have somewhere between maybe one and 4% Neanderthal but it's not the same one and 4% Neanderthal. And if we sum across all of the living anatomically modern humans, we can piece together something like 65% of the Neanderthal genome. The estimates were about 35 to 70%. And one of my students has been working on this and he comes up with something close to 65%. And the reason that this is interesting is not only because we can learn then about the impact of Neanderthal genes moving into humans, but we can also learn about the parts of the genome where no living human has Neanderthal DNA. In other words, what happened here, whether our mutations that are shared by all humans that are fixed and distributed among all human populations, but not in RRK cousins. And if we can learn this, then that will start to tell us about what it is potentially that makes us unique. What is it that makes humans different from other lineages? But how are we gonna do this with the tools that we have right now? Really are there the appropriate software population genomic population genetics tools to be able to ask and answer this question? And there are some and these are under development. The most powerful of these approaches are ancestral recombination graph approaches. There are several types of ancestral recombination graph approaches that have been developed so far. An ancestral recombination graph is basically a sequence of trees across the genome. So if this is a chromosome, each of these segments of the chromosome has a different sampling of evolutionary history. So it has a different genealogy, a different tree in which all people are related to each other in a different way. And an ancestral recombination graph, you take an alignment of these chromosomes and you estimate a tree between each one of these recombination breakpoints where mom and dad's chromosomes came together and there was a recombination event. So recombination graphs are informed by two types of data. Their individuals can fall into the same clade if they share a derived allele that sits in a particular region of the chromosome or if they share an ancestral recombination event. So in my lab, one of my PhD students, actually he just finished and he's looking for a postdoc right now. He's very awesome, Nathan Schaefer. I recommend if you're looking for a postdoc, you ask for him. He's developed an ancestral recombination graph that he's called Sarge. He's not so good at coming up with software names, but he's a very good computational biologist. That inputs panels of hundreds or thousands of phased haplotypes and it uses parsimony to figure out how the clades move whenever there's a recombination breakpoint and then it outputs this sequence of trees that contains a TMRCA estimate for each node. So basically this is a way of very quickly telling us how everyone is related to everyone else everywhere in the genome. So it's a very powerful approach for thinking about genealogical inference. So I'm just gonna show you a little bit of data from the stuff that he's done. This is taking the Simon's genome diversity data and four, these are three, sorry, three different Neanderthals, two Neanderthals and one Denisovan. There's actually one recently released new high coverage Neanderthal. These are all of the high coverage individuals over 30X, I think, individuals that have been published so far by Spontipabo's group in Leipzig. And this is analysis done by Nathan and Ed. So if you run the ARG through all of these data, the Simon's panel plus these three Archaic hominins, there are basically three possible configurations for the tree that you can see. The first is where the Archaic hominins, the Neanderthals and the Denisovan fall outside of the distribution of modern humans. This is basically the human specific genome. This is easy to identify. This is where we are all the same as each other and the Neanderthals are different. And then there's two other ways we can have this. We can have the Archaic hominins falling within the variation of humans, either with an old TMRCA and short haplotypes. And this is probably due to incomplete lineage sorting or with a more recent TMRCA and very long haplotypes. So recombination hasn't had as long to break down those haplotype lengths. And this is what's probably due to introgression. So incomplete lineage sorting has older haplotypes and introgression has younger haplotypes. And we are, I'm not gonna show the data, but simulations show that we can distinguish between those two. So if you look across the human genome and you ask what proportion of the genome has those three different configurations, we see that about 60% of the genome looks like admixture, about 31% of the genome looks like incomplete lineage sorting, and that about 10% of the genome has this pattern where it looks like humans are different. And then if you dig into this further and you ask, okay, not only do I wanna know where all humans cluster together, but I wanna know where humans cluster together and we have a human specific mutation. I wanna know something about how humans have evolved that makes us different. That turns out to be about 1.1% of the human genome. And although it's 1.1% of the genome, it contains 2.2% of exons and 1.5% of regulatory elements. So this is suggesting that here we have from this ancestral recombination graph and a ton of genomes that we can compare using new comparative genomics approaches, we've identified regions of the genome that we can use now to ask more interesting questions. Here we have a ton of hypotheses that we can now go and test. We can identify genes that might be functionally important to humans and we can design experiments to actually link the genotype to the phenotype and see if we can dig further into this and understand what's going on. Now of course, because I'm an ecologist and I work on, as Eleanor says, lots of different things, we don't only care about people in my lab and in this room because we don't only have NIH here in this room, right? And also, I really don't care about model organisms. I'm just kidding. So actually, I'm not that kidding. I really don't care about model organisms. So we have been looking at comparative genomics projects of this guy, which is sometimes called a Puma or the Cougar or a Panther or a Catamount or a Mountain Lion or one of my favorites, a Mountain Screamer, that's actually a name that is used for this cat. I've never heard them scream. I'm kind of grateful for that because I think that would be, yeah, I don't want to hear them scream. I think that would be nearing the end of my own life. And so these guys, we've generated high coverage, re-sequence genomes. We generated one de novo assembly using a combination of short and long reads and Chicago from Dovetail and Highsea. And this was a project run by a graduate student at a ceremony of postdoc Megan Supple in my group and we've sampled Mountain Lion individuals from Brazil and from several different places in the contiguous US, including from Florida, from prior to the admixture with Texas Panthers that happened. So these are the Florida Panthers that were on their way to extinction before they were admixt with the Texas Panthers. You see here the distribution. This shows what their historic range was that used to be distributed all across continental US but they've been reduced by habitat fragmentation and human movements to just these isolated populations in Florida and the Northeast. So with these genomes, we can use PSMC and reconstruct their demographic history if you believe that that's what this is doing and you can see that there is a separation here between the individuals from Brazil and the individuals from North America. This is approximately when we think the individuals were reintroduced from South America, Central America into North America. So there is a divergence between them and we see that there's been a recent decline in the effective population size of all of these individuals since then. So we can ask this question. And what we see, sorry, this is showing the Brazilian individuals versus North America. If we build whole genome phylogenies, we can see that within North America, there's been a lot of population structure and isolation, the individuals from each population clustered together. But what we can do with complete genomes and actual very nice assembled genomes is we can ask questions about their specific evolutionary history. So this is actually a map of two different individuals from each of these four different populations showing heterozygosity estimates across one of the longest chromosome fragments every 500 KB. And what we see here in the colored blocks are where they're homozygous. So these are places where mom and dad had the same copy, the same haplitite, the same allele. So in Brazil, what we see is there are some runs of homozygosity. Mountain lions are a species that tend to inbreed for short periods of time with a male being dominant and he'll breed with his offspring for maybe one or two additional generations until another juvenile male comes in from outside and usurps him and fixes that problem. But if we look at the others from North America, we see many, many tracks of homozygosity across the genome. And as I said, this is that Florida panther prior to admixture. And you can see that a bunch of his genome has been reduced to homozygosity. And in particular, there are regions where both of the individuals are homozygous. And this is how genetic diversity is lost in populations. There's not a gradual decline in heterozygosity as tends to be reported in conservation work, but instead it's like the lights are on or the lights are off. Diversity is there or it's not there. And so with this type of data, comparative genomics data at the population level, we can start to ask questions about how inbred populations are, where they might have deleterious alleles that have gone to fixation in the population, but also to identify individuals from nearby populations or other populations that you might use in translocation experiments to reintroduce diversity that's been lost from that population into the new population. So this is another potentially powerful thing that comparative genomics approaches at the population level can do. And we all know that a few weeks ago, there was the report that was released that said there were a million species facing extinction. And I think one thing that this report didn't actually talk about but could have is the challenge that we have in species that aren't necessarily facing extinction, but where genetic diversity is declining overall. And if we think about the challenges that all of these other species are facing, maybe they're not becoming extinct, but they're very fragmented because of human land use changes and all that genetic diversity, the lights are going out. And these are other species that we need to care about. And this is another way that comparative genomics can be useful as we move forward as a community. And finally, I wanted to talk a little bit about how we can use comparative genomics approaches to think about interactions within entire communities. So this is a picture of one of my field sites in the Yukon in Canada. And this is a wall of permafrost that's been exposed by placer gold mining. So this is a gold mining region right outside of Dawson City in the far north where the miners are spraying down the dirt as it melts to try to get to the gold bearing gravels underneath. And as they do it, they expose these walls of frozen mud. And in these walls of frozen mud, there are thousands, tens of thousands of bones from things like mammoths and giant bears and extinct lions and stuff like that. And we walk around and collect them. But sometimes we get very lucky and we find a section like this where there looks like there's actually some stratigraphy. This is hard in permafrost because there's a lot of cryoturbation as freeze thaw happens. And you tend not to see these things. But here what we see are active layers. And we've been able to radiocarbon date what's going on here. And this particular layer here, we know occurred about the same time that humans first appeared in the region. And right here, we see all these sticks that are sticking out right here. This is the period of time where the temperature warmed approximately eight degrees over the course of approximately 20 years. And we had an enormous influx of different types of species that were coming in. This is the transition from the Ice Age into the Holocene. And we can see that happening right here. And so what we've started to do is go to places like this. And we're not gonna have bones or plants or community level interaction in enough of a scale, enough of them to really know that we're reconstructing it accurately. But we can stick a sterilized metal tube. We actually coat it with alligator DNA to make sure that we're not contaminating ourselves. So unless alligators were present in the high Arctic during the Pleistocene, we're gonna be good, I don't think they were. And we take a sediment plug and then we use environmental DNA approaches, meta-barcoding approaches and enrichment approaches to try to reconstruct what the community looked like. So in doing this, what we're trying to learn is who was with whom and when they were together. And our goal there is to reconstruct what makes communities resilient and what makes communities more stable. And we can do lots of different things. We can look at the isotopes to measure what the climate was doing. We can look at the genetics of plants and animals and microbes to see how microbial community composition was changing, how microbial metabolism was changing and ask all sort of interesting questions using environmental DNA. The idea here is that when an organism dies or is on top of the surface, it's actually leaving traces, genetic traces of itself in the path as it was there. We can go up to these places and this is one of Dwayne Frey's PhD students who was working out in the field with us before and he is removing one of these plugs here. He's wearing gloves to make sure that he's not touching or contaminating this, but it's a plug that's in a metal tube here. And then he'll spray the inside of that tube with the alligator DNA. We chop it in half and we take a bit of sediment out of the center of the tube. And if we find alligator DNA, we think that it's melted and potentially contaminated. So we've gotten contamination from the outside and the inside. So we use the alligator DNA as a tracer to make sure we're not contaminating the sediment from things on the outside. So we take these back to the lab and we can extract DNA. We also go to places like this is a lake that is in Pribiloff Islands, which is in Bering Sea in Central Alaska. And in the winter, when the whole thing is frozen, we go and we can shove one of these big long, coring apparatuses into the mud underneath and then we get these long cores and we go down the core here and take sediment plugs out and then we can reconstruct all the diversity changes in community composition as we go back in time. This particular core went back 17,000 years and we were using it to try to figure out when mammoths disappeared and why mammoths disappeared on St. Paul. So just to show you kind of the things that we do up here, this is from Klondike. This is the first place where you saw the video of the person taking the mud out of the wall. We've gathered a whole bunch of bones from that area and we've used that to get genetic data from horses and from bison and then this is a population genetic reconstruction of changes in effective population size. So here, bison are increasing in population size and then decreasing again and horses are increasing. There's a bit of a blip decreasing again and both of these go locally extinct in the area by around 20 or somewhere between 15 and 18,000 years ago. And in the same place, we can reconstruct the climate history using isotopes. So this is nitrogen isotopes from the bones that we're actually getting the genetic information from and this is oxygen isotopes from the sediment, from the water that's in the frozen dirt itself. And from this, we can see that around 35,000 years ago, right here when the animals start to decline, it became suddenly very cold and very dry in the region. We also collect around this time Arctic ground squirrel nests, which we have a whole bunch of different plant macrophosals that we can use for DNA and Arctic ground squirrels require these well-drained substrates and a pretty deep active layer. So we know that when they're there, it was probably this dry step landscape. And with this information, we can say that when it got colder and drier, this actually drove these changes between bison and horses, where we saw that initially bison, they require really rich grasslands. They didn't do very well when it got colder and drier and all the rich grasses got replaced by the scrubby stuff. And horses did well for a short time, lacking the competition from bison, but then they also started to do very poorly because there just wasn't enough to do here. And we also see this rapid recovery around the time that we see the transition into the Holocene. So what can we learn from environmental DNA about this change in community composition? How can we ask questions about the community using environmental DNA? This is something that my grad students, Sabrina Shirazi's working on. She's only finished some of these right now. She's filling in the holes here. But what these bars are a reconstruction of the type of plants that are present in this same place. And we see this yellow line here, where you have the Arctic ground squirrels, where it's cold, et cetera. We also get a dramatic turnover in the plant community. Different types of plants are there. And then it changes very quickly afterwards. So this is really work in progress and preliminary data, and we're learning how to do this, but it does show the power of being able to combine these different approaches, including an environmental DNA approach to really reconstruct entire communities. And this is very early days for environmental DNA. There's lots of things we don't know. We don't have enough reference genomes for a lot of the species that we're interested in. There's a lot of stuff that comes back as unknown or other, we don't know what's happening. We don't really know how to make sense of the genomic data that we get out of them. We get a big, giant mixture of a whole bunch of different species. And we have to ask questions about how can we actually piece together those genomes appropriately so that we can disentangle who is at that time. We don't know how much DNA moves across space and environments or how it decays. We don't know what the DNA footprint of a particular species is. If I am a tree, how far in every direction does my DNA go? And what do I know about any DNA that I get out of that plug of soil right there about where that tree, that DNA that's in there goes? Also we can ask questions from this. I mean, it's even more complicated when we have these mixtures. How do we link genotype and phenotype? How do we know about microbial evolution that there's lots of things living in the soil by itself and they're passing genes back and forth not only toward each other at a time, but we know now that there are living microbes for thousands of years, maybe tens of thousands of years, sort of passing genetic information over very long time differences, really screwing with our models of molecular evolution that we would like to be able to use to figure out how old things are. And we also need better methods, better biological methods, better experimental laboratory methods to think about this. Of course, environmental DNA is not just useful for ecologists or ecosystem dynamics and thinking about this, but obviously there's interest in human health of environmental DNA. It's the same idea if we go into, look at metagenomes from skin or from intestinal contents or and we think about how this, how the community of microorganisms that live inside our bodies are impacting human health. We are really at the very early stages of having any idea what we're doing here or how long lived any of the interactions that are observed are or how we can actually make sense of the community of organisms that are there. And if we think about microbial dynamics and microbial evolution, it's really not just this idea of human health, but it has implications for agriculture. Can we use these models of microbial community turnover or even plant community turnover to make more informed decisions about agriculture and pesticide use and how we're going to drive the changes that we wanna see in industrialized agriculture as we move forward. We also, it has implications for thinking about forensics. Obviously there are complex mixtures when you go to a crime scene. So how can we develop the tools that we need to be able to disentangle these complex mixtures of human DNA and microbial DNA to better be able to use this information to solve crimes. And that means both to catch the bad guy and also to let the guy who's not the bad guy be proven not to be the bad guy. And of course there are also implications for conservation. And here I like to use the picture of Oliver Ryder here who's in the audience and the San Diego frozen zoo that's been doing so much to be able to collect and preserve the organisms that we hope to be able to use as we move forward here. And they're not only thinking about how to collect and archive and store and share these data, but also how can we actually move these samples and information across international borders and how can we really reward people from different places for the work that they've done, the data that they've collected and even maybe intellectual property that they own from these samples. And these are all big outstanding questions that we as a community need to address as we move forward. So as we look to the future of comparative genomics and we think about where we're going and what we want to do, I think what we've learned today and what we can think about if we think from this big broad perspective is we need to be able to connect lots of different people with different expertise, people who are good at the assembly stuff and people who are great at putting together these alignments, Joel Armstrong who built the cactus tree for the 250 mammals is finished his PhD last week and he's leaving and there's no one else who can run cactus and all of us are panicking. We all want to be able to add taxa to this alignment but it's hard. This is a very complicated, expensive, costly challenge and it's something that we need to think about in this infrastructure. How can we collect consistent data? We talked a lot about making genomes available, publicly available, but then the data quality varies and the assembly quality varies and the metrics vary. So how can we make sure that the data that we're collecting and making available are actually comparable with each other? Talked about batch effects that Melissa brought up yesterday. And finally, we really do need to think about storage and compute and we didn't really talk about that before but this is, as we get a lot of big data sets and bigger genomes, this is something that's gonna be increasingly important. So as I finish, I think that I just want to leave us here as we move forward and move into our discussion groups encourage us to just think about the data and some of these other challenges but really motivate our discussions by thinking about the biology. How will all of these new tools and resources change the types of questions that we can ask? How will it change our research agenda? And this is the next hard thing that isn't solved but will be and with that, just want to say thank you. Thank my lab group and you for paying attention to me. I finished on time. There's actually time for questions, so I'm sure. Steve. Very nice, Beth. When I looked at that pattern of the Puma, what I was struck by was that the Florida Panther, two individuals had a different distribution of sets of homozygosity and heterozygosity and what that indicated to me was that perhaps that the diversity, if you look at the Florida thing, like if you look at the left side of that chromosome, there's 20% of the region that is heterozygous in one animal but not in the other. So if we looked at 10 animals in Florida, which I always thought were pretty much homozygous at all the same regions, you might be able to reconstruct most of the diversity that was in the ancestral one. Have you thought about that or attempted it because it's a product of how long ago the bottleneck was and the Florida Panther, of course, had several. The founder event of the North America plus the recent depredation. Yeah, we would like to do this and we have requested additional samples from Steve O'Brien to be able to reconstruct these. How about I give them to you? I think you're right, it's interesting. You actually gave us three samples and I didn't show the third one here because the third sample that we have was one that had been admixt with an individual who was released from Central America into this population unbeknownst originally. Everglades, yeah, Everglades samples. The Everglades samples, right. And it does look different. It actually has a level of heterozygosity in parts of the genome that are similar to the Brazil individuals but it also has very long tracks of homozygosity and what's interesting to me about that sample is we know approximately how many generations ago the inbreeding happened because we can estimate that from the homozygous track lengths and we see that even though it was an outbreeding event it didn't take very much time for it to lose diversity across much of its genome. And so what it says to me is that if we're going to use translocations and reintroductions and really small isolated populations as a method of keeping their diversity good we can't just do it once. We have to keep doing it. We have to keep putting in that diversity otherwise it'll just be lost as they continue to inbreed. But I would like to look at more of both those slightly outbred and the inbred Florida panthers because I think this is a really fascinating population to study these dynamics. So tell Warren. Thank you. Yes. Coming back to the panther which is an interesting pattern I was wondering whether you know that the homozygous is actually involved the same haplotype. In other words this could be two sub-breeds of panthers that are homozygous in their own way. Right. We did ask that question. That's not represented here but it is possible to know that from the data because we have the genomes aligned to each other. I don't know the answer here but certainly in a lot of places it is the same haplotype but that is important to know. The population size. Steve what was the effective population size of this population? Do you know how big this population was before the reintroduction? Before the reintroduction? Yes, about 30 individuals of which half of them were geriatric like the people in Florida. What's also nice about this from this perspective is that we also have the pedigree. So we know how they're all related to each other. Exactly. It's a really powerful dataset for studying the dynamics of... We also know exactly when the intergression from the Costa Rican PETA and the Everglades took place because we have a record of the introduction dates in the 60s. Right. It's cool. It's all in Steve's book. It's in my book. A question to link yesterday's keynote from Harris with yours. Just open it up. So if we were to think about sequencing the genomes of 1.5 million species which of course we all are aware there's a diversity within a species. So maybe that 1.5 million might be 100 million. Could we just chat a little bit about that between you and Harris and among us here? The scale of what would be necessary, not just one species representing one species. Yeah. Well, I think that one of the fundamental issues to be resolved is how to go about doing this. I mean, you have to start somewhere. Eric and Harris and others from Genome 10K including myself, we've gone about this as we would really like to try to get as broad of a taxonomic diversity as possible to start with. I think the answer though will always be the species that get sequenced whether it's a single species or a population are gonna be those that get funding. And so we can make all the recommendations and comments and decisions we want to but until somebody stumps up the cash to do it we can't really come up with any firm plans. Obviously we're not gonna next week sequence 100 million individuals. We don't have any of the capacity to do that. We think, Harris. We do have the California project now and that project will include a lot of population sequencing. Yes. For at least maybe 100 individuals from each species. Yes, so this is Brad Schaffer's led a program in California. He's got a special line item on the state budget to forward conservation genomics in California and the plan of this group, which we'll convene in a few weeks, is to have one high coverage quality genome. One reference, we'll still need the reference. Yes, and then up to 100 resequence genomes from other. Well, it depends. Brad wants to do all salamanders, so, you know. Yeah. So this is important and I'll just add a comment in the EBP budget. We did have population sequencing for all threatened endangered species and included in the budget. That's cool. It's in there. All these clapping. But not 100 million species. Yes. Harris. Hey, I just wanted to suggest actually a particular scope that we've been thinking about for that, which is that for many endangered species and the most critically endangered species, there's, you know, 500, you know, fewer than 500, fewer than 1,000 individuals. It's actually very, very reasonable we've started some projects along these lines to go and do essentially species sequencing for an endangered species and just say, look, we're just going to have records of every individual. Obviously, for species that are extremely common, that's totally impractical. But for species that are endangered, the notion of doing species seek is extremely practical right now if you can use appropriate types of materials as your input. Yeah, I mean that's, and I think that final point is important. I mean, when we think about going to an endangered species and grabbing every one of them and taking a sample for genome sequencing, we also have to think about the potential consequences of taking that sample and disturbing the individual. Yeah, to be clear, you absolutely can't do that. And so, yeah, it's critical to not do that design. All right. Thank you. One more, you want to say one more? Okay, sir. Hi, Beth, thank you. So I want to get your insight about when you have genomes from, well, extinct species and you want to have their genome sequence, but you, the closest reference, present-day reference is very, very distant. So what are your current insights about how to actually reconstruct extinct genomes? Yeah, that, your point is good, Maria's point is good, that's very hard if your extinct species has a very distantly related living counterpart. How are you actually gonna do it? And the way that we've been doing it until now is just to kind of do the best we can. We do see that it matters. We have a dodo genome sequencing project and when we first had the dodo DNA, the closest relative that we had was the rock pigeon. And we found that the sample had, we estimated about 20% endogenous DNA, 20% DNA that mapped to the pigeon reference genome and the rest of it couldn't map. You assume that it's something else. But then we sequenced and assembled a Nicobar pigeon genome which is closer to the dodo by about 50 million years. And we found that the sample actually has 42% endogenous DNA so it's clear that having a closer reference genome is critical. I mean, the best that we can do, what we do now even for the things that are relatively close is we'll map the ancient DNA to the closest reference genome and then you can run like pylon and try to iteratively move away from what your reference genome was. But then you have to come up with clever ways of breaking that genome to see when there's been rearrangements. Ollie? Just back to the runs of homozygosity issue. I think this can be, because it gives a bigger signal, it's more sensitive than kind of looking at loss of heterozygosity. I think it may have a great predictive value for looking, great value for looking at a historical demography of populations, timing, bottleneck events, and also for a huge number of taxa that we don't know their current status inferring what their genetic load is. And to the thought that this can be done, you can do a kind of an extinction risk analysis by sequencing a few genomes, could be a remarkable sea change in predicting extinction risk. So I think that's a perfect point to move us on to the discussion section. So another round of applause for you. That was wonderful, thank you so much. And so we'll take a few minutes to reassort. The population genomics is in here. Reproductive and developmental genomics is in the room that sort of backs to this. So you just go around in either direction and get there. In Glen Echo, bring your name tags and we'll start the discussion sections. Thank you.