 We'll go ahead and get started on time here, and I'll have a few minute introduction while the stragglers walk in. Really, really excited to welcome Dr. Scott Edwards to NIH and NHGRI today. He's joining us from Harvard University, where he's a chair professor of Organismic and Evolutionary Biology. And according to his intro slide, also the department chair, so juggling many hats. Also the curator of ornithology at the Museum of Comparative Zoology, which I assume is a Harvard institution. Originally, he received his BA in biology from Harvard some time ago. And so he ended up back at his original stomping grounds. But in between, a PhD at Berkeley, a Sloan Fellowship at University of Florida, a couple visiting fellowships around the world in between before eventually winding back up at Harvard for the past 20 years or so. And not just an excellent researcher, but also an excellent educator. And Diana Proctor just mentioned to me that he has some pop-gen lectures that are on YouTube that are some of the best lectures Diana has seen on the subject. So if you'd like a good YouTube lecture taught to you by an expert, check out Dr. Edwards' videos on YouTube. His CV is lengthy, I will say. He's a fellow of every society that you've probably ever heard of, including the American Academy of Arts and Sciences, AAAS, the National Academy of Sciences, and so on, a member of many board of directors, including the really wonderful Cornell Laboratory of Ornithology, which I've had a chance to visit there at Research Station, a really cool place, Massachusetts Audubon Society, and the Adventure Cycling Association, just to throw an oddball in the mix. If you had the chance to visit his homepage in anticipation of the lecture, you would have seen up on the splash page of his homepage as a picture of him while he was completing a solo bike journey across the United States in 2020. Perhaps running away from COVID that summer, I remember that summer well, but also raising awareness for the Black Lives Matter movement and Blackbirders Week, totaling 3,800 miles, 76 days, and a fair bit of media coverage on the way. So suffice to say a little bit of an adventurous spirit in Dr. Edwards. And in addition to that, looking on his CV, he's traveled all around the world through the years pursuing birds, doing fieldwork, collecting samples, really, really clear evidence of a life well-lived in his career as a researcher and educator at Harvard. And so looking through that inspired me a little bit to get out from behind my desk a little bit more often, because it seems every few years or so you have a fun adventure. We should all strive to do that. So I'm thrilled that today he is stopped by our neck of the woods to give us a talk on something that I care deeply about. And that's discovering the functional connections between genotype and phenotype using comparative genomics approach. So with that, let's all welcome Dr. Edwards to NHGRI. Thanks very much, Adam. And thanks to everyone who organized my visit today, Monica, and all the great scientists I've met this morning. It's a real honor to speak to you here today in this beautiful auditorium. Yeah, I thought I would tell you about some of the work we've been doing over the past five, 10 years, trying to bring together comparative genomics and phenotypic evolution. And a bunch of the second half of the talk was stuff that was funded through an R01 from NHGRI. So I'm very, very excited about that. Just to introduce you, this is my lab group. And I came into biology through ornithology. I've been a bird watcher since I was probably 10 years old, growing up in Riverdale in the Bronx in New York. And anyway, it's great to be able to educate students coming from often-organizable backgrounds, but sometimes computer science or other backgrounds. And we study a bunch of different fields. Today, we'll focus mostly on the second to the last two squares up there compared to genomics and phylogenetic methods. And I'm very fortunate to work in a world-class museum. Museums are great centers for education and training. And it's a very interesting time now because we're seeing museums try to navigate how they can be most relevant to 21st century biology. And a lot of that dovetails with genomics. And so it's been very exciting to try to modernize, in my case, the ornithology collection, but there are lots of other collections. And so anyway, if you're ever around Cambridge, I'd be happy to show you our collection. And so it was maybe five or six years ago, a colleague and I were at a meeting of the Society for the Study of Evolution. And we had just gone to a symposium talking about the uses of phylogenies. Why do we need phylogenies? And most of the symposium focused on many of the sort of ecological spinoffs that phylogenies bring, learning about biogeography, ecology, adaptive radiation. And we were all struck, at least among this community, how little intersection there was between phylogenies and comparative genomics, or genome evolution. And we ended up writing this short perspective piece and thinking about sort of trying to make our community aware of some of the potential avenues. And there were, even at that time, some really compelling examples using phylogenies to understand the genomic basis of phenotypic evolution. I'll talk about the first two examples on this list, but there have been many others. And I think it's a very exciting time right now because we're seeing lots of different approaches coming to bear on this question. And the reason why it's important is because we can't do traditional genetics on most of biodiversity. We can't do traditional crosses in the lab. And we can't set up the traditional way of allowing the phenotype to segregate along with the genetic markers, which is what you see on that first column there in this slide. What we can do, however, with the other 99% of biodiversity is look for clades that have a phenotype popping in and out or turning on and off. And we can then associate changes in the genome with changes of the phenotype on the tree. And I'll show you some ways in which we're trying to build models to do just that. And then just like standard genetics, we can then use functional tests to ask whether the candidate genes or regulatory elements that we've identified are fulfilling the role underlying the phenotype that we might hypothesize. And so it's an exciting venture because we're bringing in lots of important clades, important phenotypes that otherwise would be not amenable to traditional genetic analysis. And there have been lots of different ways people have tackled this. And so one group doing a lot of this sort of work is Gil Begerano's group at Stanford. And they've had a couple examples where they're looking for a correspondence between a particular phenotype, which you see here in the green check marks. Those are trait present and trait absent. And the presence or deletion of either genes or non-coding elements. So these red areas here are deleted. Regulatory elements or genes. And that's proven very powerful. Some surprisingly specific associations between loss of a gene or regulatory element and inability or new ability to perform a particular biochemical reaction or have a particular phenotypic trait. Another major mode of connecting genotype and phenotype has been looking for changes in the rate of evolution across a tree. And so you might have a particular set of target lineages that you're interested in. In this case, and you'll see more of this, these are some birds, some of which are flightless. And we can look for accelerations of either genes or regulatory elements, specifically on lineages with the target phenotype. And this is, I think, another way of connecting these two and has certain advantages and disadvantages. But there's a myriad of ways in which people have tried to connect these two. And so our models tend to fall into this latter category. A lot of the way in which we've approached this question has been driven by one of my graduate mentors, Alan Wilson. He was a famous molecular evolutionist at Berkeley where I got my PhD. And he and Mary Claire King wrote this very important paper back in 1975 in which they cataloged the extent of protein divergence between humans and chimpanzees. This is using very primitive databases and cobbling numbers together from very diverse sources. And yet they came up with a number, namely about 1% protein divergence, 1% or 2%. That still holds and has stood the test of time throughout all of our era of genome sequencing. And Alan had this amazing ability to turn lead into gold. A result that a graduate student might just think is just the death knell to his or her dissertation, he twists around. And in this case, one can imagine that finding a lot of differences would have been the exciting result. But it was this lack of differences at the protein level, which inspired Alan and Mary Claire King to basically say, well, it must be regulatory evolution. And so in the different examples I'll give, you'll see a few of each type of case where proteins seem to be driving the phenotype versus non-coding regulatory regions. And so the first example I can mention involves a question that a graduate student in my lab, Maude Baldwin, brought to my attention. She was fascinated with the way in which organisms sense their environment, particularly in terms of taste. And so we humans and most other vertebrates have a very simple multi-gene family of taste receptors, which are expressed on our tongue and the inside of our mouths. There's an umami variety. These are two heterodimers, as well as a sweet taste dimer. And there have been a number of natural experiments in which lineages have essentially lost the ability to taste a certain repertoire because of gene deletion or pseudogenization. And so for example, pandas, whose diet consists almost exclusively of bamboo, have lost their umami receptor. The umami is essentially savory or amino acids. And so not being meat eaters, Darwin's adage that if you don't use it, you lose it. Pandas have essentially lost their gene for the umami receptors. Similarly, cats, having evolved from a highly predatory meat eating lineage, have lost their sweet taste receptor. Despite their pension for milk, they nonetheless don't have a sweet taste receptor. And these sorts of natural experiments are a very nice context in which to connect genotype and phenotype. Now, birds, which has been our interest for a long time, also evolved from a highly carnivorous lineage of theropod dinosaurs. And so we now know that it's not a stretch to claim t-rex as a distant ancestor of modern birds, and of course, archaeopteryx and other extinct lineages all having evolved from a highly carnivorous ancestor. And that begs the question, how is it that birds such as hummingbirds and other nectariver species, how do they sense sweet and carbohydrates? Because we know from looking at their genomes that they have lost the sweet taste receptor. They don't have it. And so the question is, how are they able to sense carbohydrates in the same way that we mammals can do with our sweet taste receptor? And so Maude embarked on these questions. I was one of those dissertation proposals that I was highly skeptical of and did everything I could to dissuade her from going in this direction. But she persisted. And basically, through a lot of trial and error as many dissertations go, she was able to find a lab in Japan that actually was one of the few on the planet that did functional testing of taste receptors. Happened to be sort of a satellite lab for the Kikkomon soy sauce company. And they knew everything about seeing what it was that bound taste receptors and identifying those compounds. And so she was able to express the umami receptor. The umami receptor. That was her first choice for a candidate gene for being able to taste sweet. She was able to express it in mammalian cells, both in hummingbirds and in their sister group, the swips, which we know can't cannot taste sweet, as well as an outgroup chicken, which also cannot distinguish plain water from sweet water. And she was able to show that through a series of amino acid substitutions, which are indicated by those red stars there, hummingbirds have essentially morphed their umami receptor into a sweet taste receptor. And it was, I think, a dramatic example of showing how sort of rejecting Alan Wilson's hypothesis that changes in the sequence of a protein could dramatically change the functionality of an organism and was likely a major driver for the radiation of hummingbirds, which now number over 300 species. And so following up on that work, she was interested in looking at additional instances of the evolution of neck delivery and sweet taste perception in birds. And so she went to the songbirds, which are basically comprised about half of all birds and are a really impressive radiation of species. And on this slide, you simply see all of the documented evidence for sweet taste perception, either through neck delivery or fruit delivery, quite all gleaned from the literature and quite a large number of species in many lineages from things like sunbirds and honey eaters in Australia, many different repeated evolutions of the ability to taste sweet. And so she wanted to ask, are they using the same mechanisms as the hummingbirds were? And so she was able, with collaboration again with folks in Japan, to basically create these heterologous dimers. So you see on the top there the hummingbird umami receptor in orange and the honey eater umami receptor in red. And again, hummingbirds in the New World, honey eaters in Australia and Southeast Asia. And she was able to show that when you, the native dimer, which you see here, oops, sorry, both orange and both red, can provide a robust response when interrogated with a carbohydrate substrate, whereas the heterologous dimer, consisting of both orange and red, one of those two taste receptor domains, could not. And so this suggested that maybe there was some co-evolution between these two parts of the heterodimer. And then sort of screening a more phylogenetically close subset, what we've got here are the honey eater dimer on the top here in gray and red. And then on this axis here, you've got a series of different songbirds and members of the essentially perching birds, which are the larger clades. So songbirds are these four down here. Perching birds are the ones, the larger clade at the top. And what she showed was, as you can see, there's with these heterologous dimers between a honey eater domain and these other species domains, not much of a response until you get to the base of the songbirds. This brown tree creeper, which is an Australian species, has a modest response with one of the two heterodimers. And all of the other songbirds show a quite a robust response. And this suggested that it was really the very deep origin of the second origination of the ability to taste sweet and to detect carbohydrates, which evolved very deep in the songbird lineage and wasn't, in fact, present among those top two species, which are, again, part of the perching birds and part of a subclade known as the subocenes, mostly in South America. And so showing an independent evolution from the hummingbirds and a really interesting example of functional co-evolution between these two parts of the taste receptor. Now, for a lot of the work that we've been doing recently, we've wanted to expand into the non-coding region of the genome. Just seen two examples where it's really the protein sequence evolution that's driving this new adaptation. But we know that genomes are mostly comprised of non-coding DNA. And so we looked to some classical theory in molecular evolution, specifically this so-called neutral theory of molecular evolution invented by Motu Komura and his students, which suggests that functionally less important genes or molecules will tend to evolve faster, develop a faster substitution rate. And extending that to the genome scale, we could imagine that functionally less important parts of genomes will evolve faster, have a higher substitution rate. And this is sort of the paradigm that we've begun to develop a series of models that hopefully will be of use to the community in linking rates of genome evolution and changes in association with phenotypic traits. And so we've called this group of models PhiloACC. And this is, it stands for phylogenetic accelerations. And in these models, we've basically posited a series of target lineages whose phenotypes we're interested in. And we have focused most of our attention on detecting and measuring the extent to which a given gene or non-coding region will accelerate specifically on those target lineages. So here you see a binary trait, plus or minus. And we might have a series of target lineages here. Let's find regions of the genome that are accelerating specifically on those lineages. Now that's for a binary trait. We've also developed some models which complicate the simple model in PhiloACC just a little bit. I'll go into this in just a second. Here we're relaxing the assumption that the gene tree, the tree of any given gene, is exactly the same as the tree of the species, since we know that we can have a lot of stochastic variation as genes travel through phylogeny. And then finally, we've developed a method called PhiloACCC for continuous traits. Here we're trying to associate rates of evolution with variation in a continuous trait, like longevity or body size or something like that. So I'll walk you through some of the basics of the PhiloACCC model. And then we'll talk about some of the other models. And then we'll apply them to some comparative genomics data sets. So the basic model is one in which we identify a set of target lineages based on some criteria. We're actually not modeling the phenotype specifically. We're simply identifying either target or non-target lineages. In our case, these turned out to be birds that could fly and birds that could not fly. And we imagine a shift in the so-called conservation state of every branch on this tree. And what that means is that for any gene or region of the genome, some branches, it'll evolve very slowly, be very conserved, other branches that will have accelerated. And so we can imagine a gene being in either the background state on a given branch, sort of a neutral state, a conserved state, or an accelerated state. And once we've applied those states, we can then go on to actually estimate the actual rates conditional on that state. And then finally, we can apply two Bayes factor tests. The goal here is to distinguish scenarios in which we have acceleration across the entire clade, including both target and non-target species, versus acceleration just in the target clades. Turns out that's a difficult problem. It's difficult to distinguish genes that accelerate only in these target lineages, especially when they're convergent, when they've accelerated in multiple unrelated lineages. Telling the difference between that and accelerating across the entire clade is very challenging. And that's what these two Bayes factors do, basically contrasting models where the element can accelerate anywhere on the tree, versus nowhere on the tree, versus on specific lineages on the tree. So those three scenarios. That's what M1, M0, and M2 are. So that's the basic PhiloACC model. However, that model basically assumes that every gene in the genome, or every gene that you want to analyze, has the same tree topology as the species tree. And we know now that we've been able to look at many genes across PhiloACC that that isn't true. Much of the reason why genes depart from the species tree is basic population genetics and an appreciation that if you have two speciation events in rapid succession, where this time between these two speciation events is very short, it's entirely expected that some genes will not have come to fixation before the next speciation event. And so now the challenge is when you have a gene tree, such as here, where you can see how these bold lines are, actually sorry, it's this one, where the bold lines are different from the species tree. You can see here we have species 1 and 2 are most closely related, whereas at the gene level, genes 2 and 3 have a more recent common ancestor. In this tree, they're the same topology, albeit different branch legs. But we know that when this happens, that can provide a false signal of acceleration in certain lineages. So it's important to try to distinguish these scenarios. So we call this other model phyloaccgt for gene tree. It's actually now published in molecular biology and evolution. And you can see here under three scenarios of a single acceleration or two accelerations or three independent accelerations, phyloacgt does a somewhat better job than when we assume that all genes follow the species tree exactly. This is simply a measure of performance and how many times it got the right answer based on simulations. So the downside is that phyloacgt is computationally quite a bit more expensive. But it's nice to have these two possibilities. There may be situations where we have no good reason to think that genes are varying a lot across the tree, in which case we could legitimately use the standard phyloacgt model. And then the last model I'll mention is one developed by postdoc Patrick Gemel, who is interested in connecting regions of the genome and their rate of evolution with variation in a continuous trait, such as longevity. Here's a data set from Nathan Price and others looking at mammalian longevity. And you can see here that some lineages, this is sort of the body size longevity regression. You can see some lineages such as humans and bats have a longer than expected longevity given their body size. And the trick is to find regions of the genome that can associate with this continuously varying trait. And what we've done is to take, so here's the same scenario. We have a tree, we have a trait, and here's an alignment. And these red lineages here are lineages on which we see a particular non-coding region on chromosome 5 of humans showing accelerations. And so the question is, do these accelerations are they associated with, in this case, increases in lifespan? In this case of the shrews up here, it actually looks like it's the opposite. You can see the long branch of the shrew and the lower than average lifespan. By contrast, I don't know if you can read the font, but we've got a killer whale down here, which is also accelerating, but is associated with a longer lifespan. So we're trying to parse out all these signals. And what you can see here on this panel is down here we have the particular branch of the tree that we're looking at for this particular element. The y-axis is the trait change, and the color of the element here, red or blue, indicates how accelerated it is. And so we're seeing some cases, such as this killer whale, that are experiencing a lot of acceleration and have quite a long lifespan given their body size. And that's the sort of signal we want to pull out. And yeah, this is currently in review, and we're excited about its applicability to a wider number of traits. Patrick has also tried to extend the relationship between the trait and the genome segment to be a bit more flexible. And in particular, now we have ways to where the rate of evolution of the gene may be positively associated with the continuous trait, may be negatively associated, or may show a nonlinear relationship to the evolution of the trait. And so this is sort of a null hypothesis where we're seeing changes. We're seeing phenotypic sort of neutral evolution in the phenotypic state. These gray triangles are meant to indicate the variance accumulating in the phenotypic state along that lineage. But sort of variation in the rate of the underlying genes. And so in our model, we have a Brownian motion model, which means that the variance is accumulating linearly with time in the trait. And but the null hypothesis here is that the trait is changing at a constant rate, but the underlying genetic elements are changing at variable rates. This is an example of no association between the rates of evolution of the gene and the constant phenotypic rates in the phenotype. However, we can imagine a situation where we're seeing both faster evolution of a given gene or a non-coding element and faster change at the phenotypic level. And basically, the model tries to estimate parameters of three variables, A, H, and B, which describe the form of the relationship between the genomic element rates and the phenotypic rates. This is still very much in development, but we'd love for folks to try it out and see whether it yields some insight into their phenotypes. So let's look at some examples. This is an example by another postdoc in the lab, Subir Shakhya. And he is like me an ornithologist, and so he's looked at birds for the phenotypic data. There's amazing data sets now for birds for a variety of phenotypic traits, basically for all 10,000 species. And so the challenge that we're finding actually is intersecting the genomic data and the phenotypic data, because we basically need both in order to run these models. So in this case, we looked at a trait Tarsus length. Tarsus is a very easy thing to measure, both in the field and in a museum specimen. And we basically have Tarsus length for all 10,000 species of birds, and so it seemed like a fun trait to look at. And so Subir ran a few models to understand where the big shifts in Tarsus length had taken place on the avian tree. We were actually able to put together about 5,000 species and ask just from the phenotypic level, where are we seeing shifts in Tarsus length on the tree? Here are four clades, the penguins, kingfishers, swallows, and a lesser known group called the bull bulls from Asia, where we see rapid shifts to short Tarsus length. This is just one of the terminal hind limb bones. So here's a quiz for all of you. What bird group out there should be on this slide, which has a very short Tarsus but isn't? Can anyone think of a bird that just basically has no legs? You've got them here in the summertime, so at least one species. Hummingbirds again, hummingbirds. Hummingbirds are highly aerial, and they can perch, but they don't do a lot perching. And same with swallows, highly aerial, very short legs. The reason why we missed hummingbirds is because we actually had to divide the data set up into smaller chunks to test these patterns. We used a Brownian motion model developed by Luke Harmon. And so as a result, we picked out these four clades. And there's also some very interesting examples of evolution of long Tarsus length, but it turns out that in our analysis, the number of transitions to long Tarsus was smaller than the number of transitions to short Tarsus. So we thought that short Tarsus would be a little more powerful. And so we then started to look at the genomic data, and we turned to a group of loci known as conserved non-exonic elements. These are widespread in the human genome and other genomes. They often tend to be enhancers, and there's many, many thousands of them around the genome. And so here's an example of a conserved element that is not in an exon or a coding region, but is nonetheless highly conserved. And principles of molecular evolution tell us that this region is doing something important, even though it's not coding for a protein. And these have proven very useful to look at across the genome. And so Subir put together a fairly large collection of these non-coding regions using previous data sets, as well as some new data that we had. And basically put a collection of over 900,000 of these elements. They range in size from just a few tens of base pairs to a few thousand base pairs. And you can also see we've been able to intersect them with various data sets, including some epigenetic data sets from attack-seq data, which I'll explain a bit more later. But you can see that some of them are found in exons. These are just conserved regions, whereas most of them are non-coding. Most are either intronic or intergenic. And so these are likely to be putative regulators. So when we run FiloACC on a subset of these data and target these short Tarsus lineages, we come up with a bunch of different subsets of interest. So on the x-axis, you basically see the loadings of each gene on the axis where we're testing no acceleration versus acceleration in specific target lineages. That's the first base factor test. And then on the y-axis, we see the second test where we're testing elements accelerating in the target lineages versus those in an unrestricted model where they can accelerate on any lineage. And we're interested specifically in subsets along this red axis. And in this case, we found about 8,000 elements that were accelerated in all four short Tarsus lineages, about half of which were in attack-seq peaks from chicken data from the hind limb. We also found 347 elements where the target lineage model, just restricting to those four lineages, was actually better than an unrestricted model. And that means that these elements might be specifically involved with short Tarsus in those four target lineages. And so anyway, this is one way in which we can sort of narrow down a fairly daunting set of large numbers of elements to interrogate. We didn't do any functional work in this study, but we basically identified a suite of candidates that we can then look at in a more experimental fashion. This is just showing some of the sharing of accelerations across the different lineages. You can see that many of the accelerations are lineage-specific. So an element only accelerating in kingfishers, only accelerating in swallows, for example. But we do see a small number of shared accelerations across lineages. And intriguingly, we find that many of these elements are accelerating and occurring near genes that are involved with known pathways for limb development. Limbs are fairly well understood in terms of their regulatory network. And here you basically see, for any given gene, you can see the lineages, BK, P, or S, where an element is accelerated in the vicinity of that gene. And I think the take-home message is that we're seeing a situation where we're not seeing conversions necessarily at the level of an individual element next to an important developmental gene. We're seeing the same networks being affected. Maybe different genes within the same network have regulators nearby that are accelerating. And this, I think, is an interesting model to think about and one that we need to do more work to actually characterize are the same networks of genes being hit by acceleration or are they exactly the same genes? That's still an outstanding question. So this was exciting. And it seemed that we were identifying non-coding regions that legitimately could be said to have a role in limb development, known from other studies. And as I mentioned, I mean, one of the big challenges now is to really link genomic databases and phenotypic databases, because we're not seeing much overlap there. And one of the things we've been doing in the museum is doing a lot of surface scanning of skeletal elements in birds, because although we have external measurements like Tarsus length for all birds, we don't have detailed information on the anatomy or a skeleton of different species. So this has been a lot of fun. OK, I'm going to end with a final example. Again, looking at rates of evolution, this time for a binary trait, namely loss of flight in birds. And if I ever met a genetically intractable group of birds, this is it. You can't breed ostriches and emus in the lab very easily. This was the so-called paleognathy are a really iconic group of birds. They are found on all the continents, mostly in the southern hemisphere. All of them, including the ostrich from Africa, the emus, and cassowaries from Australia, the reas from South America, and the kiwis from New Zealand, these are all flightless. The one group of paleognates that actually can fly is indicated by this species here. This is a tinamoo, which belongs to a group of neotropical species, about 50 species, which can fly. They're not long-distance migrants or anything, but they can't soar. But they can definitely get off the ground when to escape predators. And so now, for a long time, folks thought that this tinamoo was the out group to all of the flightless species. And this was very convenient because it suggested a single loss of flight with subsequent spreading across the southern continents by continental drift. Very convenient pattern, which turns out not to be the case, but which is actually a boon for comparative genomics. Just to remind you that loss of flight entails a lot of phenotypic changes. You often lose the keel, which is this big flange of bone that we see on flighted species, completely lost in the sternum of these flightless birds. That's why they're sometimes called ratites because their sternum is like a raft. We see big changes in body size. And we see variable loss of forelimb elements. You know that you can see the wings slightly on an ostrich, although they're much too reduced to have it fly. Some flightless birds, such as moas, the extinct group from New Zealand has completely lost their forelimb elements. There's never been a humerus, radius, or ulna found for a moa species. So we see very intriguing differential loss of forelimb elements. Now, as I mentioned, the tinnimus were originally thought to fall at the base of this tree. And that was very convenient in suggesting a single loss of flight in the flightless lineages. We now know, not so much from our work, but from on work done back in 2008, where they relaxed this assumption of forcing the tinnimus to be the so-called outgroup. We now know that that's not true. And that tinnimus, in fact, fall right in the middle of this flightless radiation. It's one of these results that, as an ornithologist, I was completely skeptical of and downloaded the data, re-analyzed it. But I've come around. And in fact, our genome-wide data, this is based on whole genomes, all short read data, but nonetheless a large number of markers, strongly put the tinnimus right in the middle of that flightless radiation. You can see here in blue the different losses of flight, putative losses of flight. We don't know exactly how many there were. The biogeography of this group, which are indicated by these continents here, is very complex and likely involves a lot of extinction. It's gonna be basically difficult to reconstruct the biogeography of this group. And that's why the number of losses of flight is uncertain. But what we are certain about is flight was likely lost more than once. It's much more difficult to lose flight at the base and then regain it, for example, in the tinnimus. And that's another scenario that would allow the tinnimus to fly while all the other species are flightless. But we don't think that's the case basically based on developmental grounds. It's very, the number of origins of flight across the animal world is very small because it requires a huge number of changes. By contrast, losing flight is relatively common, even within birds. Okay, so we'll look again at these non-coding regions. We found about 280,000 of them and we applied PhiloACC to these data. And just a couple of neat examples. We actually inherited a draft genome of an extinct moa put together by an ornithologist named Robert, I'm sorry, Alan Baker at the Royal Ontario Museum. Sadly, he passed away, but we were able to use the genome that he sequenced in these studies. And here's an example of a non-coding element showing acceleration specifically in moa and these reas. This is after 100 million years of being conserved across all other birds. And you can see how this might, it's plausible that this element could have played a role in loss of flight after being conserved for so long. Here's some additional examples. Again, we're able to place each branch in a rate category indicated by its color. And then we can estimate the rates of the element in those categories. And about almost 1% of the elements that we investigated were accelerated in at least one flightless lineage. And so we're able to have some sample size to work with. Now, we asked what genes are in the vicinity of these so-called rat tight accelerated elements. And what you see on the X axis here is the number of rat tight accelerated elements for a given gene. And on the Y axis, the total number of elements near a gene. And these genes in red here are those which have a disproportionately large number of punitive regulators that are accelerating in flightless lineages. And what was exciting was that we saw a lot of overlap with previous work. For example, genes like TBX5, there's some Hawks genes in here, DAC1, these were all genes that had already been studied by developmental biologists in the connection with limb development. The other take home message is that when we look at the small number of proteins that are accelerating specifically on flightless lineages, we find a modest number, maybe 200 or 250, but the functional coherence of that group of proteins was not high. We don't get any strong gene ontology enrichments. Whereas for the non-coding elements, when we look at go terms for the genes that are close by, we see very strong signals for skeletal development, cell proliferation, things like that. And so we feel that in this case, the non-coding elements, the regulatory landscape, is perhaps giving a stronger signal for underpinning loss of flight than the proteins themselves. So this is perhaps a point in favor of Alan Wilson's and Mary Claire King's hypothesis. Okay, just to end, we can show that some of the genes that are in the vicinity of these rat tight accelerated elements are indeed expressed in the developing limb buds. These are chicken embryos where we've looked at expression of particular genes with accelerated elements nearby. We can also use approaches like a tax seek. I think many of you here are probably familiar with the tax seek. It's a method whereby we can ask what parts of the genome are open and unwound from nucleosomes and available for gene transcription and binding by transcription factors. And we performed a tax seek on both the foreign high limb of chicken as well as a variety of other parts of the skeleton. What this shows here is the enrichment of the non-coating accelerated elements in open chromatin of these different structures related to loss of flight. So it was very exciting to see that during development these accelerated regions are over represented in areas of open chromatin in the foreign hind limb as well as in the keel and other parts of the skeletal apparatus. And so we can imagine intersecting the rate acceleration data, the attack seek data, as well as other epigenetic data, again to narrow down a very daunting set of candidates. Now we would not claim that these 42 non-coating elements are underlying loss of flight, but they certainly provide a very tractable set that we can then look at functionally. And that's what we have done in the early experiments. We would sort of very laboriously take individual putative enhancers and put them into a construct where we can ask whether they're driving gene expression. So in this case, we have taken the chicken version of an enhancer that's accelerated in flightless birds and shown that we can inject it. That's what the red fluorescence means and that it drives gene expression. That's what the green fluorescence means. We take the tinimum version. That's the volant paleoignate show that it too can drive gene expression. By contrast, the RIA version, which has been accelerated, has been injected but does not show evidence of driving gene expression. And so what this shows is that this acceleration has facilitated or been associated with a change in function of this element, either loss of function or a shift to a new function. We're doing a lot of comparative transcriptomics and a comparative attack seek. This is just stuff that we're currently analyzing now, but I wanna end mentioning some really neat experiments we're conducting with Emma Farley, who's a developmental biologist at UC San Diego. She has developed a high throughput enhancer screening assay. So instead of laboriously taking one enhancer at a time and asking whether it can drive gene expression, we can inject hundreds at a time. And what we do is we attach a barcode to each enhancer. That's what BC is here. We can then harvest the mRNA from the developing limb bud or other structure and sequence it and find out which of the enhancers that were present in the original cocktail has driven gene expression. And this has allowed us to dramatically scale up and in our first experiment, we've looked at about 1,000 different enhancers, putative enhancers. And so the data look like this. Here we have the amount of DNA for each enhancer. Each dot here is a different enhancer. This is the amount of input in blue across these replicate structures. The red is the amount of mRNA driven by each enhancer. So you can see that there's differential activity of enhancers in these different structures and that it is fairly replicated across replicates. And what this has allowed us to do is to identify enhancers that appear to be associated only with forelimb development, such as these here, where here we have a measure of enhancer activity as read by RNA readout. These are enhancers active primarily in the forelimb, higher readout of mRNA than the high lip. Here are bifunctional enhancers and here are enhancers more active in the hind lip in orange. And so through comparative genomics, we've been able to go from very unwieldy non-model species to putative tissue-specific enhancers, which we can then interrogate further. And so hopefully these examples have shown the power of comparative genomics for unlocking secrets about how phenotypic diversity arises. These are just some of my conclusions. And the attack seek was very useful in terms of narrowing down what we should look at. And I just want to thank my collaborators and NHGRI for their funding. And I'm happy to take any questions. Thank you. So thank you Scott. This is from Larry Brody. He says, if I understand correctly, you've identified changes in a large number of unlinked elements that contribute to the fixation of a new trait. And he would like to know if all of these individual elements are acting in concert to produce the trait, how do we get to the end point via segregation? Step-wise, bunches at a time, do we know the step size? And if known, how many generations would it take to move from the ancestral phenotype to the new one and fix it? Thanks for that question. That's a great question. I mean, we don't know whether all putative enhancers are involved with the trait. There's gonna be a certain amount of background noise and false positives. Again, the attack seek data I think is extremely important because as I showed you, loss of flight is a multivariate trait. Lots of things are changing. With just the genomic data, we don't know whether a given enhancer influences the keel, the hind limb, the fore limb, or what. And so we really need that epigenetic data to narrow down our search. As to the time it would take to go from an ancestral to a derived state, that is a tough one. I think there are probably some creative ways we could imagine looking at how fast the elements are changing. But the bottom line is it changed somewhere along a branch and we don't often know where along that branch it changed. But good question, yeah. Great talk and so many very different, exciting phenotypic changes that you're looking at. I was really struck by the repeated loss of flight in the ratites. And you mentioned the skeletal changes that must also have occurred, I guess now, multiple times. When you look at the skeletal changes now knowing that they occurred multiple times, does it actually look like they occurred in different ways or does it actually still look like it's the same skeletal change happening? Yeah, thank you, that's a great question. One thing we did do was we looked at the rate of development of tinimoo foreign hind limbs because that might help us distinguish between whether tinimoo's had regained flight or whether they had retained it from the ancestor. We found that tinimoo fore limbs and hind limbs basically developed at the same rate as did chickens. And so that was one confirmation. But I mean, your question is very, very good. And in many ways I would say that genomics and phylogenetics is outpacing morphology in this way. We have lots of really novel relationships in birds and other groups where we're wondering, okay, great, so parrots are sister group to the songbirds but is there morphological evidence now to support that sort of looking retroactively? In the case of the loss of flight, there are, as I mentioned, there are lineages have lost the fore limb elements to a different degree and so that's a certain indication that perhaps there's been imperfect convergence, if you will. But I think there's a lot more to do at the developmental level to show that maybe they're slowing down cell division at different times or in different places or by different mechanisms. So I think it's still very open, yeah. Thanks. There's a lovely talk. I found it very interesting that you see those regulatory elements over and over again on the same transcription factors that are regulating those processes. But the underlying sequences within those elements as well, are you seeing conservation of the binding sites within those elements contained also to more broadly make those changes? Thank you for that question, it's a great question. And I simply haven't been able to cajole my postdocs into looking at that level yet. It's a very good question. We do have some examples where sequence acceleration is associated with loss of binding sites, particularly for this one transcription factor ETS which is important in lots of cellular processes. And we have a couple examples where very few sequence changes seem to be relevant to abolition of a few binding sites. And we're gonna experiment now to see whether that might be actually driving some of the changes that we're seeing. No, it's a great question and you think we would have done this years ago, but yeah, it's on the to do list for sure. Awesome. So my question is sort of related to the long bones and the legs. And have you looked at sort of the opposite where you have a different set of bones changing like wingspan. So you could have long leg, long wing, short wing, long leg, short leg, long wing, et cetera. Do you see that it's the same modules kicking in just regulated differentially in the arm versus leg or is it, would you see completely different sets of enhancers being changed? Yeah, thank you for that question. It's a really good one. And this is, no, I think one of the thrills of collaborating as you work with people that know a lot more about development than I do. In briefly I put up that slide of comparative transcriptomics. And one thing we're seeing there is that the transcriptomes of foreign hind limb of flightless birds, they tend to be more similar to each other than to the transcriptomes of say foreign hind limb of chicken or something like that. It's a really interesting conundrum about homology and all this stuff. So I think we are seeing convergence at the level of the transcriptome. I think that's fair to say. Whether or not that's underlined by convergent changes or convergent accelerations in the non-coding elements is still something we need to look at. But the signal from the expressed genes is, it's sort of as if the foreign hind limb are co-evolving. They're sort of clustering together. They're species, they tend to be lineage specific rather than say all the hind limb transcriptomes clustering in all the forelimb. And maybe that's what's been seen in other sort of serial structures, but it wasn't what I was expecting. So yeah. We'll squeeze one more in from Elena Ostrander who thinks a lot about dog phenotypes. So she might have this in mind and asking that the examples you presented all assume that the trait has some commonality and mechanism. How often do you expect that to be true? And for other phenotypes, there might be possible mechanisms that would permit a particular phenotype. Yeah, thanks, Elaine. That's a great question. Absolutely. I think this whole comparative framework at some level depends on there being common mechanisms. Although it's also reasonable, we can quantify now the extent to which changes in the non-coding genome are lineage specific versus common. That's sort of an empirical question. And as we saw in the TARS example, there's quite a lot of lineage specific stuff. And I think the real challenge is figuring out well, how much of that is related to the phenotype versus other kinds of noise. And so, yeah, and they're also, as I was saying, there might be sort of commonality at the level of the network rather than the individual non-coding region. And that's, I think, a big task for the futures. How do we kind of collect that data and analyze that sort of data? And last, from Shorjo Sen, he wants to know about the bike that you rode cross country. Is there anything special about it? Like gearing, tires, an old trusty bike? Oh, man. I think I'm more known for that crazy bike trip than for any science I've done. I would highly recommend it. I came back basically saying to my graduate students, look, it only took two and a half months. Just do it. I'd love to chat with folks about it. It was a life-changing experience, a good way to connect with the communities and that an urbanite like me might not connect with very often. And it gave me a lot of hope, honestly. I think if we can get past our political divisions, we actually have a lot in common, more than divides us.