 I'm an investigator and an assistant professor at Cornell University, interested in genomic technology development and its application to understanding evolution. You got a speaker talking. Thank you to the gentleman up there. And I'm very excited to be here to moderate this session with my co-chair, who will introduce himself. Yeah, so I am Claudio Mello. I am at the Oregon Health and Sciences University. I study, actually, many parallels with Eric Jarvis. I also study birds, including finches, parrots, hummingbirds, a lot of comparative work, interested in basic mechanisms of vocal learning, vocal communication. So it's a pleasure to introduce our first speaker, Emma Farley, from University of California, San Diego. So I first became familiar with Emma's work when she was a postdoc with Mike Levine and published a beautiful paper pointing out that a lot of transcription factor binding sites in actual genomes were sub-functionalized and were really hard to define using sort of off-the-shelf position weight matrices. And she's continued her interest in genotype-to-phenotype relationship. She uses a wide variety of models. So Siona, she was telling me she's added zebrafish and mouse to the fold now as well. So I won't take up any more of her time. So let's thank her for being here today. Thank you very much for the introduction and for inviting me to give my perspective on genomics and comparative evolution. Can everybody hear me? OK, excellent. So I'm really interested in how the precise patterns of gene expression required for successful development are encoded in our genomes. So how do we have this beautiful neural expression pattern seen here? And we've heard that it's enhancers that control when and where genes are expressed. And we've also heard earlier today about using epigenomic approaches to annotate regions of the genomes to find enhancers. But that doesn't tell us whether or not the sequence is actually functional, let alone what sequence within that region actually drives expression. And this is really important because small changes in enhancer sequence can have really dramatic effect. And so here you can see the developing limb bud. And so sonic hedgehog is expressed in this purple domain. And this ensures that we get five digits on our hands and our feet. And a single base pair change, an A to C out of 850 base pairs, leads to an expansion of this pattern and polydactyly. And sequence changes within enhancers can also be really beneficial. And it would help us to understand them all, all of the ones hidden in our genomes. So for example, here is a red blood cell and there's an enhancer that turns on a membrane protein in our red blood cells. And this membrane protein is how the malarial pathogen gets into our system. And so we are malarial susceptible. And a base pair change within this enhancer makes the enhancer no longer functional. You don't express the membrane protein. And people with this variant are resistant to malaria. And so we know that 90% of variation associated with disease and phenotypic variation is in the non-coding genome, mainly within enhancers. But we have no way of knowing which variants are simply a net variation between individuals and which ones actually have a functional consequence. And we have all these genome sequenced, which is really exciting. But I feel like there's a frustration because we can't get all of the information out of that to help us understand evolutionary adaptation and disease. And so I wanted to take a different approach, a synthetic biology approach, to understand how sequence encodes function and see if we could get to some rules that could help us understand this in a general way. And so here you can see we've zoomed in on an enhancer and we know enhancers contain binding sites. So in blue, there's a binding site and then a different type of site in orange. And this enhancer binds ETS transcription factor to the blue sites. And garter, which is expressed in the ectoderm, the skin and the nervous system, binds to the orange sites. And this leads to the turning on of gene expression in the overlapping pattern shown in green. And this turns on the gene OTX, which is important for neural development. Here's a later embryo just because it's easier to see the head and the tail. But clusters of ETS and garter binding sites are found throughout the genome. Some of them don't turn on at all. Some of them activate expression in the heart. Some of them during blood development. Some of them in the gut. So why is it that this sequence is turning on specifically in the nervous system? What changes are gonna lead to a functional consequence, a change in gene expression and a phenotype? And so to look at this, what I wanted to do was keep the core of the binding sites constant, so GGAA for X and GATA for garter, and randomize the rest of the sequence. Now we hit a problem here. This is a very small enhancer. It's 69 base pairs. If we were to make every possible sequence variant, it would be 10 to the 30 variants. And so when I saw this number, it made me think we need to test millions, if not billions of variants for function. And once you find the functional features of the enhancer, perhaps there are only five binding sites that's all you require. How many ways can you arrange these binding sites and get the same expression? So if we change the spacing, the order, the orientation. And again, this is a combinatorial problem and it quickly goes exponential and we're talking about billions of variants. Obviously, so what I wanted to do was make as many variants as possible and test them for function. And because enhancers drive tissue-specific gene expression, it's not enough to test these in a single cell line. We need to test them in every single cell of a developing embryo. And so for this, we need a special model organism. And I think that SIONA is ideally placed to do this. So although the adult doesn't look anything like us, here's an adult SIONA. It's a member of the Eurocordates, the sister group to the vertebrates. And it has a notochord, cell-shown in red, a dorsal nerve cord, and even a heart, the cell-shown in green. And the transcriptional program specifying these tissues and the transcription factor specificity is conserved between the Eurocordates and the vertebrates. But beyond this, they're an ideal model organism for this particular question because of the power of electroporation. So we can electroporate a reporter construct with the enhancer, a promoter, and GFP into hundreds of thousands of synchronously developing embryos, and then we can read out expression. So in an hour-long experiment, we can electroporate millions of embryos. This is less than 1% of an electroporation. And so my approach has been to make enhancer variants to look at sequence constraints and enhancer variants to look at grammatical constraints to see how changes in these affect gene expression in whole developing embryos. And so I'm going to talk you through the first experiment I did on this because it really led me to a focus on enhancer grammar. So Charles alluded to this. So in the first experiment, I was able to make 2.5 million different enhancer variants with randomized sequence outside the core. Each enhancer is attached to a promoter, GFP, and a barcode. So if enhancer A is active, we will see barcode A as mRNA. So we can electroporate these into fertilized eggs, wait till the normal time the enhancer is active, and then extract all of the mRNA and specifically sequence the barcodes. And because we know which enhancer goes with which barcode, we can identify the functional enhancers. And so in the first screen, we found 20,000 enhancers that were active at or above the levels of the endogenous enhancer. I should say now we are making tens and hundreds of millions of variants. But we wanted to know what was it that it was important for the expression of those 20,000? And when we did analysis, we came up with this really simple answer which for me was kind of confusing. So anybody who maybe studies transcription factor binding might recognize these motifs. So I told you I kept the core of the binding site constant, this G-G-A-A and G-A-T-A. And it seems that particular dinucleotides flanking the core are important for expression. But if you look at these, these are PWMs for ets and garter. And so the most represented base at each position gives you the sequence that binds the transcription factor with highest affinity. But if all you need is high affinity binding sites to mediate expression, we should be able to just put our favorite high affinity binding sites together and design tissue-specific enhancers. And that doesn't work, as I'm sure many of you know. And so I wanted to look at the biology. So we went back to the wild type enhancer and we scored the affinity of the binding sites within this enhancer. So highest affinity being one. And we see that we have two relatively high affinity binding sites, but then we have these low affinity binding sites. So we wanted to test the hypothesis that all you need is the flanking to make high affinity to mediate expression. So we took inert enhancer variants from our library that drove no expression. And we asked, could we make this into a functional enhancer simply by manipulating the motif, the flanking? And so if we make these high affinity binding sites by making just a few nucleotide changes, we do turn in inert piece of DNA into something that drives transcription, but we've lost tissue specificity. Yes, we have neural expression, but we also have posterior brain, which we shouldn't have notochord and endoderm. And so we wondered, maybe you need these low affinity, suboptimal affinity binding sites in order to mediate restricted expression. And so to test this, we took the inert enhancer and simply by changing nucleotides to mirror the wild type affinity. We saw that this recapitulated the endogenous levels and location of expression. And we did this on many inert backgrounds which were chosen at random. And I should say that many people have seen this in the past, but as a genomics community, we seem to get into a habit of matching things to PWMs and away from the original studies of the SV40 enhancer and even the lack operon. But having shown in this high throughput way, the sequence is required for expression. We were able to look further and see how the organization of the binding sites impacts function. And so here you can see the wild type affinity binding sites and the spacing, so 10, 15, and 13. And we wanted to do changing, is the spacing between the binding sites important for expression? And so if we insert three base pairs here, regardless of the three base pairs, this leads to a three-fold increase in the levels of expression, suggesting that spacing affects levels of expression. And if we delete three base pairs here so that we have 10 and 10, we see a dramatic reduction in the levels of expression. And so this is interesting because the wild type enhancer, along with having these suboptimal affinity, low affinity binding sites, also has a spacing which does not give the highest levels of transcription. So it's suboptimal for transcriptional output. So enhancers integrate information about the binding affinity and organizations. What happens if we completely optimize the features of this enhancer? So we have the high affinity binding sites and we make this spacing 13. And we were surprised to see that we've completely lost tissue specificity. We have this really strong expression in the notochord, the posterior brain and the endoderm. And so these are regions of FGF and X signaling and we've lost combinatorial control. And so when I saw this, well, first I was really scared because I would love one day to be able to look at a genome and find all the enhancers involved, say, in notochord development and try and pinpoint, like all of us, mutations that would lead to phenotypic variation. And this scared me because if you think about it, there's only one way to be the highest affinity binding site. There's only one way to be organized in the way that gives strongest expression. But there are many, many, many ways to be a low affinity binding site and you're combining this across many binding sites so it makes the enhancer code very degenerate. And so I think that this idea of suboptimization has really made it hard to find the features within enhancers that drive expression. And it's for this reason that I became very interested in enhancer grammar because I wanted to see if this is such a complex problem, is there a way for us to make it less complex? Are there any interdependencies that we could use to make the problem less complicated? And there was one thing I noticed that maybe some of you might have noticed. So the higher affinity binding sites had the worst spacing and the poor affinity binding sites had the spacing that gave higher levels of expression. So good spacing, poor affinity, poor spacing, good affinity. So I wondered if there was a balance or an interplay between affinity and organization and if we could get that dependency, we would be constraining or reducing the degrees of freedom on this problem. And so I went about investigating this. I randomly selected one of the enhancers from the synthetic library that I showed you. And this one drove very strong notochord expression. And I demonstrated that this notochord expression was driven by this six site shown in red and a low affinity ets and a high affinity ets. And so having found how this sequence encoded notochord expression, I wanted to see how changes in the spacing affect expression. And again, we see that spacing affects levels of expression. But if there's an interplay between affinity and organization, here we have poor affinity and what seems to be a good spacing. And here we've now got poor spacing, poor affinity. If there's an interplay between affinity and organization, we should be able to recover this expression by improving the affinity of this site. Such that now we have poor spacing but good affinity where here we had good spacing and poor affinity. And indeed, this does recover the levels of notochord expression. And so it seems that there is this interplay and I wanted to see if other aspects of binding site organization impact function. And so here is the wild type OTXA enhancer on which we've added the ZIC and the extra ets. So we have the endogenous neural and then we have this notochord expression. These two sites are pointing towards each other. Is that important? We don't know. We change the direction and we lose expression suggesting that orientation of sites is important. And so a defined enhancer grammar, specifically as the interplay between affinity and the syntax meaning order orientation and spacing. And I wanted to know, could we use these rules to find notochord specific enhancers in the genome without looking at anything other than sequence? So I haven't shown you all of the high throughput studies we did but we found that we needed two ets and a ZIC. We found a range of affinities. We knew that 11 base per spacing was really optimal and that directionality was important. And so we wanted to search the genome and use these parameters for find enhancers. And we were particularly interested in ones you couldn't find in the genome from previous approaches. So how far could organization compensate for affinity? If we had really good organization, how low affinity is a functional site? And I'm just gonna show you the most extreme example. So we found this region upstream of the Brackery promoter. Brackery is a key gene in notochord development. And this has the 11 base per spacing. They're both pointing towards each other but the affinity is 14% and 25% relative to the consensus. So really low affinity binding sites. I never in my wildest dreams thought that this would work and I was shocked to see that this drives really strong notochord expression. And indeed we've found 130 regions that conform to this grammar within the Siona genome. And I also mentioned that ZIC and ets are expressed in vertebrate notochords including mouse and human. And we've used the grammar to search in there and have found the hundreds that conform to this as well. So we're now testing these to see how good our rules are and how we can improve them. But beyond that this demonstrates that we can use regulatory principles as an orthologous method to comparative genomics to identify enhancers in the genome from sequence alone. And that potentially there's an entire class of enhancers that have poor affinity but good organization in our genomes that we haven't really studied. And so in my lab we're studying enhancer grammar and I'm wondering how many types of enhancer grammar are there and what classes exist. Are there ones for different types of biological processes? Is it transcription factor dependent? And I want to tell you about some unpublished work on this enhancer shown here. So here we have two U-sites and R and an X. These bind suppressor of hairless, HES and we don't know what binds to this one. And this is a really surprising enhancer. So here is the sequence conservation of this enhancer. It's about an 800 to 1,000 base pair piece of DNA. And you can see that the only things that are conserved are the binding sites. And the sequence between the binding sites is not conserved at all. And you may be thinking, great, but how many organisms are you looking across? And so this sequence conservation is all of these organisms. So it goes across the bilaterians from the rough periwinkle and the invasive snail in the gastropods to human and zebrafish in the vertebrates. And so we were really shocked to find this and we wouldn't have found this by looking at sequence alignment, more we found it by looking at rules. And so we wanted to know, is this, I should say as well, the binding site affinity for these is really highly conserved between the organisms. So it's around 0.9 and this is 0.8 to 0.9 and this one is slightly lower. And the spacings are really highly conserved. So this is always 16. This is a range here and this is between 30 and 50. So it's a really unusual and I think a novel type of conserved enhancer. It's not a whole block. And so we wanted to know, are these actually functional enhancers? So we took the Siona enhancer and we tested it in Siona. And you see the neural expression. So this is the motor ganglion and the nervous system and the posterior brain as our anterior brain and the muscle. And then we took the most extreme which is the invasive apple snail. So this is a proteostome, a gastropod. And we tested this enhancer in Siona. And we were shocked to see that this also drives expression in the posterior brain, the anterior brain, the dorsal nerve cord and the muscle. Are these sites what are important though? Maybe it's something else about the sequence. So when we delete these binding sites, we see a dramatic reduction in the expression and we see the same with the apple snail. And I haven't got time to show you. So I showed you Siona and the invasive snail. But actually we've tested the enhancers from all of these species in Siona and they all drive expression in the nervous system which is where we expect expression because it drives a neural gene. And we're currently now testing these enhancers in sea urchin and zebrafish and doing genome editing experiments in mouse and Siona to really understand this in more detail. But I think this opens a question of why does this enhancer need such a rigid grammar while other enhancers are encoded in a much more flexible manner? And what I'm trying to get across to you in this presentation is really this idea that you can use regulatory principles, synthetic biology in model organisms to gain a lot of insight into gene regulation, hopefully. And so what I mean by regulatory principles was this idea of suboptimization within developmental enhancers, especially those regulated by pleotrophic factors. But this means that enhancers are incredibly degenerate, hard to understand. However, all is not lost because we were able to find grammatical constraints and this interplay between affinity and organization to predict tissue-specific enhancers from sequence alone and that this was really a very flexible grammar. And now what we're looking at is this very rigid grammar that we are shocked to see as conserved across bilateria, well, it's conserved in both proteostomes and deuterostomes and we're investigating the biological basis and need for such strong selective pressure. And so I think that these principles can help us identify variants associated with disease and phenotypes. So here's the limb enhancer and much like the other enhancers I showed you, the binding affinities for the transcription factor X here are very low and these are the validated important functional sites. And I told you an A to C mutation led to polydactyly and an expansion here but we didn't know why. And I think if enhancers use low affinity binding sites then mutations or variants that increase the affinity could be gain of function dominant mutations and that these could be underlying a lot of regulatory changes in phenotype. And indeed this change, this A to C leads to a tripling of the affinity of this binding site and we're currently studying this in mouse to see if it's the affinity change rather than the sequence change per se that's driving the phenotype. We should find out next month, so that's exciting. And so my vision for like the next 10 years, I guess I don't know how many of you know Strunk and White. I found it very helpful. If you don't know, I would recommend it. It's a book about the English grammar and how to write well. And I would love for us to have an atlas of grammatical rules that help us understand how the genome and how enhancers encode tissue-specific gene expression. And I think this could help us identify tissue-specific enhancers in the genome like the notochord, but I also think mistakes or errors in the grammar that we find within the genome could help us get away from what we have now which is these pool of variants associated with a phenotype or disease and try and pinpoint the causal variant that's driving a phenotype. And so in order to get there, I think we need functional approaches. And when I say functional, how sequence impacts expression and phenotype using massively parallel reporter assays, genome editing assays. And I really think that we should take advantage of synthetic biology and the massive diversity it can give us to combine with comparative genomics and that this requires improvement in oligosynthesis so that we can synthesize 1KB regions and test potentially all the attack-seek regions predicted to be enhancers and see which ones are and which aren't. And then phenotyping at scale, obviously we have single cell technologies which help us look at RNA changes. But I think we need spatial omics on a really grand scale with integrating sequencing, automated microscopy, proteomics, and then a wealth of computational tools to take advantage of this. And so with that, I would like to thank, I was a postdoc in Mike Levine's lab, so I'd like to thank him and all the people involved in the first part of the talks project. And this is my lab and I'd like to thank NHGRI and the director's office for funding and thank you very much for your attention.