 All right, let's go ahead and get started. And so welcome to the annual Stienbach lecture in biochemistry. The Stienbach Lectureship is the most prestigious lectureship in the biochemistry department. It is funded by the Stienbach Endowment in honor of Harry Stienbach. And then for those of you who may not know, Harry Stienbach was a professor in biochemistry. He's best known for his work with vitamin D. And he found that UV irradiation increased the amount of vitamin D, and it really helped cure rickets. And so he was offered the choice of patenting the invention, but instead he gave the patent back to the university and which led to the founding of the WARF, which is the Wisconsin Alumni Research Foundation, nearly 90 years back. And WARF was one of the first tech transfer institutions across the country. And since then, the endowed lectureship has been around for many years now. And we've had an amazing number of scientists, leaders in the field who've given the Stienbach Lectureship over the years, including several Nobel laureates. And so pretty much every Stienbach lecture has been one of the pioneers in the field. And today we have the distinct honor of having one such scientist, Joe Thornton, from the University of Chicago. Joe is an expert and a leader in the field of protein evolution. And Joe's had what some might think as an unconventional entry into science. Joe has an undergraduate degree in literature from Yale. And so that partly explains why his papers are really, really well-written. And he worked for Greenpeace as an activist for many years. And that's when he got interested in how xenobiotic chemicals affect human health. He's also author of a book called Pandora's Poison. And I just quickly checked the rating before I came down here. It actually is highly rated on Amazon. And since then, he did his PhD at Columbia studying endocrine receptors. And he started out as a faculty at the University of Oregon at Eugene. And in 2012, he moved to the Department of Human Genetics and the Department of Ecology and Evolution at the University of Chicago. And along the way, he's really led the field in protein evolution. And his research really elegantly combines evolutionary biology, phylogenetics, with structural biology and biochemistry. And he's again led the way in really figuring out what the molecular mechanism is for evolution of protein specificity. And I believe his presentation today is gonna be about protein ligand specificities and tomorrow's talk will be on protein DNA specificity. And I just wanna remind folks that there is a reception right after the presentation. And I ask you to stick around for the reception. And with that, I'll have Joe take the floor. Thank you. Thank you. Thank you all for the invitation. I'm really honored by the invitation. My students in postdocs find it very amusing that you invited me to give this distinguished lecture in biochemistry because I don't think I'd even qualify as a bastard biochemist. But hopefully I'll be able to say some things that are interesting for you. I value the opportunity to give this series of two lectures. I'm gonna let it go a little bit loosely. The idea is today I'm gonna give an introduction to our approach using ancestral protein reconstruction and talk about the evolution of ligand specificity in these allosterically regulated transcription factors. And then tomorrow I'll talk about the evolution of DNA binding specificity in another domain of the same family of molecules. And hopefully this will give you a sense of some of the approaches we use in the field of evolutionary biochemistry. One question I will have is in planning for tomorrow is how much redundancy is necessary in the introduction. So will you raise your hand if you're going to be here tomorrow but are not here today? I don't know how to plan that. The premise of our work and of evolutionary biochemistry in general is that a merging of the disciplines of molecular evolution and evolutionary biology on one hand, biochemistry and molecular biology on the other hand can lead to insights which illuminate fundamental questions in both fields. So if you are a biochemist, evolutionary analysis has some potential and I think largely untapped potential. There are two reasons for this. One is that evolution is the ultimate explanation for every biological system, every metabolic pathway, every protein, every cellular signaling system. We often talk about design principles in biochemistry. There are no design principles per se because nothing in biology was ever designed. Everything in biology is the result of an evolutionary process and so if there are principles they are an emergent property from the history that ensued. So if we want to uncover what those principles are we have to understand the evolutionary history of biological systems and the ways that those historical evolutionary processes may yield generalizations that are apparent in biological and biochemical phenomena. Second is that evolutionary analysis is an extraordinarily efficient means of uncovering sequence structure function relationships. So obviously this is arguably the core form of knowledge in protein biochemistry and it is a very hard nut to crack because sequence space is so unimaginably vast that even using the most advanced techniques of deep mutational scanning and library based analysis and so on we can only tap a tiny portion of that sequence space experimentally. However evolution has been a massively parallel experiment conducted over billions of years in the optimization and diversification of protein sequence structure function relationships and the data from that experiment persists in the present day sequences, structures and functions of proteins. And if we can analyze that database, reconstruct the evolutionary process, we can then gain new insight into that map that relates changes in sequence to changes in function. So here's a particularly important distinction that I'm gonna rely on over the next couple days is the difference between horizontal comparative analysis among proteins which people often think of as a form of evolutionary analysis and I suppose it is in a crude way in vertical analysis which is an explicit reconstruction of evolutionary processes through history. So suppose we're interested in what confers the difference between members of a protein family with functions that are either purple or blue. If we're doing horizontal comparative analysis, we would try to identify the set of sequence differences that are necessary and sufficient to confer the functional difference and we might test this by swapping residues back and forth between these proteins. Problem is this virtually never works. It rarely, very rarely will actually actually can switch the function. Often it will fail to identify the necessary residues and most of the time it will simply result in broken proteins that don't do anything. And the reason for this is apparent when we compare horizontal to vertical analysis. There are two major advantages to vertical analysis. Vertical analysis means tracing the history of the family on the phylogenetic tree that gave rise to them. So the first is that the difference between present-day proteins reflects all of the differences since their last common ancestor along both lineages. When in fact there is a discrete interval of phylogenetic time during which those differences evolved. And so vertical analysis becomes a much more efficient way of identifying the causal residues. If we can find the branch on which purple changed to blue or blue changed to purple, that will allow us to identify a much smaller set of candidate residues that can then be tested. The second issue is epistasis. There are often changes along those branches which make the residues, the substitutions which were causal during evolutionary history intolerable in present-day backgrounds. So that a state, so that a state which evolved along this branch and did confer this difference between purple and blue may not, so that the ancestral purple state may not be acceptable in this background or the derived blue state may not be tolerated in this background. And if that's true, then you will get broken proteins when you do the swaps. Now you'll notice that this approach means that we have to have some way of reconstructing this history and I'm gonna talk to you about that in a minute. Before I do that I wanna take a brief detour and talk about why evolutionary biologists should care about biochemistry and molecular biology, the converse of what I was talking to you about before. And this is sort of where my original heart is, although now I'm equally passionate about the fundamental questions in how proteins work as well. But evolutionary biologists are interested in the nature of evolutionary processes and there are these questions that we have been fighting about for literally a hundred years now. Questions like, is evolution driven primarily by large or small steps? What's the distribution of mutational effect sizes? A few mutations of large effect or an infinitesimal series of small effect mutations for gradual evolutionary change? Is the evolution of new functions driven primarily by selection optimizing those functions or is there a significant role for chance and genetic drift in the emergence of new protein properties and new phenotypes? What is the role of epistasis in evolutionary processes? By epistasis I mean that the effect of a mutation at one site or locus depends on the state at other sites or loci. Epistasis may make evolution contingent on low probability events or if epistasis is unimportant, evolution might proceed deterministically in the sense that selection can always drive a protein to its optimal form because no matter what your starting point is a mutation that improves the function can always be found by mutation and driven to fixation by selection. So that's a fundamental question about the relationship between genetic architecture and the nature of evolutionary processes. To what extent does the architecture of a biological system or in this case the physical architecture of a protein constrain or create opportunities for evolutionary processes? Do we have to be thinking about the way a biological system or a protein is built? The physical properties that allow it to carry out its function in order to understand its evolution? What are the mechanisms for new functions? And how do complex systems evolve? Now all of these questions have been debated at literally ad nauseam in evolutionary genetics and have remained unresolved into the last few years because they require explicit examination of the relationship between protein sequence and the structure and function of the proteins that they code for. And how that sequence structure function relationship affects the evolutionary process and how it changes through time. So the idea of the functional synthesis that merges these two fields is that we can get at that map and how it has guided evolutionary processes through a combination of the techniques of the two disciplines. So the approach begins with an explicit analysis of history. We do it in phylogenetic terms if one were examining more recent evolutionary diversifications you could use population genetics but this allows us to retrace the history of sequence evolution and propose candidate historical substitutions that are responsible for key changes in function and structure. We then use experimental biology, molecular biology and biochemistry to explicitly test hypotheses about the direction of evolution, changes in structure and function and the effects of key mutations on those molecular phenotypes. And we can study the mechanism by which ancient genetic changes produced changes in function by using studies of protein structure and biophysics. So the last thing I wanna mention is this question about the evolution of complexity. So this is an age old question for evolutionary biologists and many biologists of other stripes are interested in it as well. So the classic explanation for where complex systems come from is that they're the result of optimization and elaboration under the influence of natural selection. And this has been demonstrated for many systems such as the evolution of animal eyes from more primitive light sensing organs. But this explanation might seem to break down when we start thinking about complex molecular systems in which the function of any one part depends on the presence of the other parts of the system. So this for example shows you the lag and binding domain of a steroid hormone receptor. What's the selection pressure that drives the evolution of a new hormone if there's not already a receptor to transduce its signal? And conversely, how do we explain the evolution of a new receptor if there's not already a hormone for it to receive? So this is the puzzle that led to the sort of empty-minded critique called irreducible complexity. But it actually is a legitimate question that we can get at scientifically if we can reconstruct the evolutionary history of molecular systems. So Darwin was actually well aware of this problem. He wrote in The Origin of Species, if it could be demonstrated that any complex organ existed which couldn't possibly have been formed by numerous successive slight modifications, my theory would absolutely break down. And he continued, in order to discover the early transitional grades through which the organ has passed, we should have to look to very ancient ancestral forms long since it's become extinct. So this raises the puzzle, how can we trace out the evolution of a complex system if we don't have access to ancestral forms? So the approach we use in my laboratory attempts to meet this challenge using modern technologies. It's a technique called ancestral protein reconstruction and it allows us to experimentally analyze ancestral forms of molecular systems. So here's the idea. We use computational methods to infer ancestral sequences in mutational trajectories and then we synthesize genes coding for those proteins and do experiments on them. So we begin from an analysis of present-day protein, an alignment of present-day protein sequences. We can use phylogenetic techniques to infer the phylogenetic tree that relates those to each other, the set of speciation and duplication events that yielded present-day lineages. If we are interested in some ancestral node, which we will call X on the tree, we wanna know its sequence. The algorithm proceeds on a site-by-site basis and we can statistically calculate using a probabilistic model of evolution, the probability that the present-day set of states would have evolved, that's the data, given some ancestral state X, given this phylogenetic tree and given the model of evolution. And the model of evolution simply describes the probability of going, the relative rates of going from any amino acid I to any other amino acid J and these models typically also incorporate variable selection pressure among sites to reflect different rates of evolution and different constraints among sites. So we can calculate the probability of the data given the ancestral state and we can do that for any possible ancestral state and this lets us calculate the posterior probability given these conditions of any possible ancestral state. So that lets us quantify our confidence in the ancestral residue at that site. The maximum likelihood inference of the ancestral state is the one with the highest posterior probability. And this we can think of as the inference of the ancestral state that maximizes the probability that the reality we live in today would have evolved. So it maximizes the probability of all of the extent observations. The maximum likelihood ancestral protein is the string of maximum likelihood ancestral states. And it is unusual for a protein to be reconstructed with no ambiguity whatsoever. So here, for example, you can see an example where there is a second state that has non-trivial posterior probability. So we will be interested not just in the best point estimate but on a cloud of plausible reconstructions. So given those inferences of possible ancestral sequences, we can synthesize coding DNAs that will produce them and we can express them in whatever system is appropriate and characterize them the same way we would present a member of the family. If we're interested in the causal mutations that led to functional diversity, we can repeat this process moving up the tree and identify those branches on which the function changed and the candidate mutations are the ones that occurred on that branch. And we can reintroduce the derived states into the ancestor background to explicitly test the hypothesis that, say, this L to G transition was sufficient for changing function from green to purple. We can study biophysical mechanisms by doing structural biology and other forms of biophysical and biochemical analysis on the ancestral proteins. And one other mode of analysis that we can add to this now is using library-based approaches such as deep-mutational scanning to study not only the histories that did happen but alternative histories that could have happened in the ancestor background. And then we'll talk about that approach a little bit today and to a greater extent tomorrow. Okay, so the model system that we have done that we've concentrated on in my lab is the steroid hormone receptors. So this is a very useful model for studying the evolution of protein function. They have diverse and crucial biological functions. They're the major mediators invertebrates of sexual dimorphism, reproductive behavior, a lot of aspects of homeostasis and metabolism as well. They are ligand-activated DNA-binding transcription factors. So they are defined by their interactions with other molecules, and they're therefore exemplars of this question of the evolution of complexity, because they are molecular mediators. So here's how they work. In the absence of hormone, the receptors are in the cytosol in an inactive conformation. The steroid hormones are metabolites of cholesterol, so they are conserved among species. They don't vary genetically because they're not directly coded for. They're the products of an enzyme-mediated pathway. And they are lipid soluble, so they diffuse through cell membranes. They bind with extraordinary specificity and affinity to their favorite receptors, and that causes a conformational change in the receptor that allows the receptor to dimerize, typically to enter the nucleus, to bind to a specific binding site called a response element. In the active conformation, they bind coactivator proteins that potentiate transcription of nearby target genes. So you can see the sense in which they are molecular mediators. They are a phylogenetic subfamily within a larger family of nuclear receptors, so we can do ancestral reconstruction on them, and there is rich and strong phylogenetic signal in the domains that we care about. There are convenient functional assays because of their biomedical importance. For example, estrogen receptors are the major mediators of breast cancer risk, androgen receptors, prostate cancer risk, and so on. So we can apply all of these techniques to evolutionary questions and reconstructed ancestral proteins. And then they have this really nice modular structure, which makes them highly tractable for our purposes. The DNA binding function, recognizing the response elements, is conferred by a separable zinc finger region, and the ligand binding function and the interaction with coactivators is conferred by a separate and well-conserved ligand binding domain. These are largely autonomous of each other, so we can mix and match them, forming chimeric proteins, and that allows us to study the evolution of the DNA binding domain without the inference being contaminated by sequence evolution in the ligand binding domain and vice versa. We can study the evolution of hormone specificity without worrying about changes in the DVD along the same branch because we can experimentally hold them constant. Okay. So you and I have six steroid hormone receptors in our genome. They come in pairs. We have two estrogen receptors. We have an androgen receptor, progesterone receptor, glucocorticoid, and mineralocorticoid receptors. The question that I'm going to talk to you about primarily today is how did the specificity of the glucocorticoid and the mineralocorticoid receptor evolve specifically from their common ancestral gene, which is called the ancestral corticosteroid receptor. I will, if there's time, take a brief detour to talk about the deepest ancestor of the family, which we call ancestral SR1, and how it evolved specificity for the two major classes of ligands, estrogens versus the non-estrogen ligands. And then tomorrow I'm going to concentrate on the evolution of the DNA binding domain from the ancestral SR1 to yield two different forms of target gene recognition. I should mention that some of the techniques which we've used in the DbD are quite different from the ones we've used on the LbD, and they've allowed us to gain different kinds of insights into the evolution of these functions because the DbDs are really well-behaved. And so I'm going to talk about more recent work, the most recent work tomorrow. Okay, so the story today is primarily about GR and MR. So here are the players. GR is the major regulator of the long-term stress response. MR is the major regulator of osmolarity, kidney, colon, function, gill function, if you happen to be a vertebrate with gills. GR's major ligand is cortisol. MR has two different ligands depending on what kind of vertebrate you are. If you are a human or other tetrapod, your major MR ligand is aldosterone. If you are a fish or a shark, your major ligand for the MR is a structurally related deoxycorticosterone. MR responds to cortisol, though higher concentrations are required to activate it. These, as you can see, are quite similar to each other. The two classes differ primarily by the presence of a 17-alpha hydroxyl group on the glucocorticoids, which is absent from the mineralic corticoids. The other difference is that aldosterone has this... has this additional oxygen at the 18 position, which is lacking from DOC. So the question is, how did these different modes of specificity, which are very physiologically important, evolve between these two paralogous proteins? So to answer that question, we could do horizontal analysis, but that turns out to have been done already, and it failed. And the reason it failed is because these are extraordinarily different. They're 43% different at the amino acid level. Structurally, we have really good candidates that seem to be involved in interacting with this 17-hydroxyl group, but when we swap them back and forth, we get broken proteins. And there's decades of work along those lines. So we're going to use an evolutionary approach and reconstruct the common ancestor and the trajectory from that common ancestor to present-day functional diversity. So we begin with the phylogeny. The alignment has 252 members of the family. The phylogeny is generally well-supported. The model is also very well-supported statistically. Oh, and I should mention on this phylogeny, here's the ancestral corticosteroid receptor. This thing is about 450 million years old. It is more ancient than the last common ancestor of you and a shark, because we all have GRs and MRs, but it is more recent than the last common ancestor of you and a jawless lamprey because they only have one unduplicated member of this family. So it's a statistically well-reconstructed protein. This is a histogram showing the distribution of posterior probabilities over sites. You can see that the vast majority of sites are unambiguously reconstructed. There are a handful of sites that are ambiguous in the sense that they admit of multiple plausible reconstructions, and we're going to return to that later. What does the maximum likelihood sequence look like? Well, if we look across the entire length of the domain, we don't see much. This is the sequence identity between the maximum likelihood ancestor and the MRs and the GRs, a little more similar to the MRs. When we look just at the residues that line the ligand pocket, however, which are reconstructed with very high mean-pustier probability, we can see they are identical, except for one site to the MRs, considerably more different from the GRs. So this would lead us to the hypothesis that perhaps the ancestor was MR-like in its functions. But that's a hypothesis we would like to test and then explore the mechanisms for. So we do that with a functional assay. This is a very traditional, simple, this is fading, isn't it? Can you see it? I can barely see it. Does anybody walk around with one of these in their pocket? Is this even what it's doing? No, this isn't even worth it. Can you see it? No, I can't do that. But thank you for the suggestion. Okay, so it's a simple reporter assay in mammalian cells in which we create a fusion protein of the ancestral ligand binding domain with the Gal-4 DbD, the reporter is a Luciferase gene that's driven by the Gal-4 response element. We've done this in many different cell types. The results don't vary among cell types. So when we look at the extant proteins, we can see the GRs are highly sensitive to cortisol shown in purple, whereas the MRs and the GRs of cartilaginous fish are sensitive to all three hormones, mineralic corticoids and cortisol alike. So we want to know about the ancestral protein right here. Here's the dose response curve for those. You can see it is an extraordinarily sensitive mineralic corticoid receptor and also has some residual sensitivity to cortisol. This thing is as effective a mineralic corticoid receptor as yours is. I'm going to show these data from now on using the EC50, which is the concentration of hormone at which half maximal activation is obtained. So I'll show it this way where the height of the bar is proportional to the sensitivity. It's actually depicting the negative log of the EC50. So big bar, more sensitive. So the ancestral protein looks just like a present-day MR. When we look at the ancestral GR1, this is after the duplication. This is the GR in the common ancestor of you and a stingray. It continues to be MR-like, which makes sense in terms of the MR-like behavior of the GR in sharks. But when we get to the GR in the common ancestor of you and a salmon, that thing is now fully GR-like. Absolutely cortisol-specific has lost the ancestral response to mineralic corticoids, indicating that the shift in function happened on that branch and it's through the loss of the ancestral sensitivity to aldosterone and DOC. So that tells us that GR specificity evolved from an MR-like ancestor and this is robust to phylogenetic uncertainty. How do we know this? We took all of the ambiguously reconstructed sites, introduced the plausible alternative residue, repeated the assay, and always got the same result where the ancestral protein was MR-like in its function. So we conclude from this that the functional trajectory that we infer from the maximum likelihood ancestor is robust to statistical uncertainty about the ancestral sequence. I want to take a little detour here and talk about the robustness of ancestral reconstruction in general because people are often interested in this. We have a paper by Gita Eich, a former postdoc in my lab that's going to come out in MBE in about a month, looking at the robustness of ancestral sequence reconstruction in different systems. And I'm not going to talk to you very much about the details of the system, but these are proteins that are involved in a transition from an ancestral enzyme to a derived protein-interaction domain at the top. In the middle, the evolution of the steroid receptor DNA binding domain and a new response element specificity, and at the bottom, the evolution of ligand specificity in the steroid hormones. What we said was we started using an approach where we take all of those positive alternative amino acids and dump them into the maximum likelihood ancestor all at the same time. We call this the alt-all ancestor, and it represents the far edge of the cloud of plausible sequences. These proteins have very different levels of uncertainty, but in every case, the alt-all ancestor has the same function as the ML ancestor. Is there an issue with that? Yeah, it doesn't work. I think it's probably batteries. Thank you. Okay, so in the first case, there are 20 differences out of 187 amino acids. The alt-all ancestor is 13.2 log units less likely than the maximum likelihood ancestor, and it has the same qualitative functions. It has the enzyme activity, and it has no protein interaction activity. There is some quantitative difference, as you can see in the maximal turnover rate, but the qualitative function is identical. In the DbD, it's not quite as uncertain. There are 12 ambiguous sites, but you can see that SR1, the first and third sets of columns, both the ML and the alt-all are ERE-specific, not SRE-specific, whereas SR2, the derived protein, the ML and the alt-all have nearly identical functions. And then finally, this is the ancestral SR1, and its derived protein, ancestral SR2, the filled and open circles depict the ML and the alt-all's ligand specificity. And this thing has 65 amino acid differences, SR1 does, 65 differences between the ML and the alt-all. So there's this vast stretch of sequence space of evolutionarily plausible ancestral reconstructions that maintain the same function. Now it's certainly not true that all of the sequence space has it. It's specifically the sites that are plausible ancestral reconstructions because of... and specifically the states that have been sampled during evolution. And the reason that there is this high level of robustness is because the sites and states that are strongly conserved because of functional constraint are unambiguously reconstructed. And the ambiguity comes among states that are replaceable because there hasn't been enough functional constraint to keep them in a single state, but all of the states that are allowed confer the same function. I think that's as interesting from the perspective of, thank you, from the perspective of the distribution of sequence, the distribution of function across sequence space as it is as a sort of reassuring note about the reliability of this technique. Okay. So we're going back to GRs and MRs. If GR has a derived function and the MR function of responding to mineralic corticoids is ancestral, that tells us something interesting about the evolution of the hormone receptor pair when we look at the metabolic pathway that produces aldosterone in mammals and other tetrapods. Aldosterone is at the end of a pathway where deoxycordicosterone is an intermediate. And it evolved in tetrapods because the advent of this activity in a duplicate gene from the protein that catalyzes this reaction. So aldosterone is unique to tetrapods, and that event evolutionarily comes after the receptor's ancestral sensitivity to the hormone. So this suggests that the complex system hormone receptor pair evolved by a process of molecular exploitation that the ancestral protein before the duplication was sensitive to the hormone before the hormone actually existed. You have a duplication. You later have the loss of the ancestral aldosterone sensitivity in the GR. And it's only later in the tetrapod lineage that aldosterone appears due to the modification of this duplicate enzyme. This indicates that the system, in fact, did evolve in a stepwise fashion that the receptor and its sensitivity to the ligand preceded the ligand. And the new hormone, once it evolved, recruited an ancient receptor into a new signaling complex exploiting its ancient promiscuous activity. Now why would this sort of molecular exploitation occur? The ancestral CR sensitivity to aldosterone before the hormone existed is presumably a byproduct of its affinity for its native hormone, DOC. And DOC, because DOC is as ancient as the ancestral receptor. So the hypothesis is that aldosterone activation is a structural carryover of sensitivity to DOC because of their similarity. Now that's a structural hypothesis that we tested using structural biology. So working with Eric Ortland, who's a crystallographer at Emory University, we determined the crystal structure of the ancestral CR. This is actually the first crystal structure of an ancestral protein that anyone was able to produce. Eric was able to do this. It's a very nice resolution. This is in the active conformation. The active conformation is defined by this position of the so-called activation function helix that we'll talk about later. This is the co-activator binds right here. Hormone nestled deep in the pocket. And when we look at the pocket, we see why this molecular exploitation was possible. So Eric solved the structure in complex with all three ligands, in fact. And this shows you a superposition of the side chains that line the pocket as well as the three ligands. And you can see that the receptor doesn't have to adjust at all to accommodate the three different hormones, including aldosterone. And the hormone is coordinated in almost exactly the same fashion. Here is the 17-hydroxyl, which causes a very mild steric clash. And here is the additional 18-hydroxyl, which is unique to aldosterone. And there is plenty of room in the pocket for that. So there's sort of extra room at the inn for the guest who hasn't arrived yet. And so the receptor promiscuity comes from this mild sloppiness in the pocket. And it's that same promiscuity that, of course, is exploited on purpose during drug development and accidentally by endocrine-disrupting pollutants. But it provided the raw material for the evolution of the aldosterone MR pair. But what I really want to do is talk about mechanism. So if GR specificity evolved from an MR-like ancestor, how did that happen? What were the genetic and structural causes of that change? So here's our branch on which it happened. There were 37 changes on that branch, which is a lot. So we used phylogenetic criteria to narrow it down to a more manageable group. There were five, which are diagnostic in the sense that they're conserved in one state, in all the GRs, and a different state in all of those with MR-like function. So we tested the hypothesis that each of these was a contributor to the shift in function by introducing them into the GR1 background. So I'm going to depict the results on this plot of receptor sensitivity space, aldosterone sensitivity along this axis, sensitive, insensitive, cortisol sensitivity along this axis. And we're looking for a transition from the ancestral protein, which is sensitive to both, to the derived protein, which is sensitive only to cortisol. So we introduce each single substitution and we do not get into the right zone at all. Most of them, in fact, result in a protein that functions very, very poorly. Here are the doubles. Most of these, again, are very poorly functioning, except there's this one pair that gets us almost all the way there. This confers a three orders of magnitude reduction in sensitivity to the mineralocorticoid and yields cortisol sensitivity almost identical to the derived ancestor and to the human GR. So this pair poses a puzzle because a protein itself is a complex system. How do we get to these complexes of amino acids that seem to function together? The assumption is we get there through a stepwise process of evolution, where there are two ways that this pair could have evolved. One is for S106P to have evolved first. But this is one of the nonfunctional intermediates. This thing does not respond at all to cortisol and responds only to really orders of magnitude super physiological concentrations of aldosterone. So this is a likely nonfunctional intermediate and is unlikely to have occurred under the influence of purifying selection. What if we started with L111Q? That thing is neutral. That has virtually no effect on the function of the protein. But once it is in place, then we can add S106P, which in this context yields a shift in function to a cortisol-preferring protein that's still active in the presence of cortisol. So this implies a certain contingency in protein sequence space that if we're imagining a network of connected genotypes that can evolve through single-site mutations, that we go on a step-by-step basis, and there are some backgrounds in which we can't get from one function to another. But there are so-called permissive mutations which create a background in which the function-switching mutations can be tolerated. And it appears that L111Q is permissive for S106P, and together they confer the switch in function. So how and why do we see this epistatic interaction? Well, so we worked with Eric again to look at the crystal structures of the intermediate ancestors, GR1 with the mineralocorticoid and GR2 with the synthetic glucocorticoid. So this is before and after the transition in function. So let's look at these two substitutions. Here's the GR1 background with the GR1 side chains, the hydrophobic leucine 111. In the general vicinity, but this is still about four angstroms away from the ligand, that S106P forms this hydrogen bond that stabilizes the position of this helix that runs along the side of the ligand pocket. This is the ligand. This is the 17-hydroxyl of cortisol that distinguishes glucocorticoids from mineralocorticoids. Here are the side chains in the drive state, a polar glutamine that is still up here and has nothing to interact with, and a proline, which is going to change the conformation of this whole backbone. It not only destabilizes by removing that hydrogen bond, but also introduces a kink as prolines are want to do. So now here is the conformation of GR2. So in the drive state, this entire helix has been pulled down by about five angstroms, moving this second site to a position where it can form a ligand-specific hydrogen bond with cortisol only. So we call this conformational epistasis because one substitution modifies the effect of the other substitution by changing its location in three-dimensional space. So there's this compensation by this hydrogen bond for the destabilization of the helix caused by the conformational change at this site. I'm going to skip this interlude. I might do it tomorrow at five o'clock. All right. So that shows you how those two substitutions conferred a shift in preference from mineralocorticoids to glucocorticoids. But you may notice that this did not complete the transition. So when we add these two, we now got a protein that likes cortisol better, but it doesn't look like a present-day GR or even the ancestral GR2. So what optimized the rest of that phenotype? So our hypothesis was that the other three diagnostic changes are responsible for the full cortisol specificity. So that's a very testable hypothesis. We're going to introduce those other three into the background that exists with the first two. So here's some nomenclature. Group X is the two we've already looked at. Group Y is the other three diagnostic ones. So the test is to add those together, giving ancestral GR1 plus XY. We expect this to be cortisol-specific and test that hypothesis and we are wrong. This gives us a nonfunctional receptor that responds to none of the hormones. A non... a likely impassable evolutionary pathway. Now this is strange because that Y group are intolerable functionally, but they occurred and they are conserved in every GR that's ever been sequenced since. Which means there must have been additional epistatic substitutions, permissive substitutions that permitted group Y to be introduced. We're going to call those groups Z. So we want to find what they are. So we're going to go back to the sequence, but now instead of looking for residues that are conserved in one state in the purple and a different state in the orange, we're going to look for ones that are conserved in the GRs where they're required, but not necessarily conserved in the MRs. And there were two that had that pattern and those are our candidate permissives. And notably, shown here in green, they are very close to the Y mutations on the structure. So here's the experiment. I've shown you that adding Y gives us an unfunctional receptor. Let's add Z and see if we get a cortisol specific receptor. And indeed we do. These substitutions are sufficient to rescue XY, and all together these seven are sufficient to recapitulate the full evolution of the GR-like function. But we got a problem again, which is we can't go this way. So how do we get to XYZ? Well, it turns out if we add Z to X first, Z is permissive. It does not change the function. It does not change the specificity. It in fact gives us a receptor that maintains the ancestral function, which now represents a substrate in which group Y can be introduced, shifting the function again. So these are, again, permissive substitutions. So what we did was in fact test all of the possible combinations of XY and Z. And we can chart possible pathways through sequence space. And there are some passable and some impassable pathways. So any pathway that involves introducing Y before we introduce both X and Z is impassable. These yield totally nonfunctional intermediates. But if we introduce X first, we shift the preference, Z is neutral, and then we can add Y. Or alternatively, if we start with Z, it's neutral. We can add X, shift the preference, and then add Y, giving us a completely cortisol-specific receptor. So it's in this sense that some of these events, particularly the evolution of Z, is a chance event that could not have been driven by selection for cortisol specificity. These are more likely to have evolved due to chance drift alone. And in that sense, evolution becomes contingent on low probability chance events. But there's a question that's raised here, which is whether that contingency is strong or weak. So let's say these function-switching mutations depend on the prior acquisition of permissive mutations. If there's only one set of possible permissive mutations, then if we could replay the tape of life, as Stephen J. Gould said, turn history back, let it go again, it's very unlikely that these would occur by chance, and this whole pathway to purple would not have been available. However, if there are many possible permissive mutations, then those alternative histories, some of them are likely to allow by chance the acquisition of the permissive substitutions, and then the function-switching pathway would have been available. So the question is, how many permissive substitutions are there? Well, this is a question about the nature of sequence space around the ancestral receptor. So Mike Harms in the lab approached that question experimentally. He wants to identify how many alternative permissive mutations for the XY group there would have been in the ancestral background. So here's the approach that he used. He created a screen for alternative permissive mutations in the XY background. A permissive mutation has to fulfill two criteria, and here's how he's going to test them. He creates a mutant library in the XY background, which is non-functional. Permissive mutations have to rescue the XY phenotype and give you a cortisol-specific receptor, which he tests in a yeast-to-hybrid assay, which I'll show you about in a second. Then a permissive mutation also has to be permitted in the ancestral background, which he tested using the Luciferase reporter assay that I showed you before. So the first part of the procedure is to identify rescuing mutations and then look at those in this assay to determine if they are truly permissive. Okay, so here's that yeast-to-hybrid screen. This is sort of a classic thing in steroid receptor biology. You've got the library of the ligand binding domain infusion with the DVD driving expression of the histrigen. When this thing, we throw hormones at it, when this is in the active conformation, it's going to recruit this construct, which consists of the Galphor activation domain and the co-activator peptide, which binds to the ligand binding domain in the active conformation, get this whole complex together, drive expression of his three, which allows survival and growth in his deficient medium. Okay, so first thing to show you is that the XY genotype does not... This is in yeast, by the way. XY genotype does not grow in either medium or cortisol. The XYZ genotype does grow uniquely in cortisol. So this can distinguish non-functional from functional in the terms that we're interested in. Now, this is a low throughput assay because of some peculiarities of ligand binding domains, which we have overcome with the DNA binding domain. So we have massively high throughput assays for the DVD. This is sort of medium throughput. Through a ton of work, Mike screened over 12,000 yeast clones containing over 3,600 unique full-length GR protein sequences. These cover the majority of single replacement neighbors, a large number of double mutant combinations, and 825 higher-order combinations. So it's not an exhaustive test of sequence space, but it's an ambitious sample. And here's what he found. First, this is what the historical substitutions look like. This axis is the improvement in cortisol sensitivity over the non-functional ancestral GR1. And these are the individual and pair of permissive substitutions that happen during history. In the screen, Mike recovered one of the historical replacements, and he recovered a reversal of one of the group-wise. He found a number of new rescuing mutations, however. He found three which were indistinguishable in their effect from the historical permissive mutations. And he also found four more substitutions which were as good as the individual permissive mutations. And since the individual permissive mutations came in a pair, he wanted to see if we combined these, if any of those would be authentically as permissive as the historical ones. So here are the doubles, and one more was as good as the historical pair. So we've got four genotypes of rescuing mutations. Whoops. So that says that permissive or rescuing substitutions are fairly rare, 0.1% of genotypes examined, and that includes the historical ones. So are they permissive? So he looked in the reporter assay, and none of them were permissive. So here are the historical permissives active. Remember they are neutral, so in the ancestral background. So this thing activates with both DOC and cortisol. When we put in that pair, we get a dead receptor. And when we put in the three other ones, we get a receptor that has a high level of constitutive activity and responds to relatively low levels of other hormones like progesterone. So this has lost its allosteric specific regulation by corticosteroids. So these are promiscuous or constitutive. So these alternative rescuing mutations all disrupt the function of the protein in one way or another. So when we look at the full set of potential mutations that Mike looked at, there were no novel permissive mutations in the library that gives an upper bound of 0.03% of all substitutions that are permissive, and those are the historical ones. So that says that permissive mutations are very rare and GR evolution was contingent on rare low probability chance events, not common events which are of moderate probability in the aggregate. And in fact, because they were too required, this is rare low probability compound events. So evolution is then highly contingent on chance events in this case. So last thing I want to tell you about was work Mike did to figure out the biophysical causes of this contingency. Why were permissive mutations so hard to come by? Why are there so few rescuers and why are they incompatible with the ancestral proteins function? There have to be biophysical requirements that cause permissive mutations to be rare. So the first thing we had to think about was what are the models for permissive epistasis? The dominant model in the field is that epistasis in protein evolution is driven primarily by effects on stability. And the idea is that there's some threshold stability that's required for function and function switching mutations would typically reduce stability leading to an unfolded protein. The permissive mutations would increase stability without changing function and would create a background in which the function switching mutations could be tolerated. That's the model and we tested that. And we found, we tested it first by examining the effect on thermal denaturation. And Mike found indeed that the x-y substitutions are indeed destabilizing and the permissive substitutions are a little bit stabilizing. So that's consistent. However, there were a couple of observations which were inconsistent with this. First is that they seem to compensate locally. Remember, I showed you that the permissive substitutions are right next to the function switching mutations. And it should be possible to globally stabilize a protein with distant mutations. And in fact, they seem to locally compensate. So for example, in GR1, one of the function switching mutations abolishes this hydrogen bond, which is gone in GR2. And this hydrogen bond is involved in stabilizing the loop to the activation function helix in the active conformation. But the permissive substitution compensates with a new hydrogen bond to the very same loop. The other one improves packing interactions right here between the mobile helix and the nearby part of the protein. So these seem to be local compensation rather than global compensation events. The rescuing mutations are also local, located on either the mobile helix or on the activation function helix. And finally, Mike tested this hypothesis that global stability is not sufficient by introducing a stabilizing pair of substitutions that are distant. These increase the TM, as shown in yellow, but they do not improve the function. They are not permissive. So increasing stability, the global stability folding, is not sufficient to permit the function switching mutations. So requirement number one, permissive mutations have to stabilize the same local elements that were destabilized by the function switching mutations. Number two, why were the rescuing mutations incompatible with the ancestral protein? Why did they result in a completely non-functional protein? Well, when we looked at that pair, we can see that in the derived conformation in ancestral GR2, where they were in fact rescuing, one of them is on the mobile helix, the other one is on the helix that's right across from it and they have this very nice packing interaction. But they're not permissive because they can't be introduced into the ancestral genotype and conformation because when you put this helix back where it was in the ancestral state, they clash. So that's requirement number two. Permissive mutations have to be compatible with both the ancestral and the derived confirmations. And if a conformational change is involved in the new function, that becomes difficult to do. And finally, what about these rescuing mutations that disrupted the allosteric regulation, causing a constitutive or a promiscuous receptor? Well, when Mike looked at these, they are all on the activation function helix and they all improve packing of the activation function helix against the body of the protein in the active conformation even when ligand is absent. And he tested this hypothesis using molecular dynamic simulations. This is shown for one of those substitutions. Here's this substitution showing the activation function helix packing against the body of the rest of the protein with an increase in the average number of contacts in the mutant state compared to the ancestral state. And this is why you get a protein that doesn't depend on hormone anymore in order to activate it. So this leads to our last requirement. The permissive mutation has to be compatible with the energetics of allostery. It has to need the binding of the hormone in order to occupy the active conformation. So permissive mutations have to fulfill three constraints. They have to stabilize the right part of the protein that was destabilized by the function switchers. They can't stabilize the wrong part of the protein. That is, they have to maintain the opportunity for allostery. If they stabilize it too much, you lose regulation. And finally, they have to be structurally compatible with both confirmations. And there are few mutations out of the whole universe of possible neighbors of the ancestral protein that can fulfill all these requirements. So as a result, evolution is dependent on these rare events that find permissive states that can allow confirmation changing function switching mutations to be amassed. It's the architecture of the protein the way the allosteric lag and regulation is conferred by the physical architecture of the protein that limits the evolutionary trajectories available to the protein. Okay, I'm going to skip this little thing, and I'm going to finish here. What did we learn? By reconstructing the history of the protein, we learned something about the trajectory of the evolution of a complex system. How new receptors evolved by molecular exploitation, repurposing old parts for new functions. This is facilitated by the imprecision of the fit between ligand and receptor. Chance mutations played a key role in opening and closing paths for the evolution of new structures and new functions. This analysis showed you why the horizontal work to try to turn a present-day GR into an MR failed because permissive mutations are not present in the MR. So you can't put the MR states into the GR. And conversely, there were restrictive mutations as well that keep you from doing the opposite, but I didn't tell you that story. But we did learn that the evolution of the GR's present form was contingent on rare compound non-deterministic events. And if we could rewind the tape of life, it's unlikely the permissive mutations would have evolved and the protein would have had to have taken a different course to either evolve a different function or to evolve cortisol specificity if it did through a different mode of protein architecture. So in this sense, proteins are fundamentally historical objects. We want to understand why they occupy the parts of sequence space and why they have the biophysical architecture that they do. We've got to look to their histories. It also indicates that the contingency of the evolutionary process arises from the physical properties of the GR. So when Stephen J. Gould wrote this beautiful book about the contingency of evolutionary processes at a macroscopic scale, he made a similar argument about rewinding the tape of life and getting different outcomes. But his idea was that there were these events like asteroids hitting the Earth and mass extinction and climate change that drove that contingency. And this work shows that at the most fundamental level of biological organization, contingency arises from the physical architecture of molecular systems, that it was the nature of GR's allosteric regulation and the nature of the conformational change that conferred its new function that made evolution contingent on low probability events. Finally, I hope I've shown you some ways in which studying historical and molecular mechanisms together is mutually illuminating for the questions that motivate us in both evolution and biochemistry. So finally, let me thank the people who did the work. Mike Harms did that beautiful work on the library screens for alternative permissive mutations. Jamie Bridgman and Gita Eich did massive amounts of work on all those reporter essays. Victor Hansen-Smith did most of the computational work reconstructing the ancestral proteins and then everyone else in the lab made really... This is a long-term project and everybody in the lab, whether they did the experiments or not, made irreplaceable contributions to sort of understanding the potential and implications of these experiments. And then, of course, Eric, collaboration with Eric, was essential to all the structural biology here. So I'd like to thank you. I'll take any questions if there's time. And then we can also talk more at the reception and tomorrow when I talk about the DVD. Thank you. Comment at the end, but do you imagine that in addition to possibly evolving GR specificity, there might have been other similar hormones that might have been accessible, have those permissive mutations not been there? Yeah, I think there are innumerable pathways that evolution could have explored if it didn't take this one. And that could have involved acquiring specificity for cortisol through some different architecture. But as you mentioned, there are other steroids present in vertebrates and particularly in the tetrapods. If you think at a physiological scale, the importance of having evolved a specific GR was to allow separation of the regulation of osmolarity from the response to stress. And in many vertebrates, particularly fish species and jawless vertebrates as well, those are regulated concomitantly, one ligand, one receptor, turn on both processes. This allows more specific regulation at an endocrinological level. So if there was a selective advantage to this greater specificity of regulation, you can certainly imagine other steroids present in the animal being co-opted for that purpose. And my orientation on these things is that there are a huge number of possibilities, possible pathways evolution could have gone down if this hadn't happened. You know, permissive mutation being so rare as in your case is this is, you think, a general phenomenon where you would have other proteins who are more robust and more permissive mutations can actually drive into different trajectories. I think that they are at best partially general. I wouldn't say they're absolutely non-general, but we know of examples in which permissive mutations are less specific. So the stability mediated permissive epistasis model that I alluded to, that has clearly been involved in many cases of evolution. Jesse Bloom's work on the evolution of proteins in the flu virus shows trajectories like that. And if that's all it takes to permit the evolution of some new function, there are going to be lots of ways of doing that. I think that the degree of specificity and rareness of permissive mutations depends on the architecture of the protein's function and the changing that architecture that's involved in the acquisition of a new function. So when there's allosteri, I think it makes it harder. And when there's a conformational change, I think that makes it much harder. When you put those things together, you get a very stringent circumstance, a set of constraints on permissive substitutions. I think every protein will be different in this respect, and it will be really interesting to see how those constraints apply across different protein families and also how they may change during the course of evolution along a phylogeny within a single family as well. So let's thank Joe once again.