 So go ahead and get started. So it's my great pleasure today to introduce Annabarie Pyle, who's giving this year's David Green Lecture in Enzymology. And I just thought I would give a little background on David Green. So this particular lecture is supported by the David E. Green Memorial Lecture Fund. And there's one of three named lectures we have here in the biochemistry department. And so David Green, like Anna, began his independent career at Columbia University. He then saw the light in 1948 and moved here to Madison, to help found the Enzyme Institute, so the Institute for Enzyme Research. And while here, he was extremely prolific. So the number I found on the internet, which has to be true, on the internet, says he was associated with 700 articles that were published over four decades here in Madison. And he really made fundamental contributions to our understanding of oxidative less formation, beta oxidation, and many other enzymatic reactions that occur in the cell, particularly in the mitochondria. And he also established the Enzyme Institute as really a foremost center for mechanistic enzymology and helped recruit many people in that area, including Lardy, Cleveland, and Piranha, and many others. So he also had a nice quote I found that I think is particularly appropriate Piranha's research in maybe the field of RNA splicing in general. So at age 27, he published an essay titled Reconstruction of the Chemical Events in Living Cells. And in that essay, he wrote, the mastery of a particular machine requires not only a knowledge of the component parts, but also the practical ability to take the machine to pieces and reconstruct the original, which I found to be a really nice quote for what it means to understand something in mechanistic and molecular detail. And so today, I can't think of a more fitting lecture than Annemarie Piles' work on RNA splicing. So it's very well known for studies on the mechanism of group 2 introns splicing, kind of all of those, the molecular details of catalysis, how these ribosomes hold, and their structure of these ribosomes. In addition, she's really made really nice contributions to her understanding of RNA helicases and how these functions inside the cell in a variety of ways. She's currently at Yale, and she's also an HGMI investigator who she's been since 1997. And she holds the William Edward Gilbert chair of molecular and cell and developmental biology. She's also a professor of chemistry at Yale. So with that, I'll be able to honor and thank her for her degree to come to this lecture. Thank you, Erin. Well, it's an honor to be asked to give the green lecture. It's also an honor to be asked by Erin, who has just done such amazing work in such a short time. And it's also super fun to come and see all of my many friends here. You know, as part of the already society community, I spend a lot of time here organizing meetings and coming to meetings. And this is sort of a second home. So anyway, it's nice to be here. So in part because this is the green lecture, I'm going to be telling you about splicing today. And I'm going to try to do as much emphasis on sort of enzymology and how these functions as splicing enzymes as possible. And so for the undergrads that are present, I also am going to give a bit of an introduction to splicing and mechanism of ribozymes. So for those of you who think about splicing all the time, please forgive me. And we'll ultimately get to something that hopefully you don't find interesting, too. OK, so right here is my favorite splicing machine. And this has been the focus of my work. The group two is self-splicing and trans-family. We'll be learning a lot more about that. But you know, it looks pretty complicated, and it is. So we're going to start with something a lot more simple to begin. One thing that needs to be said to really understand why we're talking about any of this today is that RNA is not functional until it's spliced. The central dogma created some major ribbing. DNA doesn't just go to RNA, it goes to protein. The RNA has to be cut and patched in various ways and modified in various ways before it's functional. So here is what splicing looks like for an RNA with one intron. An epsilon is defined as a piece of RNA that encodes something functional, like a breaking sequence. And often, two exons are subdivided by a region of RNA that doesn't actually encode for something important or may have secondary functions. And it's called the intron. And through the splicing reaction, this piece of RNA is removed, usually in a lariat form, I'll explain in a minute. And the exons are spliced together in a simple way. OK, so that sounds very easy. Although it's important to see that this is a two-step reaction that we'll look through that in a minute. But life is not that smartly to have lots of introns. It's going to look something like this, where you have various different exons that have different modules of, let's say, a breaking sequence. And the boundaries between these exons have to be spliced together to create this sort of multicolored pulse that is now ready for function. The why is splicing important. It's really important to you, because most of your genes are in this category. You have an on-and-average eight or nine introns in all of your genes. So splicing is a huge part of your gene expression situation. And it gets worse. This is what it looks like when every one of these exons is included. And you get a single-arge product that encodes, let's say, one possible protein product. However, life is not that simple. In us and in other complex organisms, exons are spliced in various combinations. So maybe the red guy will splice to the yellow guy and leave out the orange one, et cetera, here. It's giving you one of many combinatorial possibilities in gene expression. So in our tissues, our tissues all undergo what's called alternative splicing with different combinations of exons. We express different kinds of spliced products at different developmental stages. When we're early embryos, we make different combinations than when we're adults. And what this has done is that splicing has enabled eukaryotes to break the one gene, one protein barrier. So this has facilitated massive organismal complexity. So we can do a lot more with a lot less information through this process. So that is why splicing is important. So this is the first message I got, especially in use, but not exclusively in use. If somebody tells you that there are no introns in prokaryotes and in bacteria, they are not right. There are lots of group 1 and group 2 introns in bacteria. The second message I'd like to tell you is that RNA splicing machines are predominantly ribosomes. In other words, they're machines that are made of RNA. They're enzymes, but they're not substantially in their active sites composed of protein. So again, this kind of thing is catalyzed by RNA. So what are ribosomes? Ribosomes are RNA molecules that catalyze chemical reactions. They are true enzymes, although many of them undergo autocatalytic reactions with one turnover. Ribosomes come in many different flavors. And I like to group them into these different sort of flavor groups in the following way. There are big, giant ribosomes that mostly act to clean unlike a RNA and DAA. And they are basically two of them. There's a self-blessing group 1 intron. There's a ribosome that clips the termini off of your tRNA molecules when they're initially made. There's the group 2 self-splicing intron that we'll talk about. And now we know that the splice zone, which as we'll see is the big machine that processes with multiple turnover, all of your splice sites, is also going to fall into a category that shares a common enzymatic mechanism. We're going to talk about this sort of catalytic mechanism of this class in great detail. Now I'm going to just run through the other classes that people like to think about. We're not going to talk about them further in this discussion. There's another really big group of ribosomes. And they're all kind of tiny. So some of them are as small as 19 and as large as 80 or so. And these are often structured RNAs that are present in viruses, and viroids, and other contexts. And these are self-cleaning motifs that are conserved and fall into these families. They undergo an RNA reaction that's actually pretty simple, in which a general base deproved needs the 2-prime hydroxyl at the sysile linkage, the bond that needs to be cleaved, and gives you a site with phosphate at the end and release of a 5-prime hydroxyl. OK, that's these guys. So let's just say these guys have an easy task. They do a pretty easy reaction. Because as you know, this is sort of a natural degradation reaction of the RNA. As we'll see, these guys have a much more complicated molecule. The other big member of the ribosome family is the ribosome, in which RNA catalyzes peptidyl transfer. The heart of the ribosome is all RNA. And RNA is doing almost every aspect of the peptidyl transfer chemical reaction. And then something you may or may not know about is that there are DNA molecules that can catalyze, especially RNA and DNA cleavage and ligation. These have been developed mostly in a test tube in vitro through selection methods. But Ashley DNA could be a very good catalyst if you select for one that is targeted to a certain type of chemical bond. So those are sort of lurking in the background and they're sort of artificial. But they're interesting. OK, so having defined ribosomes as a whole, let's go back to talking about the group that we're going to focus on today, these big ribosomes. The large phosphodiesterase ribosomes catalyze this particular reaction in which an exogenous nucleophile, and by exogenous I mean it's not this guy, it's a nucleophile that comes in and does inline attack as a separate molecule or a separate region of the molecule, does inline attack on the phosphorous, giving you trigonal bifuramid structure of the intermediate. And so you get this kind of architecture that will require subsequent prognation of the leaving group, and you'll get a 5-prime phosphator linkage and a 3-prime hydroxyl. Now, everybody in this family of big ribosomes catalyzes this kind of reaction. And when I say nucleophile, it's always some kind of alcohol, either a 3-prime hydroxyl, 2-prime hydroxyl, or water. OK, so you'll see here, I circled two big red circles here, and if you can see the print, it says magnesium. These reactions are almost always metal-catalyzed, and this was discovered not through the structural biology done in my lab, although it was confirmed by that, but this was initially demonstrated through chemical biology methods that were pioneered by a guy named Joe Pichirilli, who's at the University of Chicago. And he basically proved the involvement of these metals in stabilization of the leaving group and of the nucleophile by doing this really creative experiment. It's just like a genetic experiment, but it's without. And it's called a metal ion specificity squad. And what he did before we had any of the structures I'm about to show you is that he substituted these potential ligands here, like these oxygens, with a soft ligand like sulfur. And he saw that when he did that, the reaction no longer proceeded in magnesium. Magnesium doesn't like to interact with sulfur. But if he dropped in a reasonable fraction of cadmium or zinc into the mix, he could rescue the reactivity up to wild-type levels. And in that way, he established interactions between metals in the leaving group, the nucleophile, and I'm not showing you here, but also some of the non-virgin phosphoryloxygens. So this was a very important experiment, and it established that enzymes of this family proceed through a two-metal ion mechanism. He did this with group two introns, which we'll talk about in detail, and with dysplasosome, which we'll also discuss. And this was done earlier by Scott Strobel and others and Joe with group one introns, and as we'll see also with RNA-SP. Now, what I'm gonna do right now is jump way ahead to very recent structural biology, which says that these predictions were correct. Here I'm showing you a zoom-in of the active site with a bunch of our favorite large ribosomes, group two introns, group one RNA-SP. And what's hard to see here is the RNA that sits, that sort of presents the sisile linkage. But you'll see that in each of these cases, there are two metal ions that are four extremes apart that are in the ideal position to provide a two-metal ion mechanism, much as you see in the polymerase, plenofragment, and other protein enzymes. And in fact, for a parallel, I'm showing you here, RNA's H, which also presents two metal ions in the appropriate position to, or to cleave the sisile bond. So the reason I show you this is that these are RNA enzymes, and this is a protein enzyme, but they're all doing the same thing. And they're all basically utilizing a scaffold with a different kind of chemical composition to accomplish the same task of RNA cleavage. Okay, so this is the reaction that they all share. And so, now we're gonna funnel down our attention a little more, and we're gonna focus on group two introns in the splicing zone. And the reason I'm gonna do this is that these two types of large ribosiline are the ones that do pre-MRNA splicing. So the splicing of messages to make mature proteins. Group one introns are important. They're important in a variety of bacteria and certain primitive eukaryotes. They're often important in ribosomal RNA splicing. These guys are what we need to do pre-MRNA splicing. So we're gonna focus on those. Okay, so a little bit more about these. Group two introns are auto-catalytic in the sense that they form a structure that folds and presents a single special backbone linkage, a single two-prime hydroxyl group that serves as the nucleotide in the first step of splicing. That releases the three-prime hydroxyl group of the five-prime axon. And in the second step, that attacks over here at the three-prime splicing, ligating the axons and releasing the variant intron. Now we know that very similar products are produced when we undergo nuclear splicing with a big machine called the splice zone. And this is what processes our RNAs in the eukaryotic nucleus. The splice zone is a little more complicated, as you can see. It is a group of conserved RNAs, and not shown here is a host of auxiliary proteins that are also important in this process. This is a unibole molecular reaction, it's a one-shot reaction, whereas this is occurring with multiple turnovers. The splice zone actually acts as a true enzyme. It operates on the set of splice sites and then it gets up, reassembles, and that's on another set, okay? So for a long time because they had similar splice sites and similar products, it was thought that they might share an evolutionary heritage. But we all questioned whether or not that was true. It also implied that the heart of the splice zone might be arrived as much like the heart of the group 2 intron. This was all speculation, and I actually didn't really engage in it very much. I just thought group 2 introns were really interesting. And I knew that it would have very cool structures because they have lots of long-range interactions that I can see by cross-linking and lots of other things that suggested that their shapes were really elaborate and different than anything we've ever seen before. So I decided to focus on those and to be agnostic about the connection of the splice zone. And anyway, I spent about 20 years studying these as enzymes, trying to figure out their structures, and more recently getting very high-resolution structures to figure out how they work. So what's the story with group 2 introns? Once again, like any form of pre-already splicing, they undergo the same two-step reaction I showed you before, where a specialized 2-prime hydroxyl group at one point in the intron attacks. You get step one, the least hilarious is step two. Now you'll notice there's a little more detail in this diagram. I'm showing you that the splicing reaction not only goes forward, it can go very well with reverse. So this reverse reaction is an important thing to keep in mind because after you excise a free group 2 intron, it's still a very reactive enzyme. And it can attack pieces of RNA or DNA and insert itself back in them through the reverse reaction. And this becomes important when you see that group 2 introns have another life. They're not just self-splicing the RNAs, they're retro-elements. And they played a big role in the dispersal of introns throughout all of terrestrial genomes. Okay, so what do group 2 introns look like? Their secondary structures show them divided into about six domains and this is largely one ounce of true over many years. They do fall in three major families, which we'll see in a while, but those all show the following things. The first domain to be transcribed after the 5-prime exon is domain one, which is the largest domain. Domain two plays a small role in the architecture. Domain three plays a moderate role. Domain four does nothing for active site architecture, but it does encode an open reading phrase. So stuck onto the periphery of this ribosome, you can have a huge gene encoded here that gets translated and the translated protein can come back and influence the splicing reaction. It also becomes the site of the protein binding. Now the most important part on a group 2 intron is domain five. It's also the only part that is highly conserved in sequence. So the thing that really confounded the initial people who discovered group 2 introns splicing were these very innovative people, was that it was impossible to learn about their architecture using just plain old genetics or by staring at sequence. There just wasn't enough conservation or phylogenetic co-variation to understand them. Almost all the conservation was here and as we'll see in our link right here. But it was very clear that after domain five, there was a domain six that contained a bulged adenosine that would attack the phyprime splicing and you'd get the first step of splicing, this would attack the second step of splicing and you'd have a laryed intron much like you get from splices on the processing. Okay, and these little sequences I've highlighted here are regions of the phyprime axon that are recognized through base pairing with regions of domain one. So these guys base pair, these guys base pair and you can change them to anything you like and preserve the structure. Okay, so that's how group 2 introns are organized overall. Now, one more slide about what they do. They told you that they have a second life and they do. You don't really want a free group 2 intron working around because they do the following thing. After they're in size and if their open reading frame is translated that resulting protein comes back and forms a revenue clear protein particle called a group 2 intron polo enzyme. So you have this RNA protein complex between the group 2 intron and it's encoded protein which is called a match brace. This complex then does an amazing reaction. The intron can recognize on DNA sequences that look a lot like its recognition sites within the intron and it can reverse splice into the sense strand of a homologous piece of DNA. And then the protein partner, it has a DNA and a nucleus domain and it can clip the anti-sense strand and then if that weren't enough amazing stuff that this protein does the biggest domain of this protein is a reverse transcriptase domain which then comes back, uses this as a primer and makes a complete DNA copy of the inserted RNA intron. So this is one of the ways that group 2 introns came to just find any genome that they entered and to proliferate within those genomes. There are certain organisms, eublina being one of them, that in which 30% of the genome is now group 2 intron. And that has had a big impact on how those organisms diversify and appropriate. So anyway, that is one of the other kinds of things group 2 introns do. Okay, so we studied both the splicing and reverse splicing reactions of group 2 introns for many years working out chemistry, enzymatic function. And we reached this point where we really couldn't get any further until we had high-resolution structural information on this system. So the field was really at a standstill. So if we wanted to really understand the chemical mechanism, how the RNA actually organized itself to be a ribosome, to be an enzyme-acrocyte, we needed high-resolution structural information. So we needed to go from this, which is the secondary structure of the intron we were working on, which is from the east, we're from Kavria. We often call these secondary structures roadkill maps because it basically looks like you're earning after you've run over it with a truck. It's completely flat. And so it definitely lacks functional information if you care about enzyme-acrocytes. If you add magnesium to basically the roadkill map, it'll fold up into its native tertiary structure. And this is the tertiary structure model of this particular intron. And it's very highly organized. And we spent the next 10 years or so studying how to sort of investigate the function of tertiary architecture. But first, we needed to get high-resolution structural information. So when I embarked on this, people said you'll never get a crystal structure that hardly is good. You should just give up now. So I actually, in parallel, did a lot of cross-linking, a lot of chemical probing, lots of other things, which was good because that data enabled me to later validate the crystal structures I got. But nonetheless, we ignored all these errors. And we spent a lot of time looking for group 2 introns that would fold at very low magnesium concentrations and that would be very, very stable. We finally identified one genome of eudacterium in oceanic bacillus agensis. And after we transcribed this, we folded it into a typical group 2 intron structure. And then we made many, many constructs. I should say the first construct that we made of the OI intron, when we put it in crystallization drops, actually crystallized pretty readily. When we looked at the diffraction, it was at 20 angstroms. But we were very happy. We were at the synchrotron, and the staff member of the synchrotron said, this is the ugliest diffraction pattern I've ever seen. This is the most beautiful diffraction pattern I've ever seen. So we were okay because your first iteration is just a start. When you want to solve an RNA crystal structure, then you actually do the application of the architecture to get better packing into a proof of resolution. So making lots of RNAs for structural biologies, a lot easier than making lots of proteins. So we don't have to over-express this. All we have to do is make a plasmid and use over-express T7 RNA polymerase just to make as much of the RNA as one. And it has to, never has to go in and be expressed in bacteria and in yeast or anything. So we made 127 different constructs of this guy, and we varied the lengths of various stems, loop sequences, and other things, and just kept crystallizing them and looking for which ones diffracted to highest resolution. Finally, number 87 crystallized to 3.1 instruments resolution. And we've since done better. I'm going to just show you what you see when you actually finally manage to phase your data. What you see before you draw the model, and I've actually drawn the model of the back one to help your eye follow it, but when you're doing RNA crystallography, your electron density map immediately jumps out to you as having a discrete RNA structure. So yeah, you can see these nice helices. You can automatically begin modeling pretty easily into this kind of data. So we've solved many of these structures. Our best crystals diffracted to 27 instruments resolution This was phased with heavy-added derivatives several times in several ways. We used a curvidium hexamine. We also used echerbium initially. Echerbium is binded to the catalytic medallion site, which is nice. Later on, as I'll tell you about, there are very key monovial land binding sites in this RNA, and they bind to potassium selectively, and we can look at those by substituting in with phthaline curvidium. We have very good, our free values. This led to a lot of new methods because RNA crystallography is kind of still early. And fortunately, our structures agree with all the biochemical constraints that were kind of hard to find when we were doing a lot of brute force cross-lamping on the architecture. So we're pretty confident that it was a meaningful set of structures. Here's what it looks like. This is the oceanobusilus igansis domains one through five, which for this intron has always been able to get. Domain six is dynamic and we never observed it. Now, color coding here, I told you that domain one, the bioprimed most domain, folks first, and it's the largest domain. And you can see here, it almost forms what looks like a scaffold into which all the other domains dock. And we'll see actually that turns out to be true. The other domains fit into specific positions. The most important being the catalytic domain five, which contains the active site as we see. And you can see as this thing turns around, the active site left has the bioprimed splice site tucked into it that consists of phosphate. And as we'll see later, there are metals that surround us and participate in the cleavage reaction. But we'll get to that in a minute. To me, more than just the enzymology that we learned from the system, we began to learn a lot more about tertiary structural organization. What are the nuts and bolts by which RNAs hold themselves together through interactions that are not base pairing? So this shows you just a few of the many examples that we've learned just from that one structure. We learned that big junctions in RNA can be stabilized by all kinds of unusual interactions. This is a really cool interaction in which a loop region of one RNA contains sort of a slot for a base that's provided by this whole region of RNA that comes in and introduces this and interpolates it into the structure. And this is actually a conserved motif. And you can see that kind of holds the thing together like the way people used to make fine furniture where you'd actually have tiny and rude structure to it. Also, you see here stats of three-stranded and four-stranded structures. What I'm not showing you here is we don't really have time. There are lots of interactions that are mediated by the two crème microssal groups above the backbone. This is probably my favorite part. And Jim Keck was organizing his lecture around this today. This is reminiscent of what line is falling about what happened in DNA. So there's parts of this structure where loop regions interact with each other through a three-stranded motif that's pretty interesting. There are two helices that melt together through the internal loops in the following way. There's a duplex that comes up here and it flips out one of its bases and that base interacts with the third strand over here. The next base flips in and forms a canonical base pair. The next guy up interacts with the partner over here giving a zigzag appearance and having this strand actually participating in a three-stranded interaction. So line is wasn't totally gone. You can have RNA that's inside out if you do it in small doses like this. Anyway, so there were lots of really cool structural and I think that was what I was most excited about for a long time. Now we can skip over about 10 years worth of mechanistic work on the RNA folding pathway to just show you a movie that describes how the intron folds. So we actually have done a lot of a lot of biophysical and kinetic work on this process. We also saw the structure of individual domains in isolation and watched how they changed when the internet assembled. So this is the crystal structure just this domain in isolation and it is crushed together to bind the rest of the intron that has to open. So here's the closed confirmation in isolation. When it opens it is buttressed and trapped in the open state by domains two, three, four and then that opens the space between these two which then enables domain five to fit into the cleft and present itself as the active site. One of the things about this that's really interesting in this that we found between many RNAs this suggests that a lot of complex RNAs have a first counts, first folds strategy. So the first thing that gets transcribed actually folds first and forms a scaffold that promotes the faithful assembly of downstream units because RNA can missfold and make a mess really easily because it's secondary structural units are kind of metastable. You can form many different possible stable RNA secondary structures. So hardwired into the transcriptional program of an RNA is often information about this folding pathway and this really helps prevent this folding and group two introns rarely missfold. Okay, so we had learned a lot from this group two intron that we had derived from the most primitive form of group two intron called class C and it's the most ancient form. Since that time, new structures have been reporting of even bigger ones. So a more modern and larger, more developed class is class 2B and a member of this class was recently solved by my former postdoc in his own lab, Nator. And you can see it's a much larger challenge. The core architecture is identical to what we see in the 2C. And then, and this is a thing that will come up a lot in the talk, departing from crystallography more recently, Han Wei-Wang's lab in Beijing solved the structure of a 2A1 in complex with his mattress protein and that shows you the third class, the other very advanced class of group two intron. So now we know what all three of them look like. Only in this case do we know how the structure changes through the different stages of splicing and I'll talk to you about that a little more. But basically, the field has moved down real quickly so that we can sort of visualize the entire family of splicing machines and how it's diversified in time. Okay, so let's talk about determining the mechanism of chemical catalysis by this intron. Joe Cacherilli, as I said, did experiments about 10 years ago that suggested that this splices through a two-metal ion mechanism. And my very first structures on this actually show that there were two metals that were four and each one's a part. But we still needed to take a very close look at this through our crystallographic investigations. So how is the active site actually organized? Let's take a step back and think about that. So you can tell just from phylogenetic conservation that the most conserved parts of the intron are the main five and the small junction between the domains two and three were actually equally conserved and this puzzled me for a long time. What is going on with those nucleotides that they would be just as conserved as the most important parts of the intron? The answer came with the fact that they actually form one structural unit. These junction regions form a major group of triple helix with the base of domain five and then a loop nearby forms the top of that triple helix. Until this time, people didn't really think that many molecules could make triple helices in the major group because the major group is not that accessible to a transplant. It can occur and can do it over a short span. Anyway, this platform then enables another part of domain five adjacent to it to adopt a backbone configuration in which the sugar phosphate backbone curls very tightly upon itself. This is the tightest cape that's been seen in any RNA structure. And when you cram the phosphates together this closely by pulling the bases away, you create this intense electrostatic pocket that's reaching a very strong and electronegative density. It's very focused and it binds two metal ions and they're exactly four n-streams apart, okay? And we observe these in the native map just to go on density and then we confirm them through anomalous scattering of the terbium and other metals that mimic magnesium at this position. So these suggest that the intron performs catalysis as Joe suggested through that type of mechanism but we needed to be able to see it with an RNA substrate in the active site. Here you have a bi-prime splice site that crystallized in and attached to the intron and you can see this is the sysile linkage. It's perched right above the two metals and exactly the orientation that it should be for this kind of mechanism. So this was a while ago and there were a couple of things that bothered me about this data and that was the fact that in the neighborhood, like right around here and here, there was additional electron density and we didn't have the resolution and we didn't really have the data that explained what they were. So we kind of went back to work, we built new constructs that would crystallize and diffract a higher resolution and we got down to 2.7, 2.8 n-stream resolution and we began to see density that was consistent based on bond distances with monobilane potassium. Potassium has been seen in RNA structures frequently. It's usually completely dehydrated. In other words, RNA atoms often interact directly with potassium and these bond distances were appropriate for that. This was an interesting structure that we got because for two double-ion mechanisms, enzymes, it's rare to get a structure that's empty in which the metals are still there even when the substrate is not. So this was kind of an unusual, and we were excited about that. In crystallography, in contrast with priori-EL, you can actually nail down the physician's metal ions by exchanging these weak scatterers with something that's a big heavy metal ion that will scatter electrons in the gas. And for anomalous scattering, we give you a strong signal. There is an ion that's exactly the same size as potassium and that's valium. It's also super toxic, so you've got to be careful. So we thought, well, does this thing splice in valium? Another good one is rubidium. So I figured when I put together this top, there was too much structure. There was not enough actual reactions, so I want to show you the splicing gel. I'm going to show you first here the splicing of an OI intron in the presence of its typical ions, which are potassium and magnesium. This is the precursor. It hasn't spliced yet. This is basically the intron product and the largest exon product. Now, group two introns of many enzymes, many, do not function in sodium. Sodium poisons a lot of different enzymes, not just iron and enzymes. And that's because we're potassium and magnesium estes. Most of our cells are really the most sort of highest concentration of ions or involved these two ions. So then I said, well, what happens when we replace this potassium with valium? So we saw that it actually loves valium. Not many things actually function better in valium. This guy really liked valium. He liked rubidium. And then we solved a whole bunch of new structures. We solved 14, up to about 20 additional structures of the same construct in these different ions and different ion combinations. So I just cut to this one. The valium and magnesium case, when you then look at the difference map for melanoma scattering in that case, you see mass of the melanoma scattering in the valium case, particularly at this site. And also pretty strong at this site. You also see it in rubidium. And this understands that in the cesium as well. So what this did is it enabled us to unambiguously assign those positions of electron density right in the act of, say, as being monovial capitals. That was amazing because it meant that this wasn't just a simple two-metalline mechanism. It meant that group two introns capitalized splicing using a heteronuclear metal ion cluster, which like kind of more advanced protein enzymes. And I'll show you by that here. A metal ion cluster is a group of metals that share ligands and help to coordinate as the binding site of the enzyme that they're in. And K1 is part of the cluster. It's shared with M1 and M2. And in fact, what we see in structures that I'm not, I don't have time to show you, when you blow out this site, M1 and M2 can no longer bind. They're gone. You have empty introns. And so what K1 does is it helps to organize the ligands such that they will effectively bind at M2. And M1 is additionally stabilized by these. And in one of our structures, we actually can see water undergoing inline attack on a 5-prime splice site. And what you can see then, when you leave that in another structure, you can see that then the cleaved product, which is a 5-prime phosphate, is then organized by K2. So in the first stage of the first step of splicing, you get inline attack and the involvement of K1. And then K2 plays a role in managing the product after that first step and organizing them to get ready for the second step. So this was exciting because it meant that all of these metals in this vicinity have a role to play in chemistry. So up to this point, what did we learn? It's taught us that monovalence can play a major role in RNA chemistry. Potassium is really, really important in RNA structure in catalysis. K1 and K2 each play a distinct role. Rooftos are not simple two-metallion enzymes. They contain a heteronuclear cluster. And what I'm not showing you is that you often see in protein enzymes that have two-metallion mechanisms, you'll see lysine side chains in the same position as my potassiums are within the active site. So charges are placed at those same positions that do the same thing. Okay, so what were the implications of any of this for evolution? Typically, our crystal structure told us that we had a set of tertiary interactions that made a triple helix between part of the two-intron into being five. You see that in multiple two-introns and then we did a phylogenetic analysis of the U6 spliceosomal RNA. And we predicted and saw a similar co-variation between part of the catalapse, which is a concerted part of U6, and the base of the stem that it makes, U2. And then there's a similar bulge in U6 that binds megalions. And part of that bulge we predicted to make a center of a triple interaction. So we made a prediction that the U6 spliceosomal RNA is going to be in the substructure that's a lot like domain 5. And using this basic idea as sort of a roadmap, Joe Bichirilli and John Staley's lab made a metal ion specificity switches and mutational changes in the triple helices and basically showed that the exact same arrangement of metals and triple helical bases is present in this spliceosomal, okay? So through these two papers, they basically were able to demonstrate that the nodes and bolts of this active site are carried over into this much more advanced one. And there were already hints of this. And early structural work by Sam and by Dave Brown, others who were looking at the U6 structure and not just in isolation, but in conflicts with the variety of proteins that it's going to have a very domain 5 architecture. And this suggested again, that there was a parity between those two systems. And then more recently as we'll see, these spliceosomal architecture has been elucidated by cryo electron microscopy. And in those structures you see the triple helix we elucidated, you see the locations of the metals. So this suggested that in terms of evolutionary relatedness, group two introns are related to spliceosomes through their RNA components. The heart of the active site is very similar in both of them. But what about the fact that group two introns use mattress protein and spliceosomes are full of proteins? Is there a relationship between these two? So group two introns have protein partners. And as I said, they're encoded by domain four and they come back and bind to their parent intron. And we already knew that mattresses were sort of like a sophisticated form of a reverse transcriptase enzyme. There's a portion of the mattress that is basically a reverse transcriptase domain with fingers and palm. And then an X domain that functions basically as a thumb. And what I'm showing you here is just the HIV or T-dome. And so based on analysis of sequence, it was pretty clear that mattresses had a voyage on some kind of RT. This part is important for binding, this part is splicing. So that you knew the thumb played a role in facilitating the splicing and retro transposition that you didn't know would apply. The big question was, are there parts of the spliceosome that look like these mattress proteins? And so in this favor, these investigators actually were able to build an alignment of mattress proteins that wind up nicely with a protein called PRPA that is always right at the center of action when spliceosomes undergo catalysis and chemistry catalysis. And you can see that there's really, really good conservation between these two, suggesting that part of PRPA is a sort of reverse transcriptase-like domain. Okay, so up to this point, nobody may be able to solve the structure of a root two entrant mattress. Or any of the related reverse transcriptase. Because they're also related to reverse transcriptases that are important in L1 retro transposites. So line elements, other things, which make up 10% of our genome. So this is an important family of reverse transcriptases and they were uncharacterized. So because we failed to do the classical ones, we actually used bioinformatics with a bunch of different criteria, such as these, to identify ultra-stable mattress proteins from the database. And we came across two that are actually found in human gut bacteria. Not in any kind of cancer-driven bio or anything else. And these guys crystallized to beautiful resolution. And I actually have to show it to you, because as you may have noticed, we've mostly been doing RNA crystallography and resolution. And we're really happy when it's in the high twos. So I haven't seen beautiful data much in my life. So when these guys crystallized to 1.2 angstroms resolution, we were really thrilled to actually have nice data to work with. So I have to actually show you that data. Anyway, so when you solve the structure, sure enough, these guys are beautiful reverse transcriptase-like domains. They have a finger on domain. And we also were able to thread up the one element and other non-LTR vector transpose on reverse transcriptases. So these represent a first-in-class type of structure. They all this crystallize as dimers, which in our hands, we find to be functional. So this dimer interface, we believe plays a role in splicing and vector transposition. We can talk about that later. These are fully active reverse transcriptases. The active site is quite canonical with other reverse transcriptases. And I'm just showing you this to be braggy. Like, here is an aromatic side-chain for a whole minute. So that's your tool, isn't it? But anyway, here's the catalytic medallion in RT. This is a very processive, very active reverse transcriptase. And it's even active. Not so processive, but it's even. Here's the full length, and my gel is a little curved. The construct that lasts the thumb is even a reactive reverse transcriptase showing that the thumb is a processive new vector. OK, so why was this result important? This result was important because when we then looked at homology, structural homology, using a dolly search between our RT and anything else in the database that was an RT, the closest relative was the RT-like domain from PRP8 and spliceosome, which had been solved earlier. In fact, other RTs from HIV and other things were gary divergent. In fact, the only thing that comes close to these two guys are RNA-dependent RNA polymerases from polygomerases. So clearly, group 20-term naturalases and the sort of protein heart of the spliceosome are basically related. And those guys are not related to conventional reverse transcriptases, sometimes from natural viruses. They're related to RNA polymerases, which is kind of cool. OK, so I'm going to try to look through the rest of this so you guys don't fade away. Basically, this is just showing you the relatedness amongst the naturase and the RT-RPs and PRP8 by a dolly search. And so not only are the RNA components of group 20-term related to spliceosomes, but the protein components are related as well. And how they fit together to catalyze splicing in a cell, that's the next question. And we now know that, actually. And we know that from the revolution in private electron microscopy, because in that case, we can visualize an entire spliceosome. And this is the structure that was reported in nature from Kiyoshi Nagai's lab, mostly because he made a beautiful primal session, which you need to know in the next slide. And this is the group 2A in Tron, the Sulfur cryoen from Hanley-Wong's lab. And you can see that the RNA active site here is in red. And it interacts very similarly with the RT domain of both. And it's hard to see this, because everything is so big here. But what I can tell you is the most important part of the protein's interaction in both cases is that this thumb domain is interacting with the position where the 5-prime metson is recognized. So the 5-prime splice site herons are stabilized by the thumb domain in the group 2 in Tron, RTs. You can see that in here, and I'm kind of pulling it out here. So here's domain 5, here's the 5-prime metson, and the elements that recognize that. So that is primarily how maturase stimulates splice in the retro transposition. Here again, I'm showing you similarities or the different RT domains and RTs that can crystallize or salt the cryoen in only systems. And basically what I want to show you here is that even in the splice's domain, this is the thumb domain from that RT-like protein. And you can see that it's interacting with the 5-prime splice site herons that are made for you. So this is the similar type of interaction that we see between the 2-prime 5-prime splice sites and its recognition site with the 2-prime 1. So in both cases, the nuts and bolts of proteins and the RNAs even interact with each other using similar strategies. And this is just more about how the active sites are similar comparing group 2 domain 5 with 5-prime splice site with the splice's own, they're very, very similar. They both have the triple helix. They both have the diagonal metal ions, so we should talk about that. That's hard to nail down by currently yet. And again, here it's the 5-prime 1. So it's the same location. So what this boils down to is that all these different pre-MRNA splicing machines are basically undergoing catalysis using a related set of machinery. So these are related ribosomes, as it were. And all these splicing machines are metallo-inside, and they use a very similar strategy with the 2 diagonal and metal ions that are important for cleavage and elongation reactions. This all means that RNA splicing is very ancient and it originates from invasion by transposon. Remember, group 2 introns are mobile genetic elements that just came from sort of parasitic element that are initially inhabitable bacteria and then was brought into imperivance during sort of early stages of the imperiative evolution. So this RNA splicing can evolve in a unit molecular process as in the 2-introns to something that proceeds with multiple turnover. And then the splice's own is nice. It's able to actually act on multiple splice sites. And so it's acting like a true enzyme, which gives it the capability to break that one gene, one protein barrier. And RNA's splicing reactions are stimulated by ancient family proteins. And these RT-like proteins didn't originate from other RTs. They originated from viral RNA from embraces. And so that means that in both cases, eukaryotic RNA-pricing processing machine came about through the taming of parasitic genetic elements in one case of transposon and another case of RNA virus. And so from that, we got our RNA processing machine. It's a pretty good deal. And so I'll finish with that. And I will thank the grad student who did almost all the new work that I talked about today, Chen Zhao, who's now a student in my lab. Although Mark Lamarcia, this picture appeared earlier, and he's a group leader now at the EMEL, he did the work on the metal ion interactions during the scattering. We're also indebted to Matt Tuor, who was the pioneer in uterine drug crystallography, studying in my lab over at the ROHA, to treat St. Morifu, who's now a PI at Drexel, and Joe Lieberman, who were really indebted to the teams at NEPAP, the APS, HMI, NIH, and then if you want to see more about what we do, just go to my website, which we try to keep current. And then I also just want to thank the whole lab. This is on our recent kayaking trip. Go right off of the 1.9% of our own water lab. And thanks to you for your attention, you know. Do the match races ever decrease the requirements for the accessory on the test, sort of helping the RNA folder? Do you think there's always need for those accessory potassiums that are so important? You did activity with the match rays. We're not really seeing that much change in the ionic requirements in uterine. So that's probably alleviated by other proteins in the cell. But what we have seen that's pretty interesting, in which I didn't really go into it all, is that in the absence of match rays, group two latrons in a test tube tend to splice using a hydrolysis reaction. They use water in the first step of splicing instead of the branching reaction. When you throw in the match rays, it's entirely two-branching and polarity formation. And it's like 100 times faster. So, yeah, we haven't published that yet, but it's really exciting. I mean, it's a vast stimulation. But still, the chemistry is done by the RNA, but really I think the match rays is contributing to just architectural organization, the brain, where the atoms together work better. Why would the match rays pick out stable match rays that you thought would be more likely to crystallize in the absence of structure? Yeah, you know, there are just lots of programs that you can use to grind your sequences through, to look for short regions in which the protein is likely to just be natively unfolded. Kind of the isoelectric point has certain contributions. There are just many criteria that are indicative of a more stable protein that you can just apply iteratively to sequences. And just like the final show, Chen basically just went out and down. She had a few winners, like three winners. These were the ones that we went after first. But what's exciting to us, that I didn't talk about, is that this is not a thermophilic protein, but it is really stable. And it's a fantastic protein. In fact, we're going to actually distribute it so you can do end-to-end RNA sequencing. Groups of entrons, RTs are nice because they're used to plowing through structured RNA. So this guy that I described today will copy the entire hepatitis C virus genome, which is 10,000 bases, very quickly end-to-end with very few awards. So they're useful enzymes. And so we're actually developing them for study, especially alternative supplies. The potassium is, for example, reminded me of Sue Chen's work, where she was able to use different salt conditions to push the reaction forward for that kind of thing. So I was then wondering two things. So one, or two, in terms of the different salt conditions, change the forward versus reverse reaction. And if you think there may be potassium acytosis, spliceosome, or could those be occupied by protein? Yeah. Okay, so this is where I should interject. Well, I'm gonna come back to the point about Sue Chen and the riskability, but I should interject something about this, spliceosome structure. You can't do anomalous scattering or related experiments yet doing cryo-electron microscopy. So it's very hard to unambiguously establish the sites of metal that's in a cryo-EM structure. So if you see a cryo-EM structure to see all those metals, you'd be very, very afraid. And, actually, we'd like to change that. There's probably some ways to plan that. So we don't know really that well. The other important thing about cryo-EM structures is that these are done on native RNAs that are isolated from an organism. The RNAs have not been transcribed in a test of them. We fold it like the ones we've done with crystallography. Native RNAs have lots of modifications on them. So people are modeling metals where there may be an RNA modification. So we still really have trouble interpreting the existing maps of the spliceosome to know where the metals are. But based on not just RNAs age, BAMH1, lots of other very well-studied protein enzymes, I think they'll probably be amino acid side-chains, positive charge side-chains that are, like, playing a role in the K-1 case. And you're right. I mean, Su Chen's results were fantastic because they basically suggested to us that the spliceosome has some mechanism for control by these auxiliary metal sites. I haven't seen a... I haven't seen... So it's hard to drive splicing in reverse unless you have a large amount of product to drive it in that reaction. Also, remember, as soon as you splice, let's imagine you reverse splice into a product, you're not going to splice out because the polymerase is gonna trap it by copying it. So there are lots of, in the group two system, lots of systems for trapping you in the forward direction or the reverse reaction, and the same is true in the splice zone. Because, as you know, the splice's own reactions can go in the reverse tip. So, and yet, reversibility, we think maybe a mechanism for proofreading in both systems up to the point where we trap it. So it may be useful.