 Alex, test, test, test, hello, we're ready. Okay, what do we have? Four classes left. All right, so a little bit of a job opportunity up here. If you're interested, let me know. Had to take advantage. All right, so we covered replication last time. This time we're going to be covering transcription. And you'll see that there are similarities and differences. Of course, we're going to be making RNA molecules from a DNA template. So many of you have had various classes that talk about transcription. Hopefully there's something new in today's lecture to keep you engaged. Let's see if this, why is it not going for the strange? All right. Okay, so we're going to first look at the process of transcription, and then we're going to look in a little bit more detail at various processes used in processing RNA molecules. Many of you have thought about splicing before. We know eukaryotic RNAs are 5 prime capped and 3 prime polyadenylated. And we know that many non-coding RNAs are also processed. So from one long initial transcript, you cut into smaller pieces. That's going to be annoying. Why is it not advancing? Because it's a new version of PowerPoint. That's right. All right, let's see what we can do to make this easier to advance. Sorry, guys. There we go. That ought to help me. All right, sorry. Okay, so we have a variety of different RNA molecules. You have so-called coding RNAs or protein-coding RNAs, M RNAs, but then there's a variety of non-protein-coding RNAs. Many of you are familiar with ribosomes. Actually, the catalytic function of ribosomes is carried out by RNA molecules. Transfer RNAs are also important in translation. Small nuclear RNAs are important in splicing, and we'll see that in some detail today. We're not going to cover in today's lecture a variety of naturally occurring inhibitory RNA molecules, but we'll cover that just briefly, I believe, next Tuesday. So you have small interfering RNAs and microRNAs. Okay, so we've covered replication. You know, in replication, you're actually copying both strands of DNA and transcription. You have to select one of the two strands and only transcribe one of those strands. Okay, and so it's important. The process of initiation is how the cell decides which strand to transcribe. Okay, and so we have a template strand and a non-template strand in transcription. It's going to take some getting used to. So we've talked about DNA polymerases when we talked about replication last time, and we also mentioned that DNA polymerases always require primers. Remember, we had this primase enzyme in replication that laid down an RNA primer that initiated DNA polymerization, whereas in RNA polymerases, which we're going to be talking about today, you do not need a primer. It means initiate synthesis of RNA molecules, so-called de novo, just starting from ribonucleotides. Okay, and so both of them require a DNA template. We're only copying one of the strands in transcription. We have a 5 prime to 3 prime polymerase activity, so the directionality is the same. We're forming the same bonds, the actual chemical catalysis, catalyzed by these two different enzymes is the same. We're making a 5 prime to 3 prime phosphodiester bond. We need nucleoside triphosphates. Here we have ribonucleoside triphosphates instead of deoxyribonucleoside triphosphates. Now, DNA polymerases, as we mentioned, had that second active site, so if the wrong base was incorporated, the DNA polymerase would pause and shift to the other active site, where the 3 prime to 5 prime exonuclease activity would cut out that base. In RNA polymerases, we don't have this proof-rating capability. Can you understand, or can you guess why that might be? Why we wouldn't need a exonuclease activity? I mean, don't we want to make the right protein? Does anybody have any ideas about that? Yes. That's true. There's this wobbling phenomena that we'll talk about, but also, RNA molecules are turned over, so their lifetime is very short, so in prokaryotic cells, minutes to hours, and eukaryotic cells, hours to days. So if we make a mistake, we're only going to make a mistake in synthesis of that one RNA molecule, and that won't be propagated to our progeny, so the consequences of mistakes are less severe because of RNA turnover and because we're really using this to translate not to propagate information and genetic information. So we just don't need it. So there will be some RNAs that have mistakes and potentially, you know, so maybe when you express a particular gene, you make hundreds to thousands of RNA molecules and each of those are translated in parallel, so maybe if one of those thousand has a mistake, the gross majority of the protein produced won't have that mistake because the majority of the RNA molecules will be of the right sequence. Okay, so that's a good point. Oh, this is taking some time to get used to. So the synthesis that occurs is, again, the 5-prime, 3-prime phosphodiester bond formation, so it's important to realize that the first nucleoside triphosphate that's incorporated maintains its triphosphate. Okay, so we're going to take this position here, the 3-prime position, 3-prime hydroxyl is going to react with this phosphate. Remember, we're going to lose pyrophosphate, pushing this reaction forward, and we're going to extend the chain from the 5-prime to the 3-prime direction. And so this is de novo, and that word means that you don't need a primer. So the chemistry is exactly the same here. So base pairing directs the incorporation of monomers into the polymer of RNA, and we have the first, the positioning of the ribonucleotide triphosphate and then the nucleophilic impact of the 3-prime hydroxyl on the 5-prime phosphate, kicking out pyrophosphate, making the new phosphodiester bond. So this is exactly the same, except instead of T's, we'll be incorporating U's in the growing RNA polymer. So we have two strands. So in DNA replication, we're copying both strands. Remember, we have that leading and lagging strand synthesis. Here, we have a so-called non-template or coding strand, and then we have the DNA template strand. So the direction of the synthesis of the polymer RNA is directed by the template strand. And this is anti-parallel base pairing, and we're synthesizing from the 5-prime to 3-prime direction. The non-template coding strand, we're not base pairing with that, but if you look, we have the exact same bases in there, except for the substitution of U in our RNA molecule for the T's that are in the DNA molecule. So it's important to recognize and learn how to refer to each of these two strands. Okay, so this is the base, the RNA polymerase. And so here we have RNA polymerase bound to our duplex DNA, and we need to unwind it. So initially, the DNA is all wound up into a helix, and if we want to transcribe this, we need to unwind the helix. And it's actually, there is not, in prokaryotic RNA polymerases, there is not an associated... There's Sarah. Hi, Sarah. Good to see you. There is not an associated ATP hydrolyzing helicase. So it's just the action of adding bases here pushes this forward and unwinds the DNA. And so we're going to need topoisomerases both in front and in back of this replication or in this transcription bubble. Okay, and so as we push this forward, we're going to tighten this up, make positive supercoils, topoisomerase can loosen that up, and then as we come back together, we're going to be a bit underwound. We're going to have negative supercoils. And so we want to have just the right amount of supercoils after we transcribe our gene. Okay, and so here is the RNA molecule. There's about eight base pairs, hybrid base pairs, RNA-DNA base pairs within this transcription bubble. And so we have two channels, incoming ribonucleoside triphosphate come in to here. They're added to the three prime end of this RNA polymer. And then the five prime end of the RNA molecule is extruded, it's sort of like a tuba toothpaste, and it's squeezed out of the backside of this polymerase. Okay, and this is extremely fast. So during the elongation phase of RNA polymerization, about 100 nucleoside triphosphates are added per second. So it's about 100 hertz, it's very, very fast. And so as we've talked about with replication, there's a variety of stages. We have initiation, elongation, and termination. And we're going to be comparing and contrasting prokaryotic and eukaryotic transcription. We're going to start with prokaryotic transcription. And so the initiation of transcription and prokaryotes occur from the promoter. And here's the sequences of the promoter. So the beginning of the de novo synthesis of the RNA molecule occurs at this plus one position at the start site. But upstream of that, we have two different regions, or actually three. And these regions actually interact with the machinery, the transcriptional machinery. And so we have this sequence, T-A-T-A-A-T. Not coincidentally, the bases that are being chosen are going to help us to pull these two strands apart because we need to be able to open up a transcription bubble. And so here we also have another sequence. And then up here we have a so-called up element. So this is called the minus 10 region, the minus 35 region in the up element. And this is a consensus sequence. So this is for actually E. coli. And different RNA polymerase subunits bind to different consensus sequences. So we're going to learn about sigma subunits. So sigma subunit is the actual polypeptide that binds to the promoter. And so sigma 70 binds to this particular sequence. So the closer that a given gene's promoter is to the consensus sequence that tighter the binding of the sigma factor. So in prokaryotic transcription, it's all about making very tight binding of the RNA polymerase to the actual DNA. And so here's a variety of different genes. We're going to be looking at these in detail next Tuesday. And you can see in general, they're very close in sequence to the consensus sequence. So they make relatively tight binding to the sigma subunit. Does this make sense so far? You might have had much of this. I apologize. All right. So this figure is a bit confusing. So this is the initiation and the elongation steps of RNA polymerization in prokaryotes. And the important critical point that you get here is that these two, all of the factors come together in the RNA polymerase before the RNA polymerase lands on the promoter. So the sigma 70 subunit and this catalytic subunit, the alpha 2, beta, beta prime, omega subunit, these two proteins bind together. And then, and only then, can this RNA polymerase holoenzyme bind to those particular consensus sequences in the promoter. And so the binding to the promoter positions the RNA polymerase for optimal, for initiation at the correct location, right? And so here, it's a stepwise process. First, the proteins associate. Then they bind to the DNA in a so-called closed complex. So this is before we've melted the two strands or dissociated the base pairing in the two strands of DNA. The orientation of this binding is determined by the promoter. So we can actually transcribe to the right or the left, depending on which strand we're selecting and the way you select is by looking for those consensus sequences. Only in one direction would you have the consensus sequence. To select the proper strand. So we bind initially the holoenzyme binds to the promoter in a closed complex. We then melt the two strands apart, separating these into a bubble, forming the open complex. And then, and only then, after forming the open complex, can we initiate transcription, okay? And so once transcription has started, we leave the promoter and we jettison the Sigma 70 subunit. Okay, so now we just have the catalytic subunit and this jettison action is pushed along by competitive binding of a protein called NUSA. So NUSA binds where Sigma 70 used to bind, kicking out Sigma 70. And the whole name of the game here is don't stutter. If you slow down during transcription, the RNA polymerase will just fall off. So the NUSA factor helps the RNA polymerase to be more possessive, okay? And so we then extend the RNA molecule until we come to the end and then we release NUSA, our nascent RNA, and we reuse the catalytic subunit. Okay, and so this is a cyclic process. Any questions on that so far? Yes? Ah, it's the termination. So there's two different ways that terminate, but when those terminations occur, that causes NUSA to fall off and cause the RNA polymerase to fall off the DNA. So we'll get to that in a moment. That's a great question. Another question real quick. Can you explain again what the consensus sequence is? Ah, this is the critical point. I'm glad you asked. So this is the consensus sequence. These are the three segments of sequence. The consensus sequence, you can think of that as the prototypical sequence that would make the tightest binding to sigma 70. So this is not necessarily a particular gene. It could be a particular gene, but you can see for... here's a list of actual real-world genes, and each of these is pretty darn close to that consensus sequence, but it's not necessarily exactly the same sequence. If it were, it would make the tightest binding possible. And so that's what's helping us to initiate the transcription. Any other questions, Alex? Good. Okay, so moving along. Okay, so we're elongating, so we're adding ribonucleoside triphosphates and extending this chain until we get to the end. But how do you know the end? There's got to be some signal from the DNA that says, stop, we don't need it. I mean, otherwise it would just keep going forever. You make massive RNA molecules. And so there's two different mechanisms for termination of transcription. These are exclusively prokaryotic termination of transcription. You could add that to your slide. So there's so-called row independent and row dependent. So row is a helicase protein, and we're going to first cover the termination mechanism that's independent of this helicase. Okay, and so in row independent transcription, you're going along fine until you get to a particular sequence in the DNA, this adenosine-rich region. And so when you get to that region, that catalyzes two things. So there's two different forces that are causing the end of transcription here. One, the weak base pairing. Remember AU, you only have two hydrogen bonds compared to three. Number two, a secondary structure forms in the nascent RNA molecule. So you have a stem loop structure. And remember I mentioned there's about eight nucleotides of DNA-RNA hybrid duplex within the transcription bubble. What this secondary structure does is helps to pry those base pairs apart. So you're exchanging base pairs, RNA base pairs for the RNA DNA base pairs in this hybrid duplex. It's normally found in the transcription bubble. So for every RNA base pair that you form in the stem loop, you have one less DNA RNA base pair. Okay, until you get to a critical amount of base pairing where it's just too unstable to stay together and everything dissociates at that point. So this is referred to as row independent. Again, two requirements. A weak base pairing, AU base pairing, and a stem loop structure that unzippers the RNA DNA hybrid duplex. Okay, does that make sense? So that's row independent. And that's just the sequence of the actual DNA, which is causing that. So we don't have a protein coming in there. Now row is a helicase. And so another way to terminate transcription is by particular sequences, these so-called rut sequence. So rut sequence initiates the binding of the helicase to the RNA molecule. So our transcription is going in this direction. We're synthesizing from the five prime, three prime direction. And this helicase just snakes its way up the RNA molecule and it actually gets to the duplex RNA DNA and then it unwinds it in an ATP-dependent process. Okay, so this is an active unwinding. So there isn't necessarily a requirement for AU base pairing there. And there isn't a stem loop. Okay, so there's two different ways, and this is prokaryotes only. So E. coli and other similar organisms. Any questions? Okay. So prokaryotic genes are organized differently than eukaryotic genes. So many of you know that each promoter leads multiple structural genes. And so here you have one promoter and each of these genes encodes for synthesis of a different polypeptide. Okay, so you have gene A, B, and C. So oftentimes in prokaryotes you have metabolic pathways, the genes of the enzymes that are catalyzing metabolic pathways are all expressed as a unit. Because generally, you know, if you want a pathway to be active you're going to need each of the enzymes there. So this is a so-called polycystronic message. So we're making, we initiate transcription. We have a promoter. We have regulatory and general and prokaryotic transcripts or in DNA. You have repression as the major mechanism of regulation here. And so it's the complementarity of the sigma subunit binding to the consensus sequence that provides tight binding. And then you regulate generally by turning off, although there can also be activator sequences. And generally, these regulatory sequences are either within the promoter or upstream of the promoter. So this is a polycystronic message. So if you're going to have more than one functional gene within your message, you're going to need a way to internally initiate translation, right? So ribosomes have to land right in front of each of these genes and translate just a single polypeptide. We don't want to make just one long polypeptide. We'll look at how that works on Thursday. So eukaryotic genes are not arranged in these operons where you have multiple functional genes. They're monocystronics. So you have one promoter and then you have one functional gene. And so in the regulation of eukaryotic genes is much, much more complex than prokaryotic genes. In general, in eukaryotic genes it's the activation of transcription. The transcriptional machinery binds pretty weakly to the promoter and it's activating sequences. They can be way upstream. They can be downstream of the gene. They can be within introns of the gene. So it's very, very complex for eukaryotic transcription. So I think that's it for prokaryotes. I have a clicker. Are you ready? All right. I think we're going to continue. Everybody submit their vote. Come on down. So it's sort of really two possibilities here that are plausible, right? Is it off? Okay. So let's look at the... Oh, I'm on the wrong computer. It doesn't work. All right. So let's look at this. So we initiate transcription and then promoter is cleared. Three, four. Okay. So it is B. Some of you said A, where you've swapped these two steps. We shall move on now. All right. So now we're going to switch gears and think about eukaryotic transcription. And of course, things are much more complicated. And so we have not just one RNA polymerase, but more than one. So RNA polymerase two is the polymerase that's going to make our messenger RNA, which will make proteins eventually. You know, after we make our messenger RNA, those will be made into proteins. But these non-coding RNAs are made by polymerase one and polymerase three. And so ribosomal RNAs are made both by polymerase one. And here's one ribosomal RNA is made by polymerase three. Transfer RNAs are generally made by polymerase three. And then we have these small nuclear RNAs. These are important in splicing, but they don't code for proteins, but those are made by RNA polymerase two. So as you might imagine, things are going to start simple with the prokaryotes and get a lot more complicated. So as I mentioned, in general, there's a weak binding of the transcriptional machinery to the eukaryotic promoter. So just as in the prokaryotic promoter, we need to initiate de novo RNA synthesis at a very particular spot at this initiator position. But then we have a variety of regulatory sequences in the promoter, so the so-called TATA box or TATA box if you're a little bit less corny. And so that binds to a TATA box binding protein, a TBP. I'm corny, so I get to say that. But in general, there's very low affinity. And so activation of transcription is going to be important here. And so we can have regulatory sequences upstream, five prime to the gene. We can have them within the sequence and introns. And we can have regulatory sequences downstream or three prime to the gene. And so here's the process. This figure, they've updated it this year, and I think they didn't necessarily make it more simple. There's one important point here, is that in eukaryotic transcription, you assemble the RNA polymerase piece by piece after binding to the DNA. So remember, in prokaryotic transcription, you assemble everything. You take the sigma subunit, the catalytic subunits, bind them together all at once. They bind to the DNA. So here, we're starting this off in a very stepwise process. So we have TATA binding protein binds first, and then and only then can each of these additional transcription factors bind. So the nomenclature here, TF transcription factor 2, meaning that it's associated with RNA polymerase 2 and a variety of letters. And these are the order in which these were biochemically characterized. So the lettering itself is just something to remember. So we have TF2B that comes after TATA binding protein binds to the TATA box. And that TF2B helps to recruit RNA polymerase 2. So this is drawn as just one sort of squiggly sausage looking thing. But there's actually, I think, about 12 polypeptides within that RNA polymerase itself. There's a table coming up in the next slide. So that comes on. And then and only then do you recruit TF2E. So TF2E is the link between the RNA polymerase, the catalytic part of the RNA polymerase, Paul II, and the helicase that's important in unwinding. So conveniently, TF2H is the helicase. H for helicase, it just worked out that way. OK, so now as we assemble these step by step, one piece on the other, we eventually form this pre-initiation complex. But then at this point, it's very similar. So initially, the DNA is not melted. We have just full duplex DNA. It's called the closed complex. We then open or unwind the DNA. So that occurs with the help of the helicase. And then we cast off some of the promoters or some of the transcriptional machinery. Just like we remember, we cast off sigma 70 prokaryotic transcription. Here we're casting off a variety of different proteins and leaving ourselves primarily just with the RNA polymerase itself. And then we open the duplex. We start the synthesis. We initiate transcription. RNA molecule starts to extrude through the exit hole. But now we have a gauntlet that's set up. And so there's this C-terminal domain, or one of the polypeptides within the RNA polymerase, has a bunch of serines, thrinines, and tyrosines. And those become hyperphosphorylated once we've cleared this promoter. OK, and so there's, I think, TF2H, I believe, can phosphorylate these as well as one of the elongation factor. Also, phosphorylates this tail. And the phosphorylation of that tail in the RNA polymerase itself helps to recruit a gauntlet of post-transcriptional modification machinery. So when we're doing transcriptional new carotids, we're in the nucleus. So we need a five prime cap. That machinery is recruited there. We also need a splicing machinery. That's recruited there. And then the polyadenylation enzymes are also recruited there. And as the RNA is extruded out of the polymerase, it passes immediately by this enzymatic activity. It's an assembly line waiting for the nascent RNA. So RNA is extruded. We elongate. There's a variety of factors involved in elongation. And then at the end, we terminate. And the process of termination is actually the addition of the polyA tail. So the enzymes that add the polyA tail are also associated with an endonuclease that cuts the RNA. And that's how we're going to terminate. We'll see that in a moment. So we have phosphorylation of the C-terminal domain of RNA polymerase 2. During elongation, we have phosphodiester bond formation. And then we clear the promoter, leaving back a bunch of these initial proteins that we assembled. This is an initiation of eukaryotic transcription. Any questions on that? So it's a little confusing the way they've drawn this. It sort of looks like it's unsure. So there's TBP, binds DNA, then this binds, and then this binds. We're not preassembling. And there was a question back. Maybe I forgot that we had a question. Going back to row-dependent termination, how much ATP is used? That is a fantastic question. I do not know the answer. So an interesting question, like, for each base pair that's melted, how much ATP needs to be hydrolyzed? And it seemed you could conceive. I could sort of do an in vitro experiment to get at that, but it's not known. It's going to be lots of ATP, because it's actually whizzing through the sequence, unwinding things. So the helicase is unwinding things, especially at the DNA RNA duplex. A little bit of a side tangent. But a wonderful question. So we're going to move on. So we've initiated, and look at all this insane amount of complexity. That PUL-2 alone has 12 different polypeptides, 12 different functional genes that assemble to make an RNA polymerase 2. And these are all these transcription factors. Remember, stepwise assembly at the promoter, starting with Teta binding protein, DVP. And so TF2H has the helicase activity, and also can phosphorylate that C-terminal domain. And one of these elongation factors that is eventually recruited also phosphorylates the C-terminal domain. Remember, that phosphorylation initiates the binding of the gauntlet of activities that are important in transcriptional processing of the RNA molecule. So this is a handy-dandy slide for when you're studying to remember all this junk. All right. So we have the, we've looked at RNA polymerases today, DNA polymerases last time. But there's other, viruses can do some funky things. So some viruses have an RNA genome, and some viruses can make copies of that RNA genome using replicases. Other viruses can convert their RNA into DNA using reverse transcriptases. And so we saw this, where did we see this? At the end of, oh, it's at the end, brain fart. Someone help me out. Where did we see it again? I can ask you. And then I'll think of it. Where do we see a reverse transcriptase before? It's a lomerase. Thank you for reminding me. Hi, I'm getting old. So there we had the RNA molecule. So that was an RNA nuclear protein, or RNAP. So an RNA-associated protein, and that RNA within the lomerase helped to template the addition of those repeating units at the ends of the linear chromosome. So, but viruses can do this as well. And so here is a retrovirus. For example, HIV, the causative agent of AIDS, is a retrovirus. And so we have an RNA genome in here, and this RNA genome has a variety of different proteins that are encapsulated in this viral capsid. This thing is then fused with the cellular membrane, releasing both the RNA from the virus, as well as enzymes that are important. And so one of those enzymes that is released is reverse transcriptase. So the reverse transcriptase takes our single-stranded RNA molecule and converts it into duplex complementary DNA molecule. And that DNA is then integrated within the host genome using integrase enzymes, which are also packaged in this virus. So once we've integrated this viral genome into the host genome, we can then transcribe it, translate it, and make new proteins, all the proteins that are necessary for the capsid, as well as our integrase and our transcriptase. And there's also protease, which is important in clipping up the proteins that are made, that are used in the synthesis of the capsid. And so how does this... The mechanism of action of this reverse transcriptase involves like eight steps. And if I had showed it to you, I'd have to remember it, and you'd have to remember it. So I'm going to give you the highlight reel of what's going on here. But one thing is obvious. We don't want reverse transcription until we're in the host cell. So there has to be some signal from the host cell to initiate this reverse transcription. Okay, so in this virus particle, we're an RNA molecule. And once we're in here, we need a signal from the host cell to start our reverse transcription. And as it turns out, there's a sequence of complementarity to a cellular host transfer RNA within this viral RNA genome. And so the binding of that transfer RNA to this viral genome primes DNA synthesis. Remember, DNA synthesis is never de novo. It always starts with a primer. So in this case, once we're in the cell, the host cell, we're going to have host tRNAs, and those host tRNAs can base pair to this viral RNA, and that initiates the reverse transcription. So you make first a RNA-DNA hybrid duplex where you have one molecule of DNA and one molecule of RNA. We then take this double-stranded hybrid molecule and chew away the RNA, leaving a single-stranded DNA molecule. And one other cool thing that this virus does is there's a region of self-complementarity at the three-prime end of the DNA. So in general, there isn't much secondary structure in DNA. We think of that mostly in the context of RNA. But in this case, we need something to prime second-strand synthesis. And so there's a little bit of self-complementarity here, and that primes the polymerization of the second strand. So we end up with our double-stranded DNA. But there's a lot of hopping, so the tRNA binds at one side, extends, and then hops to the other side and does all kinds of fancy stuff that we're not going to have to remember. So cellular tRNAs initiate self-complementarity at prime second-strand synthesis. Those are the things that are important to remember. Any questions? So we haven't covered really replicases at all, but we have now just covered reverse transcriptases. Okay, so now up to this point, we've synthesized an RNA, we've terminated prokaryotic transcription. I haven't yet shown you how we terminate eukaryotic transcription, but that comes under the heading of post-transcriptional processing of the RNA molecule. The eukaryotic RNA is capped. We put a safety hat on the end of the RNA molecule, 5-prime N, the 3-prime N is polyadenylated. There's a lot of introns. There's actually introns both in bacterial prokaryotic as well as eukaryotic genomes. In general, the introns in prokaryotic genomes are only within the non-coding RNAs. Okay, so like the ribosomal RNAs, for example. But in eukaryotic RNAs, there's all kinds of introns within our protein-coding genes, and those need to be spliced out. Okay, and so we're going to look at splicing. There's lots of different mechanisms for that, and then we're going to look at how ribosomal RNA, transfer RNAs are processed. What defines a prokaryotic is the absence of a nucleus. So in the prokaryote, you have the possibility of co-transcriptional translation. In other words, as you extrude your nascent RNA molecule out of the RNA polymerase, there's nothing to stop the ribosomes from coming in, binding that nascent RNA, and beginning to synthesize a new polypeptide. Because transcription and translation occur in the same compartment in a prokaryote. A eukaryote has a barrier between these two processes. Transcription occurs in the nucleus, whereas translation occurs in the cytosol. So in eukaryotic transcription, we need to protect the RNA molecule. We need to provide signals on the RNA molecule itself when it's time to extrude the RNA from the nucleus into the cytosol. So the actual 5' cap, the polyatel, helps to provide signals to pore structures within the nucleus that allow the RNA to be snaked out. And so transcription and translation cannot occur at the same time. But in prokaryotes, when you can have co-transcriptional translation, you can actually use this co-transcriptional translation as just a beautiful regulatory mechanism. So we'll see that in what's called the trip operon coming up next Tuesday. Okay, so we're going to really focus mostly on eukaryotic post-transcriptional modification of RNA molecules. Okay. And so in eukaryotes, again, you're adding a 5' cap as soon as you've extruded out the 5' end. Remember, you're synthesizing 5' to 3' so the 5' end is done first, put a fork in it, right? So you stick it out there. That 5' end passes by the gauntlet of activities adding the 5' cap. And then you have introns as well in the genes. And these introns need to be excised out actively by a splicing, a spliceosome. So a collection of RNA molecules and proteins that cut out these introns. At the end of eukaryotic transcription, you're going to add a polyatel. So eukaryotes do the 5' cap and the 3' polyatel prokaryotes don't. Both prokaryotes and eukaryotes can do splicing. In general, eukaryotic splicing is important in protein coding genes, whereas in prokaryotic splicing, that's mostly in non-coding RNAs. So let's get to this. This is a little bit more intense than what we've seen so far. Okay, so here's the biochemistry of adding the 5' cap. So sitting on that C-terminal domain of the RNA polymerase itself, our variety of enzymes, these enzymes are adding a cap. What is a cap? Well, so remember, at the 5' end, the first residue nucleotide that's added still has its triphosphate. Remember, we thought about that earlier in the lecture. But this triphosphate can be modified in another guanosine, can be added upside down to this. So here we have a very unnatural, this is not a 5' to 3' linkage, it's a wholly unnatural 5' to 5' linkage. So here's the 5' member, it's the methyl grater in the Rory Bose. Here's the 5' between these. We have phosphodiester bonds, hydride bonds with our triphosphate. In addition to putting a guanosine nucleotide on upside down on this cap, we're also going to methylate. The goal here is to make this thing look different in the RNA molecule, because an RNA molecule is very susceptible to enzymatic activities, exonucleases and endonucleases. But an exonuclease sees this thing and says, what the heck is that? We actually need a special set of enzymes to remove this. And so this is protecting the RNA, allowing it to persist for longer in the presence of all these exonucleases and endonucleases. And so we have here also methylation at the C3 position. Remember, this is RNA, so it's not deoxy at the 3' position. We have a hydroxyl, and that can be methylated there. And so the way this happens, you have the triphosphate on the 5' base. You can first remove one of those phosphates, and then you can take a guanosine triphosphate and attach that. So attach a GMP to this. And so this phosphate comes from the guanosine. These two phosphates come from the N1 base. And so once we've attached that, we can then use methyl transferases in our old friend Atomet to methylate the C7 position on the guanosine, as well as the 2' position. I think before I said 3' this is obviously 2' hydroxyl group, and that gets methylated by these 2' O-methyl transferases. So the whole goal here, make it funky looking, enhance stability. This only occurs in eukaryotes, not prokaryotes. It's important in the processing of the RNA. So the actual nuclear pore complex is going to be recognizing this. And when we translate eukaryotic mRNAs, the RNA itself is going to be forming a circle. So we're going to actually take the end of the RNA, the 3' poly-A end of the RNA, and wrap that around in a circle, sort of similar in structure to a plasmid. So for it's just a single strand. And that 3' end of the RNA can be bound through a set of proteins to the 5' cap. And so we need these unique features to be able to assemble that competent structure for translation. So that's the 5' cap. The addition of the poly-A tail can also be considered eukaryotic termination of transcription. So we have a particular sequence, AAU, AAA sequence on the RNA molecule. That is recognized by this polyadenylation factor. It binds to that. An endonuclease activity cleaves shortly after that sequence, that adenylation sequence. And then a second activity within this factor adds a bunch of adenosine residues from 200 to 500 adenosines. And so this helps also obviously to protect the 3' end of the RNA molecule because now endonucleases have to chomp all the way through hundreds of residues there. Okay? Make sense so far? Cool. Questions? Okay. So now we've put the 5' on cap. We put the poly-A tail. Now we need to think about splicing. And so this gets a lot more confusing. And so we have here exons and introns. And those introns need to be removed. There's different ways we can do this. One of the ways that we can remove the introns forms lasso. So this is the Southwestern method. We make a lasso of the intron and we cast it out and grab some cows or something. So here we have the exons. The exons are ligated together. And you can imagine that what we're doing here is we're doing two things. We're cutting out a piece of useless RNA. But we're also very precisely ligating these two pieces together. We cannot make a single mistake. We cannot insert a nucleotide here and we cannot take away a nucleotide. This has to be precise because if we insert or remove a nucleotide, that would end up in changing the frame of our protein. We'd totally screw up the synthesis of our protein. And so we need to do this with absolute exquisite specificity. In general, each exon corresponds to a proteinaceous domain that is going to be made. So there's a general relationship. And the reason is because the way nature figured this out, say, oh, if I make a domain, that's usually associated with a certain function of protein. And so I can help other proteins to evolve by sampling and transferring domains between proteins. So if they're all sort of together as a unit, that aids in the transfer, aids in the evolving things more rapidly. It's not always the case that a particular exon corresponds to a domain, but it's a very common occurrence. Okay, so how are we going to do this? So I put on the sideboard this overview slide. There's actually four mechanisms. So we're going to cover these in order. So group one, group two, and nuclear mRNA splicing. And so in group one and group two splicing, these are catalyzed by RNA molecules themselves. RNA catalyzed, catalysis. It's actually ribozymes. Okay, and so this is the overview. And we'll come back to this throughout this section of the lecture. So let's start with group one introns. I did shuffle things around in the PowerPoints last night. So here's the actual chemistry. We have transesterification reactions, two of them. So here you have a splice site. And so you have the five prime splice site. You have a phospho ester bond between these two bases, a U and an A. And a solvent provided guanosine comes in and makes a new phosphodiester bond. So here you have a phosphodiester bond between U and A. Here you've jettisoned the U, and you now have a GA phosphodiester bond. And the intron itself helps to guide in this solvent provided guanosine. So the guanosine is floating around. The intron itself binds the guanosine and guides it in with exquisite specificity to exactly this position, this hole called splicing site. So we've broken one bond, but now, and we're obviously not going to let things go, we need to ligate together our two pieces. So the guanosine, remember we're using the three prime hydroxyl here? So not the two prime hydroxyl. So if it's solvent provided, there's a free three prime hydroxyl. If it's coming from an RNA molecule, that three prime hydroxyl would be tied up in a phosphodiester bond. But in this mechanism, you have the three prime hydroxyl coming in and breaking this phosphodiester bond. We now have a three prime hydroxy group, this is the five prime splice site, this is the three prime splice site. So we now have a free hydroxyl group, a nucleophilic attack of the three prime hydroxyl, the five prime splice site, upon the three prime splice site releases our intron. And so this hydroxyl group is going to attack the actual phosphate in the phosphodiester bond. So the phosphate is preserved. The guanosine is not providing a phosphate into this ligation. The phosphate was always there within the RNA molecule itself. And so this type of slicing is very primitive, can be found in bacteria, eukaryotes, and higher plants. And this is self-splicing, this actual RNA molecule itself, the intron is catalyzing all of this chemistry. There's not proteins. So this is the actual structure. And as you could imagine, if the RNA molecule is not just carrying information, it's catalyzing a chemical reaction, it's going to be highly complex. So here you have all kinds of base pairing, and the P7 section is in here somewhere, and that's what actually binds to the guanosine, delivering it to the correct site. So that's group-run introns. It's a true ribosome, but it's not possessive. So it catalyzes a reaction, but once that intron is cut out, it's not going to float around and cut out other introns. It's a used once type of situation. So you couldn't really call it an enzymes, enzymes are possessive, right? But it does catalyze the chemical reaction. There are examples of RNA molecules that are possessive. RNA-SP is able to cleave tRNA molecules, breaking phosphodiester bonds, and obviously the ribosome itself, the thing that makes polypeptides, that is a possessive RNA enzyme. In general, these ribosomes are very familiar with phosphodiester bonds, so they're important in either cleavage of phosphodiester bonds or some of these transesterification reactions. Any questions so far? Isn't it unbearably boring? Should I do something crazy to wake you up? Ah! That wasn't very convincing, huh? You don't want me to get too crazy. So, group 2 splicing. Now, group 2 splicing is a little different. Everything's in the intron. There's no solvent guanosine coming in here, and actual, an adenosine from within the intron is going to be doing the nucleophilic attack in this first transesterification reaction. Okay, so the adenosine, when you think about this, this adenosine is attached through a phosphodiester linkage, you know, before and after that adenosine. So there's no 3-prime hydroxyl available for this nucleophilic attack, this first nucleophilic attack. The only thing we have is what makes this RNA molecule unique, that 2-prime hydroxy. So this 2-prime hydroxy, nucleophilically attacks the 5-prime splice site, making a new phosphodiester bond between the 2-prime hydroxy and this 3-prime phosphate at the 5-prime splice site, and this binding. We'll look at it in a little bit more detail in a moment, but this binding makes a lariat. So at this adenosine, we have 3-prime to 5-prime, 3-prime to 5-prime, and 2-prime to 3-prime. Okay? And so we're making this loop structure, and then we've exposed, in the process of making our so-called lariat, we've exposed at our 3-prime hydroxy group at our 5-prime splice site, and so this is going to now nucleophilically attack the phosphodiester bond at the 3-prime splice site and making a new phosphodiester bond, precisely ligating these two pieces together, releasing our lariat. And so if we're going to make a bond to the phosphate, obviously the end of this molecule, the 3-prime end of the lariat, is going to have just a hydroxy without the phosphate. Okay? So again, we're not, the phosphates that are used at this ligation are coming from within the gene itself, within the mRNA molecule itself. Okay? Let's look at this in a little bit more detail. So we have here within this intron an adenosine residue. This adenosine residue has a 2-prime hydroxyl. Nucleophilic attack of the 2-prime hydroxyl on the 5-prime splice site makes this lariat. And so you have 2-prime, 2-5-prime phosphodiester bond. That's the binding of this A to this G. Here you have 3-prime to 5-prime, 3-prime to 5-prime. Okay? So this is a branch point. You have three bonds to that adenosine, so the typical two phosphodiester bonds. Okay? So we made our lariat. We've now exposed 3-prime hydroxyl at the 5-prime splice site. Nucleophilic attack on the phosphate within the phosphodiester bond here at the so-called 3-prime splice site ligates our two pieces together, releasing our lariat with a free 3-prime hydroxyl. Okay? So a little bit. This is a little tricky. So this is also found in a similar set of organisms, bacteria, fungi, plants, and protists. Okay? So protozoa and other such organisms. Okay? And so this is self-splicing in vitro. So in the total absence of proteins under the right in vitro conditions, this thing will cut itself out. But in the cell, it's more believed in vivo to occur with the help of proteins. But the actual catalysis itself, these transesterification reactions are believed to be caused by the ribozyme portion of this. Okay? Okay. So group 2 splicing is extremely similar to nuclear mRNA splicing. So this is important in eukaryotes. And so we have a set of non-coding RNAs, these small nuclear RNAs, that aid in this splicing. Okay? So this is not a ribozyme necessarily. We have more. These particular SNRNAs make so-called SNRNP snurps. So the blue snurps base pair to both the 5-prime spice plate and also eventually to the 3-prime spice plate, but also to this strategic adenosine. So within the intron, remember in the group 2 splicing we had the bulging out, the pinching out of this adenosine. And so the U2 SNRNAs help to catalyze the increase in nucleophilicity of this adenosine by making this odd base pairing. Okay? So this is pseudo urodine. It's a isomer of urodine. And so this base pairing squeezes out the adenosine. So what's the goal here? Expose the 2-prime hydroxyl on that adenosine. By squeezing it out, you now have a better nucleophile. And so that 2-prime hydroxyl is now going to attack the 3-prime phosphate on the 5-prime spice plate. Okay? And so these SNRPs are helping in this process. So not only do you have SNRPs sort of bulging out the adenosine, but you also have SNRPs recognizing the sequences both on the 5-prime spice site and the 3-prime spice site. And they do this by highly selective base pairing. We don't want to insert a single nucleotide or shorten by a single nucleotide. So base pairing provides this precision of these SNRP proteins. And one thing that you obviously see, so the colored segments are the regions that base pair with our SNRPs. And do you see how the majority of the sequence that base pairs are within the introns? Why did nature evolve it that way? Why is the majority of the base pairing of the SNRPs, the happy little blue SNRPs, occurring in the introns? So what are we doing here? We're making protein-coding RNAs. Any ideas? Yes, exactly. So we don't want to convey any unnatural restrictions on the type of amino acids that occur at the junctions between domains and a protein. So I mentioned before, each of the exons in general is associated with a separate domain. So between domains and a protein, you have a linker. And if you had a large amount of sequence in the exon, being recognized in the spy sites, there would be limitations in the types of codons that are available at these junctions. So the idea here is to ligate these two exons together. So there are some restrictions, but they're not major. So we know it's a triplet code. We'll see that on Thursday. And so potentially, these might not even make changes in the amino acids incorporated because there's a possibility of wobble going on here. And so that's why nature evolved it this way. So the SNRPs bind. They are recognizing both sides precisely ligating these together. U2 SNRP is pinching out our adenosine. The 2-prime hydroxyl is now ready to go to nucleophilically attack the 5-prime splice site. And the way this happens is pretty complicated. I've actually left out tens of additional proteins. So each of these complexes, we have U1, U2, U4, U5, and U6 SNRPs. Each of those potentially could be more than one protein in an RNA molecule. And so the way this works is there's a lot of ATP hydrolysis going on here. So it's very important that we check and double check that base pairing. If we're doing that base pairing at the 5-prime and the 3-prime splice site incorrectly, then we're going to ligate together the wrong nucleotides. And so here, once you bind, the binding only persists if you have good base pairing. And there's a little clock. As soon as you bind, it starts the clock. And if you bind tightly enough, if it's the right base pairing, you then hydrolyze ATP. That hydrolysis of ATP locks it in. So by adding ATP hydrolysis as a necessary step here, you're enabling the checking of the base pairing and the precise ligation, yes. It could be. We don't really know for nuclear mRNA splicing. It's very, very similar to group 2 splicing, which is a ribozyme. It's not a possessive ribozyme, like RNA-SP, we'll see in a moment. But it's entirely possible that nuclear mRNA, also the catalysis, the actual transesterification itself is catalyzed by RNA molecules. So, yes, it could be. If thezyme doesn't mean possessive. Any other questions? Okay, so we have first the binding of U1 and U2. ATP hydrolysis, makes sure that we've got the right base pairing here. Then we then assemble additional SNRPs that bring these things together, helping in the transesterification reaction. We have the pinching out of that adenosine, and that is juxtaposed to the 5-prime splice site. We make the new 2-prime to 5-prime phosphodiester bond, forming the laryte, revealing the 3-prime hydroxyl at the 5-prime splice site. That 3-prime hydroxyl is then manipulated into position by a change in confirmation and association between all these SNRPs so that you have the precise ligation of the 5-prime and the 3-prime splice site. Okay, so this is a little bit complicated. I think there's a movie. Three mRNAs contain short conserved sequences required for splicing. The most conserved intron sequences are the 5-prime GU, 3-prime AG, and the branch point A. Central to the splicing reactions are five small nuclear RNAs, SN RNAs, with proteins in small ribonucleoprotein particles, SN RNPs. Additional proteins in ATP are also required for splicing but are not shown here. The SN RNAs base pair with 3 mRNA sequences and with each other to direct the splicing cycle. First, the 5-prime end of U1 SN RNA base pairs with the 5-prime splice site and a U2 SN RNA sequence base pairs with the branch point region. Extensive base pairing between SN RNAs and the U4 and U6 SN RNPs forms a complex that associates with U5 SN RNP. The U4 U6 U5 complex then associates with the pre-mRNA. Rearrangement of RNA-RNA base pairing occurs so that U6 dissociates from U4 and base pairs with U2. U1 dissociates from the 5-prime splice site and U5 base pairs with exon sequences. The rearranged spliceosome catalyzes two transesterification reactions resulting in intron removal and exon ligation. The ligated exons are released from the spliceosome. The SN RNPs dissociate from the excised laryate intron and are recycled for another round of splicing. The laryate intron is rapidly degraded. All right, so all that monkey business with things moving left and right, I'm not going to ask you to draw which one is going where. I might say, okay, well, what's important here? U1 and U2. So U1 is the 5-prime splice site. U2 is pinching out the adenosine. So I definitely might ask you on that. And these other SNRPs are helping to bring the things that need to ligate together closer together. So this is a summary of everything. What's different, right? So in the group 1, remember, we had solvent provided 3-prime hydroxyl for that first transesterification. Whereas group 2 and nuclear, we have the adenosine, the 2-prime hydroxyl adenosine. We only have the laryate when we're using the intron itself for the first transesterification reaction. And these two don't necessarily need proteins. This in vivo doesn't need proteins. This in vitro doesn't need proteins, but it could possibly need them in vivo. And this definitely needs proteins. These SNRPs are RNA molecules also associated with proteins. Okay, and so we have the coordinate splicing that occurs as the transcript passes by our gauntlet, a polyA CTD tail. And so we have the cap synthesizing protein, puts the cap on, and then a cap binding complex, CVC binds to the cap, keeps that cap stuck onto the RNA polymerase. So you're going to extrude past, but the 5-prime end is going to remain at the polymerase. And now as introns come by, the spliceosome also associates with the C-terminal domain of the polymerase. And as we extrude introns, we're going to first bind that 5-prime splice site, remember the adenosine, we're going to bulge it out, and then we're going to bring these two sections together. Okay, and so this is how, so it's co-transcriptional processing of our RNA molecule. It's parallel processing. So this can be very complex. So here's an example of ovalbumin mRNA. So the little colored sections, these blue sections, are the actual protein coating segments, or the exons, the introns, are these large sections, and those are all ligated out. So you think about it, the majority of the RNA is cut away and degraded. What a waste, right? So there must be some evolutionary driver to have this pretty wasteful process. And the reason that this diversity exists is because we need to do alternative splicing. We'll see that in a moment. So we splice out all our introns, and now we have our mature mRNA, some nomenclature. You have the pre-mRNA or the primary transcript with all its introns, and then the mature mRNA is what we call the molecule once we've excised out, spliced out all the introns. Okay, and so we can have alternative polyadenylation signals. So remember we have this signal, AAUAAA, and that's a signal for the endonuclease to cleave, and by some not really fully-understand mechanisms, sometimes as we extrude our RNA molecule, the cellular machinery skips a polyadenylation site and goes to a second one. So we have alternative polyadenylation or termination, giving us two different protein products, or alternative splicing. So here we have an intron. We can excise both this intron or the intron and an exon, and so that can give us two final products, schematically represented in hashes and things like this. Okay, and so from our single primary mRNA transcript, we can get multiple protein products. So in the human genome, there's on the order of tens of thousands of unique genes, but there's potentially hundreds of thousands or potentially even millions of unique proteins that can be synthesized by this diversity, by alternative splicing, alternative polyadenylation. Non-coding RNAs also need to be processed. This is much more simple. So this is actually not transesterification. Here we're just cleaving bonds. So we have a modification of these RNAs. So ribosomal RNAs are catalyzed chemical reactions. The nucleotides alone are not sufficient for all the chemical diversity necessary to catalyze those reactions. So we have a variety of modifications of those nucleotides. So for example, we can get a methylation of the nucleotides, or you can get this isomerization of the uridines, a pseudo-uridine size symbol. So after those modifications, we can then endonucleotically cleave out these introns, and then we can use exonucleases to sort of trim out the remaining bits. We're left with the mature ribosomal RNA. Similar things can be done in vertebrate ribosomal RNA. That was prokaryote. What we just saw, this is vertebrate. We can modify the RNA and methylate and so forth. We cleave into... you can cleave into lytically, and then cut off, trim the extra segments with exonucleases. And so that's the same type of thing. This vertebrate ribosomal RNA processing is catalyzed by SNO RNAs and occurs in a part of the nucleus called the nucleolus. And so in the nucleolus, we have all of this processing occur. And the SNO RNAs help to provide the sequence specificity for these cleavages. TRNAs can be processed. Here's an RNA-SP. It cuts off this 5-prime end of the tRNA, but tRNAs can also have introns. So here we have an intron. And to get our mature tRNA, we need to cleave out this intron. Importantly, you should see that the anti-codon here is in a base pairing relationship here with the intron itself. So to activate the tRNA, we need to remove that intron, which will then place that anti-codon at the bulge at the bottom over here. And so that's going to base pair with the RNA and help to code for synthesis. This chemistry is highly complex. So I'll bring you through it. So you have a cyclic 2-prime to 3-prime phosphate. So we have an endonuclease, cuts out the intron, and the end product of that is all screwed up, right? So you've got a 2-prime to 3-prime cyclic structure here, and you have this unnatural 5-prime hydroxyl. We're used to having a phosphate at the 5-prime position. So we have a kinase, and the phosphate that's ligated in here comes from the ATP. It does not come from the RNA molecule. We're not ligating two pieces of RNA with the phosphate provided by the RNA. That phosphate is coming from the kinase. We can then activate this phosphate, make this O-minus a better leaving group by adding an AMP. So the AMP-related version then can react. We have a nucleophilic attack of the 3-prime hydroxy. So this cyclic structure opens up. We have a 3-prime hydroxy nucleophilic attack. This phosphate makes our new bond, 3-prime to 5-prime bond. We're left with this unnatural 2-prime phosphate, and we cleave that off. And so this is more complicated. The things that you should remember is the phosphate comes from ATP, and this cyclic structure is formed. The endonucleus, for whatever reason, leaves things in such a state that we need to modify to get it back to where it needs to be. We need to put a phosphate here, activate the phosphate, and ligate the two pieces together. So we've looked now at four splicings. This is for your studying, and you can use this as a guide for studying. So that's it. I'll see you next time. Thank you.