 So it's my pleasure to get to introduce Patrick Kramer today for the Hilldale lecture in Biological Sciences. We're really lucky to have Patrick join us, and I hope you're all looking forward to his talk. I'll try to keep a very brief introduction. I promise Patrick I wouldn't take up too much time. I'll just say Patrick got his undergraduate and part of his graduate training in Germany, University of Stuttgart and University of Heidelberg, but actually did significant parts of his research both at the Laboratory of Molecular Biology in Cambridge and at the Embo Labority in Grenoble, working with Christoph Muller, and then went to do a postdoc with Roger Kornberg around 1999. It was very fortuitous time to join Roger's lab right when things were coming together, and Patrick really made, I think, many of the key contributions that were put together, the first high-resolution Christoph Structure of eukaryotic RNA polymerase, and that paper published in 2001 in science is probably the anchor to what really led to Roger Kornberg getting the Nobel Prize in chemistry. So Patrick then moved from the postdoc to the University of Munich and quickly became full professor there and director of their gene center, and was there quite successful for a decade, and then moved in 2014 to the Max Planck Institute in Girdegen, the Institute of Biophysical Chemistry there, where he was named director and has been leading a very productive research group since. I think I won't say a lot about his work. You'll get a chance to hear about that. I'll just say that while a lot of the work early on that has led to mechanistic insights that Patrick's done was done in yeast polymerase, and he'll tell you about that today just within the last year or two, he's published really important work with the million transcription complexes in nature that I think are our landmark accomplishments. So I've always really appreciated Patrick's work, and I think he and I have some common sort of mechanistic perspectives I'm thinking about transcription, so we've always enjoyed interacting over the years, but I think you'll appreciate that, you know, Patrick's perspective on understanding transcription comes both from, you know, really cutting-edge structural biology. You'll hear about that today, but he's also really pioneered a number of new directions looking at transcription and vivo using things like chip-seq approaches, and putting those two things together are really, I think, what is leading now to some much clearer understandings of how transcription works. Patrick also is, I think, exemplary in his ability to present what was a really complicated subject clearly, and so I think you'll appreciate that today in his lecture, and with that I'm going to just remind you of two things. First, there is a reception right after the talk today, so please, you know, we'll have time for some questions after the talk, but there'll be a chance to interact with Patrick at the reception and join us outside in the atrium for that, and then the final reminder, when we do get to questions, we are recording the lecture, so please, you know, hold your hand up, I'll run the microphone around, and we'll try to record the questions along with Patrick's answers. Okay, and so with that I'm going to turn the stage over to Patrick and let him tell you about his work. Yeah, many thanks indeed, Bob, for the kind introduction. Also the invitation to come here to Madison, actually, for the first time. I enjoyed my day so far, had very good discussions with colleagues here, and yeah, I can only basically repeat what you said. It was fantastic over the last two decades to interact with you and your group and other colleagues to investigate the mechanisms of transcription, and of course that will be the topic of today's talk, but I want to just answer the question why you should be interested in transcription by showing you this image here of a developing mammalian embryo, which shows the expression of transcription factors that are responsible for the development of certain tissues, and this always reminds us on the central importance of gene transcription and its regulation, not only for the development of organisms, but also for the maintenance of tissues and also for responses to the environment of these adult organisms that result. So actually in a collaboration with Bob Landick's lab, we only once collaborated, and the first thing we did was to solve a very important problem, namely for the first time to visualize the entire central dogma of molecular biology in a single macromolecular complex, and this is what we called at the time the expressome. It consists of RNA polymerase, and this is a bacterial particle that uses DNA as a template to make messenger RNA, that messenger RNA is fed into a ribosome, and the ribosome, of course, will decode the message and then synthesize a polypeptide chain which falls up into a functional protein. Now in eukaryotic cells, things are more complicated because you have three different RNA polymerases, Paul 1, 2 and 3, and even a mitochondrial polymerase that takes care of the transcription of the small mitochondrial genome, and regulation is more intricate because you need to develop tissues, you know, you need to have differentiation and so forth, and we live in exciting times because the atomic structures of all these RNA polymerases are now available. So we are now on safe grounds if we would like to study the mechanisms of transcription and their regulation because we can start from atomic structures of the enzymes that are at the heart of gene transcription. Today I will concentrate on RNA polymerase 2, also we have worked on various transcription systems simply because, as Bob mentioned, it was always very close to my heart. Actually, since my PhD work already worked on NF-Kappa-V transcription factors which activate RNA polymerase 2 transcription. So I want to show you the latest results from the lab that we obtained over the last two years or so, three years, which actually built on all these results that were obtained over the last two decades and described to you our current understanding of how RNA polymerase 2 is transcribing genes to make messenger RNA and also the first details about a mechanistic understanding of how the process is regulated, both molecularly, but also in cells. And as Bob mentioned, the approach of our laboratory is to combine integrated structural biology with functional genomics. So we can work in vitro and in vivo and the glue between these is computational biology. So we develop a question using biochemistry and structural biology. We can test the proposal in living cells using functional genomics or we find a correlation from functional genomics data and we want to find out is it, you know, not just a correlation but causative and then we can go in vitro and test the proposal using purified components. When I say integrated structural biology, I mean the combination of cryo electron microscopy, X-ray crystallography and cross-linking and mass spectrometry, because if you have the three, you're basically unlimited in investigating such large assemblies, as I will show you. And, you know, it's fantastic to have all of that within the group so people can move smoothly from one method to the other and solve their biological problem. So here's the plan for today. I would like to tell you what we've learned about the key steps at the beginning of RNA polymerase 2 transcription when protein-coding genes are transcribed and these steps that are listed here were identified by the community as being key steps for the regulation of transcription. It makes sense to regulate at the beginning, but it's actually not a single step that is regulated. It's multiple steps and John Liss has summarized this beautifully ten years ago in a nice review. So I will start with the opening of chromatin. Obviously you need to first get access to the promoter DNA and then the Tata box binding protein or other factors can bind in this nucleosome depleted region, then the pre-initiation complex or PIC can assemble in this nucleosome depleted region and once all the proteins are there, the DNA will open. So you begin to see this region where the two DNA strands are separated. Now this complex, the so-called open promoter complex, is able to then over-initiate the synthesis of an RNA chain. That leads to a transcription elongation complex, which now makes RNA, but in metasaur and cells this elongation complex will very frequently at many genes pause just downstream of the transcription start site in the promoter proximal region. And this happens after only about, you know, transcribing only about 50 base pairs of the DNA. And this polymerase that is paused in this promoter proximal region is again a target for regulation because you need to exchange factors that stabilize the pause against factors that accelerate the polymerase and that make it a very processive and very fast transcription elongation complex. So you need to release and activate the polymerase and this activated elongation complex can then move through nucleosomes very rapidly. Actually at the speed of several thousand nucleotides per minute. And this is the last, you know, regulatory step and that is the passage through nucleosomes, which can be also regulated. So we will start here and go through and I will actually spend most of the time on the very first step because we had several recent results that are not published, which I would like to share with you that concern this very first step of generating a nucleosome depleted region. So how do you do that? Very often actually probably most of the time pioneer factors will be involved in the generation of these nucleosome depleted regions. And why is this so? Because these factors are able to invade chromatin. They are able to bind to nucleosomes. Normal transcription factors or let's say most of the 1600 transcription factors that are encoded in the human genome will only bind to naked DNA. They will not bind to nucleosomes. The presence of a histone octamer will actually block access of the transcription factor in most cases. But then there's these specialized transcription factors, pioneer factors, which can invade chromatin by binding to nucleosomes. Then they will recruit additional factors that help them. Famous pioneer factors are the Yamanaka factors like SOX2, OCT4 and so forth. You all know those factors because they are used to produce induced blu-reportant stem cells. So you can actually take skin cells and express this set of pioneer factors and you will convert the skin cells into induced blu-reportant stem cells that can then be differentiated into other cell types. This tells you about the incredible power not only of pioneer factors but of the system of transcription regulation. You can really reprogram the genome and induce a new cell identity. So you may wonder why there's no structures of pioneer factors bound to nucleosomes given their, you know, broad importance in biology. And I think the reason is mainly that pioneer factors are made to destabilize nucleosomes. So for structuralologists, it's a nightmare to work on something that actually has as a function to destabilize the complex. So what we did is we teamed up with UC-Tie-Palace lab and they are experts in vitro cell X. And what they did is that they used about 200 recombinant human transcription factors that you see here and they screened for DNA sequences that would bind these factors in the presence of histone octamers. So those sequences are selected to form nucleosomes but at the same time bind transcription factors. And you see all of these transcription factors actually don't bind. You can only find binding motives at the very end of the DNA, at the end that is breathing. And then there's a few factors, you see them here, which will actually bind the pioneer factor to the nucleosome surface around the diet. And the most important family here is the SOX family. So we use SOX 11 and SOX 2 and we could actually solve the first structures of pioneer factors in complex with the nucleosome using these selected sequences. Here you see SOX 2 binds to the super helical location 2. So that's 2 turns away from the diet of the nucleosome. And in the next movie I will actually summarize what we learned from that structure. So when the pioneer factor binds, it will distort the DNA locally, you will see that distortion. But most importantly, it will facilitate displacement of the neighboring DNA gyre because there's a clash with the neighboring DNA that will be resolved by detachment of the DNA. And actually the SOX factor can do this not only at the plus 2 super helical location, but also on super helical location minus 2 on the other side of the nucleosome, so that both ends can be not only displaced but also stabilized. And that means that you have several turns of DNA that are now more readily available for binding other factors. This is our model. So what we suggest based on these results is that the pioneer factor can use binding energy to destabilize parts of the nucleosome and thereby increase access to other factors. But of course we see here only the DNA binding domain and these pioneer factors have long intrinsically disordered regions which can recruit other machineries that help in depleting nucleosomes from the DNA. And those factors are chromatin remodeling complexes in particular of the switch SNF family. And this family is special in the sense that these remodelers are very large multi-protein complexes. You know you have other families like the CHD family which are single polypeptides, but the switch SNF remodeler are multi-protein complexes. And they often function in formation of these nucleosome depleted regions. And these complexes are also conserved from yeast to human. So we also now report or we will soon report the first structure of a switch SNF type chromatin remodeler. Namely the complex that is called risk from yeast which is a highly abundant and essential chromatin remodeling complex. And you see its structure here. I should mention that you know as often happens now with structural biology that these targets are worked on in a very competitive manner because they have important biological roles. So you know you work on it for years and others work on it for years. And then there's a month or two where all these results come out. And in this case the lab of Chen in collaboration with Cairns reported a structure if I know Gales lab from Berkeley and also Huan A. They all reported structures of switch SNF chromatin remodelers. And what is good and it should be like that in structural biology the structures are very similar. That's good news. That tells you that the resolution that is achieved nowadays is such that you get reliable structures. But there's also different aspects that the different groups concentrate on. So that makes the world you know rich and interesting. So what we saw is that there are six modules that form the risk complex and they can move with respect to each other to some extent. And this is why the resolution is low when you look at the density overall. But even here you can already see the nucleosome and yellow and a long stretch of DNA that exits from the nucleosome core particle. Now we can achieve high resolution if we focus on the individual molecules or on the individual modules of the risk complex and align them properly then we get higher resolution densities. And we can use those to build atomic models and we actually could build atomic models for five out of the six modules of the risk remodeler. So you see them now under these transparent low resolution surfaces. You see all the helices and the structural features for these modules. And I want to quickly show you how the risk remodeler interacts with the nucleosome. So what we see is that the translocates the ATPase domain which is the motor of the chromatin remodeler that translocates the DNA over the surface of the histonectomere. This motor is located at the edge of the nucleosome and this has been observed not only by us but also other groups in other chromatin remodellers. That's a general feature that this translocates docks to the edge. But what is new here and interesting is that on both surfaces of the histonectomere we see contacts with parts of the remodeler. We call them sandwiching contacts. This has only been observed in the INOAT remodeler, very specialized remodeler. And these sandwiching contacts are most likely important for the function of the remodeler, namely to determine whether the remodeler would slide histonectomeres along DNA. So basically slide nucleosomes to the side or whether the remodeler will evict the entire histonectomere and so that the nucleosome is lost. The remodeler can actually do both and in the future we have to figure out how you can switch between these activities and how these interactions with the histonectomere contribute to these two activities. Another thing that is interesting is to ask the question how topologically the remodeler could contribute to the formation of nucleosome depleted regions. And here I refer to published work from the lab of Philip Korba in Munich. They have this fantastic system where they can use yeast nuclear extracts to assemble nucleosomal arrays in vitro. And these arrays will then resemble the situation in the living cell. So this is a pattern they can establish with the nucleosome depleted region here in blue. And then this is about the region of the transcription start site so the genes would extend to the right. And in yellow you see the peaks for nucleosomes along the transcribed region. They have a defined distance. Now on these genes you can find in the promoter region actually within the nucleosome depleted region a poly T poly A stretch sequences that are thought to prevent the assembly of nucleosomes. So intrinsically these promoters are poor substrates for the assembly of nucleosomes. So there is an intrinsic small nucleosome depleted region that you can obtain simply because those sequences don't like to bind histonectomers they like to exclude nucleosomes. But to get to the fully extended nucleosome depleted region you actually need to add the risk remodeler. The activity of risk is important not to establish this array of nucleosomes that are spaced you don't see that. But to extend the nucleosome depleted region to its native widths. And when you actually model that situation based on our structure you can imagine a few interesting things. So first of all there is this DNA interacting module in green which can probably invade the small nucleosome depleted region. And it's actually known from work of others that it can bind certain DNA motifs. And these are also AT-rich stretches and probably invades there. And then the motor domain can bind to the plus one nucleosome on one side and to the minus one nucleosome on the other side. And since the directionality then for the translocation is the opposite you can actually move out the nucleosomes from this initial small nucleosome depleted region. It can generate a larger nucleosome depleted region this way. So pioneer factors they can recruit chromatin remodellers. The remodellers use the chemical energy stored in ATP and convert it into mechanical energy. They can slide the nucleosomes outwards. Sometimes they may even evict nucleosomes from the center of the nucleosome depleted region. And then co-activators are also required to initiate transcription. And why is this? Because those co-activators will modify chromatin. They will set certain chromatin marks, remove others. But the co-activators can also deposit the Tata box binding protein. And you may all remember that the Tata box is an important sequence motif in eukaryotic promoters. Which allows the Tata box binding protein to bind there and to bend the DNA by 90 degrees. And this is thought to initiate the assembly of the pre-initiation complex. So now I want to show you some data on co-activator structure and the recruitment of the Tata box binding protein. Namely, we have investigated the Saga complex. And that was actually published yesterday. You can now read about it. And the Saga complex is also conserved from yeast to human. Also a multi-subunit complex that we have now looked at. And it consists of those four modules. A histone acetyltransferase module, which is flexible. We don't see it. It's very flexible. Then a deobecutination module that was studied in great detail by Cynthia Wohlberger. And we actually have used their data to help our structure information, our structure determination. And then a core module, which we have now determined de novo, which is this large marsy in the center. And the trial one module, which was solved before by Alan Chang, a former postdoc in the lab who now has a lab in London. And this is a module that can bind to transcriptional activators. But the core module is interesting because this is the module that will bind to the Tata box binding protein and deliver it to the nucleosome depleted region to the promoter DNA. So how can we envision this delivery of TBP? This is something to yet be explored. But the one aspect that is very interesting according to our findings is when we compare the core of the Saga complex to a part of the TF2D complex, which is also able to deliver TBP. These two structures are very different, but they will bind the Tata box binding protein in the same region. So the lab of Evangelicalis has actually positioned TBP here in a core structure. And we have found through cross-linking analysis a binding site for TBP at the exact same location of this octamer-like fold, which is a fully asymmetric histone fold-based octamer-like fold, which is similar to a histone octamer, but still quite specific. So topologically, despite all these differences in protein composition and sequence, topologically these two major co-activators, TF2D and Saga, will probably use a similar mechanism to deliver the Tata box binding protein. I should say that when we published yesterday, another paper came out back to back to ours from the lab of Patrick Schulze and Strasburg, and they have actually managed to position TBP here. The structure of the core is identical, and they see the TBP exactly at that site. And again, you know, it was worth to have two structures because we have additional information that they don't have on the deobecutination module, and it binds to nucleosomes. They have information on the TBP interaction. So it's very important, I think, to compare then results from different groups and extract even more information by these comparisons. So now we have some insights into how chromatin is opened, how a nucleosome-depleted region is formed, and how the Tata box binding protein is delivered. And now the stage is set to try and understand pre-initiation complex formation and how the pre-initiation complex can open DNA and start RNA synthesis. So I will talk about the PIC and about DNA opening, and most of this is published, so I will only very rapidly summarize the key data. One important take-home message is that the pre-initiation complex for PUL-2 is a very large assembly, maybe up to 70 proteins when all the proteins are present. And we have worked, you know, ever since I left Roger Korn's lab in 2001 when we started in Munich, we have worked on the pre-initiation complex making, you know, all the initiation factors and trying to assemble larger and larger complexes. And the trick was to, actually there's a new trick for each factor, how to make it. So for example, Tf2H is a 10 subunit factor that can only be made in insect cells, not in bacteria because the ATPSs are somehow toxic. In contrast, the core mediator has 15 subunits and we made it by co-expression of these 15 polypeptides in bacteria. But that was only possible because we made first dimers, hetrotrimers, you know, seven mayors and so on. We learned how to express these things and how they would, the subunits would bind each other. And eventually we also had to learn how to assemble the complex so that there's no aggregation, that the complexes are stable, that they can be purified by size exclusion. And eventually we came up with a complex that contains now 46 proteins, about 2 megadaltons in size. And that complex contains all the yeast general transcription factors that are essential for cell growth. So there is these additional ones, I said up to 70 or so, but they are non-essential for initiation. Or in other words, the structure that I show you now should explain the basic mechanism of transcription initiation. And we think that's actually the case, also these other factors are important and we will try to also localize them later. So here's the structure, just very briefly, in silver you see the RNA polymerase II enzyme with its 12 subunits. And then the smaller colored factors here are general transcription factors, most notably in red. This is the Tata box binding protein, you can see the 90 degree bend that was already shown, you know, by work of Burley and Röder in the middle of the 90s. And also Richmond's lab and others, Paul Sikler. So here's the 90 degree bend. And then we have the mediator complex on that side, the two modules, the middle and the head module in blue and cyan, respectively. And then this large assembly here is the 10 subunit TF2H complex in pink. And this is something I want to concentrate on now because the only aspect I want to discuss about the initiation today is the DNA opening. So what did we learn about DNA opening? One interesting observation was that when the closed DNA promoter binds within the pre-initiation complex, it's not straight DNA at all. And there are several points where the DNA is bent, distorted, partially unwound, the helical axis is shifted. And this is seen here, so here's the 90 degree bend on the Tata box. And then around minus 10, which is 10 base pairs upstream of the transcription start side. We see this displacement of the helical axis by five angstroms, strong distortion. Then here around the transcription start side, about plus one. We see another bend and some distortion. And then yet another one here at this position about plus 20 or so downstream from the start side. And this is the translocase, the ATPase subunit within the step 10 subunit TF2H factor. And that is an essential protein which uses again energy from ATP to facilitate the opening of the DNA during transcription initiation. So what I will show you now is an animation, how we think this may happen, where the ATPase is translocating along the DNA. And actually the directionality of translocation of this enzyme would be towards the right. So it would actually translocate downstream. But since it is anchored in a large assembly, it cannot move. And therefore what happens is that instead of the protein, what is translocating is the DNA. And the DNA will translocate from the right to the left. And since the DNA is anchored here, you cannot move it away. So there will be a lot of strain and torsion introduced when the translocate tries to move. And the DNA will actually be propelled into the active center of the polymerase. And this model was proposed by others based on biochemical data. You know, Steve Hahn was important, but also an early paper from the labs of Ebride and Drainberg that had mapped TF2H on the downstream side by, you know, cross-linking. And then basically the DNA template strand emerges because now you generate single strands. And the template strand will have to insert into the active center, which is all the way down here. So 40 angstrom movement, and that is now visualized here. So here's the active side of polymerase. And here's the translocase, which will now be colored. It has two lobes, and those two lobes are actually reg A-like domains, and they move with respect to each other. So what we think is that when ATP is bound, there's this power stroke. The DNA is rotated and translocated forward by one base pair. And then when ATP is hydrolyzed and ADP is released, the enzyme is reset. So you can start another round of ATP hydrolysis. And now the DNA is attracted by this highly positively charged active center cleft. The DNA is highly negatively charged, so it will be positioned here. And you could see the DNA template strand being just opposite of the active side of the polymerase, where it can now trigger or initiate the synthesis of the RNA chain. So this is, of course, still a model, a model based on structural knowledge. And the structures that we trapped showed these two conformational states of the translocase. But we actually saw that in a chromatin remodel and we used homology considerations to make this animation. Now an interesting question arose, and that was, do you actually need this ATPase at all? Yeast promoters to start transcription? Or is it probably a subset that is so easy to melt that the DNA strands are separated without the use of ATP? And this experiment answers the question, so the experiment that we did is quite simple. In yeast you can use the anchor-away technology, and that allows you to very rapidly deplete a protein from the nucleus. So we tagged this translocase, it's called SSL2 or XPB in human. And through a chemical biology approach you can then remove the protein from the nucleus within half an hour, one hour. And you can then measure how RNA synthesis changes over the entire yeast genome. And how do we do that? We use metabolic RNA labeling, 4-cyl-uracil that we add to the cell culture. And this will be very rapidly incorporated into newly synthesized RNA by RNA polymers too. And it's enough to do a 5-minute labeling pulse, and then you can purify this newly synthesized RNA. And when you sequence it, the depth of your reads for specific genes will directly be informative of the rate of RNA synthesis from those promoters. So we can measure RNA synthesis genome-wide before and after depletion of the ATPase. And this is what you see here. Each dot is synthesis from one promoter in yeast. 80% of the genes are actually down-regulated to different extents, so they really depend on the translocase. But about 20% of the genes that you see here in gray, at least within the error of the experiment, are unchanged or changed very little. And those are either independent of the ATPase or they are much less dependent on it. So what I think we found here is that there's an additional layer of transcriptional regulation in eukaryotes that concerns the step of DNA opening. And that needs to be explored, you know, which genes are regulated that way. And I think it's an intrinsic feature of the promoter sequence that you can evolve to make a gene dependent on TF2H for DNA opening and transcription. So people said, wow, that's surprising because we knew for decades that this is an essential factor for initiation. But it's actually not surprising in the light of evolution. And as was said, you know, decades ago, everything makes sense in the light of evolution or has to make sense if evolution is the right theory. And when you look at the other transcription systems in eukaryotes, Paul I, Paul III, they don't need ATP to open the DNA. They can just use binding energy to open DNA. Same with the bacterial enzyme, also the archaeal enzyme. They open the promoter without such translocases. This is a glimpse of the future. It's not published and it will still take a while for us to finish it off. But it will actually, I think it serves to illustrate of, you know, what will be possible. So since in cryo-EM, the particles are in solution and not in a crystal lattice. They're in solution and then you rapidly vitrify the sample. The particles can get stuck in various conformations. And so this should be informative about the flexibility, the mobility of different parts of a complex. And even, you know, the population may inform about an energy landscape. So what are the states that are preferred in solution? And Dmitry Tikunov, who is really a wizard in programming but also in EM methodology and theory. He wrote a little script which actually solves a huge problem. And it's based on an auto-encoder. So it's a neural network. It's a machine learning algorithm that is able in an unsupervised fashion to extract different conformations from the raw data. So this movie is not, you know, science fiction or an animation. It is the raw data that is visualized in a way that different conformations, different conformational states are extracted from the data and then put together on a likely trajectory of how they may be related. And what you begin to see is that we have in our sample huge heterogeneity. And the DNA promoter that you see here can be closed and open in our sample. So we begin to visualize DNA opening directly. You can see the duplex now forming. Now it's a duplex, right? And now it's opening. And we also see that these states are related to different conformations of TF2H. So the little movement on the motor that we trapped was simply something that we could trap before. But now we see the whole landscape of what's possible. And it will be super exciting to do that with dozens of millions of particles to get higher resolution and also to relate different conformations to different states of promoter opening that I think should become possible. Okay, last thing about initiation because there's currently a lot of excitement about the question how you organize transcription in the nucleus using membrane-less compartments or condensates, biomolecular condensates. And what we found recently is that this long C-terminal repeat domain may be important for the recruitment of polymerases to promoters. You have to imagine it's a situation where in a human cell you have 20-25,000 genes and they are competing for polymerases. So if you want to differentiate your cell rapidly within a day or two, right? Or you want to, even within minutes or hours, have a conduct a stress response and alter gene expression. What you have to do is you have to redistribute Paul II enzymes plus all its friends, its factors to other genes. So how do you do that? I told you 70 proteins just for initiation, then well over 100 for elongation and then also splicing and so forth. So hundreds of proteins have to be redistributed. How do you do that? I think the solution is that intrinsically disordered regions in these proteins can associate in compartments and that very rapidly allows you to bring them to certain places. And those places are probably defined by transcription activators which also have intrinsically disordered regions that can set up such condensates. So what we found is when we just make the CTD and it was actually drawn in the last slide approximately to scale, it's a very long intrinsically disordered region. Highly conserved, there's nothing like this in the entire genome because it's a repetitive sequence of almost identical repeats, 52 repeats. You can actually see droplets in solution. So the CTD in recombinant pure form can self-associate form these droplets in solution with a little bit of crowding agent which simulates the environment in the cell. And then we collaborated not only with Markus Zweckstetter who initiated this project but also with Savié Darzac at Berkeley who is a master in imaging polymerase in cells. And they found that in the human nucleus there's a number of clusters, well over 100 of polymerases. And when you truncate the CTD to half of its length, you will actually see far less clusters and also the intensity you can show that statistically on average will go down. So this tells you that the CTD can self-associate in vitro and phase separate. And in vivo the CTD is important for the clustering of polymerases. And from that we came up with the model that polymerases cluster based on self-association of the CTD. And that led us to the question how a polymerase can actually be released from such a condensate when it's entering the gene for elongation, right? It's easy to say okay we bring everything to the promoter but then you would be stuck there. You also then have to have a mechanism to release this polymerase that initiates into the gene and then take the next one and release it into the gene. And it was known from the work of others for a long time that the kinase CDK7 which is also part of TF2H is important in that process. And what's interesting when we use the CDK8 complex and ATP in our very simple in vitro setup we can dissolve the droplets very rapidly. So when the CTD gets phosphorylated it's not staying in the droplets, it will be drawn out of these condensates, that's the prediction from this data. And from this and many, many other observations in the literature I proposed actually on the occasion of the 50s anniversary of the discovery of the three eukaryotic polymerases that was done by Bob Röder and Bill Rutter in 1969. I proposed this model in a review where you have a condensate at the promoter and there's a lot of studies that argue for this. But you may also have a condensate at least at highly transcribed genes in the gene body and that condensate would support RNA processing, splicing in particular and the recruitment of elongation factors. Whereas this condensate would recruit polymerases, co-activators like mediator, general factors and so forth. And the polymerase would however shuttle between those condensates in a phosphorylation dependent manner. So you recruit it in an unphosphorylated form but then when it forms the initiation complex you have the CDK7 kinase in TF2H and it will phosphorylate the CTD and that could shuttle the polymerase over into this second condensate where the CTD would be phosphorylated. And at the same time when we published the review, I think two weeks before, so there's actually now a note added in proof. The lab of field SHARP showed that the splicing factor can phase separate and then it would include the phospho-CTD that's totally consistent with the model. So you would have one condensate that is hydrophobic and the other one which is more charge-based, electrostatic, polar and the polymerase would shuttle. The beauty of that model is we already know that just before termination when you have 3 prime processing polymerase is dephosphorylated. So when it's dephosphorylated it could be recycled into that condensate. So that immediately explains polymerase recycling on very highly transcribed genes. A lot of speculation but I wanted to propose a model so people have a framework to test different aspects of the model. Why did we actually think about the gene body condensates? It's because we published a paper three years ago very briefly because we didn't understand what we see here. And what we found is that when you use the PAR clip technology which is a direct RNA protein crosslinking in vivo, you can show that all these transcription elongation factors would bind to nascent RNA and so they are probably part of such a gene body condensate. Okay, I want to briefly discuss pausing and release and then come to an end because we don't understand much about nucleosome passage yet. And that is what Bob mentioned. We now had to move from the yeast to the mammalian system. Why? Because this mechanism of promoter proximal pausing and release is metasone specific. So a lot of different labs including John Liss, Karen Edelman, David Price and Yamaguchi and others, they have come up with factors that are involved in pausing and release. And we could recapitulate this in vitro using only recombinant purified human factors and mammalian polymerase. And we could see that with DSIF and the negative elongation factor you stabilize the pores in vitro and we can release that into a negative elongation complex in the presence of the PATH complex and SPT6, both known to be elongation factors and in the presence of CDK9 which is a kinase that is part of the positive transcription elongation factor B. So we have the in vitro system. Now we can make these complexes in large quantities. We got the cryoEM structures and I will show you now we're moving from the initiation complex. I will show you the pause elongation complex here. DSIF is in green and NELF in color. And when we zoom into the active site we see immediately why that structure is paused. And that is because the DNA template base which should normally as is shown here in silver point towards the substrate nucleoside triphosphate is not available for Watson-Crick base pairing. It's rather involved in a base pair with the RNA nucleotide at the end of the transcript. And so the entire DNA RNA hybrid where you have base pairs between DNA and RNA is actually tilted. It's in a tilted state and when it's tilted the nucleoside triphosphate substrate cannot bind and Watson-Crick base pair and that explains why the structure is in a paused state. And what was really exciting that just before us, I don't recall, maybe two months before also, several groups from the bacterial community published a structure of a paused bacterial elongation complex and it virtually looks identical. So on Friday when I was talking to Seth Dars, the Rockefeller, he told me that they superimposed that and basically as long as you zoom into the active site like this you wouldn't see a difference if I show you the bacterial structure, it would look like that. So it seems to be a fundamental paused state of the enzyme. But why is it left like this? Because we showed already in 2012 that TF2S can actually rescue such tilted states by realigning the DNA RNA hybrid and in case there would be backtracking it can also cleave this backtracked RNA fragment so it can reactivate the complex. So why is TF2S not coming and reactivating this paused complex? Well that's actually the role of the negative elongation factor. Because negative elongation factor will bind to a region that we call the funnel and the pore. It's actually the so-called secondary channel in the bacterial enzyme and by binding here it will block access to TF2S. And that actually beautifully explains again data from Bob Landig's lab. I think it was in 2005 already where you showed that NERF actually can interfere with TF2S function and we have now the structural explanation here. So how does the released and activated elongation complex look like? Because when we understand that we can understand the switch between paused and activated. So here you have it DSIF in green, SPT6 in blue and this is now the PUF complex. Again a multi subunit complex involved in elongation. We look into the active center. So here we use different nucleic acids, right? So we don't visualize directly the straightening of the hybrid. But what we do see is that it's an active conformation so everything is okay. We see now the DNA base pointing to the open binding site for the nucleoside triphosphate. So now you can base pass an active conformation as you would hope for. And now we compare the paused complex to the released and activated complex. And you see that the negative elongation factor in red is binding to a site that overlaps with the binding site for the PUF complex. It's not the same site but if you would superimpose the structures you would see clashes. So in other words you can either bind NELF to the polymerizer or PUF but you cannot bind PUFs at the same time. At least if they bind the way we see. So that explains that you either have the stable paused complex or you have the activated complex. But now how do you switch to the activated complex? What is the switch to switch from paused to the activated transcription complex? Well that is as I said before PTFB and the CDK9 kinase. And we could now see how the phosphorylations would switch from the paused to the released complex. We mapped a total of 49 phosphorylations on all of these factors also on polymerase. And most of them cannot be mapped because they are in linkers and disordered loops. But those two are very exciting and very prominent. They are in this linker that leads to the C-terminal domain. Remember the long intrinsically disordered tail-like region. And those two phosphorylations are needed to bind a domain of SPT6. The tandem SH2 domain and increase the affinity for SPT6. And those two phosphorylations will recruit SPT6. It was actually shown by Chris Hill's lab before using crystallography of just this region that a phosphopeptide can bind to this domain. And this was extremely good for us to have because on these outer regions the resolutions that we get are not three angstroms like we get in the centers more four or five angstroms. So it was very good to have the atomic details from them to make sure that this is the structure that we see. Now I want to come to one important point and if you want to remember something from the top, the bottom line and what I show you now is maybe the most important thing. And that is the question, very old question. How can you regulate genes at the level of elongation at all? Because you should regulate genes at the level of initiation because if you want to increase the number of mRNA molecules or decrease the number of mRNA molecules, you have to increase or decrease the frequency of initiation. When over time you initiate more frequently you have more polymerases traveling, you have more mRNA. And at the elongation step it's not clear why that should change the output of the RNA. At least not in a system where it takes 40 minutes on average to transcribe a gene, which is the case for the human system, right? The gene of average length takes 40 minutes to be transcribed. So how do you do that? What we did to answer that question is we now developed a multi-omics approach. So we combined two functional genomics methods in order to test this idea that CDK9 is important to decrease the pause duration and to release the pause polymerase into the elongating form. And we found something very exciting that we, you know this is serendipity that we didn't expect. So we were trying to measure how does the pause duration change when we inhibit the CDK9 kinase and PTFP. And you can measure the pause duration by multi-omics combining occupancy profiling, the so-called net-seq technology that we used, with the RNA labeling approach, the so-called TTC protocol that we published like four years ago. When you combine the two, you have occupancy and you have functioned genome-wide, and you can now use kinetic modeling to extract the parameters for pause duration, but also for the initiation frequency. So how often polymerase starts? And this is shown here that basically summarizes all this work. Here you have the pause duration on the y-axis, and you see when you inhibit CDK9, then genes tend to have polymerases that pause for longer. This was what we wanted to show, we were happy, but it was also expected. And this is something for the students, if you find something expected nice, but if you find something unexpected even more interesting, if it's reproducible. So it was expected to extend the pause because we knew CDK9 is needed to release the pause, so pause duration went up. But look what happens here, the initiation frequency is going down. So when the pause gets longer, the initiation frequency is going down. And this is the essence of the model. So we knew that in order to release the pause, you have to have PTFB and convert it to a fast elongation complex. But what is new is that the pause complex is somehow impairing new initiation. And this is now so beautiful because it explains how you can switch genes on and off at the stage of pausing, because it has a feedback on initiation. So if you make the pause longer, the initiation frequency can go down. If you make the pause shorter and you activate it by bringing PTFB, you can have higher rates of initiation and you switch the gene on. So that was beautiful. And we actually tested that hypothesis. We took human cells, we used the heat shock response that has been used by John Liss and many others. And we actually showed directly, you can see that even when you look at all the genes that are up-regulated, you show directly not only that the initiation frequency goes up. That's trivial. That has to be the case. But also that the pause duration is now decreasing. Why is it decreasing? Because some genes are already transcribed at what we call the pause initiation limit. And when the gene is at that pause initiation limit, you cannot increase the initiation frequency further. You can only increase it further when you shorten the duration of pausing. You have to make the pause shorter and then you can increase the initiation frequency. And this is why the experimental data actually follows this theoretical limit, actually a limit that was predicted simply based on, you know, kinetic modeling. No data at all by the lab of Jesper Schwarzstrupp. Beautiful paper from 2013 that is largely ignored. Very few citations, but a very important conceptual paper. And all of this is dependent on CDK9. So now when we inhibit CDK9, you see you hardly have any activation. This is one thought that's just above. So you need CDK9 to decrease the pause duration and allow for high initiation frequencies. And that's the simple take home. When the polymerist pauses here, it will impair new initiation. So initiation frequency is going down. When you recruit PTFP, you can convert that rapidly into an active complex. You free that site and you can have a higher rate of initiation. And now I want to conclude with a model for all of this, so that you have something to, you know, think about. And that involves even one more step. And now it's again speculative but very exciting. And that is now the plus one nucleosome. Remember I showed you in the beginning that after the nucleosome depleted region, there's this positioned nucleosome. So what is the role of that nucleosome? When we now take our data, you know, CDK9 inhibition, the human genes, the pause durations, and we separate the data set into two groups and we say these are genes that strongly respond to CDK9 inhibition and those respond weakly. Then the strong responders are the ones where you have pause limitations, right where you need CDK9. They have a better positioned plus one nucleosome. So this is M&A's data. You have a higher peak for the plus one nucleosome. So we solved the structure of polymerase sitting here and a nucleosome just downstream sitting in the right, you know, distance from the polymerase. And it looks like that. You see the nucleosome here and the polymerase. So polymerase is just downstream of the plus one nucleosome. So maybe it's actually contributing to the pausing or the stabilization of the pausing. And now because we have in functional genomics all these beautiful average distances from the entire genome, we can make a model of the entire situation and that summarizes everything I said so far. At the beginning of a PUL-2 gene, we have the pre-initiation complex at the transcription start site. We have the pause elongation complex at 50 base pairs downstream and then the nucleosome just downstream. And now we put this structure that I just showed you together with the pre-initiation complex using that distance here. And so now what you see is a model. It's not a structure, but it's a very good model because you use the in vivo distance. And you see this. So pause complex in front of the nucleosome and here the pre-initiation complex just beginning to clash. So the in vivo distance between the start site and the pause site allows you to, in principle, have both complexes. But you begin, it begins to clash here. So if the pause enzyme is a little bit closer, there's already no way to initiate. But even if it sits here, it's very hard to imagine that you initiate because remember that's the translocase in TF2H. The DNA needs to be brought into the active site. So one to two turns of DNA needs to be dragged in and that would, of course, increase the clash here. It would really cause problems at that stage of opening. That's speculation but it's a very intriguing model that we actually don't think about these processes as independent processes. But chromatin, pausing and initiation are happening in a distance where you actually have contacts. And so they can influence each other. And now look at the kinetics. Kinetics are extremely important because if you only look at the image, it doesn't tell you anything. It just tells you it fits there, right? But we know not only from our data but others have published this that the pause complex lives in the range of minutes. It can be two minutes, ten minutes, but it's minutes. And the pre-initiation complex shown by others using high-resolution light microscopy and live cell imaging lives for seconds. So that is beautiful because you can use the pause to regulate initiation events because in principle every few seconds you can form a pre-initiation complex and begin to transcribe. But the pause lives for minutes so it can repress that initiation frequency. But if you deliver a lot of PTFP and you make the pause duration short this can go down to half a minute or so and then you can have a lot of initiation events that you can up-regulate the genes. This is what I wanted to tell you today concerning nucleosome passage. I will skip it but basically the answer is that we don't know exactly how it works and the reason is we understand the remodelers on there and so forth but the reason is you have to somehow explain when the polymerase goes through the nucleosome that you grab the histones and you transfer them to the wake of the polymerase because if you don't do that just one polymerase would wipe out the entire information on the chromatin modifications and certainly that's not happening and you don't want that. And so this is something for the next decade we want to have a movie on the polymerases going through chromatin and shuffling the nucleosomes to the back. And we hope to serve the community. This is a summary now of what I told you by filling in these gaps of our understanding by providing molecular mechanistic information but also in the future more and more kinetic information. I don't need to explain some of you here who have worked in the bacterial transcription field how important kinetics is and thermodynamics but in the end it will be the same thing. It's an enzyme that is allosterically regulated and all of this. But the beauty is that those biophysical mechanisms underlie the development of an entire human being and differentiation and immune response and all of that and this makes it so exciting you can go from biophysics to genetics to developmental biology. So I want to thank all the people who have been involved in this work and of course before those heroes who did the work that I presented today. There were many, actually dozens of PhD students and postdocs who did the groundwork who made factors, who made smaller structures that we incorporated. But those are the people involved in the work that I showed today or heading the work that I showed today. Svetlana did the first pioneer structure on the nucleosome. Felix did the risk complex on the nucleosome. Haibosov the saga co-activator. Sandra did the structure of the entire pre-initiation complex with core mediator and TF2H. Christian looked at the distortions in the DNA and the opening. Marc is the one who did the phase separation study that revealed this role of the CTD in polymerase clustering. Kerry and Sechelle, they both were postdocs from the US. Kerry actually stayed in Europe and she went to Vienna with her professor there. And Sechelle has just started her own lab as an assistant professor at MIT, so she went back to the US. Kerry has done the first structure of a mammalian, Paul II enzyme in our lab, very high resolution, also with the DSIF elongation factor. And then Sechelle has done the wonderful both biochemistry and structural work on pausing and release. She made these complexes of the structures and actually Lukas helped her with the post elongation complex. Lukas is also responsible for the structure with the nuclear storm in front of the polymerase. And then Saskia and Björn, she's the experimentalist. He's the mathematician. They developed a multi-omics approach that allowed us to extract kinetic rates from functional genomics data. And Dimitri is Dimitri. So he's the wonderful programmer, EM expert that not only did the autoencoder work, but actually also programmed a package called warp that I heard is going to be installed here too. And this actually closes the gap between the electron microscope camera, the detector, and reliant, which is the commonly used processing software. So he made a program called warp that will close the entire gap so it will automatically pre-process the data from the camera so that you come in the next morning, you can feed it into reliant. And it actually shows you every minute as the images come in what the quality of your data is, which people find extremely useful. Henning is our most important collaborator because he's doing the mass spectrometry. All of the stuff that I showed you involves mass spectrometry and it's wonderful to work with him. Finally, thanks so much for staying with me for an hour. It's really great to be here, great to visit you Bob and thanks for being such a great host. And yeah, if you still have a few minutes I can try and answer your questions. Otherwise, thank you very much for your attention. So we'll take a few minutes here for questions and as you're formulating them, let me remind you we raise your hand. I want to shuttle the microphone around. Join us afterwards for the reception if you don't get your question answered now. And also, I just want to remind you there's a talk tomorrow from Patrick in biochemistry, 330, 12, 11 in biochemical sciences where if you're really interested in knowing some more of the details of how all this really elegant work is done he'll be describing that. So with that, are there questions? Thank you. My question is about the correlation between elongation and epigenetic modifications. Would that be the key for the last question you asked? Certainly the modifications contribute to it and not only that I'm not talking just conceptually. First of all, they may be used to recruit the right remodelers, CHD family remodelers. Third, the modifications in the wake of polymerase like K36 trimethylation recruits other factors which may stabilize nucleosomes because otherwise you run the risk of having extended nucleosome depleted regions and then you get a barren initiation which you don't want. You have to protect your transcribed region. So they will have many roles and most of it probably recruitment and fine tuning of activities. And polymerase as you know itself is putting marks in because it's recruiting set one and set two. Set two actually binds directly to the C-terminal repeat domain when it's phosphorylated. So basically the phosphorylations in the CTD recruit the methyltransferase which then methylates the histone tails which then recruits for example PWWP proteins. So very exciting the interplay. I think it's a feedback loop. Chromatin and transcription factors work together to maintain an active gene until the activation signal ceases because the transcription factors degraded or phosphorylated or so and then the whole system breaks down and you activate another gene. Hi Patrick. yeast I believe has a plus one nucleosome but it doesn't have a pause at that position. What do you think? So I was oversimplifying when I it's a very important point thanks Dave for pointing this out because the arrays were in yeast and there the plus one nucleosome is actually normally overlapping the start site a little bit and as you know from the work from Wolfram Hertz and others you have to first shift it to make that promoter then accessible and it's different in human as I showed it here. So yeah it's true and that you know what is cause and consequence you could also argue because polymerase is such a strong machine it goes to the pause site and you know plus one nucleosome ends up where it is that's also possible. Polymerase has developed 20 piconewton force which I was told from physicists is a very strong motor and it's also possible. Beautiful talk and many beautiful figures I'm actually interested in the part that don't have beautiful figures such as the C-terminal domain giving rise to the phase separation in the pre-initiation complex as you had 46 proteins that are well defined in their structure which was different from the 70 that you mentioned as being total would you speculate that the 36 remaining ones are also to phase separations? No it's actually known that the remaining ones do adopt structure but it's true that there's a higher proportion of intrinsically disordered regions in them maybe that's part of the reason why we still have problems with expression and so forth for example the mediator tail motor has huge intrinsically disordered regions and they have been shown by Kornberg and others to interact with transcription activation domains so the answer is yes there's a lot of intrinsically disordered regions but there's also structure domains so most of these proteins have pores but I didn't have time to show you the plot if you take the entire proteome and you ask the question now give me all transcription factors and compare that to all the other proteins and what is the amount of intrinsically disordered region to phase as much as for all the other proteins so it seems especially important in the nucleus which is super crowded a lot of competition for factors you need to bring a lot of factors there quickly in the T-cell response we could see with our labelling after five minutes after you add the LPS or whatever you know my sin after five minutes we can see enhancers transcribing which were totally off so after five minutes everything is there to start transcription and I think it's because of these intrinsically disordered regions you could do one more question if there is one people are thirsty and I went over time that's okay and so if there's not please join us outside for the reception