 Okay, so thank you everyone for coming to today's Everson lecture, and it's my pleasure to welcome Deepa Sachitl from Iowa State University who's gonna give that Everson lecture. I don't know if it's Everson or Everson. Does anybody know? I guess we say Everson in Wisconsin. I'm gonna first gonna say just a few things about Gladys Everson. She's born in Wisconsin and she completed her undergrad degree here in 1931, and then a master's at the University of Iowa. She then returned to Madison to work on her PhD with Harry Steambach, which she completed in 1942. Her research interests focus on the relationship between dietary deficiencies and birth defects with the emphasis on the role of magnesium and copper and fetal development. So after her PhD here with Steambach, she moved to become a faculty member at the University of California at Davis from 1953 to 1969 where she served as chair of her department and raised concerns about the general need for more exercise and better nutrition to improve health. I think all those things are still true. So when she passed away in 1969, she bequeathed her estate to the Department of Biochemistry and that has been used to support this Gladys Everson lecture ship in biochemistry and this lecture ship is for former associates of our department, including former graduate students, postdoctoral fellows, and faculty. So now I can introduce Deepa. Deepa got her undergraduate degree at the University of Michigan. She came here in 2001, 19 years ago already. She was one of my first graduate students. She worked on the structures of U6 and U2 splice isoma RNAs by NMR with me and Dave. And from that she moved on. She graduated in 2006. She became a Damon Runyon postdoctoral fellow at UC Berkeley with Jennifer Dowdna and this is where she started working on CRISPR. She was there until 2011 and then she did a second postdoc or sort of advanced training period at Scripps Research Institute with Jamie Williamson's lab where she worked on electron microscopy of ribosome assembly. And so I think those training experiences, you might see a little bit of both of that, the EM and the CRISPR definitely today. She's really done fantastic work in her own lab which she started at Iowa State University in 2014. So she's been there six years. She's an associate professor with tenure and she's won some awards already. She told me not to read any of those. So I'm just going to let Deepa tell you her story. All right. Is this on? Thank you very much Sam for that very nice introduction. It's really great to hear about the namesake of this lecture shift. I didn't really know much about Professor Everson but I'm really honored to be selected for this lecture shift or to give this lecture today. It's unbelievable to be back in my alma mater department. I really love coming back to Madison obviously who doesn't love to come back to Madison. But also I have such fond memories of this department so it's really great to come back and visit with a lot of familiar faces as well as to meet a lot of new people. And I was actually pretty lucky to get to train in Sam's lab and be one of his first graduate students as he mentioned. And we actually had a little bit of a reunion at the terrace in August of last year when I was visiting for the Phaegis conference. And so this is me and Sam as well as two of the other founding members of the Butcher Lab, David Staples and Nick Ryder. So we were all three graduate students who joined together or joined the lab together in January of 2002. And just to give you an idea of what we look like back then. This is about four months after we joined the lab standing in front of the 750 megahertz NMR machine. And so David and Nick were also exceptionally great colleagues as were everyone in the Butcher Lab and really all of my classmates as a grad student. So for those of you and I have now, I guess my advice to you is to keep in touch with each other that you're gonna make some great friends who you can kind of keep up with their careers for for the rest of your career. So I'm really excited today to talk about some of the work from our lab which studies CRISPR cast systems. And of course I'm sure most of you are very familiar with CRISPR because of its use in biotechnology. And we do have a strong interest in the biotechnological aspects of CRISPR. But my main interest really in CRISPR is in kind of its native function in bacteria and archaea as a defense mechanism against various types of mobile genetic elements and including viruses. And so I want to start today by kind of setting that stage and telling you a little bit about the interactions between phages and bacteria. Just to kind of give you a sense of why it's so important for bacteria to be able to defend themselves against phage. And so this is just a micrograph showing a bacterial cell on the bottom being kind of infected or attacked by these viral particles. So each of these little circles is a different viral particle and what they're doing is injecting their DNA genomes into the bacterial cell. And of course like any virus what's going to happen then is that they're going to kind of hijack the machinery in the bacterial cell, use that machinery to replicate themselves, create new viral particles and eventually what will happen is that this bacterial cell will lice and burst open and give forth all of those variants. And so that's of course quite catastrophic for that individual bacterial cell but it's also quite problematic for the population as a whole. All of the bacteria that are in the population that this cell exists in are also now subject to viral infection. And so viruses, sorry bacteria just like any other living organism has to have a way to defend itself against these type of opportunistic pathogens. And so these types of interactions are really quite important in a number of diverse ecosystems all over Earth. It's estimated that there's somewhere on the order of 10 to the 30th bacterial cells on Earth and as a result people have estimated that there's somewhere on the order of 10 to the 31st bacteriophage particles. There's usually a 10 to 1 ratio of phage to bacteria. So that's not an easy number I think to kind of visualize in your head, basically 10 with 31 zeros after it. But just to give you a sense of the importance of these type of interactions I would encourage you to think about things like the human microbiome or the ocean microbiome where the populations of bacteria that are present in both of these very complex ecosystems are really subject to predation by bacteriophages. And so it's been shown that in the microbiome, the human microbiome bacteria, bacterial composition is actually dictated in some part by the phages that are also present in the microbiome. So it's interesting to think about the fact that there are phages in our guts right now as we stand here. In the ocean microbiome I think one of my favorite all of these stats about phages are always quite fun interesting. But one of my favorite is that the biomass in the ocean is turned over about 30% of the biomass in the ocean is turned over on a daily basis and that's mainly due to killing off of bacteria by bacteriophages. So I hope that kind of gives you kind of sets the stage for the importance of bacteria to be able to fight off viral infection. And of course bacteria have evolved many different ways to defend themselves. But I would argue that out of all of the ways that we understand so far, CRISPR-Cas immunity is actually one of the most sophisticated mechanisms. And that's really because this is what we kind of consider to be an adaptive immune system. So by adaptive I kind of mean that has some of the hallmarks of our own adaptive immune systems in the sense that it's an immune system that provides a memory of an infection event that immunizes the cell against future and reinfections by the same virus in this case. And also that this is a specific mechanism against a specific virus. And so the way that this works is through a process called adaptation in which a protein called a CRISPR-associated or CAS protein recognizes a short piece of DNA or takes up a short piece of DNA from some type of foreign DNA genome, in this case from a bacteriophage. And then it inserts it into this kind of signature sequence in the host genome called a CRISPR array. And so you can see that that short piece of DNA is being inserted into this CRISPR array between these black diamonds. And those black diamonds are repeating DNA elements. And so this uptake event, this DNA uptake event is basically the immunization event that causes this bacterial cell to now be immunized against this particular phage. And so how does that actually work? What's the immune response? Well, so the first step that has to occur is that this CRISPR RNA is actually transcribed in the processed or this CRISPR is transcribed in a process to form CRISPR RNAs. So the actual functional unit of the CRISPR are these short RNAs, each of which bears one of those short sequences that was originally derived from some foreign genetic element. And these can now associate with other CAS proteins to form effector complexes. And within this effector complex, this CRISPR RNA is basically going to act as a guide to now guide the effector complex to find that matching piece of nucleic acid from which the sequence was originally derived. Once that target binding occurs basically through complementary base pairing between the RNA and the complementary nucleic acid, you can have some type of cleavage event and eventually degradation of that nucleic acid, which effectively neutralizes the infection. So if you have to have the genome in or fully inserted into the cell in order for replication to begin, this will of course neutralize that infection from occurring. So in my lab, we're really interested in all of these three different stages of the CRISPR-Cas immune system. Today, I'm going to talk about two different stories. In the first half, I'm going to talk about adaptation. And in the second half, I'm going to talk about interference. And so in the first half, I want to talk to you about one particular protein, one particular CAS protein called CAS4. This protein has actually been one of the more mysterious CAS proteins in terms of our understanding of what it actually does in CRISPR-Cas immunity. And in the last couple of years, my lab has done a fair amount of biochemical and structural work to define the role of CAS4 in high fidelity programming of the CRISPR array. So basically in the uptake of that molecular memory into the CRISPR array. And then in the second half of the talk, I'm going to switch to talking about interference. And I'm actually going to talk about one of the proteins that's often used for biotechnology, which is called CAS12A. It's also known as CPF1. And we've been really interested in the specificity of this protein. But in studying the specificity of this protein, we've actually found that it has a number of unexpected activities. And so I'm going to tell you about that in the second half. But before I get started on telling you about CAS4, I want to kind of talk about one aspect of this mechanism that I didn't really focus in on when I introduced it to you. This is actually my favorite part of this whole process. I think it's a really fascinating part. But I've only depicted it here as this one little arrow. But what's happening in this arrow is a really complicated process by which this effector complex is searching the whole cell for this one kind of needle in the haystack, this one sequence that's complementary to the CRISPR RNA. And that's a really complicated process from just a molecular recognition point of view because this sequence that needs to base pair with the RNA, if it's a DNA sequence, is most likely in a double-stranded DNA duplex. And if you also think about the fact that there's a whole heck of a lot of DNA in a bacterial cell, this really raises the question of how this process can occur in a timely manner. So if you think about a typical DNA binding protein, most likely it's going to recognize let's say a major groove, site a sequence through the major groove. In that case, you might have some kind of scanning mechanism. You might have just kind of collisional searching. But what's going to eventually happen is that it's going to just find that sequence just by interacting with the DNA itself. In this case, what has to happen is that these cast effectors have to go around unwinding DNA in order to find the piece of DNA that actually matches to the RNA. So you might imagine that that's an inefficient process and also potentially an energetically unfavorable process. And so what these cast effectors actually do is they kind of simplify their search by searching instead of for that complementary sequence, instead searching for these kind of signature sequences that are located next to the complementary sequences. And these are called PAM sites and these are highlighted in yellow. So these are generally short sequences somewhere between 2 and 5 nucleotides in length. And if the cast effector first searches for these PAM sites, it really simplifies the search by limiting the number of locations at which these cast effectors actually have to unwind the DNA. And so this is one of the things that I've been very interested in throughout the time that I've been working on CRISPR. I'm not going to actually talk about any of the work that we have done on this, but just to kind of give you an example of how this actually works, one of my first students, Chou Yu, used single molecule FRET to kind of look at how cast effectors can search DNA in an efficient manner. So in this experiment, basically we've just immobilized DNA with a Psi-3 label on it and then flowed in a cast effector with a Psi-5 label on it and then looked at how that cast effector interacts with the DNA. And we used three different targets in this experiment that contained different number of PAM sequences and these don't have any complementarity to the CRISPR RNA. So we shouldn't expect to see a full binding event. And what you can see from the single molecule traces is that when we don't have any PAMs present, we see these really, really short, quick interactions between the cast effector and the DNA, just kind of these like single frame interactions and they actually occur on the DNA in a completely random manner. So they're not binding in any particular location. But as we increase the number of PAMs, we now start to see these longer dwell events and these are located at particular locations. And so what we're actually seeing here is basically these cast effectors interacting with the PAM and kind of dwelling for a little bit longer than they would normally dwell when they're searching the PAM in this random manner. And what we think is happening is that they're actually starting to unwind the DNA and determining whether or not there's complementarity with the CRISPR RNA. And so our work as well as lots of other people have kind of come up with this model for how this works. Basically the cast effector interacts with the PAM. It begins to unwind the DNA. And if there's complementarity with the CRISPR RNA, this RNA DNA duplex can basically zip up. The formation of the RNA DNA base pairs can offset the penalty, the thermodynamic penalty of unwinding the DNA. If there isn't a match basically the cast effector can disassociate and continue on with its search. Okay so let's go back to our cartoon now. What I didn't show you earlier in this cartoon is that this target has to have a PAM next to it in order for target binding to occur. And what that means is if you remember that this phage is the same as this phage, this target or this piece of DNA also has to have that piece of or that PAM next to it. What that means is that during adaptation the proteins that are involved in adaptation have to take up a piece of DNA from a location that contains a PAM. And so one of the questions that we've been really interested in is how is that defined? So how do CRISPR cast systems make these spacer sequences that's the location in the CRISPR array that is actually functional and going to result in DNA binding by the cast effector? And there's a lot of considerations when we think about this. Not only does that piece of DNA have to be taken up from a location with a PAM, but the PAM actually has to be removed from this substrate prior to integration. And I'll tell you why that is because once the spacer is actually integrated into the CRISPR array if there's a PAM present in that you can basically have a DNA binding event at the CRISPR by the cast effector. So if the PAM is still present on that piece of DNA the cast effector would be able to bind to it and that would mean that the cast effector can bind to the host DNA and you might have auto immunity. And so what happens instead is that the PAM is removed from this substrate prior to integration. But now this raises another question which is how does this protein know which orientation in which to integrate this piece of DNA? So if we think about the fact that this is going to eventually be a template for a CRISPR or for an RNA it has to be integrated in the correct orientation in order to basically synthesize the correct strand of the RNA. If it's integrated in the opposite orientation we're basically going to have an oppositely oriented CRISPR RNA that will not target the correct strand of the DNA. So these are the three questions that I think have been kind of open in the field and quite confusing and I think biochemically and biophysically interesting. So one of the questions that we sought to address is what is some of the machinery that's required for these steps to occur. And if we take a close look at the proteins that are present in CRISPR-Cas systems we can kind of divide them between the three different stages of CRISPR-Cas immunity. And we can see that there's actually a lot of similarity between CRISPR-Cas systems in terms of what machinery is present for adaptation. So it's thought that adaptation occurs through kind of a universal mechanism for all CRISPR-Cas systems in which these two proteins Cas1 and Cas2 are required for integrating those pieces of DNA into the CRISPR array. However what you'll notice here is that there's also this additional protein Cas4 and this protein is pretty widespread in CRISPR-Cas systems. They're found in somewhere on the order of half of CRISPR-Cas systems but their function have really been kind of mysterious. So what do we know about Cas4? Well I want to point out that it's not named Cas4 because it was one of the first four Cas genes that was identified way back in 2002. So Cas1, 2, and 3 are really well understood at this point as our Cas5 through Cas13 or whatever. But Cas4 still remain kind of mysterious. What was known right away is that it had a recB-like nucleus motif. It's most likely some kind of nucleus. It was eventually implicated in adaptation in various CRISPR-Cas systems and biochemically it was shown to have different types of nucleus activities including 5 to 3 prime and 3 to 5 prime exonuclease activity, endonuclease activity, and even DNA and winding activity. So all of these evidence kind of suggests that it might have some role in basically the biogenesis of the substrate that could be used for by Cas1 and Cas2 to integrate the substrate into the CRISPR array. And so when I started my lab one of the things that we were interested in doing was to kind of look at some of these less well understood CRISPR-Cas systems that were not as well studied as the more common ones from E. Coli and Cas9-containing ones. And so Hayoon Lee was one of the first graduate students who joined my lab and she began to study a system that contained a Cas4 protein and got really interested in understanding whether or not Cas4 interacts with the rest of the adaptation machinery and also in answering this question of whether or not Cas4 is involved in pre-spacer processing. And Hayoon graduated in December of 2018 and since then a new graduate student, Yook-Di Dingra, has been working on this project. So I'm going to answer the questions in this order. So first I want to talk about whether or not Cas4 is involved in pre-spacer processing. And so this took a lot of work but I'm just going to summarize it in one slide here. Basically what we wanted to ask is if we feed our adaptation proteins a substrate that's longer than what's typically integrated into a CRISPR array, does it get cut down to the correct size by the proteins that we purify from this system? And so what Hayoon did was she purified the three proteins, the three adaptation proteins, Cas1, Cas2, and Cas4. She purified them separately and incubated each one of these with a substrate that looks like this. And none of them seem to do anything to that substrate. So even though all three of these proteins have been shown to be nucleases, under the conditions that we're testing here, which is pretty low concentration, we actually don't see any nucleus activity against any of these, or against this DNA. But now when we start to add them in combination we see something pretty different. When we just add Cas1 and Cas2 together we still don't see any activity against this substrate. But when we add in Cas4 now we can see a nice processed prespacer. And so is Cas4 required for processing? I would say yes, it appears that when we add Cas4 it is necessary for processing to occur. So the next question is, can this process product actually be integrated into the CRISPR array? And so we can kind of simulate a CRISPR by just providing the components of the CRISPR that are necessary for integration to occur. And this is an unlabeled piece of DNA so what's going to happen if integration occurs is that we're going to end up with a labeled DNA that's longer than than the original substrate. And indeed when we add in the unlabeled mini CRISPR we now see that this process product gets integrated into that CRISPR. And you can't really see it very well on this still so I'm going to show you a different gel. Once in a while what we would see is that if we don't have Cas4 present we actually see the unprocessed prespacer being integrated into the mini CRISPR. So what does that mean? So if you think about what I talked about earlier we have to have the PAM cut off for two reasons. One is to make sure that we're defining the correct guide RNA sequence but the other is to make sure that the PAM is actually not part of the CRISPR array. And so what's happening when Cas4 is not present is that we're actually having this kind of low fidelity integration occurring where the entire substrate is being integrated into the CRISPR and that's going to most likely result in a non-functional CRISPR RNA. So of course the other question that we were really interested in is whether or not Cas4 is required for selecting the PAM and also for processing the PAM. And so what we were able to do was to test a number of substrates where we had a PAM located right at the center of the substrate and then we basically varied the sequence on either side. And then Hayun wrote toward these beautiful sequencing gels and was able to define exactly at what location Cas4 was processing these DNAs. And in each of these gels I have the PAM sequence circled in black or in red and you can see that in each case, regardless of what the sequence is surrounding the PAM sequence, we only see one processing event and that occurs precisely upstream of the PAM sequence. And that's exactly where we would expect processing to occur in order to make a functional CRISPR RNA. And so what I've shown you is that in the absence of Cas4 we think that what might happen is that processing doesn't occur and as a result in some cases Cas1 can actually integrate an unprocessed prespacer resulting in a non-functional CRISPR RNA. However when Cas4 is present it can select a sequence with the PAM and it can remove that that extra sequence precisely upstream of the PAM resulting in a functional prespacer. Okay so to answer our first question is Cas4 involved in prespacer processing? Yes. Cas4 processes prespacers precisely upstream of the PAM and it does so in a Cas1 Cas2 dependent manner. So somehow Cas4 or in somehow in the presence of Cas1 Cas2 Cas4 becomes activated as a nucleus while in the absence of Cas1 Cas2 it doesn't appear to have any nucleolidic activity. So this gave us a clue that maybe Cas4 is actually directly interacting with Cas1 Cas2 and somehow by through that interaction it's actually being activated as a nucleus. And so of course the next question we wanted to ask was does Cas4 directly interact with the adaptation machinery? And actually one of the first experiments that Hayoon did was to just co-express all three of the proteins in E. Coli using his tag on the first protein Cas4 and just perform a pull-down experiment and when she did so she actually pulled down both the his tagged Cas4 as well as Cas1. So right away we had some good evidence that Cas4 and Cas1 interact with each other. Unfortunately in this experiment Cas2 was not pulled down with the other two proteins and this was actually a big surprise to us because in other systems Cas1 and Cas2 actually interact pretty solidly they're kind of a rock with each other. But for some reason in our system Cas1 and Cas2 seem pretty recalcitrant to interact with one another. So we struggled for a long time to kind of make this ternary complex between the three different proteins. Another problem of course is one of the reasons why this protein hasn't been biochemically characterized all this time is because Cas4 is not a very good protein to work with. It's pretty finicky so often when we would incubate these proteins together they would just precipitate out of solution. So it took a long time but eventually what Hayoon was able to work out some conditions under which we actually see this complex forming. And so just to give you a sense of the fact that Cas1 and Cas2 do not interact on their own this is just individually purified Cas1 and Cas2 run out on a size exclusion column. You can see where they elute and if you just incubate the two of them together they still both elute at those two same elution volumes. And what she of course eventually tried and we had tried this many times before was to incubate it with DNA and finally it worked for whatever reason and when we finally got those conditions that did work what we can see now is that we get this earlier eluting peak which indicates that the three protein or the two proteins and the DNA may be interacting with one another. So what happens when we add in Cas4? So now we need what we see an even earlier eluting peak and so if we take just the peak fractions and run those out on gels we can see that the Cas1 and Cas2 peak has both of the proteins and the Cas412 peak has all three of the proteins and both of the peaks also have DNA. So it looks like we have this nice complex of protein and DNA. So what do they actually look like? So you guys are getting all this wonderful cryo-EM instrumentation here. We're still working on some negative stain stuff but we're getting to the point of being able to do cryo-EM but just to give you a sense of what these look like by negative stain these are just 2D class averages of the two different complexes. This Cas1, Cas2 complex has pretty much the same architecture as what we expect for typical Cas1, Cas2s and nicely when we look at the ternary complex we see this additional density that's located on either side of this Cas1, Cas2 complex and so we thought that that might be the Cas4 and so we went ahead and solved three-dimensional low resolution negative stain reconstructions of these. You can see that the Cas1, Cas2 complex again looks very similar to previously solved crystal structures of this complex and when we look at the Cas412 complex what's interesting is you'll notice that the two Cas1s are actually dimers. However Cas4 is only interacting with one of the subunits of the dimer on either side so while we might have expected four to four to two stoichiometry of this complex we actually see a two to four to two stoichiometry and the locations at which these two Cas4s are binding are actually the two locations of the integrase active sites of Cas1. So Cas1 binds to a DNA along this surface and the two ends of the DNA are located in active sites on either side of the complex and so it appears that Cas4 active site is in close proximity to the Cas1 active site and we can see that a little bit better in this kind of close-up image. I want to note here that this is low resolution data so we can't unambiguously model the Cas4 structure into it but regardless of how we model Cas4 into this density it will end up with the active site of Cas4 in close proximity to this Cas1 active site and so this kind of gives us an idea of how Cas4 may kind of sequester a substrate away from the Cas1 active site until processing occurs and allow for the substrate to then be handed off to the Cas1 active site. And so one other thing that I wanted to note is so we have these really beautiful crystal structures of the Cas1-Cas2 complex bound to various substrates and products along the integration pathway this is actually the last step of integration after both integration events have occurred and what I when we overlay this or when we fit this structural model into our Cas4-1-2 complex what we can see is that the black DNA on the bottom which is actually the repeat part of the CRISPR array is mutually exclusive with where Cas4 is bound in our Cas4-1-2 complex and you can see this a little bit better in this close-up so basically we what we think might have to happen is that Cas4 would have to dissociate from the complex in order for the integration event to occur on the Cas4 end of the complex and so what that means is that Cas4 may have may define sort of one end of the prespacer by being bound and eventually dissociating from the complex in order for the second integration event to occur and so along those lines one of the amazing things about EM of course is that it can tackle sample heterogeneity and we actually had a fair amount of heterogeneity in our sample so the structure I showed you earlier with the two molecules of Cas4 in them was actually only about 50% of the particles that we had in that data set the other 50% of the particles only had one copy of Cas4 so these form these asymmetrical complexes and in fact ever since that first sample that we looked at the vast majority of the samples that we've looked at have mainly had asymmetrical particles or particles that don't actually have Cas4 so what it appears is that potentially there's some type of mechanism to limit the association of Cas4 with the Cas1, Cas2 complex and that potentially having asymmetry in the complex could help to define the PAM end of the substrate so if you remember back to the beginning we have to remove the PAM but we have to remember somehow the complex has to remember which end of the substrate had the PAM and so one way that that could be defined is through a protein component the protein component that actually is involved in removing the PAM and so of course going forward we are really excited about doing cryoEM on these complexes and while we're not going to be a state-of-the-art as you guys are we are actually getting some cryoEM instrumentation at Iowa State now so we're getting a ThermoFisher Glaciosis going to be equipped with a K3 direct electron detector and so I'm really looking forward to that and with that cryoEM what we really hope to be able to study is how is so the first question of course is how is Cas4 activated within the adaptation complex so I mentioned that we don't see any nucleus activity of the protein on its own but somehow in interacting with Cas1, Cas2 it becomes activated as a nucleus so we hope we'll be able to start to answer that question structurally how does Cas4 prevent premature integration so is it somehow physically blocking the active site of Cas1 and does it have to dissociate in order for integration to actually occur and then finally how does Cas4 actually hand off the substrate to the Cas1 active site following cleavage and so we have a number of different constructs that we're hoping to look at in the near future by cryoEM so with that I'd like to move on to talk a little bit more about interference and in particular I want to focus in on Cas12A so you may have heard most likely heard of Cas9 Cas12A is a very similar protein to Cas9 in that it's a single polypeptide that uses a guide RNA and can be programmed pretty easily to target basically any DNA sequence for double-stranded DNA cleavage and so this is of course in the interference stage of CRISPR-Cas immunity very important in order to basically create a double-strand break in the DNA that can lead to degradation of that phage genome and so I just want to point out a few differences between Cas9 and Cas12A so first of all Cas9 requires two different RNAs a CRISPR RNA as well as a transacting CRISPR RNA and it also creates a kind of a blunt double-stranded break whereas Cas12A creates or it uses only a single RNA a CRISPR RNA and it also creates kind of these offset breaks in the DNA and so otherwise they're fairly similar to one another and they're considered to be kind of orthogonal tools for genome editing and they're also of course both Cas effectors and carry out CRISPR interference in CRISPR immunity so just to give you a sense of how these are actually used for genome editing of course what's really wonderful about these tools is that they're extremely modular in the sense that in every genome editing experiment you're using the same protein the only thing that you're changing is the guide RNA to target your DNA of interest so essentially what you would do is select your target the region within your gene of interest that you want to create a mutation you can introduce your Cas9 or Cas12A protein as well as your guide RNA that will create a double-strand break at that location and then that double-strand break has to undergo DNA repair in order for the cell to survive during that DNA repair some type of changes introduced so this is of course a really really great tool it's one that works exceptionally well however I do want to talk a little bit about one of the downsides of this approach which is that the specificity of this tool is really defined by about a 20 nucleotide RNA sequence and if you think about genomes that are huge like a human genome which is several billions of base pairs long the likelihood that you're going to find a unique 20 nucleotide sequence that doesn't have any highly homologous site somewhere else in the genome is pretty low and so that really raises the question of whether or not these Cas effectors can bind to homologous sites and cleave those homologous sites and if they can what's going to end up happening is that you're most likely if you do have cleavage you have the potential to have some kind of error prone repair mechanism occur and create off target edits and so this is one of the things that we've been quite interested in in my lab but we don't only want to think about this from this biotechnological point of view because I think it's of course really important for us to remember that Cas proteins did not evolve to be used for genome editing they evolved to be used for bacterial immunity and so if we think about how they may have evolved their specificity we can think of kind of the golden lock scenario if it's really highly specific that might be really great for genome editing but it would actually be quite potentially detrimental for a bacterial immune system because bacteriophages can evolve quite rapidly and if they evolve to have a mutation somewhere in the target sequence they could very easily escape from this immune system so high specificity is probably pretty detrimental for bacterial immunity on the other hand low specificity which you might think might be great for preventing phage escape from occurring might also be potentially toxic to the cell because it could allow for targeting of the host genome so we kind of expect that most CRISPR effectors are going to have somewhere in the medium range kind of relaxed specificity that would mostly target the correct sequence but would allow for some level of tolerance of mutations within the target so medium specificity we expect to allow broad targeting while also limiting toxicity and so of course this is something that's been studied with great interest because Cas9 has such great potential and what's been shown is that streptococcus pyogenes Cas9 which is one of the most popular tools that's being used for genome editing does actually have a pretty high rate of off target cleavage in genome editing studies so this is a basically a high throughput genome editing off target study that was done by Keith Jong's lab what they found is when they looked at all of these different guide RNAs they actually found a number of off target sites for each one of them so basically at the top you have the guide RNA sequence and below is all of the off target sites that it hit one year later Keith Jong's lab did another study this time looking at Cas12a which was a protein that had just been discovered in 2015 and what they found I think quite surprisingly is that the protein is actually quite a lot more specific than Cas9 so for all of these guide RNAs that they looked at they barely saw any off target effects or any off target sites for any of these guides and so this I think is great for genome editing it suggests that Cas12a is a much better tool potentially than Cas9 but I would argue that it raises kind of this important question of how does Cas12a provide good immunity if it's so highly specific it seems like phages would be able to escape from it quite readily well one thing I want to note is that more specific for genome editing doesn't necessarily mean more specific for cleavage so in these studies what they're generally looking for is either the formation of a double strand break or an actual edit that occurs at an off target site and so I would argue that there's the potential to miss out on a lot of cleavage events that are occurring including things like nicking that may not be captured in these various techniques and so one of the things that we've been interested in in my lab is directly probing cleavage by Cas12a and determining how specific it is and then also in ongoing work if Cas12a is so highly specific how well does it confirm immunity and so this is work done by Karthik Morgan a graduate student in the lab and what Karthik does is basically takes a biochemical approach to studying Cas12a's specificity so this is just purified components taking purified Cas12a in a guide RNA along with a plasmid that contains a sequence that perfectly matches to the guide RNA and just does a simple bind or just a simple cleavage assay where we start off with negatively supercoiled DNA and if it gets cleaved on both strands as we expect Cas12a to do it will get linearized and that happens in a very quick manner and so what we're interested in of course is what happens if we introduce mutations in this target sequence and of course we can do that on a sequence by sequence level but that would of course be quite tedious and not very informative and so what we do instead is we mutate this target region in a randomized manner so we create a library of plasmid sequences each of which contains a different sequence in the target region and then we subject that to cleavage by Cas12a and so if we take that pool of plasmid sequences and subject it to cleavage what we can now see is that some of that DNA remains negatively supercoiled so some of the sequences in our pool of plasmids cannot be cleaved some of them still get linearized so some of them can be fully cleaved and then interestingly what we see is that some of that DNA gets nicked and it actually remains nicked throughout the course or the time course that we're performing here and so we performed this type of assay for three different Cas12a orthologs and so just to summarize the data this is just quantification of the amount of DNA in the nicked or the negatively supercoiled linear or nicked pool and what you can see for these three different orthologs that are most commonly used for genome editing is that we actually see pretty different cleavage patterns for AS Cas12a we actually see the least overall cleavage for LB Cas12a we actually see the most linearization and so right away we can see that there are pretty big differences between the three different the three different commonly used tools okay so of course what we're really interested in is which sequences are cleaved and so what we can do is go through and cut out each of these bands and then just PCR amplify the target region and submit those for high throughput sequencing and we get a ton of data back I'm not going to show all of it to you I'm just going to summarize some of it what I want to show you is basically just an overall view of the effects of having mismatches in the sequence so this is basically a distribution plot that's showing you the number or the fraction of the pool that had a number of sequence or how many mismatches that or how many sequences had a certain number of mismatches as a fraction of the pool and so if we just look to start off with at the original library and the NICT library you can see that they actually look pretty similar but keep in mind of course that we're actually amplifying the DNA here if we consider how much DNA was actually started out in either pool based on what we saw in the gel we can kind of normalize based on how much DNA was actually present in the fraction and so that's what we're showing here so this is what it looks like to start off with we have most of our DNA negatively supercoiled some of our DNA is NICT and over time what we can start to see is that DNA actually starts to be removed from the negatively supercoiled pool and some of it starts to kind of transition into the NICT pool and we can kind of see that continue over time and what we're really seeing is that sequences with between one and four or even five mismatches are depleted from the negatively supercoiled pool meaning that those sequences are actually getting cleaved and many of those sequences are actually now showing up in the NICT fraction instead and so the data I just showed you is for AS-CAST 12A that was the seemingly most specific out of the three orthologs that we looked at if we look at FN and LB-CAST 12A and especially if we look for the sequences that have a lot of mismatches we can see that the differences are actually mainly for those sequences with a lot of mismatches and so it appears that AS-CAST 12A really can't cleave anything with more than five mismatches whereas FN and LB-CAST 12A are actually fairly promiscuous and seem to be cleaving those sequences and I just want to compare this with SP-CAST 9 that's very commonly and considered to be low fidelity tool but you can see that it also doesn't really cleave anything with more than five mismatches so it looks a lot more like AS although for sequences with fewer mismatches it is more promiscuous okay so we of course want to validate some of this data and so what Karthik did was just clone some of the individual sequences that showed up in the NICT fraction and he tested those individually for cleavage by in this case LB-CAST 12A and what we can see is that the sequences with two, three and four mismatches do behave similarly to what we see within the library but to our surprise the sequences with five, six, seven and eight mismatches actually were not cleaved at all when we tested them individually so whereas they appear to be cleaved in this pooled library sequence or appear to be cleaved when they're part of a pool when they're tested individually they're actually not cleaved at all and we could also see something interesting when we kind of look at the rate of loss of sequences from either a negatively supercoiled pool or the increase of sequences in the NICT pool so here you can see that depending on how many mismatches are present we see different rates of losses of sequences from the negatively supercoiled pool and we see different rates of increases of sequences in the NICT pool however if we look at sequences with five through ten mismatches we actually don't see any differences in the rates at which they appear to be cleaved and in fact we don't see any cleavage of these for AS as I mentioned and again we don't see any differences in the rate at which they're NICT so what does that mean to me that suggests that the cleavage that's occurring here is not sequence specific so somehow when we have pool of sequences we're seeing some kind of non-specific cleavage that's going on for sequences that have lots of mismatches but when we test those individually we don't see cleavage of them so something is happening during library cleavage that isn't occurring during individual cleavage so one thing that I didn't mention earlier is that CAST-12A was recently shown to have this alternative activity when it binds to a target sequence it actually gets activated as a single-stranded DNA and this was shown by Jennifer Doudna's lab last year or two years ago now in that paper they basically showed that after CAST-12A cleaves its target it can just degrade single-stranded DNA however they showed also that there was no activity against double-stranded DNA in that paper but we wondered if what's happening in our pool of sequences is that we're actually activating CAST-12A as a trans-double-stranded DNA NIC ACE so basically in our library of sequences we have some sequences that can act as activators we have perfect targets or targets with one or maybe even two mismatches once CAST-12A binds to those it can then start to NICS some of those other sequences that are basically non-specific the ones that have five through ten mismatches and so we can test this quite simply by basically pre-binding CAST-12A to a target and activating it for potentially a NIC ACE activity so we'll incubate it either with an empty double-stranded DNA plasmid so this is a DNA that does not contain a target or with a single-stranded DNA as a control so this should just recapitulate the activity that's been observed earlier and what we see agrees quite nicely with what we saw in our in our library data so for AS CAST-12A where we didn't see much NIC-ing occurring at all we do actually see the single-stranded DNA degradation but we don't see any activated NIC-ing of double-stranded DNA however for FN and LB CAST-12A where we saw a lot of this non-specific NIC-ing we see quite a lot of non-specific activated NIC-ing occurring when CAST-12A is prebound to a target and I would argue that this activity is actually pretty much on par with the activity that's observed against the single-stranded DNA and so to kind of summarize what I've shown you in addition to its known rules as a targeted double-stranded DNA cleavage mechanism as well as trans single-stranded DNA cleavage we now also observe that CAST-12A can target and NIC sequences with many mismatches up to four mismatches and this might explain why it appears to be more specific in genome editing because NIC-ing of course is unlikely to result in some type of off-target edit but NIC-ing I would argue may actually be detrimental in a phage setting where it may actually kind of limit the ability of the phage to replicate its genome and we also uncovered this trans double-stranded DNA NIC-ing activity which may be an alternative mechanism in addition to the targeted NIC-ing and targeted double-strand cleavage to prevent phage replication and so these are some of the questions that we're asking going forward how does the mutational tolerance of CAST-12A impact its ability to slow phage escape? Is the ability to NIC targets that have mutations in them sufficient to provide protection against a phage? Does CAST-12A target activation improve immunity? This is a question that we're really interested in and of course if this targeted activation or this target activation leads to NIC-ing in a genome editing context might this result in unpredictable off-target edit so of course typically with an off-target edit we can predict where off-targeting might occur based on the similarity between the guide RNA and other sites in the genome however if this type of activity is occurring in eukaryotic cells during genome editing it's quite possible that any DNA that's close by to the target site might get cleaved by CAST-12A and might result in unpredictable off-target edits so with that I'd like to wrap up and I really want to thank members of my lab I talked about Karthik and Yukti's work as well as Hayoon who's not listed on the slide but there she is I really want to also thank our current Microscopy Facility and our research IT who have been really really helpful for the EM part of our projects and also Andrew Severin and Arun Sita Ramat the Genome Informatics Facility as well as our DNA facility for help with the high throughput sequencing and I want to thank these people for funding and thank you for your attention Okay so does that CAST-12 nicking activity use the same active site? Yeah So there's some conformational change that that is required I presume to make it NIC Yeah, so Jason Strand Yeah, so it's kind of I'll answer that in two ways Yes, it's actually thought to be a conformational change on the DNA that presents the two strands to the active site so the active site is kind of immobile and becomes solvent exposed when the second or when after the cleavage occurs the downstream product is released but the other issue that arises from the nicking activity being linked to the same active site is that it makes it a lot harder for us to study because we can't simply make an active site mutation to knock out that activity so that's one of the things that we're working on now is trying to find mutations that block the targeted or the activated nicking activity but don't block the targeted activity so we have some ideas of how DNA might bind following release of that product and we're basically making mutations based on that I'm intrigued by these PAM sequences so are they all the same and how many are the would there be in a typical bacterium? Yeah, it's a really I think it's a really interesting question because some cast effectors are really specific for their PAMs like Cas9 recognizes NGG so it's really on a a two nucleotide PAM so you could imagine that there's a one in 16 chance that any location would have that assuming that they're evenly distributed across the genome Why don't they evolve out? Yeah, so I think that there was some evidence in a paper once that a phage had evolved away from having PAMs but I think it would be pretty challenging for some of the CRISPR and in fact some of the more prevalent CRISPR cast systems so the type one systems are actually more promiscuous for their PAM sequences so in my lab we've studied type one E a lot which is the one that's found in E. coli and that actually has a three nucleotide PAM there's 64 possible PAM sequences but the cast effector can actually recognize somewhere on the order of 20 of those Do they have some other function? So the PAM probably has multiple functions in those systems because it's not just the cast effector that binds to the DNA but there's actually a transacting nucleus in that system that is responsible for degradation and that also recognizes the PAM so there's lots of different functions for it in that system but I would argue that if you have you know three nucleotide sequence and 20 out of 64 can still be recognized that it would be very difficult for any phage to really evolve away from that but one way in which phages can escape from immunity is by acquiring a mutation in the PAM so yes I think that that is a possibility in terms of one way to just completely escape from CRISPR immunity So do the systems that lack a cast4 have problems with the PAM recognition and orientation? Yeah so so I think it's unclear still how they recognize the PAM so in E. coli again that's one of the ones that doesn't have cast4 one of the things that's interesting about it is that it actually retains the last nucleotide or the first nucleotide of the PAM depending on which way you're looking at it so it only cuts off two nucleotides of the PAM and it retains one of them so that's one way in which the orientation could be easily defined right um however that's not a perfect way because there's only four possible nucleotides and if you know if you have a C on one end you're not gonna necessarily not have a C on the other end so that's not a great way to define it so it's still not even that doesn't make it very clear how that might work there are nice structures of the E. coli cast1 cast2 that shows that it recognizes that nucleotide but it doesn't really show that it recognizes the other two nucleotides of the PAM so how those are defined I think it's also quite unclear and it might be based on what the substrates are like where where does this pool of substrates where does this pool of DNA even come from in the first place so potentially those are enriched with PAMs on their end already or something like that but that's not really well well understood yet yeah so in your mutagenesis screen you told us how many mismatches there were but you didn't mention where they were yeah does cast12 have like a high priority seed? yeah I didn't show this because it's a lot but so we can obviously look at it at a sequence level and yes there is some evidence for a seed region so basically the PAM proximal region which is basically the first part that would base pair with the CRISPR RNA is more important than the rest of of the sequence and you can see here the the lighter colors the blue or white are sequences that are enriched and those are generally located within the seed region yeah there was I think there was a paper that suggested that cast12A doesn't have a PAM or I'm sorry a seed I think it's just that it's a really short seed it's it's possibly shorter than cast9 which I think is somewhere around the order of 10 nucleotides but it does have about a six six nucleotide seed and that's pretty consistent with what other people have seen as well you learn anything you learn anything about the targeted sequence in these aspects I mean is the targeted sequence seemingly random or you know or can you see any better interaction in like your adaptation assay with this form and one and two that would make a difference well so in this study we we tried three different CRISPR RNAs one this one that I'm showing here is like 50% gc-rich we had one that was very gc-rich and one that was very at-rich and we did see pretty big differences in their specificities and I think also just in their general targeting activities so I do think that the composition of the sequence is important but this is actually a question that I find very interesting like does the adaptation machinery have some way of detecting a good sequence versus a bad sequence and what is a good sequence and what is a bad sequence in terms of providing immunity my guess is that the adaptation sequence or adaptation machinery is not there's no evidence that it can recognize the DNA that it's taking up other outside of the PAM so the only specific interaction it should have is with the PAM otherwise it's just kind of a ruler mechanism for determining the length of the DNA any more questions okay then let's thank Deepa