 So I'd like to thank the organizers for giving me the opportunity to come and talk at this meeting. My lab is at the Scripps Research Institute. And what we're largely interested in is my group breaks out as sort of into three projects. All of what we're interested in, from one perspective or another, is evolution and trying to understand or manipulate evolution. And very much from a chemist's perspective, I'm trained as a chemist. And my lab largely takes a chemical approach. The three projects very briefly are one, trying to use concepts of evolution to design or identify and optimize novel scaffolds for antibiotic development, a very rigorous sort of chemical, physical approach to understanding the role of adaptive mutations to protein function. And what I'm going to talk about today are our efforts to develop an unnatural base pair with which to expand the genetic alphabet and eventually the genetic code. So there it is. That's the genetic alphabet. This is the genetic alphabet of E. coli. And if you can't see it, let me blow it up for you. So I gave this talk in China two years ago. And someone came with me afterwards and said, you know, you have a lot of nerve showing the Japanese flag in China. But of course, the genetic alphabet is just a long string of GCAs and Ts. To a chemist, DNA is actually a remarkably unremarkable polymer. It's really very simple. What makes DNA, of course, utterly unique and remarkable is selective base pairing. G selectively pairs with A and A pairs with T. Another material has that property. All the diversity of your life, all the diversity around us, in fact, all the diversity since the last common ancestor of all life on Earth is encoded in a four letter two base pair genetic alphabet. And when my group started at Scripps about 15 years ago, we were very interested in asking the question as to whether or not that alphabet and eventually the genetic code could be expanded. So we already heard from my good friend, Chiro, about his take on this. And we've both started about the same time. And we both sort of walked along together. And it's been becoming good friends with Chiro. It's been one of the biggest, greatest things that I've been able to, that this project's led into. And I really like at Chiro's work, in particular, what he talked about was using his unnatural base pair DSPX to evolve aptomers. And to get to a question that was asked earlier, those aptomers bound better than did the normal aptomers comprised of only GCAT. So they did, in part, novel function that wasn't available in the natural genetic alphabet. These experiments, to my knowledge, are the first example of genetic, of biopolymers, unnatural biopolymers that were themselves evolved without the intermediacy of a natural biopolymer. So I think there's an absolute landmark study. And for me, it also really provided the first real practical demonstration of the use of unnatural base pairs. So that's Chiro's in vitro work. What I'm going to talk to you about today in what's been the driving goal in my lab is to develop in vivo applications of an expanded genetic alphabet. And so if one's interested in that question, there's a variety of things you have to be concerned with here showing this organism. This happens to be an E. coli, so gram negative clear that there's a paraplasm with two membranes there. And so there's a variety of things that you have to be concerned with. You have to have a DNA element with an unnatural nucleotide in it. You have to have available within the cell requisite triphosphates of the unnatural nucleotides. You have to have a DNA polymerase that's selectively synthesized DNA containing that unnatural base pair, of course, with high efficiency and high fidelity during a replication process. Then you have to have the same story with an RNA polymerase. You have to have an RNA polymerase. You have to have available within the cell the triphosphate of the ribose. And you have to then drive transcription with an RNA polymerase. And then, presumably, that RNA polymerase, that message can go out in the cell with the ribosomes and with tRNAs that are transcribed with the corresponding cognitive nucleotide to reconstitute the unnatural base pair during decoding and drive protein synthesis. And the two things I do want to say at this spot was that I want to acknowledge the work of Steve Benner, who was really the first to talk about expanding the genetic alphabet in a practical sense and develop analogs that were designed to be orthogonal. And I also want to mention Peter Schultz's work, who worked on this part of the problem, where what Pete did was he developed orthogonal tRNA synthetase pairs recoding for the amber codon and drawing on orthogonal pairs from Janashie and Mazae, Archea, that could orthogonally recognize the repurposed amber codon. And I think that's absolutely beautiful work. In my personal opinion, I think it's a shame, and I don't understand why he hasn't won the Nobel Prize for that work yet, because I think it's some of the earliest and best stuff of what we would call synthetic biology today. But I also want to mention from the outset that we planned on stealing Pete's tRNA synthetase pairs. And instead of encoding them with amber codons, which allows you to incorporate one unnatural amino acid, we want to develop an unnatural base pair with which you could use to write a virtually unlimited number of new codons, hopefully for the incorporation of multiple different unnatural amino acids simultaneously into a protein, maybe in an evolvable contact. And I don't have time to go into this too much, but in terms of the sort of thing that one could do with that, protein therapeutics. If you haven't noticed the revolution that protein therapeutics has had that's caused in the therapeutic field, then you haven't been following anything. Because 70% of the INDs at the FDA, the investigational new drugs at the FDA two years ago, were proteins. There's an absolute change in how people are thinking about developing proteins. Yet proteins have only 20 amino acids. Now, we can argue for the rest of the meeting whether proteins need new amino acids for new functions. I would argue they do. If you look at small molecule therapeutics, things like electrophiles are the most common pharmacophore in a small molecule therapeutic. Yet they do not exist in proteins at all, electrophores. Things like metal binding centers, things like redox centers. Enzymes evolve that. Enzymes draw on that sort of activity by using cofactors for whatever they can. But that's an entirely different challenge to evolve proteins that have cofactors as well. So the idea of being able to evolve proteins with things like electrophilic centers for therapeutic applications are the long-term goal that drives my effort in my lab. So our effort, I want to introduce, and there's two things I want to introduce about the approach that we took to the challenge. And the first was that we wanted to try to draw upon a different force in hydrogen bonding. Chiro also alluded to this earlier. And everyone's familiar with water and oil and the fact that they don't mix. The hydrophobic effects makes oil want to get out of water for a whole bunch of complicated reasons, involving salvation and involving cavity size, involving entropy, and also packing, obviously, between the oil-like molecules. And I'm going to lump all of that sort of under the rubric of hydrophobicity. And so all of our analogs were designed based on oil wanting to get out of water. So the idea was that two hydrophobic, two very lipophilic nucleobases would pair with each other during replication and not want to pair with more hydro, with a more water-like natural nucleobases. And now if we're taking the H bonds out, we were in the early time, we were very concerned about just the stability of the DNA. So that's why the analogs were all sort of large. So we were going to replace the cross-strand hydrogen bonding with in-strand packing interactions. So that's one thing I want to say. They're all large in hydrophobic and that's why. Now the other thing I want to point out about them is there's a lot of them. So we did not want to take sort of what I view as sort of the traditional chemists approach of very carefully designing an analog and then making it and testing it. I've always been very inspired by medicinal chemistry. About a third of my group works on medicinal chemistry and I wanted to approach the problem very much the way, is this yours? I think I just turned it off or is this? Oh, okay. I've always been very inspired by the medicinal chemistry approach and so we wanted to simulate the approach. What we wanted to do was not make one but make lots and then develop assays to analyze them and then use those assays to develop structure activity relationships that feed back into the design effort and try to fuel that cycle to optimize analogs. And so there's a lot of different analogs here. One that I'll talk about in a few seconds is this perpinal isocarbon styro guy and this, but this collectively is what we sort of refer to in the lab now as sort of first generation set of analogs. So in terms of that assayer, the sort of assays that we used was of course just a stability assayer where we synthesized the analogs as phosphoraminites. We incorporate them into DNA and we measure the thermal stability because the midpoint where 50% of the duplexes are melted, dissociated into single strand is in a rather complex way related to the thermal stability, to the thermodynamic stability of the base pair. So we synthesize different base pairs. We put them in the same sequence context and we measure the effect on stability that way. The sort of much more important assayer was driven by these assays, the kinetics. In the early days, it was all steady state kinetics. So we take a primer and a template again using phosphoramidite chemistry and incorporate a specific nucleotide in a specific position. The primer would run up, typically run up right before it and then we look at the ability of different DNA polymerases to take triphosphates and incorporate them by extending that primer to incorporate the unnatural nucleoside triphosphate. We refer to that step as synthesis or incorporation because we're actually making the base pair or we're incorporating the triphosphate. The next step, now it's unlike natural synthesis because you now have a primer that terminates with an unnatural nucleotide. And so the next step we refer to as extension and that's the step where you incorporate the next correct nucleotide off the now modified primer terminus. So these two steps, we would evaluate independently. So like I said, we synthesize a piece of DNA that terminated here and then look for its incorporation. And then we would separately synthesize a piece of DNA that terminated as the primer at the unnatural nucleotide itself and look at the incorporation of the next step. And so we would develop, we would drive second order rate constants and actually be able to quantify, be able to analyze the differences between the different analogs. And so by far the strongest SAR that came out of this first generation study was that large aromatic surface area very much facilitates the incorporation step but it makes the extension step very challenging. We were actually, we were able to optimize this but we were never able to optimize that. So in collaboration with Pete Schultz and Dave Wemmer at UC Berkeley, we solved the structure of one of those first generation analogs that I just described. Now this analog, this PICS-PICS pair we refer to as a self pair and I spent a lot of my time in my early days justifying the use of a self pair. If you think that's weird, fine, today our pair, our best pair is hetero pairs but just to give you a context, a historical context and a lot of our early efforts were focused on self pairs and so this self pair happens to represent the sort of most canonical of those first generation analogs that were synthesized very well but then terminated extension. And so if you look at the structure shown here, you can sort of see why. So they don't pair edge to edge. Now maybe you look at these analogs and you expect that but again, remember we didn't design these to pair with each other. These just came out of an empirical study of synthesizing lots of analogs and looking at all the possible pairs and these self pairs are what came out as being sort of the most promising. But they cross-strand and circulate and in our mind's eye we immediately visualized we believed that this explained the SAR that we observed. So during the synthesis, the incorporation step, this triphosphates, if you imagine this mode of binding also during replication in the polymerase active site, maybe this is the template and this is the growing primer. When this triphosphates incorporated, it picks up all this beautiful packing interaction, gets out of water, which makes it happy and so that's what drives that very efficient incorporation step. But to the extent that the template's more locked down in the polymerase active site with more interactions with the protein, and any distortion required to pinch down the duplex to allow that intercalation to happen, the majority of that distortion is gonna be borne by the primer terminus, which mispositions that hydroxyl group for the next step, for the nucleophilic tac on the next incoming triphosphate, which is why the extension step was slow. So with that SAR, we sort of returned to our design strategy and asked the question, well if these analogs, the larger hydrophobic ones were prone to prevent extension, could we design smaller analogs that would not be prone to intercalations and optimize their incorporation step and then have a pair that we might be able to simultaneously optimize both steps. So we again returned to a very sort of medchem empirical approach where we synthesize lots of analogs. Of course, I'm not showing the sugar and the phosphate for those mathematicians in the audience that maybe didn't know that something else was attached there. I was supposed to be a joke, but. In any event, so again, we were systematically examining lots of different analogs. Sorry, okay, I'll try not to, you could maybe close your computer. So again, systematically examining lots of different analogs and systematically putting on different halogens, different fluorine substituents, different methyl substituents and again, driving the program very much based on that empirical SAR. Now the SAR that came out of this second generation analogs is a little more complicated, so let me spend a second to describe it. The single position on the nucleobase that was by far the most important was the position ortho to the glycosidic linkage. So this is the glycosidic linkage, whether it's a C glycoside or an N glycoside. This position was by far the most important. For the insertion step, the synthesis step, where you're inserting the triphosphate against its cognitive base in the template, it doesn't matter whether you're looking at a nucleobase in the triphosphate or in the template, you want that substituent to be hydrophobic. Makes sense because we're trying to drive this packing interaction, this hydrophobic interaction in the first place. Now the extension step, when you're looking at the nucleobase in the template, you still want that ortho substituent to be hydrophobic. But the problem is, you need it when it's in the now primer terminus, you need it to be hydrophilic. And the reason is, and we should have known this from the beginning, is you look at any of the natural nucleobases, they all have an H bond acceptor there at the same position, that ortho position. And if you look at structures between the primer template and the plum races, plum races always donate a hydrogen bond to help orient that primer terminus. So you need to be able to accept that H bond or you're gonna force a desolvation. And so this seemed like a potential physical-chemical contradiction. How could we simultaneously optimize both its hydrophobicity to pack and its ability to accept a hydrogen bond? So at the time, I was fortunate because I had a very talented graduate student in postdoc who simultaneously ran two screens independently of each other, two screens of 3,600 candidates each. And one screen, I'm not gonna go into this too much for time, but one was just a gel-based screen where they looked only at the extension step because that was the rate limiting step for most of our analogs. And the other one was a much more sophisticated plate-based screen where we took two plates and each of them were identical except one got the unnatural triphosphate in addition to the natural triphosphates where the other only got the natural triphosphates and we stained with cybergreen, which the signal depends on the strength of the amount of double-strand DNA present. And so we looked for wells where we looked for plates, we looked for wells on one plate where there was strong signal, but in the sister well on the other plate there was no signal and that assay bakes into it both extension and incorporation and fidelity all into one assay. So from those two independent screen of 3,600 candidate unnatural base pairs, we were pretty excited because both of them identified the same single unnatural base pair. And that's shown here. What we call MMO2, which is the second of a methyl methoxy series that was a second generation analog and this five methyl sulfur isocarpal styro analog which was a member of the first generation analog series. And now notice the nature of the ortho substituent, that contradiction that I mentioned earlier. Sulfur at this position is more polarizable, it's more hydrophobic than is oxygen, but it's still able to accept a hydrogen bond. And an O-methyl group is simply a bond rotation away from accepting an H bond or preventing hydrophobic methyl group for packing. So at least in our mind's eye, we imagined that that was how the unnatural base spread solved that contradiction. This very quickly reinvigorated our design efforts and within months we had identified this naphthalmothoxy derivative as being a better partner for 5,6. Now this was the first pair that we could really PCR amplify with high fidelity and I'll talk about that in a second. Since it's discovery, I'm gonna spend most of my time, the rest of my talk talking about this pair, but since it's discovery, we have found TPT3 as a better analog, as a better partner for NAM. And this pair will come back at the end of my talk to be important, but I'm gonna spend most of the rest of the talk talking about this NAM 5,6 pair. So at this point, we could no longer use steady-state kinetics to drive SAR because the unnatural base pairs are virtually synthesized as fast as an AT base pair. And it's not because they're actually chemically equivalent, it's because they're both rate-limited by product association. It's just a limitation of steady-state kinetics. You only measure your rate-limiting step. And all that that tells us is that we had now increased the efficiency of the chemistry step to the point where it was no longer rate-limiting turnover of starting material into product. So my lab has recently got a rapid stop-flow injection system, so we're gonna be doing pre-steady-state kinetics to get to those numbers specifically, more directly. But even so, that will never be able to be fast enough to drive SAR because it's a rather time-consuming assay. So we developed another assay based on just sequencing. So we would take a PCR reaction, take the Amplicon, and give it to our sequencing facility. And they would take it and put it into a standard Sanger sequencing reaction. Of course, that Sanger sequencing reaction doesn't have our unnatural triphosphates in it. So what you see is an abrupt termination at the position of the unnatural base pair. So if you ratio the intensity, the amplitude of the peaks prior to the unnatural base pair to those after, and you construct calibration curves of known mixtures of DNA that's synthesized with the unnatural base pair and DNA synthesized with an AT replacing it, you can convert that ratio into a percent of unnatural base pair present. You normalize that by the amplification level during a PCR reaction, and that gives you a fidelity. And that's sort of the fidelity that I'll talk about now. So we incorporated the unnatural base pair into a lot of different sequences to look at GC and AT sequences to see if there was a sequence bias. Amplification levels were pretty high. And the fidelities per round of replication were pretty high. Even sequences where there were two right in a row. Now, in this case, there's an approximate here because what we're looking at here in the assay is a drop on a drop. So that just becomes a little bit difficult to actually rigorously characterize that. We even put the, so I should mention these little sequences are, of course, part of 180 more than I'm simply not showing you the rest of the sequence. And what we're doing here is we embedded the unnatural base pair within a sequence where each of the three nucleotides on both sides were randomized. And what we're trying to do here is say, are there any sequences that are particularly prone to lose the unnatural base pair? Because then they would have an amplification advantage and then we would see this erode. And that did not seem to be the case so we were pretty enthusiastic. But in order to examine that a little more carefully, we incorporated the unnatural base pair into a chemically synthesized piece of DNA. We then amplified that piece of DNA. We amplified it 10 to the 24 fold, diluted it out a million fold three times. And then during that PCR cycle of amplification, of course, what happens is some of the unnatural base pairs are lost so you produce a population where they've been replaced with a natural pair and some are retained. In order to differentiate them, we put them through one more round of PCR where one of our analogs is attached to a biotin tag which now of course produces two populations, one of which has a biotin tag which corresponds to the population that retained the unnatural base pair and one which is no longer tagged and corresponds to a population that lost the unnatural base pair. We then can take that population and subject it to aluminum and high throughput sequencing. And so we actually ran the whole thing in parallel to a natural sequence with an AT present. And then every 10 to the three, as I mentioned, we took out an aliquot. And so all of those populations were analyzed with a minimal number of reads of 1.6 million so that statistical analysis is pretty reasonable. And so here's the single nucleotide frequency data. So what this FL-1 is, it's the frequency in the population that had the unnatural base pair relative to the control population that did not have the unnatural base pair, the frequency of an A at these positions relative to NAM. So this is A at the minus one position to the minus 10. This is the FL-1 of A at the position plus one to plus 10 and correspondingly the same for C, G and T. This is the population that retained the unnatural base pair as a function of amplification. This is the population that lost the unnatural base pair as a function of amplification. So the reason we used FL-1, because FL itself, if FL is greater than one, that means you have a bias for that nucleotide at that position. If it's negative, sorry, if it's less than one, you would have a bias against the nucleotide at that position. FL-1 simply makes it visual and that any value that's positive means you have a bias four and any value that's negative means you have a bias against. So clearly, here's our largest bias. You can see as you amplify out to 10 to the 24 fold, immediately five prime to NAM in the template, you see a preference for C. That's the largest single nucleotide bias that we have. Now, of course, a single nucleotide bias isn't enough to tell you everything. Oh, sorry, to gauge you as to how relative, how important that bias was, a C at that position was present at 18.7% of the initial population. Now, of course, it should have been 12.5%, but these are the vagaries of phosphoramidate coupling efficiencies on during chemical synthesis. But nonetheless, at 18.7% only over 10 to the 24 fold amplification only grew up to 24.7%, a pretty small increase. Now, single nucleotide biases aren't enough to give you the whole story because sequence correlations can hide biases. So for example, if you take GCA and T and permute at every one, so GCAT, CATG, and so on, at every position of only those four sequences, every nucleotide would be present at only 25%. So it would look totally unbiased. But of course, it's highly biased because there's only four sequences. So correlations hide biases that are not apparent at the single nucleotide level. So we did a correlation analysis. So this is the population that retained. This is the population that lost in natural base pair. This is just a measure of the correlation. So we're looking now at a plot that maps the sequence against itself. So this peak here, for example, this peak measures the correlation here and here. So what this tells you is that as you amplified, the population that retained the unnatural base pair had a correlation grew in between the minus two and the minus one and the plus one and the plus two. So the flanking dinucleotides, same in the population that lost, there were correlations between the nature of those flanking nucleotides. But those were the only correlations. Now, when we first got this data, we were a little confused because in this population, the correlation seems to grow in and then grow out and then sort of grow back in again. And I'll give you the answer. These correlation values are so small that what we're looking at is random fluctuations, just noise. But nonetheless, what this data tells us that if we're looking for additional sequence biases, we only need to consider the flanking dinucleotides. So if we did an analysis of the dinucleotide frequency, this time plotted as a unit circle, where frl minus one is shown here. So a positive bias for something would correspond to a bubbling out of the data and a bias against would correspond to a collapse off the unit circle. So the only significant bias that we have, well, at least the largest bias we had is right here. So this isn't the population that retained the unnatural base pair and it's a bias for a five prime dinucleotide of CG. Now this largely, I mean, maybe the CG bias is a little larger than a CC and a CT and a CA, but a CG is probably the largest. This largely simply corresponds to this bias for a C at the five prime position. And if you look at the actual numbers again, a CG was present at 2.3% of the population before the amplification and it was only present at 3.5% after a full 10 to the 24 fold amplification. So we actually stated in a paper and the reviewers allowed us to state that this functionally in vitro was a functional third of natural base pair because these sequence biases are actually less than some of those observed amongst natural sequences. So at this point, we had sort of believed that we had demonstrated that we had a fully functional and natural base pair. So maybe now is the chance for us to charge into our Envivo long-term goal of trying to use this as the basis of an organism with an expanded code. But if you, again, returning sort of the medicinal chemistry analogy, if you went to a medicinal chemist and you said, here's a compound that I have, I'd like to start development program. So pull down $20 million from your pharma company and let's start the program. What the medicinal chemist will ask you is, well, what's the target? And you'll say, well, who cares? I've got this great activity. And the medicinal chemist will never be interested. You'll never get interested in pharma with that data. And the reason is because the pharma industry has been fooled too many times by ghosts, sort of things that vanish as you try to track them down. So what they want to know at the beginning of a program is what's the target and can you understand the mechanism of action? Is it a reasonable thing? Is it understandable? So the reason this was an issue for us is because we got a structure of our new base pair, this 5, 6, NAM pair in collaboration with Tammy Dwyer at University of San Diego. Here's an overlay of 10 NMR structures. Here's the average structure. They're intercalated again. So not as much as the first generation analog was, but they're still intercalated a little bit. And this was a real moment for us. We sort of realized our pairs are never going to do this. There's nothing to be had from doing this. They're not getting any H bonds. So the duplex is going to do whatever it has to do to sort of unwind and open a little bit. And allow those nucleobases to pack on each other, which is the only route to stability they have. Now, in principle, that doesn't bother me. But of course, the reason it does bother me is because the question is, why is it? I mean, I just tried to tell you that they're replicated well by different plum races. But everyone knows that plum races evolved to recognize a Watson-Crick pair. This looks like a mismatch. This looks like, if you're familiar with an AA-Zip or motif mispair where the A's interdigitate amongst each other. And everyone knows that plum races evolved to select against them. So are we chasing a ghost? So to tell you a little more carefully what I mean by that, so if you look at a GC or a CG or a TA or an AT, it doesn't matter. They all form the same structure, the very famous Watson-Crick structure. And it doesn't matter whether it's formed in duplex DNA or if it's formed during replication by pairing a triphosphate against a templating nucleobase. They all look like this. They're all planar and they all take on this very canonical Watson-Crick structure. Again, ours doesn't look like that. Ours looks like this intercalated structure that plum races are supposed to select against. So if you think of how a plum race works, a DNA plum race looks like a right hand. Template lays down like this. And when a triphosphate binds, and only when the correct triphosphate binds, it induces a large conformational change of the fingers domain down over the palm and thumb domain. Now that conformational change is supposed to result in a very tight closed complex that rigorously selects for the structure of a Watson-Crick pair. So we had two questions when I approached my friend Andy Marks about collaborating. Andy solves crystal structures, amongst other things. And we decided that we would try to solve structures of plum races synthesizing our natural base pair. And we had a couple of questions. Number one was formation of our hydrophobic pair sufficient to drive that same conformational change of the plum race. And number two, if it was, what is it recognizing? So here are just the key structures. Here's the binary complex of the primer template bound of the plum race. And the only part of the plumber ice I'm showing is the O and the O1 helix. Those are at the base of that fingers domain that I mentioned. And here's in the binary complex. So in the binary complex, NAM, our analog in the template, is flipped out of the developing duplex. Now when we add our triphosphate and solve the structure for the ternary complex, what you see is the our natural analog in the template flips back into the developing duplex, and you get this large conformational change. Now to see that conformational change, I'm overlaying the binary and the ternary structures here. There is that conformational change of the fingers domain. Now to convince you that it's exactly the same as the conformational change induced by a natural base pair, this is an overlay of the ternary complexes synthesizing a GC pair and that synthesizing are a natural base pair. So you see at the secondary level, they're absolutely superimposable. And if you actually look at the side chains and even the bound waters and magnesium ions, they're absolutely superimposable. So the first question, the unnatural base pair triggers that exact same conformational change of the plum race. Now the second question, what's it recognizing? If your eyes are good, you can already see. And if they're not, if they're like mine, then let me help you. That was a good day in my lab. A natural base pair is replicated with an induced fit mechanism. Its formation drives a large conformational change in the plum race. Our unnatural base pair is replicated by a different but only subtly different mechanism, a mutual induced fit mechanism. Its formation drives a large conformational change in the plum race, but that conformational change in the plum race drives a conformational change in the base pair. I don't think that if we would have tried to develop an unnatural base pair based on most other forces that are much more directional, like hydrogen bonding or ionic forces, I don't think it would have had the plasticity to adapt, both the strength and the plasticity to adapt to the plum race active site. So having, I hopefully convinced you that the unnatural base pair is well replicated and that we understand the mechanism, it's not some weird artifact. With that, we were enthusiastic enough to advance to what our long-term goal was not as trying to use the unnatural base pair, trying to deploy it in an organism as a basis for an expanded genetic alphabet and eventually a genetic code. So returning to this image of my gram negative cell, we were immediately confronted with a challenge. And that's this. How do we get our triphosphates in a cell? So the literature, if you look in literature, it'll give you a couple different suggestions. I don't have time to go into any of that. If anyone wants to challenge me on why in order to have a semi-synthetic organism, you'd have to have it be able to import and synthesize the unnatural triphosphates or whatever, we can talk about that later. But none of the strategies worked. And so the strategy for us that finally did work was based on noting some published literature. So what that literature was was the following observation. There are a variety of genetic elements that autonomously replicate. So these genetic elements are the genomes of several intracellular bacteria, some, for example, some chlamydial species, as well as the genomes of several organelles like mitochondria and chloroplasts. And the property they have is this. They autonomously replicate, but they don't encode the machinery of triphosphate synthesis. Instead, like a lot of, I mean, these genomes are bathing in another organism that has all these nutrients already available. And so it's a well-known thing that what those genomes do is they minimize and scavenge. The genomes that did not encode the machinery of triphosphate synthesis instead encode dedicated nucleoside triphosphate transporters and just steal them from their host environment. So we got really excited about that. We thought, well, maybe some of them would be useful for us. And in fact, several were found that actually imported GCA or T, deoxys and ribos. So we got very enthusiastic about that and wrote around the world and requested these genes. And of course, this is the idea that we would express them in bacteria and they would facilitate the uptake of our natural triphosphates. I gotta be honest, this is not the transporter nor these triphosphates. This is just an image I stole off Wiki or the internet someplace. But that's the idea. So put it a little more specifically. Here's the side of plasma membrane and we envisioned that, well, the outer membrane, the unnatural, we imagined the triphosphates would get through because they're hydrophilic. They can diffuse through porans. And then once in the paraplasm, we imagined that one of these transporters expressed from a plasmid might facilitate their uptake. So we got a variety of different of these transporters as I mentioned. And the second from a chlamydia, an algae species called PTNTT2 worked very well when we assayed it with radio-labeled ATP. So it worked in our E. coli cells. And then the question is, would it function to import the unnatural triphosphates? So here's cytoplasmic preps. So we're only looking at cytoplasmic composition and this data results from the addition of P31 labeled DATP. So you can see that this is the amount of triphosphate that gets into the cell. This is the amount of diphosphate present, monophosphate. And then the nucleoside is dark because it's alpha-P-phosphorous labeled. So you just don't see it. So this immediately tells you that the triphosphate's getting in, A is getting in, but it's being decomposed. This is probably just the natural life cycle of triphosphates within a cell. They're being metabolized up. They're being brought back down. But of course what we're really excited is that it also acted to import both the triphosphates of 5, 6 and NAM. Once within the cell you still see decomposition just like we did with A. There's the triphosphate, there's the diphosphate, there's the monophosphate. Now this is an HPLC assay so we can actually see it. We're not the free nucleoside and so you see it does go all the way down to the free nucleoside. And this is for NAM. So a couple things. Number one, it is being decomposed once it's within the cell but it's not being decomposed particularly faster than the natural triphosphates. Number one, and then number two, these triphosphate levels are 30, I think it's 30 to 70 micromolar. And those were steady state levels that persisted for hours after addition of triphosphate to the media. That's just the level where import is being balanced by the degradation. And the important point is is those values are in order of magnitude, those triphosphate concentrations are in order of magnitude above the KM that we measured in vitro for different plum races. So we imagine that this, despite the accumulation of all this other stuff, that these triphosphate levels might be sufficient to support our first shot on goal. So we constructed two plasmids, one we call an accessory plasmid and all right now the accessory plasmid, we're naming it the accessory plasmid because that's the plasmid that we eventually imagined expressing things like orthogonal synthetases and all the accessory machinery. But right now all it has is that transporter that I just described and the information plasmid, what we call the information plasmid because it has an unnatural base pair. Now what the information plasmid is is nothing more than puck 19. Simply where AT, something's got a puck 19 fan here. Simply where the only difference between puck 19 and pimp is a single AT was replaced with an unnatural base pair. Otherwise it's identical to puck 19. Now the unnatural base pair, remember the pair that I told you this NAM TPT3? It's actually replicated a little bit better than 56 NAM, but we hadn't validated it anywhere near the level that we'd validated 56 NAM for things like replication biases and structure. So we were definitely gonna take our first shot in vivo with this, but the plasmid that we constructed, we constructed synthetically with this base pair. Now that'll come back to be important in a few minutes, but just since Pimp has this plasmid, we envisioned that the first round of replication would just immediately replace TPT3 with 56. If these two were the only ones that we supplied to the media. So here's the experiment. Transform E. coli with that Pax plasmid and induce production of the transporter. Then add your unnatural triphosphates and then transform with the plasmid containing the unnatural base pair. Give it a little time and then recover the plasmid and determine what the fate of the unnatural base pair was. So as important controls are, transform with puck 19 instead of the unnatural, the plasmid containing the unnatural base pair, don't induce the transporter or don't add the unnatural triphosphates. This is the first data that we actually got. The graduate student would run this experiment. And so what we're showing right here is isolation of Pimp after 15 hours, which corresponded to 22 doublings of the E. coli and a 10 to the seven fold amplification of Pimp. Just like that trick that I showed you earlier, the graduate student took the plasmid, recovered out of the cells at this point and amplified it by PCR with an analog that was tagged with biotin. And that separated into two populations and we could differentiate them with the streptavidin super shift. And so here's the gel. So here when you have puck 19, so when you have the template that doesn't have the unnatural base pair in it, doesn't matter whether you add the triphosphate or if you induce the transporter, you don't see a shift. That's important because it tells you the unnatural base pair is not randomly inserting through the genome. Now, when you have the plasmid containing the unnatural base pair, but you don't provide, so the transporter is under IPTG control. When you don't induce the transporter, you see no shift. When you do induce the transporter, but you don't provide the triphosphates, you don't see a shift. Only when you have the unnatural base pair in the template, you provide the triphosphates and you provide for their uptake, you see a shift. Now the shift based on calibration curves that were similar to what I showed you earlier was a shift that implied a fidelity that was greater than 99.7% per division. So we were pretty excited about that, but of course we wanted to be rigorous and we wanted a separate independent assay to demonstrate retention of the unnatural base pair. So we just used the same trick I showed you earlier, the sequencing trick, where we took the plasmid, piecer amplified it this time without a biotin tag and then subjected it to Sanger sequencing and there you see the truncation, the abrupt termination at the position of the unnatural base pair. So we could actually go into this chromatogram and read what the sequence was. So of course it was exactly where we put it, it wasn't at some new place. Now we thought we'd clearly unambiguously demonstrated retention within a dividing E. coli cell. So we wrote a paper up, we submitted it and two of three reviewers agreed with me. One didn't and what the reviewer didn't like, there was two steps he didn't like. He didn't like this step and he didn't like this step. He didn't like the PCR steps. He wanted us to take, because he thought we were somehow introducing an artifact. He wanted us to take the plasmid directly out of cells and analyze it. So I actually got these comments back from the journal, I think it was on the 21st of December and I freaked out a little bit because to me that screamed mass spec and I don't really, we don't do mass spec in my lab. So I shot out this like email to this group at NEB and the reason I shot the email is because this group had been developing an LC-MS-MS method that they had just published a couple papers on where they were able to demonstrate the presence of an epigenetic modification in a plasmid to the sensitivity of one methyl group in a plasmid. So I thought, well, if they can do that, they can probably analyze the retention of our unnatural base pair by the same technique. So I contacted them and so I contacted them, I think it was on the 23rd, I think it was the 28th, I got an email back from them, from Ivan Corona and he said, okay, we bought your unnatural triphosphates, they're commercially available and here's a work plan and we're gonna be done in two weeks. And so anyone that's run an academic group and tried to manage collaborations, you know how hard this can be, I was like, okay, that's good, maybe. And they weren't able to do it. It took them about five weeks. I can't say enough for this collaboration, it was one of the most enjoyable collaborations that I've ever participated and I owe them a huge, a lot. Here's the data, so here's the LC-MS-MS trace, so here's DC content, G content, A content, T's down here because it doesn't fly well in their mass spec that was known. And so what you see is all the same control experiments, they're all on top of each other and out here in the trace is this little peak for 5.6. Now that peak amplitude corresponds to just what you'd expect for one per plasmid and importantly, it's the mass of 5.6, not TPT3. The only spot that 5.6 was provided was as a triphosphate to the media. This is unambiguous evidence for replication of the unnatural base pair within the cell. The third reviewer even bought it. So this actually is a picture of the organism. So since the last common ancestor of all life on earth, it's been four letters, two pairs. This organism is stably, healthily growing, happily growing while maintaining six letters and three pairs. So we are continuing to optimize all aspects of the system. We hadn't synthesized a lot of analogs. So we synthesized a lot more of these, turns out these are NAM analogs. And then we ran another screen. This time we ran a screen of 7,000 candidates because we had more. We actually identified a family of eight most represented, for example, by this pair I already showed you, this NAM TPT3. But all eight of these analogs are in vitro replicated better than five, six NAM. Which I hope I just convinced you is good enough to replicate in the cell. So this again is like a MedCAM principle. If you have one molecule and you can't modify it and you go into a MedCAM program, most molecules drop out of development, not because of affinity, but because of PK, pharmacokinetics. They have toxicity, solubility problems, off target activity, whatever. It's hugely advantageous to have a panel, a set of molecules with different physical chemical properties in order to aid development. So we're excited to have the flexibility of having all of these analogs with slightly different physical chemical properties. Now the unnatural triphosphates don't make the cell sick. But expression of the transporter does. So you're forcing cell, and this is not uncommon for membrane proteins. Because you're forcing the membrane to accommodate all of these transporters. And so that's why we put the transporter on a high copy plasmid, because we thought there was selection pressure against retaining it, and any mutation that came up that were a single copy that would delete it, suddenly the cells would go better, they would dominate the population. And right in the middle of an experiment we'd lose the ability to import the unnatural triphosphate. So a graduate student joined my group and asked the question, could we make, could we drive a mutant of the transporter that was not toxic? And he actually was able to successfully do that because it's no longer toxic. We took it off the plasmid and integrated it into the chromosome. We optimized a whole bunch of different constitutive transporters, the one that was best was this PLAC UV5. It imports beautifully, here are the growth rates. So this is just an amplitude difference. It just comes from the initial inoculation. The slope is what the growth rate is. So here you see the initial transporter system that we developed, and you see this plateauing here is the toxicity that I was referring to. This slope now shows that this bacterial cell bearing that single mutant chromosomal located loci transporter is growing with an identical rate and it imports identically. We've not done this experiment many different times. And what's exciting about this is that it takes one sort of moving part off the table. We don't have to add something to induce the transporters' production. We have our cell line now is always competent to take up the unnatural triphosphates. We can just take it out of the freezer and run an experiment. We don't have to transform with that first plasmid. We don't have to worry about timing and adding the transporter and giving enough time. So we're excited about that kind of optimization and this is the kind of optimization that we anticipate applying to all facets of the semi-synthetic organism. Okay, so in the last second, how much time do I have? Two minutes. So all I showed you was retention. The question I always get is, well, what about the next step? What about retrieval? So we put the unnatural base pair in this time instead of avoiding a gene, we put it in the middle of superfolder GFP behind a canonical T7 promoter and in front of a canonical factor independent termination sequence. And this time the experiment is to propagate the unnatural base pair within this plasmid, now to import both four triphosphates now, the deoxys and the ribos of five, six, or whatever in the unnatural pair of the unnatural triphosphates, and then induce transcription with IPTG to induce the transporter and then collect the RNA and then analyze it. So understand what the unnatural base pair has to do to survive this assay. It has to stably replicate the transporter has to bring in both all four triphosphates. It has to survive transcription into message. We then lyse the cells, it has to survive notoriously error-prone reverse transcription back into DNA and then survive PCR amplification, right there only where you expect it to be. So we've now transcribed lots of different messages, surprisingly, maybe not surprisingly, tRNAs transcribe better. They're probably structured to prevent row-dependent termination, but now we can stably and efficiently transcribe, and so the next step is to begin to combine those two to try to look at decoding at the ribosome. So with that, it's only to thank my group. So the students who've worked on the project, Michael Ledbetter and York and Aaron, where's Aaron? Our graduate students and postdocs Brian and Ailey. And then I mentioned these collaborations and again, the NEB group for collaborating and helping us out when we need it the most. And these agencies for funding, and I thank you for your attention.