 I actually picked up my copy of the history of the biochemistry department to learn a little bit more about him. And this was a chapter written by Hector DeLuca, which he describes the discovery of the production of high-density and water and milk by radiating these materials with UV. And at the time, so this led to patents, and the university's stance on patents was that they were under the sole ownership of the inventory. That'd be really nice, but they're still true. But so apparently, Professor Steenbach deposited the royalties of this money in an account and kept it crewing until he found a mechanism that could be used that would be suitably used to reinvesting this money. And this led, as many of us know, to the discovery of work. And in which Professor Steenbach was instrumental. Well, what's interesting was that Professor Steenbach assigned all these patents to work. And then those patents generated quite a bit of income from the university, 15% of which was returned to Professor Steenbach. And after some period of time, he refused to accept any additional amounts of this money was used to create personal funds that led to the establishment of two really large additions onto the biochemistry building. It endowed several Steenbach professorships. And among other things, started a symposium fund, which we're here today to celebrate. And I can think of no one's work that better kind of holds true to what's Harry Steenbach did than Professor Arnold, who really blends fundamental science and applied science. I'll just read this. Professor Arnold is currently the dean of Barbara Dickinson, professor of chemical engineering and biochemistry at Caltech. She has a really interesting background that included some time at the Solar Energy Research Institute of Bolton, Colorado, which I think many of us are really interested in. Thinking about biohubriels, the only postdoc at Berkeley in Caltech, she joined the faculty at Caltech and has many honors of which I'll just name two. She's a member of the National Academy of Engineering and the Institute of Medicine of the National Academy. She served on some very interesting boards, including the Santa Fe Institute, which is a really neat place, and several corporations. At Caltech, her research is focused on understanding the principles of biological design, and in that context, she has used evolution to create and select new proteins, metabolic pathways, and cells themselves. And these materials have applications, biotechnology, medicine, and energy for biofuels. And today, she's gonna tell us about her work on involving P450, which is fantastically interesting science, and I'd just like to remind you, in case you run out a little early, after today's talk, we'll have a banquet in her honor in biochemistry in the atrium, to which everyone's invited. Professor Arnold. Thank you very much. I'm delighted to be here and to share some stories of laboratory evolution. It's my favorite design algorithm. It's employed me for the last 20 years and probably will keep me busy for the next 20, and it works at all scales. Everything in this room of any real complexity is a product of this wonderfully simple design algorithm that even a chemical engineer can apply. And I get to apply it to some pretty fascinating complicated biological systems, and I'll be telling you today and actually tomorrow about my favorite model system, which is a real, real insight on cytochrome P450. I'd have to warn you, I've been practicing biochemistry without a complete license. I did a couple of postdocs in biochemistry labs, but I've not been trained entirely in biochemistry. I've been an engineer by training, and so my language may come a little different to you, and I hope you'll bear with me on that. I'm gonna tell you about a synthetic organic chemist's dream. Actually, I did this talk a lot, I talked about the system a lot to organic chemists, because here's an enzyme that can do some chemistry that would make your tongue hang out if you were trying to activate CH bonds. This guy can take non-activated CH bonds, mostly hydrophobic small molecules, and insert an oxygen from a dioxygen into that bond, and do so at room temperature, atmospheric pressure, and ate the solution. And it's really such a great model, of course, and it shows up in all the textbooks of what biology can do, but no one would touch it with a tenth of a pole if they're actually going to do this reaction. So, yes, we can admire the ability of this complex system to do it, but we don't actually do it. Of course, the stoichiometry of the reaction requires some delivery of reducing ophthalates from NADPH, which, of course, costs about 1,000 times per mole, more than almost anything you wanna make with it, so you have to do a little bit of cofactor regeneration and other things to make such a reaction go forward, not to mention, in fact, most of these pre-450s are membrane-associated and pretty much give up the ghost any time you go and try to purify them and characterize them, so they're a real biochemists dream. You can spend a lot of time doing that. So I'm telling you about the laboratory evolution of these catalysts, with the goal of making catalysts that one could use in ordinary synthesis, but also with the goal of trying to understand where all these remarkable capabilities came from. If you go out into nature and look at the some straight ranges of these enzymes and where they are, one thing that's remarkable is that just about everybody's got one. Just about everybody except for E. coli, which of course makes it fun to work with E. coli. I don't have a lot of background reactions. And just about everybody's got one. You've got about 57 of them. Plants can have hundreds of cytochrome pre-450s. They're involved in the first line of defense in the erbiotic breakdown. They're involved in various biosynthetic pathways and chemical products. And if you go and look at the range of substrates that'll accept for this oxidation chemistry so that it's enormously diverse, tends to be very hydrophobic substrates, but you'll see everything from very complex structures down to very medium chain alkane. So somebody a long time ago figured out how to do these reactions. And we look at this bailing of enzymes that exist today. Go to this database, this analysis database has more than 7,000 known sequences that number grows exponentially. So last year I did this talk, it was 4,000 out of 7,000. And next year it'll be many more. One thing we can see right away is that somebody figured out how to do this chemistry along time and go. And through the process of mutation, natural selection, elaborated on this such that over millions of years, the 7,000 sequences that you could scrape off the bottom of your shoe or extract from the plants or wherever today have diverged all over place to catalyze these reactions. And it's quite remarkable if you think about it. They also diverged very much in their sequences so that you can have things that are 85% different and yet have essentially the same three-dimensional structure and catalyzed reactions on a very broad range of things. Of course, this is extremely frustrating. It's remarkable to think that all these different sequences can fold into the same structure, but it's very frustrating to those who would go to these natural solutions to the problem of CHR and activation and try to discern what's the sequence function relationship. So one thing we know about these sequences is that there's a very large neutral component to evolution. There's a lot of random change being made that has essentially no effect on a function or very small effects on a function. And embedded within this 85% change in the sequence of these 300 plus amino acid substitutions might be three or four that are contributing to the functional changes. And it's your problem as biochemists to pull those three or four or 24 out of the 344 and trying to understand why one sequence has its particular function that another does not. So these products of evolution are beautiful, inspiring, but extremely frustrating to reverse engineer. For that reason, I chose one particular family member to play around with at the laboratory and a lot of people asked me, how do you choose a starting point? Well, I took advice from scientists at Nova Zan some years ago, they said, Francis, the last thing we want is an enzyme that you can evolve all over the place, which you can't make in train load quantity. You have to be able to make enough of this stuff to do something with, to understand so that you can do the crystallography so you can sell it, you name it. And so the reason that I chose to work with cytochrome P450 from the Silas Mediterranean is that you can scrape this organ, some of these are soil organism, off the bottom of your shoe, and you can put this genetic E. coli and make God want you to study. It's also, so it's well expressed in E. coli. It's a soluble version. It's sitting in the cytochrome class. It's not in the right position, like almost all the eukaryotic, like all the eukaryotic cytochrome P450s. Yet it's a P450 that shows a considerable structural homology to eukaryotic P450s. Even better, it's highly active. On its preferred substrates, this has a rate of up to 10,000 times the rates that the eukaryotic enzymes exhibit on their substrates. So we're starting with something that's good, that you can name in the plots of up to 12% of the cell protein will be the P450. And furthermore, unlike most of the eukaryotic, in fact, all of the eukaryotic enzymes which are multi-component systems, this has all catalytic machinery it requires to carry out these reactions encoded on the single-coli peptide chain. So the redactase domain, which applies three cofactors, is used to the heath domain and everything's there together in the single-coli peptide chain. So the question, oh, so what does this do? Of course, there's gotta be some downside of things. What it does is it hydroxylates fatty acids in subterminal positions. The action of wild interest unless you're making new detergents, of course that can be of some interest. And it doesn't show particular selectivity in its hydroxylation. So I think, for example, insert oxygen in any of the subterminal positions all the way up to about N minus five. And it usually inserts only one of those conversions. So here's the question. We have existing today in this family of thousands of known sequences and of course there are many, many thousands more that haven't yet been discovered. And I've identified people at 50 PM3, which hydroxylates fatty acid, yet I know that all these others are possible in fact there are many more. Could we use laboratory evolution methods, if these are powerful design methods, to create the three versions that exhibit any of these known activities? Can we start with a specialized enzyme today and create the functionally equivalent of all the other people? This is an interesting question because I can do my experiments in such a way that I'm making only adaptive mutations. And therefore I can ask, well what is the minimal number of evolutionary steps that it takes to go from one function to the other? And what are the kinds of mutations that will confer function to this? Are they occurring in the active site in the substrate binding pocket? Are they occurring somewhere else in the protein? I can actually follow the entire fossil record of an evolutionary experiment in the laboratory and see what it takes to achieve these new properties. Very interesting. I can go on and on, at least where we've discovered in-natural systems and ask, are non-natural activities accessible to this protein? And I'll give you an example of this. It's quite interesting to be able to probe the limits of the catalysis separate from what biology cares about because if we're biocanus and studying the natural world we're limited only to the products that make the natural selection. But there's a whole world of other products out there that might be the physically possible things. It's just they were thrown away because they're not particularly relevant biologically. I'll come back to this in a minute. Now one approach to making such proteins from the entry would be to sit down and try to design that. And everybody says, a test of my knowledge is that if I can build it, I understand it, right? So they tell you, if I've learned enough about protein folding and sequence function relationships I should be able to take a structure. The gene domain structure is known and identify the amino acid substitution that will convert it into some other function. That's hard to do. And I'm not going to go into any arguments although it's a wonderful debate that I often have with my colleagues in Caltech. I'll just tell you that in an enzyme like this details really not, if I call it enzymes, details really not. Catalysis, if you look at the energetics of the improving of, how does this tend to fold? We're taking only a few kilo calories per mole equivalent of a couple of hydrogen bonds out of hundreds that might be forming. And you have to be able to do your predictions of that level of detail in order to be able to design the catalysts. We simply don't understand those details. And luckily for us, we have an algorithm that circumvents this profound ignorance and allows us to do designs in the same way. But before I go in and tell you how I can evolve novel functions, I'd like to also remind you that evolution is not easy either. There's plenty of ways to make mutations for protein that give you nothing at all. Nothing of interest. A protein is less functional than what you started with. And I think just to give you a feeling of what the challenges are, it's useful to think of all the ways that you can string together the amino acids that make up a cytochrome P450 Hing domain. It's 450 amino acids long and in 20 letters in your alphabet, they've got a space of possible sequences that's not even remotely physically possible. This is a space that's so large that then it calls it very much more than astronomically. In fact, it's very much more than economically, which is the biggest numbers that humans can conceive of. It's very much bigger than any of those numbers, bigger than the number of particles in the universe by many four different words, magnitude. So this is a space of possibilities that nature has explored by the tiniest fraction of. And I can tell you I would explore it myself and even tiniest fraction of that, an infinitesimal fraction. Yet I will categorically tell you that most of it's empty. Most of those sequences don't encode for anything that even folds into a three dimensional structure much less a new cytochrome P450. And therefore it's a pretty hard place to be doing a physical optimization. Just give you some of the numbers and arguments. Experimental studies have gone in estimating what's the density of functional proteins in this sequence space. It's going from something like one in 10 and 12. Jack shows that pulling binding proteins out of random amino acid polypeptides of 90 amino acids to dug-axes in carbon-fated lacrysis in the creation of cytophings one in 10 and 70. So it doesn't really matter, you know, my intent is 12 is pretty small and you're trying to find the NEOs and K-sector antennas. 77 is pretty small, but all smalls of these things obviously did not come about why selection for random libraries. But of course that leaves us with a question of where did natural proteins come from? Had it obviously came from smaller things being built up, but that's under this lecture as well. And it should make us wonder, well, you know, if functional proteins are so rare in this space, how on earth do they evolve? Because one fact of life is that most mutations indeed are deleterious. If you go and make random amino acid substitutions for proteins and you ask the second question, what fraction of those retainability fold in function? So that's our new function, we just stay folded because we think that that's a prerequisite for function. Then that fraction changes exponentially down with amino acid substitutions, such that even after a very small fraction of amino acid substitutions, five out of four out of 50, maybe one in a hundred or even fewer would still fold in function. And then such a smaller fraction of those will acquire new functions, so it looks pretty ugly. A lot of biochemists and chemists and engineers will think, oh gee, proteins are fragile, right? So I can't make amino acid substitutions for them. But obviously that's not true because we are the products of evolution. We are the products of the process that has gone through the evolution. And really what the number to get from this is not that it goes down exponentially, but that there's a really significant number that are neutral. If this is the fraction that are neutral and we have an exponential function with m being the number of amino acid changes, we're talking of a fraction on the order of a half. And when you think of all the ways that you can make a single amino acid substitution to a protein that's 450 amino acid long, that's 19, that's 45, you're almost 9,000 ways, then a half times 9,000 still gives you many thousands of ways to make a mutation to a protein and still have a folding function. So it's this very large neutral fraction that allows us to move forward even if we're requiring new capabilities. John Maynard Smith in 1970 published a single 1.5 page long paper in Nature that I would most like to have written myself. And basically he argued that I believe we are the products of evolution and evolution has a certain nature such that for evolution to occur, I can tell you something about the distribution of functional proteins in sequence space. And in fact, the distribution has to be such that each protein is surrounded by one mutant neighbor and there has to be at least one next to it for evolution to continue its walk through space. So if we think of sequence space as being all proteins, functional proteins, and it's one mutant neighbor, so now they're ordered in this space with one mutation separating them, then mutation in natural selection diffuses walks one or two through this space finding new functional proteins nearby. And after a long period of time, they have diffused across this network so that the 7,000 cypro-p450s in 2008 will be related to one another eventually through all these single-mutational walks. And this tells us something that functional proteins today are embedded within networks highly dense networks compared to the rest of sequence space of functional proteins. So it doesn't matter where there's one in 10 to the 77 of functional proteins in a random space. What matters is that the products of evolution are sitting in a very functionally dense part of this space because that means that I can walk on this network too. All right, so go back here. See what it creates like because that tells you what I actually do at the top. So that means that I can actually use this idea and implement it in the laboratory through a very simple algorithm. If functional proteins are surrounded by other functional proteins through unit-mutational walks, then I should be able to take N18 from 2008 incorporate a few, a small number of random mutations into it, maybe one or two amino acid substitutions per gene. I could make lots of those in a test tube. Put those back into cells. They do the hard thing of translating that DNA into protein. And then, in front of cells, I should be able to screen for the properties of interest or selective. I'm smart enough to be able to tie that to survival and reproduction. So even if I do a simple screen, then of course the Murphy's law of evolution, so that's the majority don't do what you want. But if you ask the right kind of question where unit-mutational walks get rise to incremental changes in function, then yes, there will be some small subset in the order of 10 or five. And I can take those and repeat the process in a ranking on this, and essentially doing a very simple, stupid optimization, a random uphill walk in this fitness plan of state. And at the end of the process, either the student wants to graduate or you fulfill the milestones for your funding or you actually solve the problem, then the gene will encode a protein that gives the properties that you're interested in. This is a remarkably robust strategy, and I'll give you a couple of examples of the kinds of problems that you can solve with it. All right, of course, as opposed to that diffusing across the network for millions of years in the psychopathy for 50s, my walks in the sequence space because our patients is limited are very brief. Maybe we can do five, 10 generations. I'll tell you experiments of 23 generations, but these are very brief and they're very directed as well. This is by any means evolution, as we normally think about it. This is selective breeding, selective breeding of molecules in an ivory tree. And I decide who lives and who dies, and I decide what properties I want to have these molecules acquire. So what possible interesting function would be accessible by such a simple algorithm? Well, here we can use chemical information. If it were limited to a single mutation in every generation, and I have to have a measurable increase in oxygen, then we would imagine that some sorts of things that might be accessible to the M3 starting with the fatty acid hydroxylase activity, perhaps I could turn it into a alkane hydroxylase. And those would be structurally close so that a single mutation would get a measurable improvement in function. And then if I actually wanted to do something really interesting, such as confer an activity not exuded by the parental enzyme, then maybe I'd have to divide that problem into a series of steps, each of which could be solved by unidivitational rocks. So perhaps Isaac Kettis could say, okay, if I, let's say I wanted to do something crazy, and maybe people would say it works on methane and third oxygen into methane, where I'm starting with the fatty acid, then I can go through a series of intermediate steps, each of which can be solved by unidivitational rocks. Of course, you might come along and say, well, that's really stupid because it's not just the size of the substrate that matters, now you're actually getting into chemical difficulties here because we're also dealing with significant changes in the CH1 strength as we go to something like methane. So we're going from these internal methylene CHs all the way to about 105 kPa per mole for methane. So we're dealing with substrates that maybe people with 50s couldn't handle. And in fact, if you go to nature, you would find that there is no known natural occurring cycle of people with 50 that would take or is known to take gaseous alkenes or propane ethane or methane. Some people might interpret that. You see, I haven't been able to bring my P450 work on propane and I know of no propane oxidase that has a P450. Therefore, it cannot be done. That's a common response to such things in the literature. But I think that's going to be an interesting question because just because we have to get found with P450, in fact, that probably seems to be solved by a whole other class of enzymes that are not coming from oxygenases. While it might be tempting to say it cannot be done, maybe it's that nature never really had a reason to do it. So this nature solution to oxidizing ethane propane or diurnal enzymes that have no evolution or more mechanistic relationship to the psychrome people of this. This problem is solved by a very different and very complex family which I won't go into. But it's another family you don't want to touch with it and go, oh, isn't it very difficult to work with? And very difficult to engineer. So this is where I get into this question about what's fun about doing evolution in the laboratory is that, yes, if you want, you can limit yourself to the molecules that exist today or perhaps existed in the past because those are biologically relevant. But there's a whole other space of molecules out here that are physically possible at any time. That's a lot of places. It might be the cancer or the energy crisis out there. And they're just not exhibited in nature because they don't solve a biological problem. But it would solve my methane oxidation problem if I were able to solve that. Nature chose a different solution apparently. But it would be nice to know whether if P415 propane hydroxylase or ethane or methane hydroxylase is a physically possible molecule. If it is, then that brings another question. Why is this nature apparent in Q of the MMO? Okay, my part suggests we haven't looked very hard with gaseous substrates. A lot of microbiologists don't really like working with gaseous alkanes and perhaps there's not a big effort in screening P450s for these functions, but they exist. All right, so how do you actually do this? You take both fashion analytical chemistry and scale it down to 96 well plates and Matt Peters of chemist in my group did a wonderful job with that and came up with high throughput screens that give you enough information on whether something will take a substrate the size of octane or propane. Of course the CH5 strength is very different in these but at least it gave us a handle on the substrates. Well by the way, the reason you can't strain on octane or propane is it's really hard to measure octanol and propanol in a complex for a mutation problem. You don't want to purify your proteins, you don't want to do a lot of expensive steps because that gives you a lot of noise in your screening. You just want to take the lysate, add a substrate and get a nice color change. So if you take one of these methyl ethers, say di-methyl ether, then formaldehyde is the product of the P450 reaction and formaldehyde in purple reagent will form a nice purple color and light up with that reaction goes forward. Then you can take the product, the positive clones that are glued on di-methyl ether, test them on hydrogen gas chromatography and propane. So instead of the 3,000, maybe 3,000 here and then you can do 10 on gas chromatography and that's what we did and it used increase in the total turn number number that is how much product you actually make with propanol and that would be your sole criterion for accepting mutation. So I will make a long story short, this project started back in the 1990s with funding from the British Petroleum and they lost patience one way or the other to probably end this when we keep going with it. After, so with the white-type enzyme, it has a little bit of activity on octane and that's always a first principle of doing improvement by evolution. You really want to have a little bit of an activity to start with so that you're not just praying that the result will happen and actually have some belief that the mutation will give an improvement. So after five rounds of, and there's no active in propane, after five rounds of this evolution process we obtained the first decent alkane hydroxylase and it had improved activity on octane and along with that came some promiscuous activity. Once it had learned how to have reactivity on octane it also accepted a little bit of propane and made propanol and of course that was nice because we could then screen on that either or propane activity and when you screen this propane activity you can get large improvements of activity on propane and along with those large improvements in propane activity also come further improvements in activity on octane. So now we're up to five or six thousand turnovers, total turnovers and this is a manifestation of the most important law of direct evolution which is you get what you screen for. Good thing stopped, it was really not fun and I thought of many, you know, the students who had the project really didn't have the skills to do the screening now, we were chairing our hair up, just cut them for improvements and this often happens with these experiments. Why would things stop? Well, why would it stop? Because for instance, Cyclone P450 really does not want to do any better hydroxylating propane, certainly not ethnic, that's possible. Maybe that property is not chemically accessible by, it's not chemically possible, but the other thing is that if it's not accessible by this, it might be chemically possible, but it's just not accessible. And what turned out to be the issue was a really simple thing. The reason it wasn't accessible by an indentation of law is that you get what you screen for. We have been screening just for improvements in total turnover now. Does it oxidize more propane to propane? But at the same time, the stability of the enzyme, as it had accepted these mutations and improved activity, it had been losing what little firm stability it had to begin with. In for quick, these are notoriously unstable. And this wild events, I've gone down to where if you're growing the San 37, basically hardly have any functionally for 50 left. So that's just nothing to work with. And this led us to something we had understood and people would have understood the long time ago. I'll tell you in a really simple way. Jesse Bloom did a really nice analysis of this problem a few years ago, where he argued that if you look at all the ways that you can make a mutation, even those 9,000 single mutations that you make, and you look at the distribution and the effects of those mutations on the free energy stabilization of the protein. Some of the mutations, a few, are going to make it more stable. Most of the mutations make it less stable, right? The mutations are deleterious. And so if you're starting with a marginally stable protein, and it has to have some critical threshold in order to be functional, yet it's so you wouldn't be doing it, then most of the mutations that you can make are going to push it over the threshold and you're not even getting a protein to begin with. So what's the solution? The simple solution is just to move this distribution over. It stabilizes the darn thing so that this whole block of mutations that was formerly inaccessible, because it would just knock the protein out and become inaccessible, right? So more stable proteins are more evolvable. They can accept more mutations and they have more opportunities to acquire the new functions. This really works. If you go and stabilize that one that was giving us all the difficulties, 35 to 11, so you go add a couple of stable mutations to that. We had some in our pocket which you can find in our Directed Evolution. Then it was very easy to move forward so that with a few more generations in Directed Evolution and now all hell broke loose because people got one entire brand new genesis and started trying every method that you can think of, identifying mutations in the active side, saturating those residues, and so there was a lot of non-random stuff going on that I won't make the story confusing, but they were all basically single germicense institutions able to increase the total turnover up to about 20,000 now. That would be like the thing about another aspect of this enzyme which is that the hindermain is but a small piece of it. The hindermain which I showed during the rain is attached to this big reductase domain which has a lot of work to do because it's got to deliver electrons one by one to the heme through a nice shuttle of cofactors. In fact, I mean this is really fun machine to study. F and N actually physically moves down to deliver the electrons to the heme. The bottom line though as I don't arrive with the catalytic cycle as fascinating as it is is that when you start adding one million substrates to this enzyme or you make a new axis of institutions, this wonderfully fine-tuning electron delivery mechanism becomes uncoupled so that it becomes a very good NADPH oxbeys. We can just chew up your expensive NADPH without any product formation. So they become uncoupled to one another. And of course then highly active oxygen species get made at the heme and you get hand destruction after a few turnovers. So in fact, if you screen for total turnover number, you're trying to hold on to the coupling that was one reason that we chose total turnover number as our screening criterion. But if we haven't got this whole thing tuned and we've only been mutating the heme domain, then we're never going to get increases in total turnover. One thing I will note though as just by screening in the heme domain and screening for total turnover it caused the coupling to improve on propane from essentially very small numbers in the first propane hydroxylase all the way up to 90%. So we didn't screen directly for coupling but it came along as a result of having a highly efficient catalyst. But really I'd want to go and push it all the way to the limit. Can we make a P450? Not that it's good in propane hydroxylase but that as is good on propane as the wild type enzyme is on its preferred fatty acids in some states. So fully coupled, essentially 100%, 98% and very high total turnover numbers. So the argument that in fact we probably should be evolving the rest of the machine. It's a highly modular system. So obviously we were able to go a long ways just evolving this one part but it's also a coupled machine. So while the evolution in the heme domain was going on we were doing parallel evolution we did parallel evolution on the epimene and epidemine regions in the protein using improved team domains at the same time and then every result of doing these parallel evolutions and then re-combining the optimized domains was a protein called P450 PMO. Rudy Poisson did this work over the last couple of years. P450 PMO is a beauty, good propane hydroxylase. It has as many turnovers as you can measure in a small assay and it goes for a much longer than 45,000 in a whole cell system. It's a initial rate of reaction, 300 per minute is as good as any alkane hydroxylase out there and it is 98% coupled with cofactor consumption to product formation. So we've got a further twofold increase from the best heme domain to P450 PMO. And it's interesting to compare the rates of the catalytic rates, this ends on to the first propane hydroxylase. Remember, wildcat has no measurable activity in propane so we have to look at one of the evolutionary intermediates. K-pattern or K-advers, 8,000 fold improved KM has been reduced to 180 fold and in fact is comparable to the KM of wildcat EM3 or more like one of its components of the state so that's about 300 micromolar. K-pattern is also comparable. So it's a, for P450 it's a good P450 and just to show those numbers again, for EM3 it's very good on low rate but not manageable on propane. PMO, very good on propane and in fact has lost its activity on low rate. So now there's been almost a 10 to the 10th fold change in the specificity of this enzyme over this process. Great thing about the Michelangeloid, you've got the whole fossil record. So I've got all these intermediates in the refrigerator and if I've got somebody who's gone to the where with all, they'll go back and look at what happened during this process so now I get to share that story with you and this has someone get impolished. Rudy Pozzan took the wildcat enzyme and looked at its activity on alkanes of everything from that thing to C10 and showed that, and here what I'm doing is I'm normalizing the activity with respect to the alkane on which it has the highest activity. So 100% just means that of all the alkanes EM3 likes decane the most and then it has very, very low activity, no measurable activity on propane for example, very low on alkane but that's where we started the process. And as you go to the first alkane hydroxylase, the 1393, you can see that it now has preference for the medium chain alkane little less for decane and just barely measurable on C3. And then as you go further in the process, you can see what's happening as you go and accumulate these mutations it's broadening out, this is the one that's very unstable but it accepts just about everything and then accumulating more mutations that is improving the total turnover of the number on propane is starting to push the specificity over to the small alkanes so that right now at 11.3 there's no longer any measurable activity on decane. And then further going on, you can see that it's pushing it down, pushing it down such that in P450 PMO, this enzyme is now essentially re-specialized. It has no measurable activity of lowering, no fatty acid but it even has no measurable activity on the longer alkane. So now just by pushing on the activity of propane, it is now hardly taking to do decane, certainly not pentane on a first society has no activity on that name. So it's gone through this series of promiscuous energy and things that have much broader specificity than the wild type and then gone to propane. And furthermore, this was all done under positive selection on wrong. I think this is an important principle. A lot of people think that you need to have negative selection on top of positive selection in order to re-specialize something. But this shows that if you don't wait long enough, that's really true. But if you keep pushing it, at least with the P450 to have activity of propane, that specialization came along as accompanying the process. So I might enjoy seeing further mutations line. There's 23 total mutations. If you are 50, 20 are in the gene domain, the PMO 20 are in the gene domain, they're distributed a lot throughout the protein. There's a lot in the active site because we started getting inpatient and pushing mutations into the active site. So a lot of these were in the, out here came in the early generations. We know that they're functional because when they remove, they remove the activity towards propane. But we really have very little idea of what the outside mutations are doing. But I'll show you this cool little movie that Tom was made from the crystal structures of being free with its substrate bound, which is a fatty acid substrate, substrate bound, and then in the open and conformation with no substrate. And as I say, this is just the cartoon with the crystal structures of these two confirmations. But you can see, there's a significant conformation of genes in order to find the substrate and get rid of water from the active site. A lot of those mutations that I showed you were sitting in this FG healing region that has to close down the open thing. So there's quite a bit of, I mean, it's very important what the sequence of this region is. Now, this shouldn't scare anybody and he wouldn't be designers because these are the very good demonstration of detailed map and we don't understand the details. We would have very difficult time identifying which mutations here would affect this in the way that we're allowed to accept protein. Also, here's some interesting analysis. Comparing wild type, we do have a crystal structure of one of the intermediates, Tom Lewis's lab did the crystal structure of 1393. And that has almost half of the mutations in PMO. So we're quite a bit along the other visionary line. From that crystal structure, we can make a good model of people 50 PMO. And I'll just show you the change in the volume of the sun's very binding site as we go from wild type towards PMO. Let's get, this is a model but we think it's a pretty good one. As the promiscuity is, it comes more periscuous. We see maybe a slight increase in the volume of the site but a distinct compartmentalization and compaction of that. In fact, it's quite interesting. There's a glutamic acid that's placed in here along with the phenylalanine that seems to cut off part of the binding pocket. Remember, we're going after propane now. So it seems to have come up with a nice solution of actually dividing the binding pocket in two so that it has become significantly smaller. All right, so what have we done? We've now invaded the functional territory that was previously plot to belong only to members of this MMO-like family of protein monoxygenases. It's clear that a P450 does this and does it very well. At least the initial weights are fully comparable to natural insights working on these small substrates. Furthermore, along with this activity on propane, comes a little communist activity on ethane. So we can start this story all over again. 10 years isn't long enough but for the catalyst in the audience here, I've got a catalyst that does the direct air oxidation of ethane ethanol. 2700 turnover numbers. Air, original ground. This is the best ethane hydroxylation catalyst that anyone has ever reported. And I couldn't get it published. Everybody said, everybody knows that enzymes can do that. That's not that interesting. So it's not fair. This is a really good ethane hydroxylase. But, and it's fully good enough to get it to do it directly in evolution, directly on ethane. So the next story that I'm going to tell you next year, if I come back, would be how you can make this ethane hydroxylase just as good on ethane as the wild pet is on, or as its parent is on probate. And you have to do that by actually not putting it under there is no ethane surrogate. You actually have the screen for ethanol actually. And that's really hard in the fermentation process, let me tell you. But you can do that directly in these two little cynics 96-well reactors and we've determined some nice color ventric acids for ethanol. So that really works. But I will tell you, we're well on the way to ethane because if you look at halo methanes, this enzyme PMO has activity on all three of these halo methanes, including iogamethane whose CH1 strength is now 103. No one knew that P450 could do this. 90 of an MMO has been reported to do this reaction. So that's actually a new specificity, both for MMOs and P450s. And we're really pushing the CH1 strength significantly up. So my guess is it can do methane pretty perfectly well, but it's just finding right, binding pocket configurations of all methane in there. So that's the open question working out. Let me finish by going in totally the opposite direction. Tomorrow I'm gonna tell you a lot about larger drug like substrates because this is really interesting. This is where the, I guess where the money is in P450 chemistry because humans have P450 that are involved in drug interactions and drug metabolism and it would be really nice to have P450s that would mimic human metabolism. Remember this enzyme as it went to chemo went through a series of amnesiacs and amnias. Well, turns out if you take one of those comiscuous intermediates and you start screening it for activity on different compounds or screening mutants of that enzyme for actually a different compound, you can find all sorts of interesting activities. And I'm just gonna tell you to be right to stop. In a collaboration with Chi-Way-Wong's group and we'll reemerge its groups, we've been looking at the selective demethylation of permethylated sugars. That is fully, fully protected sugars to see if we can selectively deprotect at different positions along with Chi-Way-Wong. If you can do that with P450s, this is a demethylation like the diamethyl ether demethylation. It makes them out of a high, you can screen it color-metricly so you can add these permethylated sugars to these P450 reactions. You see nice purple color. It's active on that. You can run it through the GC and see what the region of selectivity is. If you can do that, then you can do further chemistry. You can make deoxy sugars. You could do glycosylation, and this is a single step reaction. Really, really nice. Well, let me tell you, I think it works extremely well. This is just one example for your enjoyment. This is one to get unpolished work. You take permethylated benzoyl galactose and run it across just a select panel of P450s. You can find ones that give quantitative conversion at every single one of the positions. And here's just one of the examples. These are selected, perfectly selected for demethylation at the fourth position of this substrate. And in fact, in the right conditions, you can make milligram quantities easily of this with significantly high yields. So this opens up a whole set of possibilities for new sugar synthesis. Makes it really easy. Human metabolites are interesting. So we published this a couple of years ago already. We can take the mutants of these enzymes that take substrates significantly different from cocaine and make the authentic human metabolites. So here are six hydroxy buscarones. This is a marketed drug for anxiety and depression. And what's interesting is that the human metabolite is the biologically active one. It's more active than the drug itself. So it would be really nice to be able to have the human metabolite, especially in new drugs, drug leads. So we showed that we could easily find a unit that made exactly the right stereoselectivity, the right product, right regio-selectivity, and had a decent conversion without any optimization. EMI literally was so excited about these results that they came to us with three of their drugs and said, we know that these have something like 50 or 39 human metabolites. We know what they are. Line it across your panels and tell us what you have. We looked at Barapamil, Estemizol, and one of their unmarketed drug compounds. And within a week, we were able to deliver them more than 30 authentic human metabolites, as well as some inauthentic metabolites, which perhaps even more interesting. Each one of these costs about $50,000 to synthesize a milligram of quantities. And these enzymes made all of these, including these non-op metabolites, not made in the microsomal system and stored with cloned human enzymes. Didn't make a few of them, we missed them because we didn't look very far. And some of these enzymes are very selective, allowing easy feel up to make individual metabolites. So one of the enzymes just selected this amount of group from all the other ones, got five other groups and selected this one. So 94 selectivity and could easily make 10 milligrams. That was so exciting that they begged us to start a company. I'm not the least been interested in starting a company at this point. And so I licensed them to Codexis. This is a great supplier of enzymes to, in synthesis. And they're now marketing from Caltech, this Microsoft platform for these things. So it's actually a really, it's, I'm not just mouthing this. The, these are really useful, really useful catalysts that are now available for these kinds of synthesis. So with that, I'm gonna thank two of the people who really contributed recently to this work and are starting out and did all of the catalog work and really designed these amazing settings so that people are 50 BMO and then a few other people in the lab and out at this point, thank you very much for having me here. Have you tried evolving the enzyme in the presence of a benign small molecule which might take up, which might split the binding up? A non-small one. So a small molecule, that's benign, that won't get like absolutely with me take up. I'm sorry, I've never done so. Sorry, have you tried evolving your enzyme in the presence of a benign small molecule? That doesn't get hydroxidated but maybe take up part of the binding package, so. Oh, well, I'm evolving this inside of a cell so there's only, you know, 50,000 small molecules and they're probably sitting in the active site and doing all sorts of things. Yeah, but I haven't added anything specific with that hypothesis. But yeah, I wouldn't be at all surprised if there's, you know, big sitting in there. I wonder if you took a series of three steps first of all, to the T, then bringing in the given stability with some end of the first floor. And then, do you have a lot of potential and a way to do it? If you're thinking of different strategies to where you sort of tackle the stability first and you think you would have gotten the same result. So that's a, the general question is if you do it again or you take a different walk will you get the same result? I'm sure if I even used the same select of pressures I wouldn't get the same result. There are obviously many solutions to this problem. Otherwise I wouldn't have done it, right? If there was something on the solution I'm not going to use offspring. I'm doing a few thousand clones that's a very small fraction of all possibilities. I don't think that, you know, this, the one I'm trying to hear is historical and I'm quite sure that you can start and find this by a less secure trip. We're now actually doing computationally in the mining library. It's in the screen of Diagonal Believer and have plenty. They have fewer core mutations that work on Diagonal Believer. So, you know, you can do it all. You can skin that evolutionary cat in lots of ways. Yeah, evolution's great, isn't it? It's interesting that, so what you seem to be doing if people are good at this clamping down but what if you look at the size of the pocket that's the biggest? Yeah, well, it's not trying to become an MMO, right? I'd love to say that mechanisms are making a converged MMO but structurally there's no relationship. So, is the redox potential of the middle of the center or anything like that changing? We are doing those measurements now in Harry Gray's lab. I don't have the numbers. In doing evolutionary experiment in the 50s I was always very impressed by something which is periodic selection. You would have that, yeah. Periodic selection. So, I can impose selection pressures that I want but are you saying to change the selection? I would say periodic selection through the selection. So, are you changing the selection pressures as you go? When doing evolution, you get a mid-population although there is no selection pressure nothing but with you and here you've seen the old one. Well, that's kind of this idea of the promiscuity, right? Within the old one there is a new activity that appears. Remember, this is a small population limit. So, the population, we're in the limit and the whole population follows as a single population. So, I have to look at the number of functions within a single sequence. And so, those are coming up as you move along. To follow up to an earlier question when you were asked if you would do it all over again could it be done differently? Just a little bit more on that though. If you were to first start with the strategy of stabilizing the protein would that not give you a greater amount of sample space from which to begin? Is there any reason to think that given all possibilities that's the best place for which to start? I'm thinking. Probably, I was okay with wild type for at least the first couple of generations. I probably wasn't missing out on a lot of things at the tail end of the mutation distribution. Probably should have stabilized it a few generations earlier. Yeah, should have done it that way. Strangency of selection. It seems like that's going to get over how far away you've gone from wild type and then. You choose a stringent to be such that you get beneficial mutation. If it's too stringent, you won't get anything. If it's too unstringent, you'll get too many. And it's just right. So, how do I find what's just right? It's trial and error. You set up the screen and if you don't get any hits then you have to reduce the stringency. But the way that you do it is you measure what wild type can do. You measure what the parent can do. And then you just take the stringency just where parent is like 10% of what it normally was. So if you're trying to increase the stability, you take it to a temperature where parent has dropped to 10% and therefore you have this nice dynamic range of 10 to 100% where mutations can be measured. If you make it too unstringent, parent will be too close to the ceiling and you don't have enough dynamic range to the measurement. And if you take it too far, you can't measure parent anymore and you don't know whether you're five mutations away or 10, so that's how you do it. You do it by looking at what's in here. It's not black magic, it's actually really white. So there is no known natural propane amount of oxygen. And of course, since I only made 23 mutations, the music is going to be most similar to it. It's going to be similar to VM3. So it has not converged towards any other sequence that's, the closest sequence to VM3 is quite 70% identical. So it has 120 some amino acids of institutions. So it certainly hasn't gone towards anything that we know of. And in general, they don't converge to other sequences. They don't actually, because there are too many pathways to go from one sequence to the other. The probability that you'll converge on any known thing is essentially zero. And that's another thing, because what it means is there's, for any given problem, there's lots and lots of solutions. Right, they have to be. Let's do one more question and then we'll head over through your questions. So as a sort of a follow up, are the positions that you're seeing the musications in, are those the ones that tend to vary a lot between different monoxygenases? Or are they more conserved ones, or how does that relate? So you can get more information, that's a good question, that you can get more information from the positions of the mutations. Because now you really are getting the adaptive mutations. And so people have analyzed the how of these things, because they're so important in drug development. And they've identified these substrate recognition sites. So half our mutations fall in these substrate recognition sites, which count for about half the sequence, and half of them don't. Because I didn't make any additional information, tells me that we're right randomly, half the sequence. I mean, it doesn't really give us a huge clue as to where you go if you were going to design this thing. However, we have taken what we've learned about what doesn't work, right? So you can throw out all the sites that are always going to be printing. And we've taken things where we see multiple hits, and we can make design libraries from those. So if you choose five residues, then the sequence space is really very small. You can make all those substitutions at all five simultaneously, and ask you any of these, hydroxylated, dimethyl ether, and the answer is yes. So we can actually, based on what we get from random intertenancies, design libraries, computationally, design libraries, not even computationally, and start targeting mutations. So let's move all of our questions over to the computer. Thanks, guys. Thank you. Thank you.