 Introduce Chris Chris did his bachelors at Michigan in chemical engineering and from there he moved to Caltech to do his PhD and Initially I was confused because I saw three names as his PhD advisors thought perhaps He just had bad luck and move from lab to lab to lab, but he tells me now He took three P, you know professors to mentor him simultaneously at Caltech and He became a biochemist by a physicist there from there. He moved to Berkeley To work with Adam Arken After which he started his own position in 2003 so roughly ten years ago He moved to MIT about two and a half years ago, but while he was at UCSF He did some really beautiful studies that actually define what we think of synthetic biology today He's one of the pioneers the leaders of the field and for that He won a bunch of awards the Sloan Fellowship the Pew Fellowship the Packard Fellowship the NSF career award and it goes on I'm just highlighting a few before I bore you to death and the indications were there early on He had one of the most prestigious post-doctoral fellowships and the NSF pre-doctoral fellowships and To top it all off He won the MIT's famed TR the Tech Review Awards in 2006 to be one of the few Innovative leaders at TR 35. I believe is what it's called He's developed a variety of tools that are now being used widely across The field and is now also the chief editor of a newly launched journal the American Chemical Society Synthetic Biology and Some of the you know beautiful work that's being done is now appearing in that journal so it's already setting He's bringing his vision to fruition and engaging a larger community. It's a pleasure to have you here Chris Well, thank you a seam. It's always great to be here. I always really enjoy visiting Wisconsin. It's you know fantastic nexus of Chemical engineering and the engineering sciences with with microbiology and of course biochemistry and so on so it's always nice to see What's going on? So I'm coming from the background of engineering my background is chemical engineering and I say that because this is a biochemistry department And so I want to make the point that our primary interest is being able to build systems that don't yet exist and To push the scale and the sophistication by which we can do genetic engineering And so I'm going to start out in this talk just by giving you a little bit of an overview of the types of problems That we hope to be able to address in the near future and then I'm going to walk through more carefully a story Where we've been working to simplify some of the genetics around nitrogen fixation Which is a function that has been tried to be transferred between organisms for some time and Try to leave you with a sense that there is a phase change that's happening now in the size and Scale of designs that we can do in genetic engineering So for a long time We and others in the field have Drumped of being able to program cells to do complex functions. And so these are are some examples of either what has been envisioned of what could be done or Things that happen naturally in biology and so for example In the example on the left This was a student of mine postdoc mine who then went and joined UC Berkeley Chris Anderson Where his whole dream has been able to engineer bacteria as therapeutic devices Where they go through different micro environments in the body they sense where they are they perform a therapeutic response in each one of those Neaches and all of this would be programmed in the DNA on The right I'm showing a number of different materials that are naturally produced So cells are the natural architects of the nano world I'm showing a glass sponge, which is essentially spinning fiber optic Cables as well as diatoms that live in the ocean and build these very intricate silica structures So everything on this slide involves being able to program a cell to go through a complex series of tasks And it's a huge difference from what is currently being done in biotechnology so if you look at the forefront of companies like Dupont or Dow ag or even startups like Refactored materials or amorous they're really about the production of individual small molecules and really just having biology produce that But that's not getting close to what biology has the capability of doing and so everything that we're trying to do is to Bring that scale up where we could actually be able to undo these types of functions in cells and very broadly There are two things that are required in order to do this The first is that you have to be able to manipulate many genes possibly hundreds of genes And you need to be able to control precisely When they're turned on and under what conditions so the dynamics and the conditions and so for that you need synthetic regulation and so the way that we envision this all coming together and moving towards Being able to do this is on the scale of whole genome design Where you could actually go into an organism and be able to construct systems that are on the same scale as the natural genome and We very roughly divide this up into different categories Where on the far right we have the actuators or these are the things that the cell is actually doing These are connected to circuitry that's telling when the different systems to turn on Which are then subsequently connected to sensors which are looking at the individual environment and Dictating when things should be activated so for example in this Fantastical idea of creating a cell that can spin its own antimicrobial cloth You might have a set of genes that are there encoding silk proteins You have another whole set of genes that are making antibiotic nano particles The silk has to be exported at a certain time and then modified and all of this has to be integrated into a large system and so My lab is focused on each one of these steps So being able to sense Environments to be able to process that information with circuitry and then to control different outputs What I'm going to focus on in this talk is the last of those steps So being able to convert any cellular function into an actuator that can be turned on and off whenever we want and to be able To transfer those cellular functions between organisms as easily as possible So now we've been doing this and others have been doing it for some time and this has really been facilitated by the advances in DNA sequencing and DNA synthesis so over the last decade to two or three decades now of sequencing There's been sequence databases that are populated with genomic information. That's encoding functions and all of that information is Stored in inaccessible unless you're able to go in and re-synthesize the DNA that's encoding those functions And this was facilitated about a decade ago with the rise of DNA synthesis companies that allow you to go and imprint out A lot of pieces of DNA or a lot of genes and screen those for a particular function And so this has led to an area done by a number of different labs called part mining And this is where you have an enzyme that you think has an interesting function But instead of working with that one enzyme you go in the databases and you print every single Enzyme that has that potential function and screen it for activity and so this is One example of that where we were looking for an enzyme that produced a methyl halide for sort of random reasons But instead of working with one gene from one organism We printed 90 enzymes from 90 different organisms that had any chance of performing this function And then we identified the top ones which came from plants and from Bacteria that hadn't even been isolated just part of metagenomic samples and so on So this is a very simple single gene experiment where you're transferring this function from a plant or from a microbe Into a new host in order to take on this new function And the reason that this was possible of course is all of the tools that already exist in genetic engineering So we had to use synthesis to access the gene But then we just pop it into an expression plasmid that has a inducible promoter and a ribosome binding site So we can turn the gene on and the new host and so on So it's a very simple type of operation that allows us to access that function and then turn it on in a new host So then we started to get interested in functions that can't just be encoded by a single gene But require many genes collectively activate turned on in order to to produce that function and so this is where we started to get interested in gene clusters and so what happens in bacteria is that many cellular functions have all of their associated genes encoded in a contiguous region of the genome known as a gene cluster and so for example one of the the system I'm going to talk the most about today is encodes nitrogenase activity and this is a 25,000 base pair region of the klebsiella genome that has all of the necessary genes for this function as well as their regulation and there are gene clusters that encoded just about anything that you can imagine a bacterium doing so for example We also work with a protein secretion device out of salmonella, which is 35,000 base pairs of DNA And again that has all of the necessary genes to build this secretion device and export proteins from the cell There are many many examples that produce pharmaceutical like compounds Things involved in the breakdown of biomass and energetic harvesting Including things like light and so on and so in this this vision of the future What we would like to be able to do is to look out at the microbial world and take any willy-nilly functions from different bacteria that we want and plug them together into this single organism that then has some combination of Abilities so for example if I want to be able to export protein I want to be able to pop in that unit of DNA that Encodes the protein secretion device if I wanted to fix nitrogen I want to pop in that piece of DNA to fix nitrogen and so on So now the challenge with it is that instead of an individual gene that we could just put in front of an inducible promoter You now have 16 or 20 or 40 genes that all have to be turned on at just the right levels in order to get activity And so this is just a subset but with all of the really high throughput sequencing that's been going on There are literally tens of thousands of gene clusters that encode just about anything that you can imagine and this is being Found in a very automated way the bioinformatics exists it already go in and I Identify these different types of functions and it's growing and scaling with the size of the database And so when we look it we often just see all these different functions So we want to be able to move from one organism to the next and so there are really three things that you need in order to Do this transfer The first is that you need sequencing and bioinformatics and that's so that you have the Information associated with that function and you have some way of going into the databases and being able to figure out What is the subset of DNA that is required for that activity? But that sequence information is not enough to to obtain that function You also need DNA synthesis and this is because you need some way of being able to re-access The information and turn it back into physical DNA, but you can't literally re-synthesize the DNA sequence from the organism from which you obtain this because all of the Regulation is going to be different between that and the new host So the third thing that you need is Synthetic biology and this is what allows you to then replace the regulation of that gene cluster so that you can get it to function in the in the target organism So we started thinking about these different things and moving from individual genes to sets of genes and and trying to be systematic about it and so one of the first problems that we decided to start trying to address is Is nitrogen fixation? And this is a problem that's been articulated for 30 years or more Where the problem is that? When we look at organisms when we look at plants and their ability to obtain nitrogen from the environment most of the Plants that we consume are not able to obtain their own nitrogen so things like legumes and beans are able to form associations with microorganisms that deliver Nitrogen, but cereals like rice and corn and we are not able to do this And so the way that we've solved this problem in agriculture is through the chemical production of ammonia Through Haber-Bosch and usually through the burning of natural gas And so in this way the cereals are consuming the nitrogen out of the soil Which is then being put back in there through chemical routes And this is what has led to the incredible increases in in cereal yields over the last decades and so since the 1970s it's been viewed as that as an alternative to this chemical route of Synthesizing ammonia and dumping it onto the soil that what you'd really like to do is to take one of the Plants that is able to associate With these organisms in the soil and transfer that capability from a legume to a to a cereal crop through genetic engineering and this was articulated at the same time that the recombinant production of insulin Was articulated in the 70s at the asylum our conference that quickly was solved and it became Genentech and I think the reason that was solved quickly is because at the end of the day It's a transfer of a single gene into a new organism Whereas as you'll see for nitrate nitrogen fixation to require many genes for activity They have to be balanced just right and the tools haven't been there in order to actually achieve that And so we decided to start going in and try to figure out what's stopped This ability to transfer this function between organisms and then how the tools of synthetic biology might be able to help with that step So this is the gene cluster that is the model system for nitrogenase activity. It's from the Bacterium klebsiella. It's a 20 gene system that's encoded in 25,000 base pairs of DNA and it has the core genes for the nitrogenase itself Which is what performs the conversion of atmospheric nitrogen into ammonia It then has a series of genes that create the Fimo co co-factor and then loads it into the nitrogenase enzyme It has additional enzymes are involved in the delivery of Electrons to the reaction center and then it's got a regulatory network that very carefully controls expression and so one of the first things that we did is we went into this system and We wanted to look at how sensitive it is to changes in the expression levels of the individual component Proteins, and so we performed a series of experiments like the ones I'm showing here where we would go into the native gene cluster and knock out either an Individual gene or a small set of genes which we then would complement back onto a plasmid So that we could sweep those genes through expression levels and look at the impact on the activity of the system And what we found is that this is a very fragile system where if you look at it funny You very quickly lose activity and so you can see that Here where you have to balance the expression level of the genes just right and if you're too low or too high You very quickly lose activity and each of the of the subsets of individual genes and combinations of genes have this property And it's not that they all have to be high or low Each set of genes has its own optimum level that has to be just right in order to get the activity of the system as a whole So now what I'm showing with these six graphs is just one slice through this space Where we're changing the expression level of a single gene as soon as you start to do combinations of genes it gets much more complicated So for example, if you're looking at the expression level of two genes in this system Each of which has an optimum then you create this sombrero hat like space Whereas you change each of those genes from the optimum levels you very quickly lose lose activity So now in a system where you have 16 genes all of which are very fragile and have to be carefully optimized You then don't have a two-dimensional space. You have a 16-dimensional space Which is this hard to imagine volume wherein somewhere in this space You've got a volume that represents all of the combinations of expression levels that are consistent with activity And if you're outside that combination of expression levels, you then have an inactive inactive system So I know it's hard to think about 16-dimensional spaces and when I was a student at the University of Michigan My advisor was Richard Goldstein and he used to tell these really bad math jokes And one of his math jokes is why should you never buy a hyper-dimensional water melon? And as it turns out as you increase the dimensionality of a space the surface area over the volume goes to infinity And so as you increase the dimensionality most of the surface most of the volume is on the surface And so the reason you don't buy a hyper-dimensional water melon is that it's all rind and That's bad. I warned you the So what that means is that when you're in this space You're always on the on the edifice of falling off and losing function. And so in a transfer experiment Where you're taking a gene cluster with 16 genes and you're cutting out and you're popping it into a new organism you're even if everything works just right and You know all the ribosome binding sites and all the promoters and all the terminators and all the codon usage And everything is just perfect. It's not going to get the expression level just right And so if you wiggle all those expression levels you end up moving into a new space and losing activity very quickly okay, and so then you have to try to regain activity, but because the The system is so redundant and so overlapping and and so non-modular It's impossible to make these substitutions. And so you're never able to regain activity And so we've articulated the problem like that and then there are certain roles that synthetic biology can help with this process So on one thing the way that we think about genetics is very different And so I'll show you how we can go in and simplify the genetics of a system to make it more conducive more understandable for transfer We also have tools to allow us to very precisely control expression levels So that after the transfer you can then go back in and tune the expression levels to be just right to get activity Some of the part design that we've done for E. Coli and other organisms can be applied to new hosts and then I by the end you'll see how our ability to construct very large and Sophisticated pieces of DNA actually helps us with this process tremendously So first I just want to go over some of the some of the aspects of a gene cluster that actually stop you from being able to transfer it and So this is what I call the nice nature reviews microbiology view of Genetics and so when you look up a particular gene cluster, it's all very nicely organized But the actual genetics underlying that can get very very complex and it's that complexity that becomes a problem So for example, there's some internal regulation But then the system is going to be embedded in the native Regulation of its of its wild type host and this is often in ways that are that are poorly characterized The system also has very non modular and overlapping part functions. So for example You know over half the genes in the nitrogenase gene cluster are overlapping So that the ribosome binding site for the downstream gene is in the middle of the upstream gene And that's just a common way that you have systems are encoded But now if you want to try to change that ribosome binding site There's no way that you could do it without interfering with the coding sequence of the upstream gene So that's a very much non modular system. That's interfering with being able to make Substitutions in order to achieve an engineering goal And there are all types of examples of a genes encoded within genes even operands are a problem and so on There's also a very complex Encoding where you have a lot of overlapping functions and you often see this within open reading frames And so this is an example of a review paper that was looking at all the different ways that you have promoters in in a In a gene cluster and so you've got a lot of examples of promoters within genes pointing upstream You have promoters going against the tide of genes for no reason promoters pointing at no target and so on and So you have all of these overlapping functions and this stops you from making certain types of moves as you're trying to build a genetic system So for example, if you took this this gene here with promoters going the wrong way And for whatever reason if you wanted to move it to the front of the cluster and you found that that was a non-functional system It may not be the move itself That's the problem, but the promoters pointing the wrong way that then turn on something that you didn't expect that then causes the problem And then these are just all the things that you know Of course, there are all the unknowns of biology and even the most well characterized systems that are constantly you're peeling the onion Trying to understand more about how the genetics are encoded So as one example, we were trying to move a promoter from one organism to the next and it killed our target And we couldn't figure out why and we gave up and then years later There was a paper that showed that there was a small RNA encoded there that interacted with porins So that when you tried to move that piece of DNA it then killed the cell because it it interfered with the porins And so you've got a lot of examples like that that interfere with your your ability to to move things around And so we had to come up with a way that very systematically Fixed the natural system to get rid of all of these issues Simultaneously and so to do this we turn to some of the the fundamental tenets of synthetic biology So at the core level in the field we think of Genetic parts and so these are units of DNA that have some biochemical function like a promoter We're a ribosome binding site or an open reading frame and so on and we tend to associate a single Genetic part with a single function and that's already quite different than the natural genetic systems where you have these overlapping parts Where you might have multiple functions within one one part So then based on those parts you assemble them in order to create a genetic device And so this has some human identifiable function It might be a pathway or a circuit or a sensor or whatever That then gets assembled into a system That's a unit of DNA that you then put into a cell that performs that function And so we've got this hierarchy from parts to devices to the system as a whole that then goes into an organism Now we wanted to apply this to the problem of gene clusters by flipping it on its head and so We created this process called refactoring which tries to go in and Systematically reduce the system into its component parts and the the term is stolen from the software industry where If you have a piece of software that's crashing like Microsoft Word or Windows One thing that you would do as a software engineer is go in and refactor the code and that means you just start from scratch You completely rewrite the code to create the same program But where you've organized the code to be modular and engineerable and stable and so on So the way that we're applying it here is where we're going in and completely Rewriting the DNA sequence of the gene cluster So that it's modular and engineerable and so on but that it encodes the same function as the wild type system And so what we do and we mostly do this on the computer is we go in and we have have a gene cluster and We start out by getting rid of all of the non-coding DNA so anything that's not an open reading frame gets thrown out We then take the open reading frames And we go through a process that we call code on randomization And this is where we try to select codons that encode the same amino acid sequence But produce a DNA sequence that's as far away as possible from the wild type sequence And the idea there is that we create open reading frames They have so many mutations in them that they eliminate all the internal promoters and small RNAs and Operator sites and so on and so forth Just by random chance. And so we have genes that are inert for everything other than Encoding the amino acid sequence. They're supposed to encode and at the same time that we're doing the code on randomization We're also running all of these algorithms that we have to find these functions and we eliminate them when we do find them We then organize the genes into artificial operons So these are organizations that make sense to us, but not naturally not necessarily the native organization We then add synthetic genetic parts that we've characterized in isolation For example ribosome binding sites and terminators and so on We then use a phage polymerase promoters that control the level of transcription and Then the last step is that we create what we call a controller. And so this is the usually a genetically distinct Construct that we build that has the synthetic sensors and circuitry to control the timing and the conditions for expression and then this the output of the circuits Then produce the RNA polymerases which then go on and turn on the pathway So at the end of this process, we're left with a gene cluster that has absolutely no DNA Identity with the wild type. It's been designed to be modular and organized We've gotten rid of all of the native regulation and replaced it with this controller that when we pop in the cell it then is encodes when and The dynamics of expression and the remarkable thing about this process is it works And we we published the first version of it last year so this is the example of the refactored nitrogenase gene cluster and It's a little hard to look at at first But as engineers this picture what makes a lot more sense to us than the nature reviews microbiology paper and the reason for that is if you look first of all It's it's highly modular So each box is a unit of DNA. That's a part that has a prescribed function Where we know what that function is doing because we put it there And we've independently measured it. So for example if we look at it's very hard to see But if we look at each of the ribosome binding sites in a system You'll see a number underneath them and that's the strength or the expression level that that ribosome binding sites achieving And you see the same thing with promoters. We know the strength of that promoter Terminators even we know the strength of that terminator And so each one of these these parts has been put in the in the system for a purpose And then we can control the the expression of the entire 16 gene system with a separate controller Which encodes this logic function? And so here we just have two small molecules that are that are inputs and we can turn on the entire 16 gene system When those inputs are in the right conditions of that logic, and then it stays off otherwise And so we've gone in we've made this modular system We can drop in a controller control when the conditions that it turns on and so on And so we've have this improved activity or this improved Reduce the function it actually comes at a cost of activity. So the process of simplification Initially created a system that was much less functional than the wild type system and you can see that here Where in the in the wild type we have a high level of Nitrogenase activity and then in the first version of the refactored system. We only recover about seven percent of Activity and so this this process of simplification came at this cost and so we started to look back and The the process of going through this actually took about seven years to create the first Refactored system and so we started to look at well, why did why was it so slow? What were the things that allowed us to speed up during that time so that we might get faster in moving forward? And so I just have summarized those seven years in this one step, which is the first Subsystem that we tried to work with in this pathway where we started out by trying to refactor one of the one of the Operons in the system that only encoded five genes. This was back in 2004 And what would happen is that on the computer? We'd create a refactored version Send it off for DNA synthesis. It would cost ten thousand dollars to synthesize this thing. This was a while ago And we just completely lose activity so you get no activity And then just to give you a sense of the scale of the of how quickly we were able to fix things We would then have to go in and we could build about five at a time Or we would cross the wild type in the synthetic trying to figure out Exactly what part of the refactor system was causing the problem? We would then figure out Okay, so the the first nif H gene that we tried to Code on randomized created in this case transpose on insertion sites which then destroyed the activity of that gene So we'd have to go back re-synthesize the gene Find that it worked but not at an optimal level then manually try to Tune the arrive his own binding site until you recovered activity. So in this sort of design-built test cycle We were doing about five at a time with a very small region of the cluster and trying to identify Improvements and then we would have to build the entire cluster and the tools during this period were just not there for this Where we would have to assemble about a hundred or so genetic parts each time So we would do it in a hierarchical assembly So what I'm showing here are all the different portions of the cluster along with the percent of activity And so with this method Gibson's assembly came online at the tail end, which actually is what allowed us to build these at all So we could assemble them into these larger pieces of DNA But we lose activity very fast and so this half clusters at ten percent of activity Which is the multiple of the sub clusters and then here we'd get three percent which again is the multiple and actually the first system that we assembled Took like four months and we only got the point three percent activity We're actually really excited by it But you had to hold the paper just right in order to see any activity So we would go back you'd have to go all the way back to the beginning Reoptimize those that were the pinch points Rego through this process of assembly, which would take a long time and we got one that was functional And that's what we published as the first step, but we said after seven years We've got to do something differently right we have to approach this problem in a fundamentally different way And so we started thinking about well What's actually the ways that we're going to do this faster and there are two general ways that you can think about optimizing this process The first if you have a design build test cycle where you're designing a construct You build the physical DNA and then test it and then learn from it You could imagine going faster by going through each of these steps faster So you go through more design build test cycles per unit time We decided to do a slightly different approach And this is where we partnered with the Broad Institute to take out massively parallel approach Where instead of designing a single construct we would design hundreds of constructs simultaneously Build each one of the those constructs test each one for activity and then apply learning algorithms to go back to the design And so then you could in this very massive way try out a lot of designs simultaneously And so this is a collaboration with the Broad Institute, which is which was absolutely critical in this Because what they did is is they applied manufacturing principles to the process of DNA sequencing and this is what led to all the advances in in It was was a major contributor to The advances that led to the human genome effort and all the metagenomics that's going on so on and This is their facility. It was designed by Toyota and it's really a manufacturing pipeline. It's not an academic lab It's in an old warehouse. They used to store the beer and the popcorn for their Red Sox And so it they're now able to sequence about 4.3 trillion base pairs per day Which is absolutely incredible So we decided to get involved with them and retool these processes so that we could do DNA synthesis and assembly at a much higher scale than what we can do now and this is where we We built this thing that we call the MIT Brode foundry and we went in and have been working on each one of the steps One is rethinking genetic design so that instead of building single constructs We can build many and when it designed many in one one in an equal amount of time then build all of those constructs to specification very rapidly test out a lot of of Constructs as well as genome scale transcriptomics and proteomics and so on and then learn from that enormous data set that you get And so we set this up as a pipeline they they're manufacturing people went in and have have started Working out all of the limbs and how you how you get the computer aided design to communicate with the construction and quality checking and so on So I'm going to go through some of the academic questions that we had to address So one of the first ones is that we realized that really for the last 30 years We've been designing genetic constructs in exactly the same way and these are tools like vector NTI I'm sure everybody in the room has used one at some point where we think of a genetic design as a plasmid And so if you have a small plasmid with three genes and a dozen parts This is a visualization that makes sense but as soon as you build a try to design a very large system like this is one of our Nitrogen this is the refactored Nitrogenase cluster in vector NTI. It just becomes a total mess First of all just going in and trying to make sure which gene goes where is it is a huge problem You make mistakes There's there's really no way that you would do this 10,000 times, right? It would it would take up a huge effort or you just have to resort to some random method So what we did is we partnered with Doug Densmore at Boston University and Doug had caught my attention because he developed a Different way of articulating a genetic design. It was fundamentally different than How we had been thinking of it and it's Part of what he calls the Eugene language. So instead of that vector There's a code here and don't worry about what it looks like and what you do is instead of Having a single construct that you're trying to build You identify the parts that are in your system. So underneath this file You would get all of the DNA sequences for the parts in the system And then instead of saying this particular construct is this combination of parts You create rules in your system and the rules are anything That you can as an experimentalist write down about how you would describe the genetics of your system So this promoter has this particular RBS all the genes have to point the same way These three genes have to be an opera on that one has to be in its own cistron So and so forth anything that you you can articulate can be captured in this language Written very simply in Eugene, which then gets converted into a machine readable code And this is critical because as soon as the machine can read it the machine can design it And so you can convert this into a into a genetic construct That captures all of the parts and all of the rules between those parts So if you're not used used to thinking about this by the way This is going in vector NTI and so this is this isn't just sort of the fantasy of of our lab But this is I think is a really a direction where genetic design is going If you're not used to thinking about parts and rules between parts we can turn to Shakespeare and Think about how we could assemble words into sentences rather than genetic parts into constructs and So what the simplest way you could do it is where you just have a list of English words And you randomly combine them so they create sentences like iron clothes put adhering behind study But then you can apply grammatical rules on top of that and start to generate Random sentences, but ones that adhere to the grammatical rules like the wife anchors the monkey and Then you can start to apply constraints on top of that system like two oyes before B You can start generating new sentences and learning from those sentences and and so on to create new constraints Which further condense your system and so the way that this works for the genetic designs Is that we can start out by defining all of the parts in a system? And then once we do that we can write a little program that then permutes those parts to come together randomly and as you would Expect you then end up with Nonsensical designs so you have genes without any transcription or translation control multiple RBS is in series promoters pointing against genes and so on so you just get random nonsense But then you can articulate things like expression rules so every gene needs an RBS every RBS gene pair requires a promoter and a terminator and so on so forth You can articulate those as rules and then rerun the process and it condenses that space So now you only create constructs that are at least consistent with transcription translation rules As he's every gene has an RBS There's always a promoter somewhere upstream in a term terminator downstream And then you can add whatever rules you want to apply to your system So for example Have it where three genes are in one operon and then one gene is by itself in its own Cystron and even there in this simple four gene system There's still a huge amount of diversity that's possible with that system And this is one of the real surprises that come that that came out of this where it's really hard to constrain your system to just one construct because all of all of the ways that you can Fluke the system and you can even think of this in terms of if you're building a single plasmid where you're just trying to express a protein You know, there are all these design choices you make that you don't mean like you could Use this multiple cloning site versus this one or flip the whole cassette this way or that way and if you enumerate all those different ways you actually have a lot of Diversity that's that's possible and and you're making design considerations that aren't really there So so the output of this program are these different Constructs, but then this feeds doesn't just make pretty pictures. It feeds into the pipeline for the robotics and assembly process to actually build each one of those constructs and So we use a series of hierarchical assembly methods that start out with small parts that assemble them into essentially individual genes and then and then final constructs and so we started out just by doing very simple assemblies where we were very constrained by the the the system Where we would just do things like switch promoters and RBS is and so on because we had this feeling that the architecture had to be relatively fixed compared to wild type and we would assemble the the The genes for this so this is one example and you can go in and screen this and then learn new new rules from it This is an example of another early library that we built where we added a little bit more diversity but still keeping the overall architecture more or less the same and Again reassembling and then you can see in this case now remember this is not random Library each one of these has the same amount of design information Before you run the experiment, but then after you run the experiment you learn something that can then be rearticulated as a rule back into the design process and so in this case we could figure out a rule Which then allowed us to optimize this Subcluster so we went from less than 20% activity recovering full activity for that system So then we went in and we had started noticing with different projects in the lab that actually The the less we felt constrained to the wild type genetics the better off we were in our library And so we we decided to build this library for one of the 16 sub portions of the nitrogenase cluster that was causing us a lot of trouble and so we built a library that we called the state a library because The state of building is a is a building on campus by Geary that just looks messed up in our library looks messed up so if you look carefully if it in this case Before we were putting all of these constraints like operon occupancy and stuff like that and In this case we just said if we don't know for 100% that it's not a rule We'll leave it out and we'll only put things in the system that we know our rules And we'll just permute around everything else. We'll let everything else float And so we created this library that is one operon in a wild type system But it was up to seven promoters genes pointing every which direction Operon occupancy in order changing Just going crazy. There's a bunch of constructs that get designed And then there this has to turn into an assembly map that takes small pieces of DNA And then combines them into transcription units into build bigger pieces of DNA That they get physically combined and this is all done in parallel And a lot of the software is trying to figure out the optimal way to get the final set of Units of DNA with the smallest number of combinations of parts they have to do So we then screen this particular library And we found that the top hit was really surprising in that it looks absolutely nothing like the wild type system The the wild type system and the first Refactored are both single operons with six genes and our best Cluster had seven different promoters genes changing orientations Just an enormous amount of diversity It's not that that had to be a way to solve the problem We also found one that looked very much like wild type with just different parts in the location And so we found these very genetically distinct Constructs that are coming out of the hits and we can be very quantitative about it I won't really go into it, but we can this is showing a map of each construct where lines indicate similar constructs And just showing that we if we map out the space we see high activity shown in green Spread out throughout the region And we can be very quantitative and so on so when we went in We then had an improved cluster where everything shown in in red is is an improvement And you can see this divided up where the HDK portion got up to a hundred percent the BF Bqf us vwzm is up to a hundred percent in the last bit of in E&J was a bit of an issue So then we once we built this this infrastructure We could then start building extremely large libraries of of clusters again all by design And so this is the example of about a hundred twenty sixteen gene clusters Where we were very systematic about how we varied the components of the cluster and this is about three point five Megabases of synthesis and assembly so it's as though we built a bacterial genome now in the scale of the experiment that we did And we do it at very high efficiency with only four percent error rate meaning that 96% of that library perfect and then four percent have mutations and that's allowed us to then go in and and improve this system further and and All the components are very high and then the top cluster all together is about 60 percent Again showing in red those parts that had to get substituted for activity So what we're finding is that one of the the big questions when we started Was whether when we take this very dramatic process where we wipe out all of the native regulation and replace it with synthetic regulation Whether there was something about the wild type that we were missing about the with the system that we were just eliminating It could never get back and what we're finding is that we can do just as well We're almost just as well with a completely synthetic system, and I think we're very close to fully recovering the activity So now we've started some species transfer experiments and And this is something we're just now starting So one of the simple examples is E. Coli. This is this has been done by other labs with a native system It's just a simple example Looking at how systems change and kind of laying out our strategy for it So we start out with Klebsiella with a system that is non modular And then we refactor it we lose activity. We optimize it to boost it up We can then transfer it into E. Coli, and we lose it back down again And so we see this loss again in activity, but then because we have a modular system We can do something that we couldn't do with the wild type cluster, which is to go in and optimize it And so this is an example where we then built a library to re-optimize activity where we took a numerical optimization approach where we switched between low and high ribosome binding sites and then In systematically learned from that information to drive the next library and so on and so this is where across all 16 genes We're swapping the ribosome binding sites systematically and you wouldn't be able to do that For example, if your genes have their ribosome binding sites in the middle of an upstream gene and so on and so with this We're able to recover activity. So we're starting to take a similar approach looking at more sophisticated organisms that would be the targets of the transfer So now I talked about Nitrogen fixation. I'll just Conclude with some of the other systems that we've gone through this process And I started out with this goal of trying to integrate across the difference synthetic systems that we're working with to build up towards Whole genome design and you can start to see how this how this comes together So for example with metabolic pathways, we've gone in and refactored individual pathways that in this case Just have metabolic products that change the color of the cells and then in the in the controller. We have a logic operation With synthetic circuits that switches between two different phage polymerases that turn on the two different pathways under different Conditions specified by the sensors. We have a nitrogen a simp system that I showed and we've done the same thing for type 3 secretion in Salmonella using a different polymerase So now you can imagine taking these two different cellular functions and combining them into a single unit so that you could have Different output signals from your circuits. They're turning on whether the cell is producing nitrogenase or is secretion competent You can start to build up that that function So just to conclude I Start out by describing some of the genetic challenges that Confront you when you try to transfer a genetic system from one organism to the next and so we created this Process whereby we go in and are systematic in stripping out all of the native regulation and by doing that We create a system that's highly modular, but we lose the activity And so we have to go through this process of reoptimization or to recover the the wild type activity back Which then allows us to go in and be systematic about transferring it to a new host and trying out different Different combinations of parts in that host to reoptimize the activity So with that I'll conclude and take any questions Ocala My thinking is very similar to yours and perhaps because we both have background and chemical engineer And I think what presented was very clear and very nice Result, I would only say that it's important also to start with elements For instance, there are better promoters than probably used that are regulating from not Not existing in nature because I think the best way to regulate promoter is turn it back and forth when hurting I Yeah, and I didn't have a chance to talk about it, but actually our promoters as well as our our ribosome binding sites are Are completely synthetic and actually were designed themselves by computer algorithms, so We know the And because we're using phage polymerases. It's an ultra simplified very small promoter and If for us we need this like you described if you flip a piece of DNA So if you flip to promoter into a spot that would then be a very hard on and off in our case What we like to do is have a system that's completely off until we have the controller as a phage polymerase And so the phage promoters themselves both were engineered by computers and they themselves have really no activity Until you put in the controller that then has the phage polymerase is that then allow that system to be turned on and controlled and so the the point I'm making is that everything except the amino acid sequence itself of the final system We have was was completely by design So even a ribosome binding site not only do we know the strength But we know the strength because we have a biophysical model that calculates the free energy of ribosome binding to the RNA or RNA polymerase binding to the promoter and so on so we can really Decouple for that part exactly where it gets its function its contribution Yes The Yeah So one thing that's interesting is that for this and a lot of the projects in the lab We're actually driving more and more away from how the natural genetics are encoded And actually the next generation of refactor systems that we have Looked just nothing like biological organization So one thing is we've gotten rid of operons all together so that every gene is under monocystronic control its own promoter its own terminator Then on top of that we've we have insulators that are very good at transcriptionally isolating each one of those units and Separating the function of the promoter in the ribosome binding site so you can independently control them Whereas right now you can't And so you've got these highly highly modular Subunits going through the system that really just look absolutely nothing like the the native system Which what emerges from evolution is this kind of bubblegum and sticks just whatever worked next kind of putting it all Together and so you get these redundant functions and overlapping and so on and so forth in a big part of being able to engineer those systems This is strip all of that all of that away In terms of like the interface with how we think about it and directed evolution I actually came out of a directed evolution lab where you apply random mutagenesis to an enzyme and then Identify function or a metabolic path or whatever so we we definitely do that type of thing the difference here is that If you you can't do it to multi gene systems effectively So you could take a system and randomly mutagenize it and kind of select for improved activity But you couldn't take one of these clusters move it into a host and randomly mutagenize it and hope to just find an individual mutant That's gonna gonna work And so our the way our system works is that we put all of the information we have Into the to the program it creates the constructs Each one has the same amount of design information. It's a little bit But that doesn't mean each one's gonna have the same function So we can then screen and then based on that screening information Try to extract to new rules for how the system is composed and then put that into the next round of Design one thing I didn't show for example is that if you're systematic about breaking up the gene cluster We find that none of the genes have to be in an operon together except niffy and niff N And gee if you really break that pair up you run into trouble So the next round we build we say well, we'll make everything monocystronic because that's easy to control But gee we better keep these two genes together so that we were more careful about their regulation And so you learn systematically these types of rules that go back in and then help you with the next round of design Yep That Okay, yeah, so So there's there's two answers to that One that's highly sensitive that we don't know how to deal with yet and one that's less so so the the particular one showing this type of project where we have a large multi-gene system and really the parts that we're referring to our ribosome binding site strengths so if what matters is the ratio between proteins then you'd have to expect that those Those ribosome binding sites would go sort of out of rank order in the new Condition to the extent that it severely affected your your function and we're using phage bloom raises and so on so in that sense There's less of an issue. However the synthetic regulation, which is something I didn't even go into and it's not like synthetic sensors And circuitry saying when things should be turned on are highly sensitive to condition and so one example of that with the nitrogen a system is That it's very strongly ammonia repressed and when we eliminate all the data for regulation. It's no longer sensitive to ammonia But when we put it with our synthetic controller plus or minus ammonia it causes the Expression levels from the controller of the regulators to change which then changes the overall Activity of the of the pathway so you can go in and genetically fix that by tuning the expression level Higher of the polymerase than you had in the other condition, but really what you want is a system that maintains You know a set point control over the amount of polymerase independent whatever condition it has So what we find is the sensitivities that you're describing are mostly on the synthetic regulation side and less on the pathway architecture side So So that's actually a big current challenge. So One of the libraries I showed very quickly is sort of said something happened here And that's an example where you could go in by eye and we could see what happened and and rearticulate that as a rule Sometimes we're able to extract statistically from the data things like the NIFE and Combination that I was saying and then put that back in a rule the real challenge We've been having which we're not close to fixing is If you create a three or four megabase library where you diversify 16 genes in all different ways and very complex combinations By eye and by expertise and even statistical methods can't figure out What's going on? And so you get crazy rules from machine learning algorithms and so on and so where our focus has been is actually Not just creating the power the algorithms that allow you to go in and work out that data But actually design the constructs themselves in such a way that they allow you to extract the information when you're done with the experiment So and even old ideas like design of experiments applied to genetics and that type of thing actually become very very powerful So as another part of our next generation designs are Designing it to then allow us to test many questions in parallel Such that the algorithms are able to pull out certain classes of information from the data set Yeah Well any any allosteric issues would be exactly the same so there's no difference between the refactored and the and the Non-refactor system because it's the same proteins. It's the same everything's the same from the Right so in the native host we should be able to recover activity when you transfer it There's certainly going to be Situations where you wouldn't be able to do it for a whole number of reasons But there are a lot of examples of projects where just getting any activity then allows you to So you get a small amount of activity of nitrogen ace in certain target organisms That really gives you the foothold to then push that forward or if you're scanning for natural products like chemicals They're being produced by native organisms You only need a small amount for that initial transfer and then you can run the directed evolution or whatever it is That you're going to do to then optimize that system back up But so for us it's really about getting Increasing the probability that you get some function immediately after transfer. Yeah So what we found so we only build the genes once Because we found as long as you get like a reasonably expressing gene that it's you can control the expression levels Pretty much wherever you need with the synthetic parts. Well, I actually don't when we were taught I was by Bob Landick about this earlier that The cut on optimization as a means of controlling expression levels is really terrible So basically it's that there's something about the gene that's screwing you up if you have low expression levels And once you fix that then that becomes a the substrate for for being able to swing three different levels of activity and so So for us that's it once we have 16 genes that we know work Well, we can we don't have to go back and create new for new versions of that as part of the process Yeah Yeah It's enough where you can grow slowly on so you could do a selection experiment for example, in fact we have