 Let's see. The people are still coming. Well, it's a pleasure to introduce Dr. Jingdong Tian. Jingdong did his undergraduate work in China in the University of Shandong. And from there, he came to do his PhD here in New York. From there, he moved to a postdoc at Harvard Medical School and joined George Church's lab and published what in this field in synthetic biology is considered one of the seminal papers, a landmark paper, where he used on-chip design of genes and synthesis assembly of multiple genes, almost genome-scale synthesis. And that propelled the field of synthetic genomics, making it more possible to think of building genomes by synthesizing short oligonucleotides or slightly longer oligonucleotides on arrays. So many of you who've done chip-chip or have looked at hybridization of your RNA on arrays are essentially looking at fragments of your genome on a glass slide. So you can synthesize a whole genome that reverse is not so trivial, taking it off the glass and converting it into a complete functional genome. And that's what he was able to do. Since then, he's been at Duke since 2005, set up his own laboratory, and has expanded the application to many exciting areas. And you'll hear some of that today. And just today, I discovered that he also has a lab in China, and he has a couple of companies. So if any of you would like to benefit from his technology, he is ready, enabled, and willing, if the project is interesting, to help you along. And so with that, I would like to, and he's won a bunch of awards during this work. As a graduate student, he was Sigma Xi's best graduate work. So it was an early indication of excellence to come. As a postdoc, he had one of the most prestigious postdoc fellowships, LSRF, Life Science Research Fellowships. And then as a junior faculty member, he won the Beckman Award, which is, again, a hallmark and very prestigious fellowship to receive. I'm looking forward to hearing what he's going to tell us today. And with that, let me give you Professor Tian. Welcome to Madison. Thanks for the nice introduction. And you basically, that's basically what I'm going to talk about today. And thanks for the invitation. And it's a great pleasure to be here. So I've been on campus a few times before and working on some collaborations with the Biotechnology Center. At that time, Franco Serino was still here. And so we collaborated on some of the rework, biorework, and that started actually our work on the building genes using the microarray-derived organucleotides. So it's a great pleasure to come back. And actually, this is the first time I visit the biochemistry department. And it'd be nice to make new friends. So today, I'm good to talk about one of the technologies that we have developed, which is the integrated high throughput on-chip gene synthesis platform. And then we're going to talk about some applications of this technology in the building genes and genomes in a large scale. So as I mentioned, my group has been focusing on technology development for the field of synthetic biology. And as you may know, that synthetic biology is a relatively young field and with many promising applications in a lot of significant areas, such as bioenergy, making chemicals, materials, agriculture, medicine, environment. And it's interesting to point out that to realize all these significant applications in this many different fields, you essentially need only to manipulate one molecule, which is DNA. And therefore, it's very critical to be able to design and synthesize DNA molecules in order to realize all sorts of different kind of functions. So that's one of the driven force of why I did this, I mean, working on the technology development in this field. So as you know, DNA-dependent science can encode many different functions from the smallest single gene in the lens of 1KB average to all the way to four genomes. So beyond single genes at 10 to the fourth level, you can have big plasmids, genetic circuits. And if you can synthesize DNAs that large, you can essentially build networks, pathways, and plasmids to do a lot of functional studies. And the next level above, if you go to 10 to the fifth, you can have essentially small viral genomes. And even the minimal genome, I mean, you can actually pick functional genes and some of them into a self-replicating genome, the so-called minimal genome, to prove that you can actually understand how life works, right? If you can build such a genome that can self-replicate and itself. So that's a proof of, I mean, you understand life. And if you go the next level, go to 10 to the sixth, you can essentially have one of the smallest bacteria genome. And from that on, you can build other types of higher, bigger genomes. So in order to study and engineer all these systems, crucial technology is to design and also to fabricate these sequences from scratch. And so people have been doing DNA synthesis from the beginning, right after the discovery of DNA double helix structure, right? And one of the first person to start DNA synthesis is actually Dr. Krana. And he spent 10 years on this campus. And he won Nobel Prize for his work, partially on the DNA synthesis. So he's one of the first person who actually started this whole field of DNA synthesis. And so from that time on, and people start to build larger and larger constructs using a slightly different chemistry than what was initially down. And so until today, just a year before, actually people have been able to synthesize not artificial genome, but one of the smallest genome. It's a replica of one of the smallest genome, which is about one million bases. And they put that back into a cell, which has been extracted of its own genome and proved that this synthetic genome actually can keep the cell alive and functioning. So that's one of the first proof that showed that synthetic genome actually can have the necessary function to support life. So that's basically a brief history of DNA synthesis. And it's interesting to find out that all these years, the chemistry involved in DNA synthesis is basically the same, the phosphoramide chemistry. And the synthesizers are developed based on that. But it's still, even though people can synthesize a genome as big as one billion bases, it's still very tedious and very expensive. And only maybe one or two lab can afford to do that. So that's obviously not sufficient for the whole field of synthetic biology to develop all kinds of different functions. So to do that, one of the critical technology to develop is actually the large scale, high throughput automated gene synthesis technology. And so that's what I have been doing. And if you compare the DNA synthesis technology, the development of the DNA synthesis technology with the sequencing technology, actually you can see that they are pretty similar. It's just that the DNA synthesis technology has been far lagged behind the sequencing technology. If you look at the development of sequencing technology in the 1990s, people are pretty much doing sequencing by hand. So you do the reactions, you run the big sequencing gel, expose it, and then read the lens one by one to get to the sequence. And in the 2000s, people developed the automated DNA sequencer. So you can sequence pretty much 1,000 or less than 1,000 bases in this automated fluorescent reaction. So the computer can read it. So after sequencing, you can get a report from the computer. And now we have single molecule or whole genome sequencer where you can sequence a bacteria genome in a few days for a few thousand dollars. So the technology for DNA sequencing has been evolving pretty fast. And so where is the DNA synthesis technology right now? I think it's still at probably this stage. We have automated DNA synthesizers, but that can only synthesize or nucleotides about maybe 100 bases long. And from that, if you want to build genes, which has normally 1,000 bases, it's a manual process. You have to do it manually in order to assemble those articles into genes. And so in that sense, we are still at around this stage. And we are far from building an automated synthesizer where you can use it to synthesize full genomes or large-scale production of many, many different genes. So we are still far away from that. But at least this chart tells us where we are for DNA synthesis. So what are the major challenges in gene and genome synthesis? So in my opinion, there are at least three. One is the cost and throughput for oligonucleotide synthesis. It's still very expensive even to make the short articles. And the next is eliminating gene synthesis errors because the chemical reactions involved in DNA synthesis has efficiency issues. It's not 100% efficient. So every base you add, you accumulate errors. So how to get rid of those errors is very challenging. Right now, people are using cloning and sequencing to get rid of, to pick the right sequence. But that's very costly and time consuming. So that's one bottleneck. And if you talk about making a synthesizing whole genome which is normally above one million bases, how to assemble and handle those long sequences is another challenge, right? Because the longer you get to the fragile sequence, the molecules will become. So it's very difficult to handle that and also how you can put that back into a cell or into a system where you can jump start the whole process. The whole function of the genome is another challenge. So the synthesis of that first minimal genome proved that people can have accumulated some knowledge to handle this issue. But moving beyond that, if you want to synthesize the coli genome, it's still a challenge that people haven't been able to prove that they can do that. So my work, I mean, for this talk, I'm only going to focus on the first two issues. And talking about some progress we made in those areas. So what's the problem of the conventional gene synthesis technology that currently used? So if you look at the process for the conventional gene synthesis, the first step is oligosynthesis, right? We have automated DNA synthesizers that can make maybe 100 base long oligos, right? And that takes about half a day to a day. And then you need to purify the oligos and assemble them into genes. And that takes another couple of days. And then the big issue of error correction or picking the right sequence, right? And that is the tedious one. You have to clone your gene into a vector and put that into a coli. For example, grow them up and sequence them. If you're lucky, you can pick the right clone, right, from this one round. If you're not lucky, you have to probably repeat this cycle and that takes more time. And all you have to do is site direct and mutagenesis to correct those errors. So that takes even more time. So right now, I mean, the price for DNA synthesis has dropped significantly from, initially it's about $28 per base. Now it's less than $1, half dollar per base. But still, if you're talking about large scale synthesis or full genome synthesis, it's still very costly, right? So what we are trying to do is actually to see if we can simplify this process or automate this process, for example, integrate the whole process of oligosynthesis and gene assembly into one step, right? And then try to increase the accuracy of gene synthesis and eliminate the error correction step. So this will drive the cost down and also significantly reduce the time for gene synthesis. So that's the goal for our research. So let's talk about first how to integrate this whole process. So at that time we thought about this question and so one of the technologies that we can borrow actually was the microarray technology where you can synthesize thousands or even more different oligos from a single chip, right? But the problem, the issue is each oligo, even though you can synthesize many different sequences, but the amount of each oligo is still pretty small and not enough to drive the gene assembly reaction. So we have to come up with ways to somehow amplify the sequences and also figure out how to assemble genes from this big mixture of oligos. So at that time this is one collaboration we did with a group in Texas and using the microfluidic DNA array to synthesize different oligos in different chambers and this is controlled by the digital micromirror array which I think Franco Serena was one of the pioneers for this technology and so at that time I also collaborated with him and so we published around the same time on this technology using the DNA microarray to assemble genes. And so this is the type of microarray we used at that time using the... So if you look at the array it's pretty small, right? The size of your fingernail, but it can synthesize thousands of oligos on a single chip. And then to amplify the amount of oligos, we developed this, I mean having a common primary on both ends and then you can use a PCR-like reaction to amplify the oligos and then you can remove the primers at the end with a type 2S restriction enzyme and then you release the clean oligos. You can use that to build genes using those oligos. So in that way you can get cheap oligos from... I mean inexpensive oligos from the chip and it proved that we can use this method to synthesize multiple genes at a time. But after that we realized that this technology still has limitations because you have... although you can synthesize many different oligos from one chip, but the pool is probably very complex and it's not easy to... efficiently utilize every sequence to build many, many different genes as you want because of the complexity of the sequences. And since then people have been developing different ways to get around that and I think that companies right now, I mean using this technology and the different approaches to efficiently utilize different areas of the chip to get as many as genes out of the chip. So some strategies and you can use bioinformatics tools to design different primers, different PCR primers, so you can selectively amplify different regions of the chip into different pools of oligos so you can assemble them into different constructs. So that's one strategy and there are other strategies that companies using to build tags into these oligos and then try to assemble them into different sequences. So we took a different approach. Actually we divided... the strategy we used was actually to try to divide the chip physically into different areas or different wells or chambers if you will. So in each chamber we can have a smaller group of oligos synthesized in each chamber. And then we can amplify the oligos and then use the free oligos to assemble them into genes directly, not taking out into a tube but directly on the chip. So in that case we not only simplified the... reduced complexity but also integrated the whole process of oligo synthesis with gene assembly on the same chip. That's the original goal we tried to achieve. So in this one strategy we hope we can actually realize that goal. So for this design we used different technology than the original micro mirror array type of chip. So this time we used inkjet printing system technology. So just like the printer you have in your home instead of printing ink we can put different monomers of phosphoramidase into the well and then print them on the chip to make different oligos. The chemistry, another advantage of using this is the chemistry is the same as the phosphoramidase chemistry used in the standard automated DNA synthesis so you don't have to have special chemicals to do that. And that simplified and also reduced the cost of this process and we proved that we can do this pretty well. Also in order to achieve this in situ synthesis and assembly we have to make our own chip. So the conventional type of chip is glass or silicon but here in order to make it easy to fabricate we used a plastic chip, a COC chip so you can easily mode what kind of features you want on the chip. And then the common micro array technology for DNA synthesis has an issue of higher error rate than the conventional DNA synthesis because of the surface issue. If you spot something on the surface the next stop you put on may not hit the same area. So the boundary shifts between each step and that creates a much higher error rate than the conventional column based synthesis. So to correct that issue we actually deposited silicon dioxide spot on the plastic surface so only on the spot you can grow oligos on that and in that case you keep the boundary clean and reduce the error rate. So in this way we can synthesize oligos which has similar error rate compared to the conventional automated DNA synthesizers which is about one in five or six hundred bases. So that's pretty good. So with these modifications we were able to put all the reactions on the same chip but still we have to develop a new kind of reaction to streamline the oligos synthesis amplification and assembly and have them in a single well. So to do that this is the strategy we used. So we synthesized different oligos in the same well and then at the end of the article we have a common primer and using this primer we anneal a primer onto it we can extend and make it into double-stranded and also at this site there is a leaking site I mean there is a leaking site here where the leaking enzyme can come in, recognize the site and make a cleavage, a single-strand cleavage here. So once a leak is made another polymerase can come in and start extending from this nick and also this polymerase has the DNA strand displacement activity so it can displace the strand in front of it and release it into the solution and that strand can participate into the gene assembly process. So in this single well you have the mixture of enzymes and optimized buffer, you only need to change temperature. This is the isosomal amplification and this is the thermocycline amplification so only by switching the temperature mode you can switch from amplification to assembly. So in a single well you can actually assemble genes out of it without changing buffers so that is essentially integrated and automated the whole process of... the whole process of optical synthesis and gene assembly so we pretty much achieved the first goal of integrating and automating the process also miniaturizing the process so the second goal is actually to increase the accuracy of synthesis to essentially eliminate the error correction the cloning and sequencing process so to do that we developed another reaction which take advantage of one enzyme which is isolated from a celery plant and that enzyme has the activity of recognizing the mismatch structures in DNA if you have a mismatch either it's caused by mutation or by deletion or insertion this enzyme can recognize all those structures the mismatch structures and cleavage if you have a high concentration of enzyme it can actually make a double-stranded cleavage after the mismatch site so that's a very useful activity so we took advantage of that activity and integrated into this gene synthesis process so we just added one cycle of this error correction reaction into it and that can essentially eliminate most of the errors in it so let me run this through with you so this is the normal gene assembly process you start from assembler oligos PCR extension into full-length genes and at this step you still have a lot of errors in the full-length genes and to remove that we added this enzyme we did a denaturation and re-enailed those so the correct sequence can nail with the incorrect ones and expose the errors in the sequences either it's insertion, deletion, or mutation they all form these mismatch structures which can recognize by this nucleus the commercial name is called severe nucleus which can recognize the mismatch and make cuts right after it so it exposes the mismatch sequences and then there's a 5' to 3' exonuclease activity in the reaction which can remove all these mismatch bases essentially remove all the errors from it and then you use the error-free fragments and re-assemble them into full-length genes so through this one cycle you pretty much eliminated most of the errors and so we sequenced a lot of those sequences before and after error correction and we show that through one cycle you can pretty much reduce the error frequency from 1.9 per kB to 0.1.9 per kB it's a 10-fold reduction and through further optimization and you can also do multiple rounds of this reaction we can pretty much reduce the error rate to below 1 in 8,000 bases and that's pretty much good enough for common gene synthesis meaning that if you compare the two error rates so if it's 1.9 per kB if you want to build a sequence longer than 1 kB you pretty much don't have much sequence left in the population which is 100% correct and if you have an error rate which is 10 times better you can still, even build a 10 kB sequence you still have more than 30% correct so that's the difference, how big a difference you can make with this difference in error rate and if you want a visual demonstration here is a GFP gene that we tried to synthesize with or without error correction so without error correction only half of them are fluorescent with error correction more than 90% are fluorescent so that's the difference that error correction can make in this process so hopefully we are still trying to improve this process and this enzyme is good it can recognize all types of mismatches but it still has an issue of efficiency towards different types of errors as you can see from this study it recognizes deletions, insertions better than mutations so if we can improve that make it more sensitive to mutations we can further improve this process so that's an ongoing study we are doing hopefully we can use this type of approach to eliminate most of the errors of course there are approaches such as my former advisor George Church is trying to use high super sequencing to sequence every article and pick the correct one and then assemble them into genes that's another approach if you can afford to have a genome sequencer in your lab so one of the goals for this research is actually to build a machine that you can put in your gene design and push a button and you get your gene product so that's basic like a PCR machine on your bench top you can have your gene design and synthesize it for you so that's pretty much the technology development that I'm going to talk about if I still have time I'm going to talk about there are many many applications you can do with this type of synthesis capability I mean you can make, for example, large libraries of genes or you can make whole genome synthesis and so today I'm going to just talk about one application we did which is on the regulation of protein expression and focusing on the codon optimization part of it I mean there are different ways that you can regulate expression of the gene some of the common ones include promoter you can use different promoter sequences to change its strengths also you can modify the ribosome binding sites change the ribosome binding activity affinity so it gives you different strengths of initiation and also you can change codons of the gene using different codons you are able to change the protein expression level just by changing the codons however the rule for codon usage is not quite clear there are software out there or company use software to design the codons for you but half of the time it didn't work if you try this, the instances companies optimize this for me and they will send you back a sequence and you try it yourself so half of the time it didn't work it shows that people just don't understand completely the rules involved in codon optimization and there are many factors involved which I'm not going to go through but GC content, secondary structure or the codon adaptation index are all shown to be key factors in that but just how they play together is a big issue how you design the software that can incorporate all these factors into it is a big challenge and people have been shown that in the gene sequence or RNA sequence if you just change one codon and that can totally screw up the whole translation process and how can you design a software to predict that so it's very challenging so with our technology we were thinking probably we can use this high throughput synthesis approach to solve this by designing the synthesized many different versions of it and see which one works best so if you accumulate enough data you can probably derive some kind of rules out of it so to do that the first step you need to be able to synthesize big libraries codon variants libraries and then you have to clone that into a vector there are some strategies that you can do that but we developed our own because it's simple so it's basically like a PCR I mean you have your vector and you have your library and the only requirement is you need to have overlapping regions between the two and then you nail them and extend them and transform directly into a coli and that's it it's pretty simple and it's not sequence dependent you don't have to do digestion ligation because it's a complex library right it must have every restriction size in it you cannot use those kind of approach but this is pretty simple you just one round of PCR you can pretty much get your library cloned and also you can use this to assemble a number of fragments in one step into the into a plasmid and clone it and grow that into the cell but it can work pretty well to close to 10 kb the whole construct so using this approach we did a pilot experiment to see whether we can just by varying the codon usage we can achieve a spectrum of different expression levels and so we use the Laxie Alpha as a demonstration and they show that it worked pretty well I mean this is the wild type construct Laxie Alpha you have very even uniform distribution of colors and this is the codon variance we did we synthesized and cloned into them and played them out you can see there are great variations of colors meaning the different expression levels of this single protein so that's the demonstration that this approach worked for alterating the protein expression level so here this figure shows a demonstration that we designed over 1300 different codon variants and we measured their expression based on the color the Laxie color and so here you can see we can have very strong expression all the way to a very low expression level and everything in between so this demonstration that by altering the codon usage you can pretty much get every single level of protein expression level and this is a distribution of all these constructs and you can see the expression level it's very interesting that this is the wild type expression level and most of the sequences have this expression level and about a third of them have expression level higher and two thirds lower so we're in the process of trying to figure out what factors are involved in this process and also we are doing a number of different proteins using this approach to see if we can find a common rule involved in this process and also we're doing this in different organisms different cell types to see if there are variations but we just want to see what are the variations what are the causes of these variations and here is just a video showing you the panel I showed you before the expression level and the speed of these different constructs so this is a demonstration that this approach can generate all types of different expression levels for the same protein and to make it practical to work on a real problem we happen to have a collaboration with Kevin White at the University of Chicago Genome Center so they are doing the in-code project where they need to make antibodies for all the Jusofla transcription factors but after they tried that in E. coli they found most of, I mean at least half of them so he asked me if I can help using this technology to express all those sequences that normally do not express in E. coli so I said I can try so he gave me 75 of those and so I went through the whole process that I just described you design the library, you synthesize libraries so each band is a library for all the codon variants for a single gene and so we made all 75 of them and so we made the whole library and screened them and picked the best expressing ones so as you can see compared to the wild type which pretty much don't express at all after optimization you can have a fat band there which is about more than 50% of the whole cell protein is that band so we demonstrated that we can make all 75 work using this type of approach so it would be nicer if after we do many, many different genes we can figure out the rules and then design a software so you don't have to go through this process you can just design one to get the highest yield you want and also this technology is also good for picking the intermediate levels I mean this has some usage in synthetic biology where people build synthetic networks where they want to regulate the expression of each component in it so normally in that you have a promoter that drives a stream of proteins but how can you manipulate the relative expression of each protein using the same promoter this is probably one approach that you can use and for that you probably don't want the highest expression for every one of them you want intermediate levels and I think this approach can probably provide that kind of solution to that problem so with that I want to pretty much all I want to talk about today and I would like to thank people who did the work and also the funding agencies so after I talk about this work I mean a lot of people just want to try it out and want to take advantage of some of this technology so obviously this cannot be done in an academic lab I don't have students that can constantly work on other projects so I set up an operation that if you have this kind of need that I can probably offer some help with that I'll stop here how many assemblies I can do on a single chip? currently we have designs that you can have 30 different constructs on a single chip but you can very easily enlarge that or make it more even more two structures of protein so you're talking about the final three-dimensional structure of a protein and not the RNA secondary structure that's a very interesting question I don't think people have been able to address that issue with the current synthesis technology because it's pretty hard you need to have a significant number in order to do this correlation but in one of the slides I showed you so in this one there's an accidental discovery where there's only one I think one codon change and that can alter the final protein structure not necessarily but later people figured out it's probably caused by the folding process so if you have one codon change and that alters the speed the ribosome migrate along the mRNA and that causes a change in the protein folding process and makes the protein less stable than the original one but you're right that's probably one of the initial proof that codon usage can actually affect protein structure but how people haven't been able to dig into that you stated that the gene synthesis is now 30 cents to an hour base yeah, depending on the length if it's longer once it's more expensive is most of that cost the oligosynthesis? no, oligosynthesis is about well actually 10 cents per base if you buy oligos but it's probably less than a third of it is oligosynthesis 10 cents per base? about a third of the fifth cost is really the oligos is there much error correction going on when you buy oligos off a machine instead of off a chip? no, there's not much error correction going on most of the synthesis companies they just clone and sequence it they don't do much error correction big step in synthetic biology or in just in my field well actually, if you can suppose you can do large scale chip inexpensive DNA synthesis or gene assembly suppose you can synthesize anything you want I think the next step is one is to find out the rules how those sequence can control the functions of life figure out that and also applications you can, I mean, once you learn the rules you can use the rules to help you build different for example design different proteins, enzymes or design different genomes you can achieve different functions with those designs I think that's that could be one of the directions actually, yeah, a lot of people are working on that but there are probably further most significant developments in the future yeah, probably people like you can predict between creating life and making genes where's the limit of what size DNA you can physically assemble? limit how are people addressing that? right, right I guess, I mean for synthetic DNA the lens could be a bigger issue than the the invable DNA because the invable you have proteins factors bind DNA to protect it and in vitro it's just a bare DNA you want to synthesize and the longer it gets the more fragile it gets so handling becomes an issue and also assembly becomes an issue so right now the assembly for bigger chunks is mostly done in vivo you put them into yeast using the yeast recombination system assemble them in vitro there are enzymatic reactions that you can do to assemble them but the lens there is a limit and people can do over 100 kb using an in vitro system but most of the larger ones is done in vivo in yeast so how big a construct the yeast can handle is probably remains to be determined so I guess there might be some limitations yeah, we tried we actually varied a lot of the factors in here tried to build libraries based on the variation of a single factor like for example GC content we can try the same gene we can do 20% GC content versus 80% GC content and see how that can affect expression levels and there is a big influence just the GC content yeah, yeah, there is also a limit on that if you use a high GC level it gets very difficult to assemble or synthesize, even synthesize the sequence using the chip technology you can get down to lower than 1 cent per base or even lower, yeah for the optimization that's time we have to see it thank you