 It is my great pleasure to introduce Distinguished Scientist Professor Joshua Laber. Today and next few lectures will be delivered by Professor Joshua Laber. He is the Executive Director of Biodesign Institute at Arizona State University and the Director of Virginia G. Piper Biodesign Center for Personalized Diagnostics. Dr. Laber has been one of the foremost investigators in the rapidly evolving field of personalized diagnostics. Dr. Joshua Laber has been instrumental in development of self-free expression-based protein microarray platforms. One of the main contribution of his group has been development of nucleic acid programmable protein arrays or NAPA technology. Dr. Laber is particularly interested in advancing biomarker discovery-based programs in particular to find out biomarkers for early detection of cancers and autoimmune disorders using protein microarrays. He has built a fully sequenced verified clone sets for model organisms and pathogen genes which is one of the huge contribution for the whole society and very important reagent resource for the researchers who want to perform high throughput biology. Dr. Laber is the principal investigator on a 36 million dollar contract to develop a blood based diagnostics that predicts absorbed radiation dose received after a radiation event 1 to 7 days after exposure which is sponsored by biomedical advanced research and development authority. He is also the past president of US UPO and one of the conveners of last year conducted human proteome organization World Congress in Orlando. Dr. Laber is going to talk about biomarker discovery-based program various considerations for statistical tools which are required for biomarker evaluations and validation and how to make NAPA arrays using very simple lab-based resources then perform auto antibody based screening for different cancers especially base cancer and how to also utilize the protein microarray based platforms for functional studies especially the PTM based analysis. In today's lecture Professor Josh Laber will talk to you about the basics of proteomics its significance for high throughput gene cloning experiments and what are the steps required for gene cloning and generating clones which could be used for high throughput experiments even later on. So the kind of resources and regions which you can generate using the novel cloning technologies then later on you can simply transfer the genes of interest into any vectors for your given experiment. I am sure Dr. Laber will introduce you not only the concepts of proteomics but also the details about how to generate these high quality reagents which could be useful for your research. So let's welcome Dr. Josh Laber for his lecture. All right I think we're ready to get started yeah all right so I'm going to start a little bit at the beginning we have we have several lectures here to cover on terms of the NAPA technology and so I thought it would be useful to sort of begin where we began. So this is where biology was 10 15 years ago. What I mean by that is that we were studying proteins a few at a time you know maybe three or four or five proteins at a time and that's you know that's how much information we were getting but what we were really trying to understand was the entire proteome and we would take these proteins and we would do a certain set of assays on them maybe we would look at drug selectivity we might look at what the substrates were we might do a variety of biochemical assays or we might do sort of cell based assays on those proteins we would test them for a variety of features and each one would get a different color sort of attached to it. But what we really wanted if you look at only a few things at a time you can't really get a full picture of what's there right if you look at this you know you you don't know what that color means you look at that you don't get know what that color means what you really want to do is everything because when you do everything then you get to see the whole picture you really understand what it is you're trying to look at and what it means and that's really where proteomics comes in proteomics is the idea of not setting one or a few proteins at a time but studying all of them trying to get a comprehensive study of everything. So there are two general approaches to proteomics one approach here is looking at the abundance of specific proteins how much protein is present and what you typically do with the abundance approach is you compare the proteins in the disease to the proteins in the normal in the normal tissue and you ask are there proteins that are changed in the context of disease relative to normal and then the hope would be that if you do this over and over again you'll identify which proteins are altered in disease and that will provide useful information about what's causing what's causing illness. Typically this approach requires mass spectrometry or some type of technology that can measure the levels of proteins in a sample. The other approach and the one that I'll talk about today is what I call a function-based approach and the goal here is to look at the individual proteins and ask what do they do what's their role how do they behave who do they interact with you know are they altered in disease and obviously these two approaches are complementary right they support each other so so what are the ways that we can look at the function of proteins right so here are a few of them you can look at where proteins localize in cells or in the body and that may tell you something about the role of that protein you can look at how that protein is modified is it phosphorylated is it acetylated is it you know is it ubiquitylated modifications of proteins tell you something about what they do you can look at the structure of the protein so what is its three-dimensional folding how does it how does what shape does it take that will give you a clue about what its role is and you can look at which other proteins that protein interacts with right this this topic that we're here today to talk about is interactomics so who does who does who do proteins interact with who do they come in contact with that tells you something about what they do so how do you do the how do you do those various studies well if you want to look at the location of a protein you might tag that protein with a fluorescent marker like the GFP put it in cells and ask where does it localize if you want to look at its modification you might purify the protein using an epitope tag and look at it under mass spectrometry and ask what modifications can I observe on that tag protein if you want to look at the structure you might purify the protein and then after you purify the protein you would crystallize it and you would do three-dimensional structures using x-ray crystallography and if you wanted to look at the interactors of that protein at least using traditional methods you might tag that protein and then do like a yeast two hybrid assay or some kind of pull-down assay to look at what proteins are attached to the protein that you're looking at right and then the the goal of course is to do this in high throughput what you want to do is look at these studies a thousand proteins at a time all right so we looked at this kind of method when we began our work number of years ago and what one of the first things we observed was that there are some things that all of these methods have in common first of all you have to be able to make proteins you have to be able to express them in some circumstance sometimes it's in cells sometimes it's in cell in a cell-free extract sometimes you're making it in vivo in the normal circumstance in other cases you're using a heterologous system all right the other thing that they all had is that to do things in high throughput to study proteins in high throughput you most often need to put a tag on the protein do you all know what i mean by a tag an epitope tag a chimeric tag if you try to purify all proteins by their very biochemical nature it's very cumbersome and you can't do that times thousands and the goal here is to be able to study proteins hundreds of them at a time or thousands of them at a time and so the easiest way to do that is to put a gfp tag on them a gst tag on them a his tag on them some kind of tag that will allow you to have a biochemical hook to study the to study all the proteins in the same way all right and when we began this work this was what what the field looked like right so what am i looking at well we're looking at a couple of graduate students who are exhausted so now why are they exhausted well they've been looking through those haystacks for the needle they're trying to find and it takes a long time to sift through the hay to find the needle so can we can we can we find a better way is there is there a faster technology so when you know if you think about a simple organism like yeast like saccharomyces cerevisiae there's around 6 000 unique proteins in yeast so if you were to do high throughput screening using cdna libraries or or phage display or something like that you could look at around 30 000 different samples and you would pretty much have sampled everything that would be you know a five-fold redundancy right you'd look at everything five times to make sure that you with a Poisson distribution you would get everything of course the simplest method would be to have a cloned gene for every gene in yeast and then test it once and only once and then you would do 6 000 assays and that would be very easy right so the same thing would be true for in the case of humans it gets more complicated so we now know that there are roughly 20 000 give or take a few protein unique protein species in humans obviously once you start taking care splice variants and and post-translational modification that number expands dramatically but let's just say for the sake of simple simplicity if we took each unique gene and tested it once and only once there would be 20 000 but you can't if you don't have cloned copies of those genes if you have them in libraries like cdna libraries or phage display libraries you can't if you want to test all proteins in order to get past all the redundancy you would have to do five million assays and that's just too many ideally what you want is a cloned collection of all of the genes in the human each one a perfect copy so that you could test every gene once and only once and then you would be doing roughly 20 000 assays so 20 000 30 000 assays that's a number that i can imagine doing in a in a high throughput biochemical setting in a supermarket in the united states if you look at around six items a minute when you're passing them that you could get that done in two weeks right they sell 30 000 tickets for a lottery in a single day in the state of massachusetts so 30 000 is a number that we could imagine we could do that right and so that's the that was the goal and so our first goal in my laboratory was to build a repository of cloned copies of all human genes so obviously i'm trying to get you to protein microarrays but before we can get to protein microarrays we have to talk about where the the genes come from to make those arrays how are you going to make all those proteins if you don't have the cloned copies of genes so the first thing we wanted was to get a comprehensive collection we wanted at least one copy of every gene now of course in the perfect world we'd have one copy of every splice form of every gene but at the very beginning let's at least get one representative of each gene the second thing we wanted was a flexible format we recognize that different users might have different applications for these genes and so some of them would need to make the proteins in cells as we talked about earlier some of them would make them in vitro some of them would make them in the natural cell setting some of them would be in the in a heterologous cell setting so you had to you had to have a format that was flexible and to get to flexible we we focused on this technology called gateway recombination how many of you familiar with gateway not so many yet okay well now imagine doing restriction digests for every gene in the human genome it gets to be a little complicated because you'd have to look at which enzymes could this gene could i use for this gene and which enzyme could i use for that gene and for really long genes restriction enzymes are going to start cutting up the proteins into pieces and then you're going to have to reassemble them or you're going to have to clone them in unique ways it would be very complicated so a number of years ago uh folks at what a company that was called life technologies developed a technology called gateway cloning it's it's essentially a type of recombinational cloning so the idea is you have you have your favorite gene here and flanking that gene are these site specific recombination sites and we want to be able to move this your favorite gene into some plasmid vector that allows me to make that protein and so by using a common system with gateway these sites are recognized by an enzyme system from phage lambda and so you can simply mix this plasmid plus that plasmid in salute in in the same sample and add an enzyme and these two fragments effectively swap locations and because these are on they have different selectable markers and this has a death cassette and this guy the only viable product is this one it's the only one that survives and when that's the only one that survives now you can essentially develop a method for doing this operation in high throughput you can move thousands of genes all all all by automation and i'll show you that in a moment so this is the idea you build a library of genes in this master vector here and then the idea is to transfer that gene into any of these other vectors to do any kinds of studies to make protein in insect cells and human cells bacterial cells just by putting the gene into any specific vector and you can do this in high throughput and my laboratory does that a lot we we we move thousands of genes from one vector to another okay another thing that you want if you're going to make these clones properly so that you can do high throughput protein production is you need to make them protein expression ready and what do i mean by that well we have to remove the untranslated sequences from their mrna's and we also have to remove the stop codon because if we want to put epitope tags remember we said we want to be able to put tags on these proteins if there's a stop codon present then then when you translate the protein it will stop at the stop codon and it won't allow you to add the epitope tag and so one of the things that we had to do was go through all of the genes in the human and remove the stop codons of course it doesn't work at all if it's not catalogued and trackable so you have to build into the whole system a database a tracking database and a storage system so that when you want a gene you know where to find it so it's it's the molecular version of building a library right you you you have to store the books in a place where you can find them same way with the genes here one of the things that we wanted in our system was that we wanted to make these clones available to everybody so if you're going to make a library of all of the genes in the human or any other organism it should be a resource that we all share and so when we built this we built this in such a way that we could share it with everybody and then the the last thing of course if you've done molecular biology you know that when you make molecules sometimes you get a mixture and a mixture is useless if you're trying to do experiments where you know what you're testing and so one of the things we wanted to make sure we did was that we individually isolated each unique clone so that when we sequenced it and used it we knew exactly what we were working with there was no doubt about what it was okay and that's the last thing i mentioned to you which is that we sequence verified everything we built that was key because we oftentimes what you get doesn't work okay so here here's the goal of what we were trying to build we called it flex to begin with for full length expression ready and it had a number of attributes to it right it had the goal was to get all genes in it we wanted to make it broadly available we wanted to use a flexible format we wanted them to be protein expression ready and we wanted them to be sequenced verified and of course we wanted this to be affordable so that people could use it and this is sort of a cartoon that we drew years and years ago about what this would look like sort of this idea of a lot of tubes that had barcodes on them each one representing a unique gene and each one addressable well the good news is that that that dream is now becoming a reality that this is what it looks like today what you're looking at here is a two million dollar freezer it's a very expensive freezer but it stores tubes in this format here this is what the tubes look like and on the bottom of these tubes here you have these bar these 2d barcodes and those 2d barcodes are unique for each gene so if we were to drop a rack of these tubes not that we ever drop racks of tubes but if we dropped a rack of tubes we could pick them up and put them in random order into a box and then the barcode reader would read all those barcodes and it would know exactly where every gene was because the barcodes are unique for every gene right and of course all of this is available at this website dnasu and i encourage you all to go to that website all you need is those five letters and that is a list of all the genes that we have in our collection right now we have over 330,000 unique plasmids in our collection so a very large collection of plasmids and they're all available to all of you they're available to everybody on on the planet we we ship them every we ship them every day in fact i think we have shipped over 350,000 samples worldwide now now they're not all human some of them are other organisms they're not all in gateway but these are all plasmids that we've made and or other people have made and given to us to share with them for uses in all kinds of experiments so what does this allow you to do if you have all these different clones for all these protein genes well imagine that you wanted to look at it do a study of a set of genes that are unique to a particular tissue maybe you're you're looking at neurological systems because you're studying brain tumors or you're looking at liver cells because you and you want to look at genes expressed in livers in in specifically in abatic cells you can go to the library that has the set of master clones you can take those master clones and mix them with this expression vector to make the expression clones the ones that have the gene in the unique vector that will make proteins in the setting that you want to study and let's say you put them into cells and do some kind of functional assay and ask where do these proteins localize or what do these proteins interact with so the idea is to study proteins in high throughput and the key is to have genes for those proteins in a format that allows you to move them and study them in that setting so i'll tell you a little bit about how we make these clones we still do that we're still trying to finish the human library we've got now almost 15 000 unique human genes cloned that's well on the way to getting to the the unique set that we're aiming for is around 18 000 so we're very close to getting the full the full set the process looks a little bit like this this is an overview i will admit that it's altered a little bit in recent years and i'll i'll tell you where those changes have been made but basically we start by identifying the genes of interest we design pcr primers that will capture just the open reading frame for that gene we then do pcr with those primers in in 96 well played so high throughput pcr to capture inserts that are unique to the gene we then capture them into the vector using a recombinational cloning system transform them into bacteria plate them pick them for culture and then sequence them to to make sure that they're correct now i will mention a couple of things that we do nowadays a little bit differently one thing that we're doing a little bit differently is that sometimes now instead of managing all of these unique clones as separate clones sometimes we will work in batches of pools of clones do all the processing in the batch and then individually pick them with a colony selector so we always colony select them as unique entities but sometimes you can do some of the processing in batch mode the other thing that we do is nowadays we can sequence them in batches as well using next-gen sequencing which wasn't available when we began this process so you can actually pool clones extract their dna do the sequencing as a batch and then and then use that to interpret the sequence of the clones now there's a trick there's a problem with that right and the problem with that is that when when when you when you do next-gen sequencing you can't tell which clone a particular sequence comes from right next-gen is just all the sequence that's in the tube and so you have to be clever about how you set this up first of all you have to make sure that when you mix clones together that they are nothing like each other because if you put two clones that are similar in sequence and you get a mutation you won't know which clone that came from that makes sense so if you have two genes that are almost identical and in one of those identical regions you see in alteration you won't know which it came from so whenever you mix these clones you have to do so using informatics approaches upfront that makes sure that they're not at all alike the second thing that you have to do is you have to realize that when you sequence them on batch you can tell what the overall sequence of the gene was but you can't confirm that that gene is in that in it's appropriate tube right and we need to know that the correct gene is in the correct tube so in addition to the the next-gen sequencing of the whole batch we also have to do at least one sequencing read for each gene uniquely from that tube so that we can confirm that we have the right gene in the right place because this comes back to that library thing in the end you're building a library where you can go and get a specific gene from a specific tube anytime you want it so we spend a lot of time thinking about that here's some of the automation that we use this is a robot it's we've transformed bacteria with DNA remember i told you we transformed the bacteria with the DNA we picked each of these different wells and we've plated them on these specialized plates and these are plates that we actually invented in our laboratory you now see them widely used in the field what they are is there these bio assay dishes they're shaped like this and they have columns and rows and each of these little areas here is a different clone a different gene and you can see i hope you can see the different bacterial colonies collecting there and of course this is then addressable by robots that can pick individual colonies so we used to use undergraduates to pick colonies and they were very well meaning but believe it or not human beings make a lot of errors when they have to spend a lot of time using toothpicks to pick colonies and put them in wells and and so our error rate was around 15 percent since then we now have robots to do this robots don't take coffee breaks robots don't forget where they were and robots can work for many many hours without getting tired so you see here's here's the robot and there's a little pin coming down here and that's going to pick the colony and hopefully i think you can see the little colonies on the auger there so um so we do a lot of the colony picking by this method all right so now you get all these clones right you've made this library of clones and you have them all in these tubes and you even done some dna sequencing how do you know that they're correct how are you going to make sure that the gene that you have in that in that well is correct and all the sequences are right or if they're not right how can you document that they're wrong well you could hire lots and lots of people to spend lots and lots of time reading the sequences and assembling the sequences for all these clones right or you could get clever and you could develop a software tool to do that and that's what we did we developed software that actually goes through and evaluates the clone sequence compares it to the correct sequence and lets us know where there are differences all right so i will tell you a few features of validating clone sequences first of all much harder than actually making the clones making the clones is relatively straightforward it's a lot of molecular biology steps you can do it it's not terrible but actually making sure that the sequences are correct is takes a lot of time the first thing is of course you have to you have to pick individual colonies i mentioned that before sequencing has no value if you're sequencing a mixture of things because as we said earlier if there's a mixture you'll never know which one is correct and which one's wrong right um and so uh but of course when you're working with individual clones you have a lot more work to do because you have lots more of those and then of course you need what's called a limb system are you guys familiar with the term limb system l i m laboratory information management system what that does is it's it's an automated software application that's going to manage all of the steps in your laboratory it's going to track each gene each clone from well to well as it moves through all the various robotic steps of course this this implies that all of your steps are going to be done on on 96 well dishes with barcodes on them so that you're always tracking using informatics where things are located um so so this is the the flow process that we used for sequence validating our clones it began it begins by loading up the plate information that's the information of your plate that has all the clones on it and what genes are supposed to be in there we then read end reads we do do you know what an end read is it's just the very end of the gene the nice thing about an end read is that the primer the sequencing primer that you use can be in the plasmid vector so it's the same primer for every gene in your collection because it doesn't begin in the gene it begins outside the gene in the neighboring dna sequence and it and the nice thing about that is it tells you that you have the right gene we then have to assemble all the different reads and this is typically for for sequencing where you had to do multiple reads per gene um we then compare the sequences to make sure that they are correct so we we um we look for what are called discrepancies and i'll come back to what i mean by discrepancies in a moment we then make sure that they're not just common polymorphisms and then we rank the isolates and then we have this decision tool here which basically goes and asks if you have a discrepancy is that discrepancy likely to be a mutation and if it is a mutation do i reject this clone or not because at the end we have to decide do we keep it or do we fail the clone um and then in addition to all of that we have to make sure that we've got the complete sequence so when we assemble the sequences we compare the sequence of the gene to the expected sequence and we ask do we have it all have we sequenced everything or do we need to go back and get more sequence okay i won't go into too long so let me tell you about the what i what i mean by the discrepancy finder so what are the what are the reasons that a clone sequence doesn't match the correct or the expected sequence turns out that there's more than one reason why that could happen of course so um obviously one source the one that we're most worried about is that the clone underwent mutation that during the process of amplifying the dna or capturing it or making the primers mute errors were introduced and of course if we have too many errors in a clone it's no longer useful right because now we're not looking at biology we're looking at mutants but a much more common reason why the clone sequence doesn't match is sequencing error it turns out the actual process of doing the sequencing in itself has errors and so therefore we may get a sequence that's incorrect but it's not the clone's problem it's the sequencing problem it turns out that sequencing error is going to occur as often as one in a hundred bases so if it's happening one hundred bases and your clone is a thousand bases long there's a good chance you're going to have errors in there so how do you fix that you you go back and you read it again and sometimes you have to get multiple reads to make sure that you have the right clone of course another reason why your clone might not match the natural the actual the clone sequence that you have in your database is it could be a natural polymorphism right if we were to sequence the genes of everybody in this room i guarantee you will find differences all over the place and those differences don't reflect that your mutants it just reflects the natural variation that occurs within a population we all have sequence variants in our in our sequence effect i just had my genome sequenced this fall as part of a project at asu and sure enough i found all kinds of sequence variation and i have no idea what it means so this is how we track sequences this is the forward read the reverse read of a clone and this is the assembled sequence and then we can look at its alignment and we can look at all the discrepancies that we find if you click on the alignment button then you get something that looks like this which is showing the alignment of the sequence with the expected sequence and obviously these colors indicate where we see discrepancies right here for example there are some discrepancies now you'll notice that these discrepancies are occurring very close to the end of the gene and that that could be a sign that they're sequencing errors because usually at the beginning and end of reads you get some some some mistakes that come up and then and then here's what we this is if you click on the discrepancy button you'll get this report and it will tell you every time there's a difference between our sequence and the expected sequence what that difference is what kind of difference it is and then what implication it has on the protein in this case there's a frame shift deletion that means that we're we've gone out of sync from the the triplet codons that you expect in DNA when you go out of sync you have the increased opportunity to run into a stop codon and cause an aberrant truncation of the protein and that's what happened in this case right obviously mutations that cause profound changes like that are much more deleterious in our clones than than simple substitution mutations this isolate ranker is just a tool that basically considers two issues first as I indicated a moment ago what are the consequences of the mutation if the if the consequences are going to profoundly affect the protein then that would make an isolate much less likely to be interesting and then we need to know is the quality of the sequence in the area good quality sequence because if the sequence quality is bad then I'm much less likely to believe the mutation if the sequence quality is bad I'm gonna there's a very good chance that the mutation is due to bad sequencing and not could not not the actual mutation so in the end you'll get a chart that looks like this and these various color codes indicate to us which clones are better than which other ones and so we can pick the best clone for a gene and then this this last tool I'll mention here is the gap mapper and I remember I told you ideally we have sequence for the entire gene if we don't have sequence for the entire gene we need to go back and get an additional read to fill in the gap otherwise we can't say with certainty that we have a good clone and so this gap mapper takes all the different reads from a particular gene it assembles them by overlapping them and then looks for any areas using essentially Bayesian mathematics it looks for areas where there are our missing areas and then we trim back the ends a little bit and then suggest that we have to go back and clone that do another sequence read for that missing area so that we can get a better clone and then this is what that this is what it looks like in our software and so you can see it basically predicts that there's a gap here that needs to be filled in and then you can see these other these colors here are indicating that the quality of sequence in that area is not great this is our decision tool this is how do we decide whether or not to keep a clone our goal is is always to to either eliminate clones or keep them obviously and so here we set the criteria that will make a pass or a fail and we allow this is if the sequence is good if this sequence is not so good then we can also ignore if there are polymorphisms and so as I say as we run through our clone list at any given time we're always trying to move clones either into the reject category or the acceptable category all right so that let me stop there and see if there are any questions on on the cloning process of making clones for collections are there any questions I can answer yeah the question was what's the mechanism of sequencing error uh that depends a little bit on what platform that you're using to do your sequencing a lot of what we do is using traditional single clone sequencing you know set what they call Sanger sequencing and in that case it can vary what the causes are oftentimes the Sanger sequencing involves different colors for different bases and sometimes you get a region where you get a little bit more red than you should and so you you can't really tell is it an a or is it a t I'm not sure sometimes it's just that you don't get adequate coverage so you don't read as many times past that base so there's a lot the the method the the chemistry themselves have errors lately we're using alumina which is next-gen sequencing it also has an error frequency but typically with alumina sequencing you get around that by doing so many reads you cover it 30 times that you're less likely to to have an error but there's it's the process itself is error prone other questions oh you mean the database that has the gene sequences in it no that is a very good point the gene sequences that are in you know um uh the databases at at NCBI in in the U.S. in the unipro gene sequences all that's up they have errors in them and um and that is and so if we disagree with that it's not always clear that it's us that's at fault typically in a lot of cases in our in our circumstance well there's two let me say there's two ways that we've dealt with that the first is oftentimes we start making our genes from existing clones where we actually know their sequence in that case we know what we're trying to achieve and we try to match that sequence in the in the case you're referring to where we're trying to match a sequence in a database we actually did develop a polymorphism tool and i i had slides on that and i took them out because it was going to get too long um but basically what the polymorphism tool does is it goes out to um all the existing databases where there have been gene sequences uploaded for for all these human genes collects all the sequences from those genes and lines them up and looks at the frequency at any given position and asks are there existing examples of other clones that have the sequence i have and if there are examples of that sequence then i'm more likely to accept the sequence um it's not perfect but it does help okay say that again well you know once we've once we've done that sequence validation we think most people don't have to do it again i mean it's certainly reasonable if you want to be extra careful if it's a very special clone for a research project of yours but for high throughput of materials we've done a pretty good job of sequencing these so i don't think you have to repeat that and i should point out that one of the qualities of the the gateway process which is to transfer the insert from one master clone to an expression clone that's a conservative molecular process so once you know that this sequence is correct then you know that this sequence is correct so you don't have to re-sequence them both yes in fact our all of our clones if you go to our website the dna's website we list the actual sequence of that clone so we've done the sequence and we've loaded that up on the database i think there was one over there yeah so in the in the clone collection that we distribute we for the most part not every case but for the most part we try to limit it to no more than one amino acid difference so if there's more than two amino acids difference then we don't load it i will say that at the very minimum we always load the actual sequence so you can always look at the actual sequence and ask is this agree enough with what i want to do to use it but most of the time it's either 100 accurate or we allow one amino acid change okay well if the genome hasn't been sequenced it's very hard to make the clones right in fact we learned that the hard way years ago we did a clone collection for an organism called francicella tularensis which causes this illness called tularemia and we were working with collaborators and those collaborators were intimately involved in the genome sequence of that organism and they said we'll get you an early copy of the genome so they gave us an early copy of the genome and we used that to to design our clone collection and we built all those clones and it was a disaster we our success rate which is usually in the 90 plus percent range was like 50 percent it was horrible and and then about a year later they came out with the official sequence of the organism and it was very different from the sequence that they gave us originally there was a lot of changes in the sequence and so when we rebuilt the collection using the correct sequence now we had like a 96 accuracy so you you really have to have a good quality genome sequence to do this kind of work so today you have learned about fundamentals of proteomics i'm sure you are mesmerized but also you can achieve using proteomic technologies you're also provided a glimpse of different protein expression based clone repositories you also studied how to do the clone production especially in high throughput manner using robotic plating and high throughput bacterial plating finally you learned how to validate these clone sequences which is one of the most important step in the entire high throughput gene cloning pipeline we'll continue more discussions in the next lecture thank you