 recording so welcome everyone to lecture number eight. Lecture number eight will be about primer design and will involve me talking a lot about how to design primers and the different types of primers that you can design. I haven't updated it from last year you can see that it's still 12 12 2019 that also gives you a bit of an idea that we are a little bit behind in the lectures so we have to figure that out. I also got back the response from the Prüfungsbüro and our exam date has been approved so the exam date that we picked is good so we can use that and there will be another one but I will put the exam date and stuff on Moodle as well. Okay so let's start so I split up the overview for today in two different sections so the first section will be polymerase chain reaction because that's what you use primers for at least primarily so we will be talking about what is a good primer when is a primer a primer yeah we weren't ready with the previous topic I know I know I know and we will get back to that after we finish this lecture because I think that this lecture should be relatively short and then afterwards we will do the correlated trait locus mapping I think in the go live notification it also says that we will do first the primers and after that we will do the correlated trait locus mapping so I haven't added it to this presentation so at the end of this I have to start swapping which should be okay but yeah very good so good so polymerase chain reaction and the thing that I like the most is the advanced primer section because that always gives some good exam questions in the end because I really like Gasmers and I also like multiplex PCR so those are like more or less advanced PCR topics and then we will talk a little bit I think about databases and I dropped something from this let me see yeah I think I dropped a couple of slides there but we will kind of go through how to design primers and I already see that I made a spelling error on this on this slide I'm not gonna tell you what it is alright so let's start off with polymerase chain reaction no we're not going to start off because we're going to start off with the assignments from last week and fortunately someone was smart enough to mail me that the input data was not on Moodle so I put the input data on Moodle for the assignments like two days ago after I got a mail but I was a little bit disappointed that no one actually did the assignments earlier so we'll have to see I don't like making assignments obligatory but be aware that the assignments are part of the lecture material so there can be questions about the assignments or doing things very similar to the assignments so I would advise everyone to definitely do the assignments yourself of course when you're stuck at something you can always mail me alright so first off answers to previous assignments so let me switch to the assignments so the assignments were about phenotypes and QTL mapping so in the end what I wanted you to do is program a very basic QTL mapping using a for loop but there's going to be some additional steps or some steps before we can do that so first is data QC so let me switch to my notepad plus plus window that's here so this is a list actually of all of the different emotes but that's not what we want but I'm going to change my mode just because I can and then we are going to close this so answers to the assignments so the first assignment was to load in the two files that were on Moodle so it was loading in the genotypes and phenotypes file so I'm gonna open up my R window and I'm just going to copy paste this to load in the two the two files so if you open up the files in a text editor you would see that they are separated by tab and that there is a header in the file so the header in the file means that the first line of the file doesn't contain any data but it just contains like the column headers good so let me switch to R for you guys and I'm just gonna copy paste it in and this shows that indeed this is the old data directory alright so that's bad so if you update that so it's D D drive I think this should work no so let me see where I actually put the where I put actually the input files because I moved them around after I uploaded them so they should be in documents then we have bioinformatics data alright so when we go back to the note blood plus plus window then I'm just copy pasting it from where I am and then I am just going to say instead of backslashes we need to use forward slashes that's just because paths in windows are not correctly formatted you could use a double backslash like I did before but then this should be loading in our data so let's go to the R window and then we can indeed see that when I fix the path we can see that it's going the right way so we're loading in the genotypes and the phenotypes and the thing that I normally do right so when I load in some data in R I generally tend to look at the data so R has this head function to look at the top of the file and then you see that the genotypes file looks like this so you see that there are different individuals and then you have different markers and these are markers which are at a certain position in the genome but we won't be dealing with where they are located or anything like that so you have individuals in the rows and we have the markers in the columns if I look at the phenotypes file then you can see that this is the phenotypes file that I think we already used before but again we have individuals in the rows and then we have the different phenotype measurements in the columns so that's the structure and that's important to know because if because when we want to match these two files together in the end we need to know where the genotypes are located and where the phenotypes are located all right good so the first step was is to do some data QC because we don't want to do and we don't want to have too many missing data for example at certain marker positions so what I'm doing here is I'm defining a new variable called missing data since I haven't looked at any of the markers yet I'm putting this to nil so it's just defining a new variable which contains nothing and then I just go through each of the columns of the genotype right because we just figured out that the markers are in the columns so we go from one to the number of columns that are in the genotypes and then the thing which I'm going to do is I'm going to here take this this column out of the genotypes matrix right I am going to ask if they are NA so missing and I'm just gonna sum those up right because is NA will give me a true false factor so it will be 0 if it is not NA and it will be 1 if it is NA so the number of missing values is just the sum of the true false and then I'm going to divide that by the length of the number of genotypes that we have and then multiply it by 100 because then we have a percentage of missing data for each of the markers after that I do a plot and here I just plot this vector so here is one of these things which you see a lot in R is that if you do a for loop and then you define a variable first and then you just concatenate to this variable which initially is empty the value for the first marker and then you store it back in the same variable all right so let's plot and let's do the missing data computation so I'm just going to copy this and then we're going to go to the R window let me show you guys the R window as well and then here you see the plot that we made so the plot doesn't look that good but it's good enough to do some by eye looking at what is going to what is happening so we see that there are around 120 markers because we have 120 values we see that for most of the markers there is no missing data so 0% missing data and the worst marker in the data set has around like slightly above 3% missing data and so you can see that it goes into these lines so you see the horizontal lines in the data and that is of course here there are no individuals missing here there's one individual missing two individuals missing three individuals missing and this is probably like five or six individuals which have not been properly genotyped at these points but normally if you do quality control you would throw out any marker which is more than 5% missing data because those are generally markers that did not work well enough during the during the genotyping phase all right so that's the first quality control measurement that we do we remove markers with more than 5% missing data in this case we don't have any markers with more than 5% missing data so we don't have to remove anything all right and then we do the do the same thing why am I having the same code here oh I pressed the duplicate button so this is just the same thing again no no no here we're going to go into the other direction so here we're going to see and because instead of going through the columns we now go through the rows so now we are going to ask how many day or how many data points are missing per individual so we're just going to do the exact same thing but now instead of going column by column we're now just going to go row by row so let me show you that as well and then of course we make a plot and we have to have a header which kind of describes what we're looking at so let me show you guys the R window we're going to just throw it back in and then it looks like this so here we see that there are two individuals which have some data quality issues right because now we can see that most of the individuals are fully genotyped but there are two individuals which have a relatively high amount of missing data and this might be an issue right this might be an issue that happened when we send in the DNA for genotyping it could be that the DNA was a poor quality but that these individuals generally you won't want to remove from the data but I don't think actually that the assignment said that you should remove them because my code also doesn't remove them but normally again 5% missing data for an individual is acceptable but anything above 5% you would want to remove this individual from the data set but we're just going to continue because the assignment didn't ask you to remove it I think it just asked you to compare it alright and then the next step we wanted to do some basic effect mapping so if you look at the genotypes right so let's go to the genotypes and let's just plot a little piece of this so let's go and show the first 10 rows first 10 columns then it looks like this so you see that there's a missing value here but what you what you see when you look at the matrix is that that there are two genotype classes you're either a one or you're two so the one means that the genotype came from the mother and the two means that the genotype came from the father yes so this individual one at the first marker has the marker inherited from the mother an individual three has the marker inherited from the father and because this is a recombinant in red line there's no heterozygous group so because these individuals are stabilized so it is relatively easy to do an effect scan because have we only have two groups so using these two groups we can calculate the mean of these individuals for each of the two groups and if we discover a QTL then that means that there probably is a big difference at at between the one mean and the other mean so when we go back to notepad plus plus then we can I can show you how I did that so the way that I did this is I define two variables upfront so I have means one which is the mean of the individuals carrying the genotype one and I have means two in which I'm going to store the mean of the individuals carrying the genotype two so then I'm going to go through each of the markers right which are in the column so I'm saying four acts in one to the number of columns of genotypes well what do I need to do well I need to select the individuals which are one so I can use genotype acts so the genotypes in column acts which ones are one and I do the same thing for two and I just store this into new variables that I define and these variables can be overwritten because every marker has different individuals being one and at another marker another set of individuals will carry one all right then we have means one and means two so we just use the mean function and because there was some missing data we saw that there was missing data we have to add this na.remove equals true when we calculate the mean otherwise if there is a single missing value the mean will be na and to prevent that had to kind of remove the missing data when you calculate the mean you can add the parameter na equals true so what we're going to do then is calculate the mean of the individuals which had the genotype one and then we are going to take the mean of the individuals which have in which have the genotype two and I'm just going to do the first phenotype so I'm just going to say comma one and this means take the first phenotype from the phenotype data set so that one is called hydroxypropyl so x3 hydroxypropyl that's the phenotype that we're mapping yeah of course we could easily go through all of the genot for through all of the phenotypes as well by just adding another for loop where we say 4x in one to the number of columns of the phenotypes but we're not going to do that we're just going to do a single scan single phenotype so then what we are going to do is then in the end is we we again do the same thing so we calculate the mean we add it to the means so to this variable we defined and then we're just going to store it back in so if we then run this into our so in the end of course we want to make a plot so we can plot the means one minus the means two and I want to plot this as a line and that's just what I'm saying here so let me show you what happens when we do that so let's go back to the R window so when we do this it looks kind of like this right so we can see here that at kind of the first marker more or less the difference between mean one and mean two is minus two thousand that means that mean two was higher than mean one and we see something interesting because we see two peaks here have where the difference is relatively big which is like plus 5,000 plus 6,000 or perhaps even more plus 5,500 but we see that that there are two regions in the genome where there's a massive difference between individuals carrying a one genotype and individuals carrying a genotype two and so they have inherited different pieces of DNA and probably these pieces of DNA are determining or are controlling the difference in this phenotype that we're looking at so this x3 propyl so this is just a basic effect scan of course we could have plotted this a little bit different as well right because we could have say plot means one minus means two well what we would say is then plot means one right so we can only look we cannot look at the the mean of the first group and we could of course just look at the mean of the second group and then you see that indeed there's something weird going on because these individuals here at the second peak in the second group their average is almost zero right normally the average is around like 5,000 but here these individuals only have like a thousand units of this of this phenotype we can we can plot them both so we can plot means one and then we can add means two to it by using the points function and then of course we want to give it a color for example red and then you see here that indeed that every time that the one group is low the other group is high and we see the same thing here but we can we can kind of look at the data but have from this we learn by just plotting the mean one minus the mean two we learn that there are probably two regions in the genome where there is a gene which is controlling our phenotype expression so that's good so we learned something right we we learned that when we do a scan like this that there are probably two two genes involved in the regulation of this phenotype alright so then the next step is of course is because we're now only looking at differences in the mean but are these differences significant right we want to know if there's a significant difference between the one group compared to the other group so let me switch back to notepad and we can do that more or less very similar in the same way but now instead of defining the two means we define a vector which will hold our p values so initially we haven't calculated any p values and then we just do the same thing so we go through all of the columns of the genotypes again we select the individuals which are one we select the individuals which are two and now we're just going to do a t-test between these two groups that we have defined and so we're going to do a t-test of the individuals that are one on the first phenotype versus the individuals which have the genotype two on the first phenotype and then I'm directly going to select the p value from the t-test and then have because this is QTL mapping we want to show these these values as minus log 10 p values and and now because we have the code I can also show you why we do the minus log 10 of the p value so let's just get the code and let's just show you guys the R window yes so when I do this we now see that at some points in the genome there is some evidence but here we see a massive peak and we see an even bigger peak here so this means that because this is a minus log 10 p value of 15 and this means that the chance of this happening at random is one times 10 to the minus 15 right so that there is and there's a high likelihood so the p value here is one times 10 to the minus 15 that there is a real difference between the two genotype groups and here the likelihood is less but it's still like one times 10 to the minus 7 and so that means that there's only a chance of one in a million or one in 10 million at this point probably one in 10 million that this is not a true difference in mean and that's how you do QTL mapping so that's more or less everything that you need to know about QTL mapping and in this case we use t-test right but in the case that you have three groups of course you can use a standard t-test but then you have to use another statistical test like a using a linear model or using another like non parametric test of course like we're skipping over a lot a lot of additional quality controls that that we could have done and that we should have done and like look if every no of if every phenotype is normally distributed and because we can only do a t-test when phenotypes are normally distributed but in this case and it's just for you guys to practice a little bit writing for loops selecting individuals and then from the other matrix selecting these individuals the phenotype for these individuals and then calculating a mean and in the other case you are doing a t-test I hope that's clear I hope that everyone was able to do the assignments and that everyone was able to get some results so if you have any questions then of course let me know and of course if if you're working on the assignments and you get stuck halfway through then definitely just drop me an email and then I can help you solve the questions that you have all right so if that is clear then with this we are going to switch back to the PowerPoint all right let me switch to the PowerPoint here as well all right so then we're going to talk a little bit about the history of PCR and I want to keep this a little bit short normally I talk like half an hour about caring mules of whom you see a photo here because he's one of my favorite Nobel Prize winners so PCR was invented in 1983 so it's my year of birth and PCR was invented by Kerry Bank Bank's mules like I told you and Kerry Bank's mules is a very interesting person because he's a scientist he's he's considered the godfather of modern biology because of his invention of this PCR method and the PCR method allows you to amplify parts of the genome and in such quantities that you can do like that you can do testing right you can you can do corona testing using PCR you can do testing for certain parts of the genome so it is one of the most valuable techniques that we currently have in molecular biology and every molecular biology lab in the world is using this technique it is used in cloning it's used in phylogenetics it's used in gene analysis it's used in genetic fingerprinting by the by the police and you would think that because this is such an influential technique that that here Kerry mules actually made a lot of money out of it but he did not really because at the moment or had he claims and that this is this is a true story he claims that this technology this new idea for doing PCR was given to him by aliens and that's interesting and the aliens when when asked how did they look like well they looked a little bit like colorful ferrets with all kinds of colors and they told him that this would be kind of the best technology to do or to look into genetics and and he did this and figured it out and he it's it's one of the most valuable techniques in the world Kerry mules has only written three papers in his whole life so if he has written two papers on PCR and one paper based on time travel so he won a Nobel Prize for his PCR papers not a lot of people are excited about this time travel paper but if you are interested in reading a time travel paper or a paper about time travel from one of the well most leading molecular biologists because kind of that's what he is bias invention of this methodology then contact me and I can give you a copy of the paper if you want to read the paper and you go to science I think it was published in then you have to pay like 20 bucks so if you want to have a copy just could just contact me and I can give you a copy he got a Nobel Prize in chemistry in 1993 so he did get some acknowledgement for or he did get some financial acknowledgement for of course developing this method but the company that he was working for literally made billions and billions of dollars from his invention of PCR he got a bonus in 1983 of $10,000 so that was his kind of company bonus for inventing this technology and the company itself made billions of dollars off his invention he is also the author of a very interesting book which is called dancing naked in the mind field I think you can find a book on Google books but he's a very very interesting person like he has its own Wikipedia page and there's so much written about him and he's really one of my well kind of heroes I wouldn't say heroes but hey he's one of these interesting figures in molecular biology who kind of came out of nowhere no prior publications published this PCR technology became more or less world famous through that ten years later getting a Nobel Prize which is also very uncommon normally it takes around 20 to 40 years before your your publication or your invention is kind of vetted in the scientific community so to get a Nobel Prize only 10 years after inventing a methodology is really really good you don't you don't see that very often all right so enough about the inventor of PCR there's there's many many stories that you can tell about him and if you're really interested in we can talk afterwards about Kerry Bank's news and you see that a lot actually that that people after winning a Nobel Prize kind of go go a little bit crazy but very very interesting all right so PCR what do you need so if you want to do a PCR experiment you need some template DNA so template DNA is the DNA that you want to amplify you need water a lot of water or not so much a lot but compared to the other ingredients you need a lot of water you need to be able to do very precise thermal cycling so and this is not really a massive requirement but it is it nowadays if you want to do very small volume PCR's then you have to be very very precise in in terminals terminal cycling that means that you have to have a machine which can be 57 degrees exactly and then very quickly go from 57 degrees to 73 degrees Celsius or go to 90 degrees Celsius and so you have to be able to very quickly change the temperature you need a heat stable polymerase nowadays almost everyone uses duck or have from the term is aquaticus bacteria but there are many many different polymerase is out there so the polymerase is a is a protein which amplifies DNA you need nucleotides so nucleotides are just the standard nucleotides that you could buy so those are a CT and G for DNA and you need oligo oligo nucleotides and these oligo nucleotides are also called primers so and that's what we're going to learn today how to design these primers for this experiment and if you're like me then you actually need an unlucky student to do the PCR experiment for you I'm a bioinformatician so I don't work in a lab so that's why we have master and bachelor and PhD students running around doing these PCR experiments in the lab so my job as a bioinformatician when it comes to PCR is only this little part is designing the oligo nucleotides based on the template DNA all right so in PCR we have three steps and these three steps occur at different temperatures right so we need very precise thermal cycling and so when we start off we have the template DNA and we have this template DNA in solution so it's just a double-stranded DNA which you for example extract it from a nucleus but what happens when you start heating up the water with this template DNA in there then at 90 degrees Celsius the two strands of DNA will kind of go and loosen up right so they will be single-stranded and by raising the temperature you raise the kind of energy inside of the water the water plus the DNA mixture and by heating it up to around 90 degrees Celsius you will have single-stranded DNA and this is because the polymerase needs to this is because we need to do stuff with the DNA so when the DNA is double-stranded is it's more or less inaccessible things can't really bind to the DNA but by by by making the temperature 90 degrees DNA becomes single-stranded all right so the next step is then to anneal your primers to the DNA and this happens at around 54 degrees Celsius so how what you do is you you you have your mixture of your water your nucleotides your primers and your template DNA you heat it up to 90 degrees Celsius you keep it at 90 degrees Celsius for around 30 seconds and then 30 seconds after 30 seconds you quickly reduce the temperature to 54 degrees Celsius and what now what starts happening is that the DNA starts rebinding but because primers are much much smaller than the template DNA the kind of the primers will bind very quickly before the template DNA can more or less close again so here we see little primers and so these primers will just hybridize to the DNA at their complementary sequence and of course they can only hybridize to complementary sequences then the next step is the elongation step in PCR so in the next step we raise the temperature to around 72 degrees Celsius and at 72 degrees Celsius what will happen is the polymer polymerase will bind to double-stranded DNA and it will start extending the double-stranded DNA in the three prime direction so here we see that there's a primer bound here at the three prime end of the first strand of DNA and the polymerase will just start extending the DNA until the time is up so a normal polymerase can copy around a thousand base pairs every minute slightly more it might be like a thousand five hundred might be two thousand depending on the polymerase but it will just start copying the DNA and of course this will use up the the nucleotides that we put into the mixture so what happens is that you get an exponential amplification of the target DNA more or less so here we have the template DNA where we have our piece of interest right so we designed primers to amplify this piece so we have two primers one forward and one reverse which are an exact match to the kind of flanking regions of the DNA that we want to that we want to amplify and then what starts happening in the first cycle how we create two copies of this DNA and then in the second cycle each one of these copies is multiplied again so in the first round or in the first cycle or after the first cycle you have two copies of the DNA because you start with one and this is amplified so you have the original one and the new one and then in the second round these two copies of the DNA will be copied again so we will have four copies going to eight copies to 16 copies and if you do 35 cycles then you end up with around 34 billion copies of your target DNA so this this allows you to get large large quantities of DNA which are matching or which are between the primer sequence that you selected of course this is more or less a simplification what really happens is slightly different what really happens is that of course in the first cycle you don't really get any products that you are interested in because when we look at it in detail we see here that we have the DNA of interest so we go five prime three prime so we open up the DNA we bind the primer and in the first round it will be amplified but it will not be it will not stop at the other primer it will just continue amplifying and it's the same for the other one have so you have a forward prior you have a reverse primer binding here so going back and you have a forward primer binding here going forward but in the end hey you will have two long pieces of DNA which are from your original template from your original template and you have two pieces of DNA which are of more or less unknown length but they will start at the position where the primer is so you will have one piece of DNA which is relatively long going to the left side and you have another piece of DNA which goes to the right side in the second cycle we also don't get any product yet because have now when we open up these two pieces of DNA we will have like a long and we will have the template DNA that will always still be there we get an amplification very similar to the previous amplification so a primer a reverse primer will bind the polymerase will extend the the are will extend the the primer and here what what happens is that we have this piece of template right so we have the the piece that was here and now there will be an amplification of the forward primer so this is where we more or less get a single stranded piece of DNA which is kind of matching the length that we are interested in and this the same thing happens in the in the other branch of the tree so only in the third cycle will we have our first kind of real amplification and because only when you start amplifying this little piece of DNA so this will open up and this will then be amplified but because it already ended at the correct position this piece of DNA is now of the length that we expect to be so I hope this is clear so it takes two cycles in PCR to warm up to kind of get the template down right because we first cut the template at the at the five prime and and then we cut the template at we could first cut the template at the three prime and then at the five prime and and then in the third cycle we get our first piece of DNA which is of the correct length and is double stranded and of course these will be and these will open up again and then in the in the fourth cycle you will actually directly jump to eight copies so it's not like we saw before that you get two four eight sixteen no because of the way that PCR works you will go from zero to zero to having two copies which are valid and then directly go to eight copies which are valid because hey you have your your your copies from the round before which are still in the in the PCR reaction all right I hope that is clear so if someone asks you how to calculate the yield from from PCR then you should be able to do that hey this it is of course two four eight sixteen and so you end up with around 34 billion copy after 35 cycles but you have to remember that there are more or less two cycles needed to kind of warm up so only after 37 cycles are you at 34 billion copies so the formula is not two to the power of x but it's actually two to the power of x plus two all right so let's talk about what is a good primer so a good primer is unique and I will be saying this a lot because it is the thing that is most difficult to achieve and the first sentence is lack of a secondary priming site and so the primer is not allowed to bind to multiple locations in the genome it is only allowed to bind to a single location in the genome otherwise your primer is not specific enough so the primer needs to have a melting temperature so the melting temperatures their temperature at which the DNA opens because primers when you buy them you get them in double-stranded format or you get them double-stranded so a primer needs to go from double-stranded to single-stranded much earlier than the genomic DNA and a primer needs to kind of open up so it needs to be and needs to have like a go from double-stranded to single-stranded somewhere between 52 and 65 degrees Celsius and this is of course because it has to attach to the DNA at the low temperature so at the low temperature which we saw here so the 54 degree mark it has to be single-stranded the primer otherwise it won't bind of course it also had the primer is also at the 90 degrees so hey it will also have the same kind of temperature drop but in general if you want to have a very successful PCR you want to have your melting temperature of the primer to be between 52 and 65 degrees Celsius the primer has to have any absence of dimerization capability we will get back to that but it means that a primer is not allowed to bind to itself or bind to any of the other primers in the in in the in the reaction right that we could do a reaction with five primers or ten primers and these are not allowed to kind of stick together so have when you design a forward primer and a reverse primer you always have to make sure that sequences of them are not in such a way that they can bind together you always want to make sure that there cannot be any significant hairpin formation and a hairpin formation is when a primer folds back on itself yeah because a normal revolution of DNA if you look at a DNA helix that's around four base pairs so every four base pairs the DNA revolves in in a helical structure and but that means that when you have like a primer which is 20 base pairs long it can actually fold back on itself so you want to make sure that inside of the primer there is not a sequence which is complementary to a sequence within the primer itself and most importantly is you want to have low specific binding at the three prime end and so that means that you want to have a lower GC content at the three prime end of your primer and this is because polymerases need to have or for a polymerase to work properly it cannot the primer cannot be bound too tightly to the DNA at the three prime end if we look here and then at the three prime end is where the extension starts right so at the three prime end and this three prime end needs to be a little bit loose so it if it sticks too tightly to the template DNA then the polymerase will bind but it's not able to kind of start amplifying the DNA so you want to have a lower lower binding so you want to have more A's and T's than GC's at this this three prime end of the primer alright so let's go through all of these in more detail so lack of a primary secondary priming side means that the primer needs to be unique and the the rule is there shall only be one and only one target site in the template DNA where the primer binds and so the primer sequence shall be a unique in the template and then this uniqueness of course does not only hold for your template DNA that you are amplifying right if we are working in the lab here we work on mice we work on chicken we work on goats and on cows and so these are all possible sources of contamination and of course when we do a PCR amplification of mouse DNA then of course we want to make sure that our primer is not able to bind to human because when I'm doing the pipiting right so when I'm making the master mix I don't want any of my DNA to be in the cup but you can't really prevent like little flakes of your skin falling into the cups that you are using so if you are if you are amplifying mouse DNA you also want to make sure that the primer that you are using is unique to mouse and does not occur in a human DNA strand because of the fact that there's always little pieces of skin floating around which can end up in your reaction mixture and by making sure that your primer cannot bind to humans you kind of exclude humans as a possible source of contamination and so in our lab we always check our primers against human against mouse and against cattle because those are the three main species that we are working with and we want to exclude that when we are doing a reaction on mice had that some contamination from humans or from cattle comes in and this is very easy to do because you can just when you have designed your primer you can just do a blast search against the corresponding genomes so in many cases if you're working on plants then you design a primer which is unique to your plant you take the primer and you just blast it against the human database to make sure that your primer cannot bind to humans all right then the name then the length of the primer is very important in this sense because the length of the primer has an effect on its uniqueness the longer the primer the more chance that it is unique had generally we end up having primers which are around 20 to 25 base pairs and this is because this is the amount of base pairs that you need for a primer to be unique in the target DNA but also not being able to bind to DNA for example from humans so the longer the primer the more chance that it is a unique primer that there's only one binding site however with extending the primer you have the annealing temperature go up and that's kind of the interplay here because normally you would say well we can make a primer which is 80 base pairs long and that will guarantee the uniqueness but of course then you're breaking the TM that the melting temperature part because the melting temperature part needs to be between 52 and like 60 degrees Celsius and so we can't just make primers infinitely long to make them unique so it's always kind of a balance between the two so generally speaking the length of the primer has to be at least 15 base pairs to be unique and we in our lab usually pick primers which are between 17 and 28 base pairs long and that is because when you are at 28 base pairs and then you're really pushing the the annealing and the melting temperature towards what what is maximum or what is what is maximally allowed for the for the reaction that you're doing all right so the base composition of the primer itself so if you look at the base pairs like the ACTs and Gs it affects the hybridization specificity and the annealing temperature an AT pair only has two hydrogen bridges so they have two bindings to each other while a CG pair has three bindings so in average or as a rule of thumb you can say that when you have an AT binding in DNA then a CG binding compared to a CG binding the CG binding binds 50% more tightly because there's just a third hydrogen bridge that keeps them together and so you have two hydrogen bridges in an AT pair and in a GC pair you have three hydrogen bridges so hey you want to kind of balance that because hey if your primer just contains a large amounts of Cs and Gs it will bind very strongly to the template DNA and of course this will also up the the melting temperature because you need more energy to have a CG rich primer go from being double-stranded to being single-stranded and so usually the average CG content should be around 50 to 60 percent and head that will give us the right melting annealing temperature however melting temperature and hybridization can be affected by other factors but head the CG content is not fixed if you are dealing with for example bacteria some bacteria have very high CG content so the template DNA if the template DNA is like 80% CG then of course you have to design a primer which is complementary to that so your primer ends up having an 80% CG content as well but the base composition it's something that you can kind of play with because you can move your primer from left to right but it is allowed to change but on average especially when you're dealing with like plants or humans or mice or cattle you want to have it at around 50 to 60 percent which is very similar to the normal CG content of a human or a mouse or a plant genome but of course this just changes all right so the melting temperature is the temperature at which half of the DNA strands are single stranded and half of them are double stranded right so if you heat up DNA in water then at 20 degrees Celsius all of the DNA will be double stranded but when you start raising the temperature then the DNA starts more or less disassociating from each other and at a certain point half of the DNA is double stranded half of the DNA is single stranded and that is called the melting temperature so the melting temperature is characteristic of the DNA composition like I said higher CG content has a higher melting temperature because of more hydrogen bonds needing to be broken so you can calculate the TM relatively easy when you have very short pieces of DNA and so when your DNA is the primer that you are designing is is less than 13 base pair you can use this formula to calculate the melting temperature so it is the number of A's plus the number of T's times 2 plus the number of G's plus the number of C's are yeah so the number of G's plus the number of C's times 4 so here it's not 2 to 3 but this gets a times 4 and this is the temperature at which your DNA binds and so if we have a piece of DNA which is AA TT so 4 A's and T's and it has 4 C's and G's and then it is 4 times 2 is 8 4 times 4 is 16 so it's 8 plus 16 so head that's 24 degrees Celsius when half of the DNA strands are but that's of course only when they are shorter than 13 base pairs when they are longer than 13 base pairs you have to use the more complex formula so the more complex formula is here is 64.9 plus 41 times and then you divide the number of G's and C's minus 16.4 divided by the sum of base pairs that you have and the nice thing about this formula is if you would plot it in R right if you would go from having like go from like very low numbers or very low amounts of C's and G's to very high amount of C's and G's and A's and T's and you end up at this 90 degrees Celsius range because you would you will have 64.9 plus 41 times some number divided by the total number so if this part here reaches like 100 million right kind of goes to infinity then this thing goes to half of infinity so it's like 1 divided by 2 so and that would mean that you end up with 64.9 plus or 64.9 plus around 20 which is around 80 85 degrees Celsius so genomic DNA you can calculate the TM of genomic DNA as well but that will and always end up being around like 84 to 90 degrees Celsius depending on the CG content but yeah when when we will head you there's their formula or there's calculators online which allow you to to automatically calculate it but hey you can calculate easily by hand what the melting temperature of your DNA will be so the annealing temperature the annealing temperature is the temperature at which the primers buying to the template DNA is calculated from the TM and that's just the melting temperature minus 40 degrees Celsius and this is just a rule of thumb that if you calculate the TM so the melting temperature then your annealing temperature that you want to use so the temperature at which you want to kind of head do the second step of the PCR reaction is just the melting temperature minus 40 degrees Celsius alright so secondary structures are things that we want to avoid so if a primer can anneal to themselves or anneal to each other rather than anneal to the template the PCR efficiency will decrease dramatically so for example you can have hairpin structures in which a primer folds back on itself and we can have self dimerization so here we have two times the same primer but by swapping the primer from going from 3 prime to the other way around and you see that now the primer is able to bind to itself and because it binds binds to itself it will preferentially bind to itself because of course there's a lot of primer in the reaction but there's only very few template DNA so there's a very big chance that if things are self complementary then self dimers will form and this will reduce your PCR efficiency very very dramatically since primers always come in pairs because we have to have a primer which specifies from we want from copy to copy right so we have a forward one and a reverse one we also need to make sure that primers cannot bind to each other so if we use a forward primer then of course it's not allowed to have a similar sequence or a complementary sequence in the reverse primer so sometimes these two demand are the secondary structures are harmless when the annealing temperature does not allow them to take place have for example some dimers of hair pins form at 30 degrees Celsius but if you are doing PCR then you never get below 60 degrees Celsius and so there's a some of these are more harmful than others and when you design primers using a computer using computer software and then the computer software will also give you an overview of this hairpin might happen at a temperature of 53 degrees Celsius so then you say well in my PCR reaction I'm never going to be at 53 degrees Celsius so this is not an issue however make sure that when you when you do primer design that you do check the primer for secondary structures like hairpins self dimerization or dimerization with one of the other primers in the in the reaction all right so I've been talking now for 52 minutes I will stop the recording and we will do the first break and I will be back at like 310 and then we will continue with the rest of the lecture so let me stop