 So I took mere fine chin because someone asked me to and I hope you like the like to break But welcome back. So we will be talking about secondary structures and a little bit more but The thing about primers is is that they work in pairs, right? Yeah, I'm recording. So I press record So primers work in pairs. So you have a forward and a reverse primer, right? And both are used in the PCR reaction So you need to ensure that and besides them not being able to either hybridize to each other You also have to make sure that they are suitable for the reaction And that means that the melting temperature and their annealing temperature cannot vary too much You cannot have a forward primer that is 30 base pairs long has an annealing temperature of like 63 degrees and a reverse primer which is only 20 base pairs long and has an annealing temperature of like 50 degrees because that won't work. So the critical feature here is is that if you are designing primers Then their maximum difference in annealing temperature is Kind of three degrees Celsius if there's a bigger difference than three degrees Celsius This is not going to work So the closer their annealing temperatures are to each other the better your primers will work And generally you want to be within three degrees Celsius Because if you're like outside of three or if you're like five or six degrees Celsius difference Then your PCR reaction might still work, but your yield will be very low So had the amount of DNA that you get will just be lower than what you would normally get All right, so very basic summary of primer design is Make sure that your primer is unique that it only binds to the target DNA that you want to amplify and not to Other DNA which might be floating around and especially human DNA if you're not working on humans So if you're if you're working on on plans make sure that your primer does not bind to humans The length of a primer needs to be between 17 and 28 base pairs But this varies and you can go a little bit longer and you can go a little bit shorter Of course the shorter you go the less the uniqueness will be or the harder It will be to find a primer which is unique The base pair composition needs to be around 50 to 60 percent You need to avoid long stretches of A's and T's and G's and C's so you cannot really amplify like repeats in the genome You need to optimize base pairing to minimize false priming So like I told you guys you have to have a low stability at the three prime end And this is just because the polymerase won't work when the D are when the primer is tightly bound to the to the template DNA The melting temperature of a primer needs to be between 55 and 80 degrees Celsius Generally, you don't want to have it above 70 degrees Celsius But like as a rule of thumb if the melting temperature of your primer is between 55 to 80 degrees You should be able to have a successful PCR reaction Because primers work in pairs You have to have their annealing temperatures to be very similar But never design or never order a primer pair When the difference between the annealing temperature is more than three degrees Celsius And you have to minimize internal structures So you have to avoid hair pins and dimers, right? So they those should not occur a primer should not be able to bind to itself It should not be able to fall back on itself because that will just make your PCR reaction not work alright, so Advanced primers so advanced primers are primers where you are doing multiple things in one go For example, if you use multiplex PCR Then you are using primers to not just amplify a single region of the genome, but you're amplifying for example multiple parts of the genome Or you're for example, trying to amplify not just one virus, but a whole family of viruses You have universal primers you have semi universal primers and you have gasmers So we will go through these four primer structures order to these four primer designs And this is relatively advanced and normally you would not be able to to do that, especially gasmers nowadays like I think that no one Really designs gasmers anymore because sequencing is so cheap that you don't have to guess anymore However, um in the past, um, I designed a whole bunch of gasmers, which is really fun Alright, so multiplex PCR is when you have multiple primer pairs in the same tube, right? So you want to amplify two parts of the genome? So you want to amplify gene x and you want to gene amplify gene y in the same go? um and Yeah, I'm sorry commando. I think it broke. I think you can't use the wizard command at the moment Let me see if I can if I can reset my overlay Let me let me do that Oh, no. Yeah. Yeah. No, I made a coding box every time that I that I go away Yeah, yeah, don't don't throw it all in the chat then like wait for me to reset the thing. Uh, so let me, um Do properties do this then okay And then I get a 404 page not found and I do properties and I go back to this page Then I say, okay, then I have to interact with the thing and I have to click the login button Have you tried turning it off and on again? All right. So I turn it on again. So now It should work So does it work? Yes, it works. So now you can use your your wizard and your hard eyes and these kinds of things again I like it. It's just so silly, but it's it's it's something which is fun All right, so I have multiple primers in the same tube So we do this sometimes and we sometimes use the three primer systems where we have A primer which is in the middle and if there's kind of a deletion then this primer cannot bind But multiple or multiplex PCR is very common Especially in sequencing projects where for example, you're not only interested in a single gene But you're interested in like two or three genes at the same time And you want to amplify these two or three genes and then want to sequence them afterwards So the application for using multiplex PCR is genome identification So for example, the the genetic test panel that is being used by the police Also targets multiple parts of the of the of the genome So here what they what they do is they have a primer mix of 20 known primer pairs And the distance between these 20 primer pairs is slightly different for every individual So some individual have a length of 50 between one and two Other people have 60 and this varies for each of the primer temperatures So the main design difficulty when you do multiplex PCR is to have to make sure that The melting temperature should be similar for all of the primer pairs that you are using And of course the dimer for me The dimer formation Will be much much More common right if you have two primers then the chances of two primers binding together is relatively low But if you're using 20 different primer pairs Then you have to check each primer against 39 other primers to make sure that they cannot form a dimer So it's it's just uh, it's just more Involved in making a multiplex PCR But in theory multiplex PCR is nothing else than just having multiple primers In a single go so instead of amplifying one part of the genome You're amplifying two or three or four parts of the genome and this is very useful for genome identification A universal primers is normally primers are designed to amplify one product But when you are dealing with universal primers, um, you can, uh, amplify multiple products And we call such primers universal primers and for example, uh, one of the main Usages here is to amplify all different huma and papilloma virus genes Huma and papilloma virus like the flu, um, has very different viral variants So there's not just one virus, but there's happy if a one two three four five six up until like 36. I think Um, yeah, so you have all you have all kinds of viruses, which all look very similar. Um, But they are slightly different Right, so the strategy here is that you have to align your sequences that you want to amplify And then you find the most conserved ends at the five and the three prime end Then you design a forward primer at the five prime conserved region Um, and you do the same thing for the for the three prime end You match forward and reverse primers to find the best pair And then you have to still ensure the uniqueness in all template sequences and you have to ensure Uniqueness in positive possible Contamination sources. So, um, how does this work? Fortunately, I have a board, right? So we have for example, the huma and papilloma viruses And so we have for example virus one, which looks like this and then we have virus two Um, which is very similar, but it has a small deletion in the middle Right. So when we align it to the first sequence, then there is no sequence in the second one And then in the third one we see the same thing But now it has a bigger deletion at the end And for example, we want to amplify a couple of these and then we have to find the region at the beginning Which is shared between all the sequences We have to find a region at the end which is shared between all the sequences And we can then design a primer a forward primer here In this region and we design a reverse primer here And of course this primer can bind to all four this primer can bind to all four So now we can amplify any of the four hapeve viruses that we are interested in Is that clear? It's it's just the same But now we're using a single primer pair And we first have to align all the target sequences with each other To find a region or two regions one region in the front one region in the back Where we can more or less target our primers. I hope that it's visible Actually, otherwise have to switch to full screen and then the overlay will kind of get worked again But so let's not do that But that's making universal primers You have semi universal primers, which is the same but now you want to have for example You want to only amplify The first six human papilloma viruses, but not the other one. So you don't want to target number seven So again, you do the same thing. So you have to align all the all the hapeve genes So you have to align every all of these viral genomes to each other And then you have to identify a subset that are more similar to each other than the other subsets So in this case, we want to look at type one to six And then we want to find the region which is conserved and then we have to design forward primers here And so I imagine that we would have a third sequence or we have now one two three and four So imagine that we have number five Which is slightly different So it might have like a little deletion here And it might have like a large deletion at this point, right? So what we want to do is we now want to find a region for the forward primer where the first four are similar to each other But the fifth one cannot be similar to the first four And so here we are kind of doing the same thing But now we are looking to find a primer Which is only amplifying the first four and this quickly becomes very difficult, especially if you want to Do multiple genes in the same go So if you use semi universal primers and you combine this with for example a multiplex PCR Then it becomes a real puzzle to figure out where you should exactly target your primers and How they should look like to kind of avoid amplifying sequences that you are not interested in But these are called semi universal So the strategy is more or less similar to the universal primers But now you don't only want to identify the region at which everything is similar But now the other virus variant should be different at this point So then you are talking about a semi universal primer My favorite primer design is the gesmer and the gesmer design is when you do not have a dna sequence available Right, so if if you are working on a species, which no one has been working on before so there is no There is no No template dna available. You don't have any sequence data available But you you do know that this animal for example has a certain protein and you want to kind of Amplify this protein in this unknown species. So here we are using the homology trick again. Yeah, because we know that for example Hemoglobin is not that different from humans to mice Imagine that we do not have the genome sequence available for mice and then we would use the human sequence for the gene of interest so for example like hemoglobin So we use the human hemoglobin sequence the protein sequence And then we design primers based on the human hemoglobin sequence Of the protein and then we translate that back, right? So hey in case we are interested in Hemoglobin not only in mice but in other species which do not have a genomic sequence available And so what we then do is dna sequences are unavailable A single group of related proteins can be back translated into nucleotide sequences And then this will be used as a template to design our primers Translation from proteins to dna is possible, but it has its problems because there will be There will be slight differences, right? So Like I told you guys every amino acid is coded by a triplet But the last base pair right the last base so this one no this one for you guys I don't know. How does it look on stream? Is this the last one or is this the first one? I don't know But when you have a three base pair codon then the third base pair in the codon is called the wobble base And this is more or less free to choose right because of the way that the That the ribosome works So every like third base pair you could make an error You could say well, I'm targeting a c but the animal that you're looking at is actually coding a t there So what you have to do then is to either know the codon bias of the animal, which you probably don't But here you can use that to back translate So here we are designing then based on the protein sequence of another animal Which we know the protein sequence of and then we use that We use the codon table to go back from protein sequence to dna sequence And then we design a primer based on this hypothetical dna sequence Of course, there might be bugs there or had there might be mismatches. So we have to make our primers longer Yes, so we back translate the protein sequence using the corresponding codon table Had we identify five prime regions and three prime regions where we're most likely to not have made a mistake And then we design and match forward and reverse primers as before And but now we we make our primer around 30 or 35 base pairs long had just to avoid that if there are any Nucleotides that do not match that these mismatches can be Can still work right because the primer can if you have a primer, which is 30 or 40 base pairs long Had then as long as like 30 out of 40 base pairs bind It is still possible for the dna to bind So it's still possible to get a a product Um, yes, so, um You use longer primers not only that you use a slightly higher annealing temperature Because of the slightly higher annealing temperature you increase the primer annealing stringency So it it becomes a little bit more stringent But gasmers are really fun to make Because there's an additional step of going back from because you go from protein to dna code And then based on this hypothetical dna code you are starting to design your primers and This is an iterative process. So you do you have to do it a couple of times You're never getting it right in the first go so But it's very fun to make gasmers All right, so have primers can be designed to serve a multitude of purposes You can do multiplex pcr you can design semi universal primers You can design gasmers and there are actually many many different other Strategies to design primers for different different solutions Hey, but there's many different fields where primer design skills are required For example, if you are going to do real-time pcr For example measuring corona viruses. Hey florian, welcome to the stream How's it going Do you have any mood that you want to share? Bad, uh, still bad Well, we're not going to talk about that but Just uh, just throw something so if it goes bad, then what do we have for that? We have the kind of intensive emoticon for you then so you can you can throw something like this in chat and then Then you look i'm using the overlay now All right, so and there's many different fields where primer designs are are necessary Especially when you work in a lab Head then you have to design your own primers Normally for phd students Well, you have to type it in first uh florian So just just type in a word with like capital letters like me like pensive And then you will be added to the mood box on top of me All right, but many fields for primer design skills are required So real-time pcr population polymorphisms where you target microsatellites or aflp or snips Too old for twitch. Yeah, i'm too old for twitch as well, but i'm still doing it But the basic rule in every primer design that you do achieve the appropriate hybridization Specificity so make sure that your primer is unique It can only bind to the dna of the target species and not of any contamination sources and make sure that there is enough Stability so make sure that your primer can bind and can bind properly And that the three prime end is not tightly bound to the dna Because that's one of the reasons why most primers don't work Is because the primers just bind to the template too strongly for the polymerase to be able to kind of extend the dna All right, so searching in databases um because hey, of course, we we have to deal with the databases So had databases are genome browsers Hey, we are going to look at ensemble How do we find our genomic location that we want to target our primers on and how are we going to export our sequences? And I was initially wanting to do a live demo, but I just put some of the screenshots in the slides because otherwise have to switch between The firefox and the normal window So of course when you want to design primers, you need to be able to figure out what part of the genome you want to target So you can do this using a genome browser Um genome browsers like ensemble or ucsc. They visualize genetic information, right? So there's you have a genetic sequence and these this genetic sequence codes for different um different Different genes right or different micro RNAs So what a genome browser does is that it takes the genetic sequence and then adds Information on top of that sequence saying that well at this region. There's a certain gene Here's an axon here's an intron and then there's an axon again And this is the promoter region And so it allows you to use different scales and you can zoom in and out in a genome browser But everything here is based on a coordinate system. So the coordinate system that we use in in in genetics Is not fixed in a way And this is a little bit annoying because some genome browsers ensemble The first base pair is base pair number one while in ucsc the first base pair in the genome So on chromosome one base pair one is coded as being zero So and the first thing that you have to remember is that there can always be like a shift in where the the genome starts But the idea behind the genome browser is just to integrate different information sources So to have an information source Like a protein database, which is then incorporated and shown on top of a DNA code, right? So you have your DNA code and you have the the Introns and axons then you have the protein level on top of that and when you go to ensemble, this is the way that they present the information to you So very basically how does a thing work? Well, you have your lab experiments which are fed into a computer Then of course you get some textual data or xml or json have representing your experiment and people put this data into a database Of course, you don't have one database, but you have many many different databases like ensemble pdb PubMed Hey, you have medlin and all of these databases. So all of these different databases They have their own web services and you with your laptop can use all of these databases or can reach these databases via the different web services Yeah, so you go to a website and you use it or you use an api So hey, you connect r directly to the database and to queries directly to the database So when you choose a database for your research you have to know The availability and how up to date it is So have which organisms are in a database and the ensemble database is for many many different organisms But there are databases which are unique to human or which are unique to mouse or which are unique for livestock species But you you you have to when you choose a database for your research You have to make sure that your database is available and that it's up to date And this is one of these things that goes often wrong because in the end we want to have Reproducible information and reproducible results And often many databases do not provide access to like the old data sets that they have And so they for example used to have the information based on genome build number five But then they switched to genome build number six and then to genome build number seven But when you publish your paper and when you did your analysis you did that based on genome build number five So the database needs to have old versions available Right, so you have to have either a backup of the database yourself to redo your your research Or you have to have a database Or a data provider which has old data sets available. So fortunately ensemble is very good in this So had the ensemble database every time that they update their database they they keep their old database available And they have like this this Structure where you can just say well, I want to go back to the database As if it were 2016 right because then you can redo your your analysis In the old days and that this is becoming less and less important is that the location of the database was very important Um, if you are located in europe then a database which is located in Japan or in china is of course far away And especially if you're if you're dealing with a lot of data transfer, right? If you want to download a whole genome sequence, um, and then you're talking about like two gigabytes of data And transferring two gigabytes of data is easier when the database is physically located very close to you Nowadays with Things like the amazon web services have which are available all over the world the same as google data centers And had this becomes less of an issue because many databases are more or less replicated all over the world And had these these databases are more or less always close to you But like five or ten years ago. This wasn't that easy. So if you would use something like keg Nowadays keg is really really good because hey, you always have a local version Which is hosted on a computer more or less in your own country or within your own like zone of the world But it used to be that keg would be only hosted in japan and then hey if you would go to the database Then that would take time and downloading data would be really really slow So databases when you choose a database you also have to look at which software they have available Which analysis tools they have available, but in the end choosing a database is more or less Hey, you have your own personal favorite or your own personal flavor of database that you like Like i am someone who uses ensemble a lot But i know a lot of the people that i work with they prefer ucsc So that's they contain more or less the same data But a different coordinate system And slightly different organism and so your personal flavor and the flexibility of the surface matter when you when you choose a database And like i said, you have many different. So hey, you have the ucsc database which you see here You have map viewer Oh, no, this is map viewer here. You see ucsc and here you see ensemble and all of these databases look slightly different They have very similar information in there But choosing one is very up to you and you can choose one and someone else chooses another one But then you run into the issue that sometimes you have slightly different coordinate systems So ensemble starting with base pair position one while ucsc starts with base pair position zero So that will that will have a gene start at three thousand in ensemble and at three Are two thousand nine hundred and ninety nine in in ucsc. So it creates some issues So ensemble is the main database that that i am using So i will be using that today for your overview when you go to ensemble it kind of looks like this It doesn't really look like this anymore. They they change the database, of course every so often So when i made a screenshot like two years ago for the first presentation that i did it looked like this But the thing is is that if you for example click on a species So you select the species you can go to the assembly information and here It will tell you all of the information that you kind of need to know But the most important information is this line So hey here you see the assembly that we're currently looking at so we're looking at b tau 4.0 Which was published october 2007 so when you are writing a publication and you are saying that we use the ensemble database for boss Tautus then you have to add this information to your publication Because people need to know exactly which genome version you are working at because genomes get updated Not very frequently but frequently enough that you have to mention which version you used because every new genome build Will see changes to where genes are located has sometimes the genome becomes a little bit longer Because people were able to sequence a part which was not sequenced before So you you have to mention the assembly that you're currently or that you are working on when you write a publication And you also generally mention the database version that you used So these two are more or less the most important ones that you have to use in in a publication So when you search an ensemble very easy We already did this you can search by gene symbol by database id or position in the genome And so they they give some examples here like you can search for the gene name or you can Search for just a single term like prion and searching in databases becomes easier and easier Because they are more or less And they're getting smarter every day So searching by gene symbol used to be a pain in the ass because genes Symbols are not standardized. They used to not be standardized For humans you have the h c n c. So the human gene nomenclature committee that assigns Names to genes Um, has so a gene symbol is an approved name which is approved by the committee for a verified human gene um And of course this has a massive a massive advantage has so it is a unique reference for all for a genes In in scientific articles. Hey, you can easily search for a gene in a database And you can actually identify several gene families, right? If i'm searching for sip and then a number had then these all point to cytochrome Genes if I search for hawks, um, which are genes which control like development of different types of body tissue And these are all homeobox genes So there are genes in the genome which which kind of shut down another part of the gene And it it allows you to clarify orthologous genes in other species This did not used to be the case It used to be the case that a single gene could have up to like 15 different names. So someone would call it bbs7 Other people would call it differently. Um, so let me let me just go quickly to ensemble and show you the Or not show you but tell you the diversity in a certain gene So how bbs7 is one of these genes that we've been working on a lot Um, yes, so can I show you firefox? Yeah, there we are Um, yes, so if we look in ensemble for bbs7, let me scale this up a little bit for you guys Um, not too much actually So here we see, um bbs7 right, so it's called bbs7 now, but it used to be called 8 4 3 0 12 4 0 6 and 16 R i k So this and bbs7 are the same gene It's now called bbs7 based on the human gene nomenclature committee, right? Because they decided that all of these beadlet part alert associated syndromes associated Jeans should be called bbs But it used to be called this and and this holds for for many genes and especially genes which have been studied a lot They sometimes have five or six different names That pop up in literature and then of course it becomes very hard to kind of understand Which gene people people are talking about if everyone's using their own name But that fortunately got better when the human Well the human community so the community of human geneticists came together and decided to form this human gene nomenclature committee Yes, so it has many many advantages So here when we want to search for a gene we can for example search for abcg2 Which is a gene which is involved in milk production in cattle Hey, so it's called boss towers at the bay binding cassette. It is located on chromosome 6 in cow And this is based on the chromosome um umd3 coordinates so In the boss towers 4 genome build this gene might be located somewhere else might be on a completely different chromosome The name of the gene is a p a abcg2 And in this case, they don't have any any synonyms. So fortunately just gene only has one name And so when we search for the gene here on the side, we see all of the different options that we can do So one of the things that we can click on is for example click on this ortholog gene right that will show us the orthologs or the the gene in different species When you click on that and then you see here that that there are many different animals in which this gene occurs It highlights the gene that we that we had selected and you can see that when we look at this abcg2 gene in cattle And then the closest known related variant is the the same gene in sheep And then the next closest relative is um in dolphins So and genetically you would not think that cows and dolphins are very Related, but they actually are cows and dolphins and secesians and like Whales and dolphins and so all of the Mammals who are living in the sea currently They are very closely related to cows So and that that's something that you can learn when you when you have a database filled with all of these sequences for different genes and for different genomes All right, so when we when we look at this abcg gene and we go back to the main gene page Then you see here that this gene has two transcripts. So there's two different splice variants, right? That means that um when you click on the show transcript table you you get this little overview And here you see that there are two different versions of the gene Both code for a protein which is 658 amino acids long But the length of these genes are different on the genome. So this means that this is a gene which has a Which codes for two kind of different proteins. They're the proteins are more or less of similar length But they are not the same proteins. So here we have a situation where in one case Certain axon might be skipped or there might be um an intron which is included So transcripts so every gene can code for an n number of transcripts So an n number of proteins that are being made And so here we see the two different transcripts and so we can see that both of them start more or less at the same position habit the first Gene actually skips a whole part of the genome and then the first axon is located here While the second version of this gene already has its first axon very much Closer to the gene right so we can see that some of the some of the axons are shared Right. So this part is always included into the abcg two gene However, this part is only available in variant number two and it is not available in variant number one And so although they code it's the same gene Coding for two different proteins and these proteins are coded very differently But of course there are axons which are shared and there are axons which are unique to one of the two gene variants So these transcripts and you can just get them from ensemble And so there's a lot more information from example So you can go to the external references where you can go to either literature or uniprot to show the the protein sequences Or the protein domains You can go to wiki gene which has some information about the gene in a in a kind of wiki-pedia format And you can go to genomic alignments here You see the location of the genome so you can you can look at the gene on sequence level And then here when you click on this phenotype button, then you have the phenotypes which are associated with this gene Yeah, so this is um the part of ensemble which is built up by cutiel analysis, right? Because cutiel analysis allows you to associate a region of the genome with a certain phenotype And so here there are all these associations will be there And then you go to the variation table to show all of the known single nucleotide polymorphisms inside of this gene So the uniprot database we already saw it before but the nice thing about uniprot that it has a very kind of clear description of what the protein does and so Had this high capacity urate exporter functions in arena and external urite secretion plays a role in polyform homestations enables to mediate the export of Ppix for both from the mitochondria. So it gives you an overview of what the gene is doing had the known snips Generally when we are talking about snips in the in the in primer design Of course When we know that there are snips In this gene then we do not want to target the primer at the location where there is a snip And so the snips that we are most interested in are the snips which are the the miss sense variants Which are changing the protein, but we also need to account for not for for standard variants so so Snips single base pair changes into the genome because this will affect the hybridization of our primer And we can imagine that if we work on two different species of cows So I am working on holstein and someone else is working on a road bontus fleck fee or or another cow species head and of course then the issue comes in is when these two cow Species have or cow breeds have very different snips So very different or relatively different sequences then a primer pair might work in holstein But it might not work in in one of the other species. Yes So when you are designing primers always make sure that your primers are not targeting a region Where there is a single nucleotide polymorphism or where there is a known deletion. So And you can find that in the snip table, which is located here in the variation table All right So if we look at a certain gene or if we look at a certain snip for example this snip here Then we can see it is a miss sense mutation meaning that it changes the amino acid structure And then we can use db snip to get more information about what is exactly changing Yes, so when we search for this snip Yes, so when we want to get the region for a certain gene For a certain snip so imagine that we want to pcr out This part of the genome which has the snip in there, right so that we can send it in for sequencing Imagine that holstein has the reference allele and another Cow breed has has got a different allele and then we can target primers to amplify this part of the genome And then by sequencing we could figure out if The animal in in has the holstein allele or if it has the rhodopontus fleck v allele Yes, so We can search for a certain snip Then we choose the region in detail button And then we have to verify that the snip is is in the picture Or that we can see the snip there and then we can export the data To a fusta sequence and now we can use this fusta sequence to create primers Had to extract this piece of dna for sequencing or other Things that we might want to do with it so If we are designing primers There is one big disadvantage and that is that there are repeats in the genome, right? I told you guys that when you when you design a primer your primer needs to be unique Of course Repeats are up to around 50 percent of the mammalian genome So if I just look at a a region in the dna and I just randomly select like 50 base pairs Then the chances that this 50 base pairs occur somewhere else in the genome is around 50 percent Right because every every if you just randomly select a part Then there's like a 50 chance that this part will be in the genome twice And so pcr primers cannot contain repeats themselves So and we do not want to target primers in these areas which are repeated So we need to get rid of them and you can use something like repeat mask or for that Nowadays in ensemble. You can also just repeat mask it when you export your sequence So let's show you guys since All right, so hey imagine that we Let me make this a little bit bigger so that fits a little bit better into the screen A little bit more a little bit bigger here right, so imagine that we want to Go for this this one snip which was in the presentation. So the snip was called Let me move this to the side so I can see so the snip was called rs 4 3 7 0 7 0 2 3 3 7 All right, so we just search for this snip and then here we see that it's a cattle variant So it's it's a it's a snip which occurs in in cows All right, so we just click on it And ensemble is relatively slow at the moment, but here we have the rs snip, right? So it says that the reference genome has an a and some animals have been detected which have a t What does it do? It's located at this point in the genome And this this And this this is the name of the variant and it overlaps seven transcripts. So it does affect the coding of the of this gene Of this abcg 2g So what we can then do is of course we can now say Head that we can now Export so we go to region in detail So where is it region in detail? It doesn't have the region in detail anymore. It's so nice that they changed the database I think it's now Yeah, it's probably called genomic context. All right, let's click on it We have to wait a little bit for the component to load And then here we see that this snip is located here. So when we hoover over it Oh, you don't get the pop-up because it's not capturing the the pop-up, but here at this position The the snip is located so it's Can we zoom in a little bit? No, we can't zoom in allowed to zoom in So here we see the snip located in the middle and then one we want to go and get the region for this snip And then we can go to the primary assembly to the location And we just click on it here, right? So then we go back to the standard ensemble website Where again it loads in this component that kind of shows where it is located And so we can see here that here. There is this snip that we were interested in And just yeah, so now you see it. So here you see our snip, right? 4 3 7 0 2 3 3 7 It's located exactly here And you see also here that there's an overview for all the phenotypes that have been associated through qtl mapping And so we can see for example that this is This this snip is associated with Difference in milk protein percentage. So having a certain variant of this snip increases or decreases your milk protein And it also Increases or decreases the milk fat, right? But when we click on it, we can see that okay So this thing is in the middle now And so we want to export the sequence here so that we can design two primers to pcr out For example this piece of the genome, right? So we can then go to export data Right and then we have to Make sure that we we want to pcr out A part of the genome and the snip is located at this exact location So what you would normally do is say well if you do sequencing then you can sequence very cheaply like 200 300 base pairs So what we want to do is we want to go like 100 base pairs in front of this one. So we say, um, oh we go from Let me reset that five six. So it actually oh no, it's not in the middle So it's already in the middle. So it already selected 100 base pair region Where the snip is located at position number 50 So but I want to make the region slightly bigger So i'm going to say well go 50 base pairs earlier have give me not a sequence Which is 100 base pairs long, but which is like 200 base pairs long So I want to start 50 base pairs earlier And then I want to say 7 4 to be 50 base pairs later, right? Then we say next And then we say well, we want to have it in text and then it opens up a window where we have The primary assembly now we have to make sure that we find our snip in this sequence Um, yes, so if we would go back and we would zoom into this snip then we would know that this snip is located at exactly position Um 640 So 640 so that means that it is 100 base pairs Before the end and 100 base pairs after so that should be perfectly fine, right? But we we do want to check that the location here is exactly in the middle of the location here All right, so then the next step would be is to now do repeat masking on this sequence So we just copy this sequence and then we go to uh repeat masker So let me open that up. Can I just click on it here? Yep, so it here we go to the repeat masking website Here we take our sequence of interest that we want to design primers for Um, yeah, so we go here we put in the sequence And then we say well, we want to use um had the dna source in this case is um Mammal other than below Had the sensitivity is going to be default and well, we just leave everything in standard So we just submit the sequence and then it will start repeat masking our sequence. So it will look in this 200 base pair sequence if there are any repeats And we just have to wait until it finishes And so it looks for simple repeats and then have full length and disperse repeats So then it just goes through all the possibilities that what what might not be suitable for primary design Yes, so it takes a little bit little bit of time, but It should be to refresh it Just to make sure and it says no repetitive sequence was detected. So that's really good So in this case this region of the genome has no repetitive sequences And then here we can go and it says no repetitive sequences So we can just use the sequence that we have as if Because there are no repetitive sequences But you always have to do this step You have to make sure that there are no repeats because you you can't design primers based on the repeats All right, so then the next step would be to start designing a primer And because now we have our sequence we know that our snip is exactly in the middle And so that means that the snip is somewhere around so this is 60 So then 40 base pairs in so it's around here where our snip is located So we can design a primer all the way up to where our snip is all the way So we have a forward primer that we can design here and a reverse primer that we can design here All right, let's go back to the power point So make sure that you always do the repeat masker step Um, again, how does it work? So you take oh, I took a thousand base pairs. So let's go back because Then I can show you that there are a lot of repeats. So we go to primary assembly, right? We take the location of this snip So we just go to export data We say we want to have the location of the snip location of the snip and we want to have a thousand base pairs upstream And a thousand base pairs downstream. Oh, no around so we go 500 in front 500 in the back and then we just say next And then we go to text. So now we get a much much longer sequence than before So we take the much longer sequence and again, we are just going to first go to repeat masker and make sure That there are no repeats. So we just put it in Other please specify Well, we're just going to say other than below. So it's a mammal All right, so we just submit our sequence We'll take a little bit of time not too long. So again, no repetitive sequences detected Why is that because in the original one, why does that not work? That's strange because there used to be a repetitive sequence in here Did I take another snip compared to the one that I was looking at? That might be that the genome build got better. Um, that is interesting Let me try this once more. So I'm not going to use repeat masker. I'm just going to hard repeat mask this region Um, so that that should be and then we go to look at text Yeah, so actually ensemble does find a repeat, right? So the repeats here are these regions where it says nnnnn So ensemble nowadays has its own built-in repeat masker Which is a little bit better than the repeat masker that I point to in the in the slides But here you can see that if we would design a primer without masking the repeats And we would pick a primer which would have been located here And then of course our primer would not have worked because it would be binding not only this sequence But also other sequences in the cow genome. So let's just use this one then for For future reference, right? So again our snip should be in the middle And of course we need to make sure that we check that as well All right, so let's go back and Disable firefox. So this is what it used to look like when I when I ran it with the old version And then it would take the DNA sequence and then here you would see that there would be a big repeat Which was also in another place in the genome All right, so when we design primers and primer design you can do by hand So it is possible to design it by hand, but you can do it much better by a computer And so a computer is much better at designing primers and there's like things like primer three Have which is a web application. You have oligo, which is also a standalone application But there are like literally dozens of tools to do primer design. So have where You can select any tool that you want And you can calculate the melting temperature using for example the promega Calculator to calculate your melting temperature for your primer to make sure that your forward and reverse primer are Of suitable temperature, but nowadays programs like primer three will also do a first prediction of the melting temperature So we don't have to deal with that the recording time. Yeah, we're at 52 minutes. So i'm just going to continue Um, yeah, so an example of primer three Um, yeah, we will do that after the break. Um, I guess you're just interested in getting the uh Getting the animated gifs to cheer up a bit Um, let me change at least my um My my my mood box. Um, so I will be doing this one Yeah, all right, so I will stop the recording and we will be back in like five to ten minutes. Um, so I will