 So this last presentation will be about variant annotation. So as you have seen and as you probably know if you do variant analysis then you will probably find really a lot of variants but out of these variants you want to of course extract the most important variants. And typically the most important variants are variants that have an effect on for example the amino acid of a protein because that later on can have an effect on the phenotype. So when you do that then you are doing something that is called functional annotation. So that would be variants that are relative to genomic features and often those genomic features are genes. So variants with big effects on those genes are either in coding regions or in promoter regions because those kind of regions are important for the protein. Less important are for example variants in in tonic regions because they will not be translated into amino acid sequence and the same count for un-translated regions. Even potentially less interesting but yeah it very much depends of course on intergenic variants especially if you cannot associate them with a gene at all. So these coding variants they can have different effect on the amino acid sequence. So as you know a amino acid is translated from a triplet in the coding sequence so if you have a mutation in the in the coding sequence it can have an effect on the amino acid but not always. So for example if we are talking about a synonymous mutation typically it doesn't have an effect at all at a amino acid sequence for example A-G-A is translated into arginine while A-G-G is translated into arginine as well so probably this mutation if the A is mutated into A-G probably doesn't have a big effect on the phenotype. However you can of course have a single point mutation which can have an effect on the amino acid sequence for example in this case when we have an A-A-T triplet that becomes an A-A-A triplet so I think its aspergazine becomes a lytin lysin then of course the amino acid will change and that can have an effect on the protein structure and therefore potentially on the phenotype. It becomes even you can even get a larger effect when you gain a stop codon for example over here this triplet would be translated into a 15 but if that C mutate into an A then it becomes a stop codon and then you get a translated protein that means that the protein stops prematurely and therefore probably it can have a very big effect on how the protein can do its job. Also a big effect on an amino acid sequence or frameshift for example if you have a single insertion or a deletion then we get an out-of-frame translation so you get completely different amino acid sequence resulting from the translation typically also resulting in truncated proteins so these can also have a very big effect on the protein itself. So many things can happen and it is always nice to be able to describe these different type of effects in a standardized way and in order to do that we typically use the sequence ontology that is a set of ontologies in a a layered fashion that can help you or it helps software to describe a particular variant so it describes both the terms so it also is definition and their relationships. So for example if we have a coding sequence variant it could be for example a protein altering variant or a synonymous variant or a terminated codon variant and all of those different terms they have a definition and that definition we just discussed right. If you think about proteins it's all so important to consider isoforms because as you probably know one gene can have different isoforms resulting into different proteins it also means that one variant might be a protein altering variant in one isoform but not in the other isoform just because that axon is not part of that that transcript. So how should we do that in terms of of bioinformatics right because we want to get a complete report and if you and we want to find variants that have a big effect so which isoforms should be choose to report the variant effect. So there are multiple ways to do that typically the effects of all isoforms are reported at least all isoforms that have an axon in that region but sometimes if you are only interested in you know a bit of an overview of what's possible you can choose to only report for example the effect with the largest impact so you focus on the isoform where the mutation had the largest impact on the amino acid sequence or you focus on the most relevant transcript and most relevant transcripts are typically the transcripts or the isoforms that are most highly expressed. So there are definitely tools through a such functional annotation of course in addition to a reference genome you also need a functional annotation of that genome meaning that you need to have the positions on where the genes and the axons and the introns are. So well here I give three examples I think one of the most frequently used ones is VEP variant effect predictor from ensemble it both has a web interface where you can just upload a VCF and you get an annotated VCF back or you can use the command line tool that interacts with the VEP API. Then there's Anovar also relatively frequently used I must admit I've never used myself but it is used by many people and then there is SNPF SNPF is a tool that we will also use in the exercises the big advantage of SNPF is that it supports really a lot of different genomes and the database is growing and you can even add make a database of your own genome so your own annotated genome and GTF and let SNPF make a database out of it and then you can also use it for for variant annotation. So people have taken this a step further because what people have noticed and what you probably also notice if you are doing a variant effect prediction is that you can have a high impact mutation for example an amino acid change that might have a very big effect on protein structure but an other amino acid change has only a minor effect on protein certain way because it's at the at the sides or at the inside of a protein 3d structure. So then people want to take that to a higher level and what you can for example do is that you look for variants that are highly conserved in many species so if you find a variant that is usually of a that changes amino acid that is highly conserved in many species then you probably have found a variant that a very relatively big effect because it's probably important to keep it that way. Then you can also look on protein structure and biogrammetry you can imagine that alpha fold might have had or has had a pretty big effect on that because with alpha fold you can exactly predict the effect one mutation has had on protein structure and also people have been using expression qtls in order to see whether a certain variant has a big effect on expression of many genes. There are quite a few databases available examples of that is a DB NSFP and a mutation data.