 Welcome to MOOC course on Introduction to Proteogenomics. Our next speaker is Dr. Karsten Krug who will talk about effect of mutations on signaling pathways and how they could be studied using softwares like MIMP and active driver BD. He will talk about the frequency of phosphorylation and factors which may lead to specific kinetic activity. He will also talk about tools like Motif X and phosphosite plus for sequence motif analysis and also calculate the frequency of most recurring amino acids near the site of phosphorylation. So let us now welcome Dr. Karsten Krug to talk in more detail about the role of various mutations on signaling pathways and also to tell us about various factors which may help us in understanding how phosphorylation can be understood in a biological system. So I will first going to give you like a short motivation. So why we want to do that and then I will be very specific and I will talk about two specific software tools that tried to you know that tried to study the impact of mutations on phosphorylation networks. So as being just presented so there is millions of single nucleotide variants known in the human genome and many of them are associated with certain human diseases but for most of them we do not know the exact molecular mechanism that you know causes this genotype to phenotype association and as we have learned yesterday during Kelly's and David's talk and also during the hands-on so mutations can if they are located in protein protein regions they can be non-synonymous meaning they can lead to a single amino acid substitution so they can change an amino acid in the protein sequence. And here so I mean there is many different synonyms for these events a non-synonymous SMV is single amino acid variants or single amino acid variants. So these are all you know referred to the same kind of event but you will find all of these in literature and of course many of these single nucleotide or non-synonymous single nucleotide variants they affect sites or amino acids that can be post-translationally modified like phospholation acetylation or ubiquitination and actually these modifiable amino acids occur very frequently in the human genome. So what you are looking at here are the frequencies of all 20 amino acids in the human proteome. So the most frequent amino acid that occurs is losing but then on the second place we already find serine which can be phosphorylated right. What is what are highlighted here are serine's license theonine's and tyrosine's license are you know can be modified by acetylation or ubiquitination. So these are the most well studied and you know so we have the technology to study these modifications on a large scale. So that is why I highlighted those amino acids here and if you just look at the like you know the overall frequencies of these four amino acids they make up for 22 percent of all amino acids that occur in the human proteome. So it is very likely that the mutation affects these modification sites and we are asking the question what kind of consequences does that imply in downstream signalling events and there has been many studies out there that tried to you know decipher these kind of relationships and I am just highlighting a couple of those here. So you know the most basic kind of approach to to take here is to look at kinase sequence recognition motives. So kinase phosphorylates its substrate and one mechanism to ensure that the kinase you know specifically identifies its substrate is you know is given by very local interactions meaning it is a it is the amino acid sequence the properties of the amino acid sequence around the phosphorylation site. So basically as a probably many of you have seen these kinds of sequence logo motifs here where in the center you you are looking at the actual modification site. So this overall kinase B mostly phosphorylates serines but also threonines and if you look at the frequency around it substrates you know blanking the sequence you see that there is a strong enrichment of an arginine at position minus 2 we got you know relative to the phosphorylation site. So this is this is what we call the sequence recognition motif of overall kinase B. So basically it recognizes the arginine at minus 2. Any questions for that? Well I mean overall kinase I mean there is two classes of kinases. So one separate class is tyrosine kinases which only specifically phosphorylated tyrosines. If you look at all known substrate of overall kinase you find most of them have a I mean most sites are serine sites and then there is a small fraction of threonine sites. So overall kinase cannot phosphorylate tyrosines. Well so we are looking so this. So the zero position is actually the site where the phosphorylation happens right and we are looking around this phosphorylation left and right. Okay so and there is again there are many tools that can generate these kind of logos and perform these enrichment tools and you know the common principle of these tools is to test for enrichment of amino acid patterns that surround this phosphorylation site and you compare it against some background data set. So for example you have your phosphorylation site data set that you've acquired in your lab you know you have like in this case maybe like 20 sites or so. So these can be all of your all detected sites in your experiment or if you let's say you specifically inhibited the kinase and now you are looking for all phosphorylation sites that are down regulated upon inhibition. So this could tell you these are very likely substrates either direct or indirect of this particular kinase. And then you compare so you compare the frequencies that you obtained here against the background data set and this can again be all detected phosphorylates in your data set or you could use all known phosphorylation sites in your human proteome for example. So as I mentioned there are several tools so two very popular ones are motive acts and the sequence motive analysis tool on phosphocyte plus. So motive acts was probably one of the first if not the first tool which was published back in 2005 by Steve Gignes lab at Harvard Medical School. And so the basic principle again so you have your phosphorylation sites so the center is where the phosphorylation happened and then we are looking at the surrounding amino acids. And from that you can very easily build up this kind of frequency matrix where in the columns you have your offset positions again in the like zero means these are my this is the actual phosphorylation side and then you're looking seven amino acids to the left and seven amino acids to the right in this case again this is very arbitrary so some tools use different you know sequence windows length and so on. And on the y-axis you're looking at the like all 20 amino acids and then you just basically account to frequencies of these amino acids in your data for example in this case we would have like three glycins exactly we would have two prolings and and so on and so forth right. So it's very easy to to build up this kind of frequency matrix and this is exactly what motive x stars so it builds up these two matrices one is derived from your actual phosphorylation of interest and the other matrix is derived from your background data set and from that you can then calculate a binomial matrix binomial probability matrix where you for each position in your sequence window you ask and for each amino acid you ask the question for example in this case how many times do you observe a proline at position plus one in your data set this you can calculate for each amino acid and each position and you compare it against the background frequency which should derive from your background data set which can again be the the entire human podium or your or all of your detected sites and from that you can then calculate or generate the sequence windows you just put up an example which is based on the seven sequence windows that that we've looked at a couple of slides earlier so if you actually calculate this probability to observe k out of n so n in this case was seven so we looked at seven sequence windows we observed four prolings at position plus one you know and the background probability of a proline in the human podium is roughly 0.062 and you know in r you can just feed it into into the binomial test function and you get a p-value that this indeed although it's very small sample size it would be statistically significant so four out of seven so and so forth this again you would do for all amino acids in all locations so but you don't have to go too much into detail here's a motive x dusted like in an iterative manner so it it first of all takes all sites that you feed in and extracts the most significant sequence motives from that set and then it removes those from the initial set and repeats the analysis so that's one you know specific property of the software okay so now we know how we can look or how we can determine these specific kind of sequence recognition motives but now what happens if these if a mutation happens in this kind of region around the phosphorylation side so there's actually three different scenarios that can happen so one is directed so you actually so this is the wild type here this is the mutated version so in the wild type you have a serine which is actually phosphorylated and this serine now due to a mutation gets mutated into a histidine so it cannot be phosphorylated anymore and of course this can also happen the other way around right so the histidine can be mutated into a serine which now maybe present a new phosphorylation side which can be recognized by kinase so it can be either like you can lead to a genesis or phosphorylation side or to a destruction of phosphorylation side so the other possibility that could happen so it does not happen at the exact side but it can happen very close to the phosphorylation side so in this case we have this poline here at plus one which now is mutated into an arginine and the kinase that was able to recognize this poline can now not phosphorylate this specific serine anymore because the poline is gone right so meaning in this case we would lose this phosphorylation side again it can also go the other way around or it can also happen that you just change the sequence kinase motif so in this case in the wild type it was kinase A who recognized this motif and now due to a mutation this motif changed into another motif that this can be recognized by kinase B so these are the most simplest examples of these kinds of events that we are looking at. There is also like further or like events that happen you know further apart from the phosphorylation side for example if a mutation hits a kinase domain which contains the catalytic function of this kinase it can also change the it can lead to to a burned kinase activity so here on this slide I just presented a couple of tools so there's many tools out there already and you know people have started looking into that 10 years back already but also recently there are lots of new developments so one so these are actually the two tools we are going to have a closer look at so one tool is called MIMP and the other is called active drive it will be just want to highlight so this other tool here G2P database so genome to phosphor database which has been developed in David Fanyu's lab as well yeah. But the there it was asking will your side be the phosphor database like in the first time might be that the sitting position is the full position yes but in my sequence it was sitting 32 so I was not able to use the promoting text because I am biologist I am not a programmer yes so is anyone of these tools as you know all of these tools is anyone of these tools solved by problem? No well these tools don't solve your problem there are so this is one step actually upstream of this type of analysis so many of these tools that actually take the wrong mass spectrometry data and do the database search and create result reports on a phosphor side level let's say so many of these tools already have these kind of sequence windows in their result table so this is actually not a peptide sequence right it's the sequence window which is always the same length so in this case it's always 15 amino acids it's this and the modification side is in the center so this is something that many tools create like Maxquon does it like Spectromil does it I'm not sure whether Podium Discover does it as well yes yes I see your problem so what you would have to do you need to hire a programmer that just takes the data takes the database and creates a sequence window so well you just use another software so I can highly recommend Maxquon okay so MIMP does exactly what we've just talked about so it predicts the impact of a non-synonymous as in we is on kind of substrate interactions so it predicts kind of binding affinities and how or whether a mutation rewires Podium or like phosphosignal networks and it compares so that's basically like the principle here so it compares the effect of mutated and valve type samples and one specific property of this tool is that it uses a Bayesian approach to construct like these in this case position weight matrices which is probably which is very similar to these amino acid frequency matrix that we just looked at so this has been published in HM method like three years back and there's an online tool which you can just use but there's also like an R implementation of that package and this is actually what we are trying to use in the hands-on sessions so I hope that that we will be able to do that so we'll see so the key features is at first builds kinase binding models using known substrate sites so very highly curated very well known phospholation sites that have been determined to be a substrate of this particular kinase it used this information to build binding models and these models are already included in the software right so this is nothing that you have to worry about so then calculates for a given phospho sequence that might be now coming in from your data set it calculates the kinase binding score for each kinase it calculates the score how likely it is that this phosphoside has been phosphorylated by this specific kinase and again so you can upload your own phospho sequences or you can just query all phosphoside sequences in phosphoside plus does everybody know about this this resource here phosphoside plus so I can again highly recommend to to check that out so at the end of my slides there are like all references that I am going through here are included so you can just go through the papers and check them out yes so the first part is calculating kinase binding specificities and then it puts in mutation data which again you can upload your own mutations or it take or you can specifically look for TCGA mutations and then it does its prediction so today in conclusion I hope you have learned that kinase activity gets affected due to the amino acids which are present in the surrounding of the phosphorylation site hence if we know the correlation of amino acid to phosphorylation specificity then we may change the expression pattern of a gene we also heard that motive X follows an iterative workflow which provides us reliable and confident amino acid sequence which could result for the phosphorylation regulation mutations can lead to genesis modification or destruction of p sites resulting in iodide pathways Dr. Krug also interested many tools which can be used for correlation between genomic mutations and signaling pathways the next lecture will be the continuation of mutation and signaling by Dr. Karsten Krug thank you