 Welcome to MOOC course on Introduction to Proteogenomics. In the course so far, the emphasis was to give you the good understanding of genomics and proteomics, different basic concepts and tools available to do data analysis. But now we will shift gear and move on how to integrate data from both genomics and proteomics in the form of proteogenomics because neither the genomic or proteomic platforms could provide you the complete picture. Proteogenomic technologies, proteogenomic tools have power to combine various genomic approaches along with proteomic data sets to provide a comprehensive, very broad overview at transcriptional and translational level. In today's lecture Professor David Fenyo will introduce you to the basic concepts of proteogenomics. So, let us welcome Dr. Fenyo for his lecture. So far you have had first an introduction to genomics then to proteomics and also to machine learning and now I will try to give an introduction to proteogenomics where we combine the genomics and the proteomics and see what additional things we can do when we have both kinds of data. So, as you heard from Karl when we often when we want to the first step in proteomics is to identify peptides and and then proteins. So, one very important thing with this database search is that if we if the protein the exact protein sequence that we are looking for is not in the database we will not be able to to find it. So, what one can do then and of course, we know that in cancer there are a lot of changes in the genome and then these could lead to changes in protein sequence, but if we just use the reference protein sequence database we will not be able to identify these. But now, since we have both genomics and proteomics data we can use the genomics data to modify our database to add in all the effects of the genomic changes. So, that is and then we make a more comprehensive database that then we should be able to see the specific changes that happen in this tumor. And so, for example, just to look at a few examples if we have a single nucleotide change like in this case here we have one base that is changing that then can lead to a change in an amino acid. So, this is just one example of in this protein we have a on the in the DNA we observed that the G at position 183 changes to an A which then leads to that this valine at position 62 in the reference database changes to an isoleucine. So, the triptych peptide is underlined where we have the valine. So, the change we see is that we then instead get a modified peptide that then will have a different mass and a different fragmentation spectrum. So, what we but if we go through and take all the single nucleotide variants that we see in the genome and see which extra triptych peptides we get we can then make a larger database. And so, this is probably the simplest change on the genomic level that then propagates to the proteome we can have more complex changes. So, for example, in this case we have a more dramatic change where this on the genome in the DNA this C at position 155 changes to an A and that means that the tyrosine changes to a stop codon. So, what will happen then is that most of the protein will not able to be produced in our sample in the tumor. So, we see we can only expect to see peptides from the first part. And so, of course, this is much more difficult case in a sense that we in proteomics our coverage is limited. So, if we do not see something that is not approved that it is not there. So, but that is one example of things that can change. And so, what people do then is to sort of evaluate all the possible effects that we can have. And Kelly is going to talk more about what kind of changes are, but I will just quickly say that most of the different ones are related to that splice variants that we can get on the from the RNA sequencing. So, this one up here is just simply that we have three exons. And according to reference database exon 1 is connected to 2 which is connected to 3. But then in the RNA is sick we also see that exon 1 is connected directly to exon 3. So, that is but we also see a lot of other cases where we get connections from the middle of an exon from an intron and so on. And so, Kelly is going to talk more in detail about these things what you can expect and how to create these tumor specific databases using combination of usually exome sequencing where we get the variants and insertion solutions and the RNA seek where we get the different splice variants. So, what are the effects then that we see from these genomic changes? So, it is can be the protein sequence can change as we looked at these two examples. We can also have that the modification site is changing and but we can also in a lot of cases have mutation that does not change the protein sequence, but changes its level. So, it is either from a mutation you get more of the protein or less. And the same for modifications that the mutations can lead to both increasing and decreasing of the mutations. So, I have a question. So, why do we care? Why what do we use these? So, now we make a big catalog of modifications that we see on the genomic level and that we also see on the proteomic level. So, anyone have any suggestions for what what why what do we do after that after we made this catalog? Yes. So, we can look downstream and see if this it I mean both that it is correlated with cancer or that maybe it happens in a tumor suppressor gene or an oncogene that changes things yes. And another thing that we can of course, look at is if these peptides are only present in cancer and not in normal samples. Then we can try to target them and see if that we can for example, I try to activate the immune system to attack cells that display these peptides or in other ways do targeting. And there are examples of treatments like that already. Yes, no so, that is what we do yeah. So, now, but it is I was thinking of after that after it done. Yes, that is what I was thinking. So, two ways. So, either peptides specific to the tumor or that a protein is activated that is usually not active that is maybe not expressed at all in most cells, but gets because of mutations gets expressed. So, that is another thing that we can do. So, for example, in breast cancer her true would be an example of that in most cells we have very low amounts, but in the subtype of breast tumors we have it is very highly expressed on the surface of the cancer cells. So, we can then target have targeted treatment for that. I mean yeah. So, that depends on your experimental design. So, if you are for example, you have to then analyze a lot of tumors and also normal samples to see that in most of the normal samples you do not see it. And I mean prefer I mean what one can do is for example, look at studies like G-text that looks at a lot of normal samples because if you if for a good treatment you want that there is no expression in any of the organs pretty much. And you can so, you can use this public data set to compare. Another thing that we can do is to go back to the central dogma of the molecular biology. Since we now are measuring these different kinds of molecules. So, we are doing whole genome sequencing, we have RNA-seq data, we measure the protein levels and we measure how much modifications we have. So, that is in CP attack that is what we usually do. And sometime for some project we have other types of measurements, but these at least basic measurements that we have. So, we can look a little bit at what how these relationships are. So, for one as was already mentioned several times that the I mean the proteins and the modifications are the functional gene products. So, that is what makes the phenotype. But then of course, it is much easier to measure DNA in RNA, there are much more automated methods and that still even though there has been a lot of improvement, it is still rather much more difficult to measure proteins and modified proteins. So, it is often one can probably have in most studies where one only looks at for example, RNA-seq of tumors these are many more samples are analyzed. So, then the question is ok. So, it is cheaper and faster to measure RNA, can we just measure RNA instead and not worry about the proteins and their modifications. So, there are many arguments for why that is not a good idea. I mean first of all there is a lot of additional regulation than this. The other thing is that this process the transcription and translation is rather slow, but the modifications can be done both added and removed very fast. And then we have different degradation rates usually for RNA in proteins. So, we shouldn't be surprised that there are substantial differences between RNA and protein measurements. So, if we just look at one example. So, this is in breast cancer it is ERBB2. So, we look here we have copy number and RNA levels. So, we see that there are is a group of samples that have a higher copy number of this gene of ERBB2. And of course, this is the hard to subtype and we see that that is correlated with transcript levels. So, we have more copies of the gene we have we will have more RNA. So, we have a nice correlation there. So, if we go down and look at the translation we see that also RNA is correlated with the protein here. So, then we have many transcripts we also have more protein. And finally, if we look at one of the phosphorylation sites it is also very highly correlated with the protein levels. So, that is what we at if there is no additional no additional regulation that is what we would expect. So, we have more copies of the gene we get more transcripts and more proteins. But if we look in the same gene in this example in another tumor type. So, in ovarian cancer we first of all we do not have really any copy number changes all the samples have the same copy number they pretty much have the same transcript levels also. And but even though there is the same transcript level we see quite a range of variation of the protein. And almost it is a little bit smaller than the range of proteins in breast cancer, but it is definitely very comparable. And finally, we do not see any correlation between protein and the same phosphopeptide as we see. So, very strong correlation here. So, these are observations that we see and we can look at different genes and we see large differences in between different tumor types, between different subtype just to look at one more example this is one of the keratins. So, here we do not have any. So, we looking at breast cancer ovarian cancer and colon cancer and we do not see any changes in copy number in any of the tumors. We see, but we see very large range difference in range of the RNA levels. So, in breast cancer we have a very large range in ovarian cancer almost no change in transcript levels and in the colon cancer somewhere in between. Then if we look at comparing looking at the translation. So, between RNA and protein breast cancer again we see correlation and in we see some correlation, but it is the range of RNA is very small for ovarian cancer and even though we have quite a bit of variation of the transcript levels there is no the protein levels are constant. So, we see that in this case we have very different regulation of both the transcript and the protein. And so, if we look so, these are just two examples. So, now if we look more globally we see that there is a wide range of so, this is copy number transcript we see that there are some that are highly correlated others are there is a even anti correlated and we see this wide range and for every comparison both transcript protein and protein phosphoprotein. So, then so, what does it mean now? So, we see if we see a correlation in one case we don't see a correlation what do we how do we start thinking about this any suggestions maybe we can start let us say what do we say if we see what any yeah any one. Yes so, and we can also have a case where the proteins are produced somewhere else in the body like there will also we see a lot of blood proteins that will be in the tumor, but there is no RNA there because it is being produced somewhere else. So, there are of all these things and we have the degradation. So, we have a lot of different things going on. Ribosec has a proxy for translation. Yes we can that is yeah. So, ribosec would be somewhere in between here it would measure the actual translation. It would not measure the amount of protein that is there, but it would measure how much is actually translated at the moment. So, that yeah that is definitely that would be going one step in between the RNA and the protein. In today's lecture you got a very broad detailed glimpse of how proteogenomics could help reduce changes in protein sequences due to the mutations in the gene, how to identify the changes in modification sites of proteins. Also in which way proteogenomics could help to understand the changes in the level of protein expression due to change in the gene sequences. In diseases which are characterized by the changes in the protein sequence due to mutations at the genetic level, the proteogenomic analysis could help to provide us the development of diseases specific databases, where the modified protein sequence information could be made available. Taking the example of ERVB2 and using proteogenomics, Dr. Fenner showed that there exists a clear correlation between RNA, protein and phosphoprotein expression in breast cancer. However, same did not hold true for ovarian cancer cases as there was only correlation between RNA and protein, but not the phosphoprotein levels. So, you can see that you know there is no clear pattern depending on each disease specific context, the correlations could actually vary and therefore, these analysis on individual dataset by looking at a specific questions are very relevant. Though proteogenomics studies may not always show a direct correlation at all the levels, they still offer and provide information which could be used to answer questions which are very relevant to disease pathobiology. In the next lecture, you will be introduced to few more concepts of proteogenomics in clinical studies by Dr. David Fanio. Thank you.