 Welcome to MOOC course on Introduction to Proteogenomics. In today's lecture, we are going to hear about a case study relevant for cancer proteogenomics. We are going to hear from Mr. Deepthiru Biswas about how cancer proteogenomics research could be helpful to provide some novel insights from the literature reviews, some published data sets. His research is called in proteomics lab at IT Bombay and he will talk about how proteogenomics approaches can help in resolving various issues of diagnosing various grades of cancer or looking at different subtypes of cancer which is very difficult to understand without having a very good molecular base understanding. He will also explain how proteomics and genomic data correlation can provide a much broader and meaningful picture of progression of cancer. He will try to also provide you the workflows of some of the case studies published in the areas of cancer proteogenomics. So, let me welcome Deepthiru for his today's lecture. Welcome participants. Till now, you have learned a lot about proteomics and genomics, how to design an experiment, how to what are the condition that to be taken into account, but whether to consider proteomics or whether to consider genomics. Already a number of debates are going on and you have also heard that whether proteomics is powerful or genomics. To support this hypothesis of proteogenomics, I want to give you a glimpse of how the powerful tool of proteogenomic can be used in cancer diagnosis and treatment. After the completion of human genome project and introduction of genomics into the disease pathobiology, there was a hope that genomics can lead to can bring revolutionary change in the cancer diagnosis and can lead to a path to personalized medicine. But the success of personalized medicine with the help of genomics was not that much revolutionary. From overall cohort of patients, only few patients were respond to the predicted therapy based on the genomic profile. There were some loopholes that were still present after the successful outcome of genomics. So, recent paper published from Zang group there is a clinical potential of mass spectrometry based proteogenomics. So, in this paper he has talked how the clinical potential of the mass spectrometry based proteogenomics can be introduced. The personalized medicine with the help of genomics was not that much successful due to a number of reasons. If we can see that with the help of genomics, solving the problem like cancer is like jumping from one hurdle to the last hurdle and we are not taking into account a number of conditions and parameters that is coming in between the two hurdles. So, we are getting a complete profile of the genomics, different types of mutations, different aberrations, but in the same hand we are missing different epigenetic aberrations, transcriptional regulations, alternative splicings and protein proteomics profiling. So, all this important information need to be taken into account to understand the pathobiology of the cancer and then only this tool can be used for the diagnosis and treatment. So, the message from this slide is that all this information starting from DNA to mRNA to protein need to be considered to reach to the goal and to diagnose and to bring a revolutionary change in the cancer and cancer diagnosis and treatment. So, before I move how proteogenomics is playing a role in cancer diagnosis, I want to give a brief account of what is cancer driver genes. So, cancer driver gene is defined as one whose mutation increase net cell growth. The total number of driver gene is unknown, but we assume that is considerably less than 19,000 which has been given by Tokeima et al in 2016. So, from driver DV repository, you can see like the top driver genes includes TP53, EGFR, P10 and how this hallmark driver genes are important in the glioblastoma in the glioblastoma tumor genesis we all know. So, here is the mutation profiles of those driver genes where the top driver genes are P10, TP53, EGFR and we can see the mutational profile in terms of samples which is in the x axis. So, if I if we choose one of the top three cancer driver genes that may be EGFR and we can understand that what is the expression of this EGFR gene in glioblastoma. So, we found that the expression of the EGFR gene in glioblastoma is pretty high. So, one of the top three cancer driver gene in glioblastoma is EGFR and if we want to check the expression of EGFR in terms of in taking into account the other cancer we found that GBM is having the most in case of GBM EGFR is highly is over expressed in both primary solid tumor and recurrent solid tumor. So, till now the genomics has given a lot of information about glioblastoma, but if we taken into account the correlation between the exon and protein we will found that the driver's core related to protein and exon is also giving some new information. This panel is to display that driver's core distribution of exon and protein position which help researchers quickly find the region of the gene with abundant deletorious mutations. So, now we understand that we are we we did not consider a lot of things between the genomics and the precision medicine that not all mutated genes are stably expressed as proteins and genes that are expressed can be post-translationally modified. Therefore, precision medicine that relies solely on genomic based assay will exclude a lot of potentially relevant information like MI RNA, micro RNA. So, to support the previous statements and to give you a complete glimpse how the powerful tool of proteogenomics can be can be very helpful to solve different kinds of cancer. So, in this study they have taken 169 ovarian tumor samples from TCGA metadata and they have tried to analyze rather correlate the genomics, transcriptomics, proteomics and phosphoproteomics. So, before going into the paper let me give a glimpse of this kind of mutation and how this mutation can be very can lead to lethality of a cell. So, the schematic I have taken so the diagram has been taken from Walsh et al 2015 where we can see the functioning of PARP enzyme and how PARP enzyme is helping in DNA repair DNA repair of single-strand DNA break. If PARP enzyme is inhibited so there is no repair takes place and which helps which rather lead to collapse replication fork. And the BRCA deficiency do not allow homologous recombination to happen. In C the deficiency in the HR homologous recombination and base extrusion repair together lead to synthetic lethality than the correlation. So, the sample information tumors were selected by examining the associated TCGA metadata to select tumors. On the basis of putative homologous recombination deficiency presence of germline or somatic BRCA1 or BRCA2 mutations, BRCA1 promoter methylation or homozygous deletion of p10s were taken. So, this clustering will is giving us the complete landscape of what are the different pathways are involved and how protein and mRNA are playing role and what is the correlation between the protein and mRNA in this pathway. So, till now we understand that the protein and mRNA correlation is there and how this protein and mRNA correlation is also playing a role in terms of biological pathway. But now they also tried to understand that how CNA that is copy number aberration in each tumor is playing a role with protein and mRNA correlation. The blue one are the complete profile of the data generated whereas, the black one is the data that is present that is that is already present in the database. So, from the CNA mRNA correlation and CNA protein correlation they found that two important two important protein that is CHD4 and CHD5 are having the maximum number of CNA CNAs. So, when they further studied they found that these two proteins are involved in chromatic organization. So, to understand the complete biological pathway they took phospho peptides, proteins, transcripts and CNA and they found that these are the top pathways that is playing a role in this cancer pathobiology. So, out of which PDGFR beta which we all know is a angiogenic receptor is also showing an important correlation in terms of biological pathway. To understand the complete landscape of the cancer pathobiology they incorporated mRNA protein and phospho peptide data into one picture and where we can we see that the PGA for beta is upregulated in both mRNA and protein. So, this upregulation of the PGA for beta is not only giving a clue to a active angiogenesis, but also showing that what are the different downstream regulatory factors that are also upregulating or downregulating in terms of mRNA and protein. So, further they tried to do a DDN analysis. So, DDN analysis is differentially dependency network analysis where the proteins curated from the literature and from the C bioportal. So, C bioportal helps you to get that data out from the TCGA and they identified a subnetwork of 30 protein that displayed co-expression pattern differentiating from HRD from non HRD patient. And from this DDN analysis they found that histone acetylation or deacetylation proteins are coming are playing are coming into the clusters and which includes HDAC1, RBP4, RBP7, EP300 and HUS1. So, from the last part of the study they understand that histone acetylation and deacetylation are playing an important role. So, this clue was enough to give them give an idea that acetylated peptides need to be studies. So, from the global proteome data they prepare acetylated peptide database search strategy and identify and quantify the acetylated peptides. From there they identified around 399 acetylated peptides and 50 acetylated significant peptide between HRD and non HRD. So, as so from this 15 acetylated significant peptide they found that K2L and K16 that is acetylation of lysine in 12 and 16 were found. So, they validated the K2L and K16 using synthetic peptide and targeted analysis using SWAT MS. In the same thing they found that the K2L in terms of eye track data were upregulated in HRD negative and same thing has been validated in SWAT and they found the same upregulation of same upregulation in HRD negative. So, they went back and further search in the literature and they found that the acetylation of the H4 has previously reported to be involved in the choice of DNA double stand break DSB repair pathway. The relationship is regulated partially by HDAQ1 a protein also identified in DDN analysis. The potential role of HDAQ in modulating the choice of DSB repair pathway has been identified. So, the conclusion from the study we understand that the activation of PDGA for pathway in patient would potentially stratify selective enrollment in trial of anti angiogenic therapy. A recombinant human humanized monoclonal antibody BEVA Kizumab that blocks the angiogenesis by inhibiting VGFA has already been trial in patients. So, the PDGA for pathway the involvement of PDGFA pathway in this cancer is also giving this recombinant humanized monoclonal antibody role in limelight. Apart from this HRD acetylation K12 and K16 on histone H4 may provide an alternative biomarker of HRD. A rational for this selection of patient in future clinical trials of HDAC inhibitors alone or in combination with PRP inhibition can be also tried. So, the moral from the study we understand the ability of proteomics to complement genomics is providing additional insights into the pathway and processes that drives ovarian cancer biology. So, we understand that how not only the complete data which we are getting from the genomics is not enough to lead to a well-profiled diagnosis and treatment of cancer. So, all the important things like mRNA information, protein information and PTMS the post translational modification information need to be gathered and further correlated among themselves and then only we can reach to a conclusion and we can take this information and further validated in clinical trials. So, now we understand that how cancer driver mutation mRNA protein need to be taken into account to reach to the molecular target or cancer drug. From the last study we understand that how the group has already has only taken the has only generated the proteomics data and they have tried to correlate the their proteomics data with the already available genomics already available mRNA, CNA data from the databases. So, firehouse can be used to download this kind of data like if we select a disease name that may be glioblastoma multi-form and we can see like all the data which are available in the TCGA can be downloaded from here. So, TCGA data version from 2016 from glioblastoma, clinicals, SNPs, methylation, MIR, mRNA and mRNA sequencing data and reverse phase protein error array data are already available. So, we can use this firehouse to download the data. So, now we are able to understand how proteogenomics and correlation of mRNA and protein can give us better insights of a particular disease, but to deal with this amount of big data prepare a panel which can help in the treatment or diagnosis of cancer. We need to think about different predictive and machine learning based analysis. I have taken an example of a paper a neural network approach to multi biomarker panel discovery by high throughput plasma proteomics profiling of breast cancer, where in study N study we 40 cancer types and 40 controls were taken, whereas in study C 20 cancer types and 20 controls were taken. Further they have done the proteomic analysis and they found that 246 proteins are common between 3 studies. After this analysis they have taken the data and tried to prepare a artificial neural networking model taking study A as a training set, study B as a testing set and study C for validation. So, in this kind of artificial neural networking in most of the cases for the training set maximum that means around 70 percent or more data need to be taken, whereas for study B 30 percent data need to be taken. The model further validated with blind data set to check the accuracy to check the efficiency of the model. In most of the cases the accuracy of the model need to be more than 80 or 85 percent. So, this artificial neural networking gives a panel base 3 panels with 5 markers and with the accuracy more than 85 percent. So, further these panels were taken forward and checked in more checked in large cohort of samples to understand to validate the data. So, like this we can use artificial neural networking and different machine learning strategies to understand and predict top candidates that are playing key role in tumorigenesis and further development of the cancer. So, the main concept is the different protein understand the complete pathobiology and then only the landscape of disease can be drawn and from there we can understand and we can and that can lead to a drug target or precision medicine disease can be drawn and from there we can understand and we can and that can lead to a drug target or precision medicine. I hope from today's lecture you got a glimpse of what is happening in literature the most recent and very promising cancer proteogenomics studies especially the CPTAC, National Cancer Institute based work and those publications have made huge impact and brought the confidence about using genomic proteomic tools together and try to provide the novel insights in looking at different type of cancers. We also got a glimpse of the workflows involved in doing these experiments which I think can provide you in a good way of thinking how you can try to utilize these tools of genomics and proteomics in your own research and then try to correlate the data analyze the data and bring something very novel which may not be possible otherwise. So, let me thank Dipthi Rup again for today's lecture and we will continue more interesting sessions in the next lecture. Thank you.