 In this course, we have discussed about various technologies which provides more functional information, more protein-protein interaction and other biomolecular interactions which gives us the functional consequence of the proteins. We talked to you about different type of technology platforms including protein microarrays, label-free biosensors and even next generation sequencing and mass spectrometry, very free. The idea is that finally all of this data which we generate from the wet lab experiments has to be made available to the public and likewise the whole research community should start sharing the data. And as you have seen in the last few lectures, we are giving you the glimpse and idea that how you can start using the public databases and various resources from which you can download data and now you can use open-source softwares to analyze the data set. In the field of cancer research and especially many clinical research, a lot of data being generated which is very costly and very precious. The large funded program from different governments which aims to investigate thousands of patient sample analysis with a new field emerging which is known as proteogenomics where aim is to look at from the same patient sample can be analyzed both proteins and gene level. And if you could do integrate this information, then probably we are able to get more meaningful information from the same sample type. In today's lecture, one of my research scholar Mr. Deepthru is going to show you how proteogenomics by integrating data from the genome and proteome could provide us some novel insight from the literature reviews of published data sets. He will also talk about how proteogenomic approaches can help to resolve various issues of diagnosing different grades of cancer or looking at different subtype of cancer which is very difficult to understand without having a very good molecular level understanding. He will also explain how proteomics and genomic data correlation can provide a much broader and meaningful picture of progression of cancer. The different accident program funded by National Cancer Institute in US from NIH have really accelerated the whole field of proteogenomics research. Scientists are sharing data obtained from tissue microarrays, protein microarray platforms, next generation sequencer and mass spectrometry and making it publicly available. The cancer genometallus or TCGA is an excellent resource for you to get it started and do more investigation further from the raw data which you can obtain from this portal. And likewise now CPTAC and other ambitious programs are available from which one could accelerate different type of proteogenomics research. Today Deep will talk to you about the workflows of some of the cases studies which were recently published using the workflow of proteogenomics in the area of cancer. So, let me welcome Deep Dhruv for his today's lecture. After the completion of human genome project and introduction of genomics into the disease pathobiology, there was a hope that genomics can lead to can bring revolutionary change in the cancer diagnosis and can lead to a path to personalized medicine. But the success of personalized medicine with the help of genomics was not that much revolutionary. From overall cohort of patients only few patients were respond to the predicted therapy based on the genomic profile. There were some loopholes that were still present after the successful outcome of genomics. So, recent paper published from Zang group there is a clinical potential of mass spectrometry based proteogenomics. So, in this paper he has talked how the clinical potential of the mass spectrometry based proteogenomics can be introduced. The personalized medicine with the help of genomics was not that much successful due to number of reasons. If we can see that with the help of genomics solving the problem like cancer is like jumping from one hurdle to the last hurdle and we are not taking into account a number of conditions and parameters that is coming in between the two hurdles. So, we are getting a complete profile of the genomics different types of mutations, different aberrations, but in the same hand we are missing different epigenetic aberrations, transcriptional regulations, alternative splicens and protein proteomics profiling. So, all these important information need to be taken into account to understand the pathobiology of the cancer and then only this tool can be used for the diagnosis and treatment. So, the message from this slide is that all this information starting from DNA to mRNA to protein need to be considered to reach to the goal and to diagnose to bring a revolutionary change in the cancer and a cancer diagnosis and treatment. So, before I move how proteogenomics is playing a role in cancer diagnosis I want to give a brief account of what is cancer driver genes. So, cancer driver gene is defined as one whose mutation increase net cell growth. The total number of driver gene is unknown, but we assume that is considerably less than 19,000 which has been given by Tokeima et al in 2016. So, from driver DV repository you can see like the top driver genes includes TP53, EGFR, P10 and how this hallmark driver genes are important in the glioblastoma in the glioblastoma tumor genesis we all know. So, here is the mutation profiles of those driver genes where the top driver genes are P10, TP53, EGFR and we can see the mutational profile in terms of samples which is in the x axis. So, if we choose one of the top three cancer driver genes that may be EGFR and we can understand that what is the expression of this EGFR gene in glioblastoma. So, we found that the expression of the EGFR gene in glioblastoma is pretty high. So, one of the top three cancer driver gene in glioblastoma is EGFR and if we want to check the expression of EGFR in terms of in taking into account the other cancer we found in case of GBM EGFR is highly overexpressed in both primary solid tumor and recurrent solid tumor. So, till now the genomics has given a lot of information about glioblastoma, but if we taken into account the correlation between the exon and protein we will found that the driver's core related to protein and exon is also giving some new information. This panel is to display that driver's core distribution of exon and protein position which help researchers quickly find the region of the gene with abundant deletorious mutations. So, now we understand that we did not consider a lot of things between the genomics and the precision medicine that not all mutated genes are stably expressed as proteins and genes that are expressed can be post-translationally modified. Therefore, precision medicine that relies solely on genomic based assay will exclude a lot of potentially relevant information like MI RNA, micro RNA. So, to support the previous statements and to give you a complete glimpse how the powerful tool of proteogenomics can be very helpful to solve different kinds of cancer. So, in this study they have taken 169 ovarian tumor samples from TCGA metadata and they have they tried to analyze rather co-relate the genomics, transcriptomics, proteomics and phosphoproteomics. So, before going into the paper let me give a glimpse of this kind of mutation and how this mutation can lead to lethality of a cell. So, the diagram has been taken from Walsh et al 2015 where we can see the functioning of PARP enzyme and how PARP enzyme is helping in DNA repair of single-strand DNA break. If PARP enzyme is inhibited, so there is no repair takes place and which helps which rather lead to collapse replication fork and the BRCA deficiency do not allow homologous recombination to happen. In C, the deficiency in the HR homologous recombination and base extension repair together lead to synthetic lethality than the co-relation. So, the sample information tumors were selected by examining the associated TCGA metadata to select tumors. On the basis of putative homologous recombination deficiency, presence of germline or somatic BRCA1 or BRCA2 mutations, BRCA1 promoter methylation or homozygous deletion of p-tens were taken. So, this clustering will is giving us the complete landscape of what are the different pathways are involved and how protein and mRNA are playing role and what is the correlation between the protein and mRNA in this pathway. So, till now we understand that the protein and mRNA correlation is there and how this protein and mRNA correlation is also playing a role in terms of biological pathway, but now they also try to understand that how CNA that is copy number aberration in each tumor is playing a role with protein and mRNA correlation. The blue one are the complete profile of the data generated whereas, the black one is the data that is present that is already present in the database. So, from the CNA mRNA correlation and CNA protein correlation they found that 2 important 2 important protein that is CHD4 and CHD5 are having the maximum number of CNAs. So, when they further studied they found that these 2 proteins are involved in chromatic organization. So, to understand the complete biological pathway they took phospho peptides, proteins, transcripts and CNA and they found that these are the top pathways that is playing a role in this cancer pathobiology. So, out of which PDGFR beta which we all know is a angiogenic receptor is also showing an important correlation in terms of biological pathway. To understand the complete landscape of the cancer pathobiology they incorporated mRNA protein and phospho peptide data into one picture and where we can we see that a PGA for beta is upregulated in both mRNA and protein. So, this upregulation of the PGA for beta is not only giving a clue to a active angiogenesis, but also showing that what are the different downstream regulatory factors that are also up regulating or down regulating in terms of mRNA and protein. So, further they tried to do a DDN analysis. So, DDN analysis is differentially dependency network analysis where the proteins curated from the literature and from the C-bio portal. So, C-bio portal helps helps you to get the data out from the TCGA and they identified a subnetwork of 30 protein that displayed co-expression pattern differentiating from HRD from non-HRD patient. And from this DDN analysis they found that histone acetylation or deacetylation proteins are coming are playing are coming into the clusters and which includes HDAC1, RBP4, RBP7, EP300 and HUS1. So, from the last part of the study they understand that histone acetylation and deacetylation are playing an important role. So, this clue was enough to give an idea that acetylated peptides need to be studies. So, from the global proteome data they prepare acetylated peptide database search strategy and identify and quantify the acetylated peptides. From there they identified around 399 acetylated peptides and 50 acetylated significant peptide between HRD and non-HRD. So, as so from this 15 acetylated significant peptide they found that K12 and K16 that is acetylation of lysine in 12 and 16 were found. So, they validated the K12 and K16 using synthetic peptide and targeted analysis using SWAT MS. In the same thing they found that the K12 in terms of ITRAC data were upregulated in HRD negative and same thing has been validated in SWAT and they found the same upregulation in HRD negative. So, they went back and further search in the literature and they found that the acetylation of the H4 has previously reported to be involved in the choice of DNA double stand break DSB repair pathway. The relationship is regulated partially by HDAQ1 a protein also identified in DDN analysis. The potential role of HDAQ in modulating the choice of DSB repair pathway has been identified. So, the conclusion from the study we understand that the activation of PDGFR pathway in patient could potentially stratify selective enrollment in trial of anti angiogenic therapy. Recombinant human monoclonal antibody that blocks the angiogenesis by inhibiting VGFA has already been trial in patients. So, the PDGFR pathway the involvement of PDGFR pathway in this cancer is also giving this recombinant humanized monoclonal antibody role in limelight. Apart from this HRD acetylation K12 and K16 on histone H4 may provide an alternative biomarker of HRD. A rational for this selection of patient in future clinical trials of HDAC inhibitors alone or in combination with PRP inhibition can be also tried. So, the moral from the study we understand the ability of proteomics to complement genomics is providing additional insights into the pathway and processes that drives ovarian cancer biology. Not only the complete data which we are getting from the genomics is not enough to lead to a well-profiled diagnosis and treatment of cancer. So, all the important things like mRNA information, protein information and PTMs the post translational modification information need to be gathered and further correlated among themselves and then only we can reach to a conclusion and we can take this information and further validated in clinical trials. So, now we understand that how cancer driver mutation mRNA protein need to be taken into account to reach to the molecular target or cancer drug. From the last study we understand that how the group has only generated the proteomics data and they have tried to correlate the they are proteomics data with the already available mRNA, CNA data from the databases. So, firehouse can be used to download this kind of data like if we select a disease name that may be glioblastoma multi-form and we can see like all the data which are available in the TCGA can be downloaded from here. So, TCGA data version from 2016 from glioblastoma, clinical SNPs, methylation and mRNA sequencing data and reverse phase protein array data are already available. So, we can use this firehouse to download the data. So, now we are able to understand how proteogenomics and correlation of mRNA and protein can give us better insights of a particular disease, but to deal with this amount of big data prepare a panel which can help in the treatment or diagnosis of cancer. We need to think about different predictive and machine learning based analysis. I have taken an example of a paper a neural network approach to multi biomarker panel discovery by high throughput plasma proteomics profiling of breast cancer. Where in study and study we 40 cancer types and 40 controls were taken whereas in study C 20 cancer types and 20 controls were taken. Further they have done the proteomic analysis and they found that 246 proteins are common between 3 studies. After this analysis they have taken the data and tried to prepare a artificial neural networking model taking study A as a training set, study B as a testing set and study C for validation. So, in this kind of artificial neural networking in most of the cases for the training set maximum that means around 70 percent or more data need to be taken whereas for study B 30 percent data need to be taken. The model further validated with blind data set to check the efficiency of the model. In most of the cases the accuracy of the model need to be more than 80 or 85 percent. So, this artificial neural networking gives a panel base 3 panels with 5 markers and with the accuracy more than 85 percent. So, further this panels were taken forward and checked in large cohort of samples to validate the data. So, like this we can use artificial neural networking and different machine learning strategies to understand and predict top candidates that are playing key role in tumorogenesis and further development of the cancer. So, the main concept is the different protein understand the complete pathobiology and then only the landscape of disease can be drawn and from there we can understand and we can and that can lead to a drug target or precision medicine. So, by now you know that there is a huge amount of data that is available in the public repositories and databases which could be utilized and extracted for the further data analysis. The big research programs like Human Protein Atlas, the Cancer Genome Atlas or TCGA as well as different laboratories working worldwide including broad institute of Harvard and MIT have shared the data into various databases. All of these researchers and scientists are making their data publicly available. More recently now the Cancer Mone Short Project and International Cancer Proteogenome Consortium also aims to investigate the proteogenomic data from the same patients and intention is to make it publicly available and share with the entire scientific community. All you have to do is track the data, perform different type of analysis and then make meaningful insight. You can always define a unique question from the same sample and look at what is the best answer from the large number of sample data sets available to you. You can also compare the data from different laboratories or even integrate data obtained from different population. Look at the effect of the same disease in different geographical locations, different races, different age groups as well as the effect of different treatment or certain diseases which may have you know the recurrence nature. Many of these things could be investigated from these kind of publicly available data set. I hope these manuscript which we have discussed today have given you a very impressive glimpse of how genomics, transcriptomics, proteomics and sometimes even metabolomics together could provide you the much in depth information at the cellular level which was otherwise not possible few years ago. I hope you will be able to use some of these technologies and some of these data sets in your own research. In the next lecture, I will talk to you more about the various revolution which are happening in the field of omics in general of course more in the interactomics and proteomics and try to give you much more sense about what is exactly happening in this whole field which is really remarkable and revolutionary nature. Thank you.