 Welcome to MOOC course on Introduction to Proteogenomics. In the previous lecture, Dr. Bing-Jan provided you the capabilities of an online tool, Linkedomics. In today's lecture, you will be exposed to the various steps involved in analyzing large dataset using Linkedomics. So, let us welcome Dr. Bing-Jan for his last lecture. So, let us go to the new analysis and then there are a few steps you need to follow in order to perform this analysis. Let us start with proteomics data. So, basically we want to ask which proteins in ovarian cancer are associated with poor prognosis and which protein is associated with poor prognosis in ovarian cancer. In order to do this, first you have to identify the ovarian cancer dataset to do the analysis or ovarian cohort. So, you can browse, but let us do a search ovarian or yeah and then you can identify this TCGA underscore oV that is a cohort we want to look at basically it is a TCGA ovarian cohort and then you click on that. And then it will give you the next step, select a search dataset. So, as I mentioned the goal of the Linkedomics is allow you to go from any attribute. For example, you can go from the mRNA or microRNA or mutation or proteomics to any other attribute, but here we have a fixed question we are interested in survival right. So, survival is a type of clinical data. So, in within the step 2 we want to define the search dataset and the search attribute. So, here we want to start with clinical data let us say you feel to for clinical data and there is only one type let us say you select the clinical data. And now you have a step 2 B as an option, but let us say if you are only interested in for example, a certain stage of ovarian cancer or maybe you are only interested in a certain subtype of the cancer and then you can do the analysis for the subset of tumors. But today let us just say we do the analysis for the whole cohort and then we select the search within the clinical data right then multiple types of clinical information like the platinum resistance status or overall survival. So, today we are going to select the overall survival. Now basically you defined what to use as a query attribute basically it is within the TCGA ovarian cohort I am interested in the overall survival. And then I want to ask and which proteins are associated with overall survival right. So, then on the step 4 the data type let us select the proteome. So, for this cohort both John Hopkins University and the PNNL they both did proteomics data generation for kind of overlapping subset of the cohort. But today let us just say we choose a PNNL data you click on the select next to the PNL data. But here you after we finish this proteomics analysis you can come back and then you can select for example RSIQ and then you can correlate the overall survival to RSIQ or you can select the copy number and then you can correlate survival to copy number. But now we are going to do the proteomics only. And on the step 5 let us select cox regression. So, this is for survival analysis and then submit query any query that has been queried by other people because some other people already done this analysis and the result is saved. So, that is why it is so quick and now you get this and then you click on the view. And you can also search the gene you are interested in for example Keras or some other genes let us say ovarian cancer what gene could be interesting. Anyone has any suggestion on which gene you want to search for? Let us look at the Keras. It looks like it is not significant the P value is 0.6, TP53, TP53 it is not significant either but it is apparently there is a little bit trend like higher expression is associated with poor survival yeah. You can also go to the bottom of this page and this will show you the top 50 genes I do not know 50 or 25 genes that are most positively correlated with survival or most negatively correlated with survival. So, we found this to be useful for yes get a quick view of the top genes and the detailed data. It is a binary data it is subtype 1 versus other subtype or and then you just have a black like this two parts right. But for survival data it is only you have death event and the still alive or dead right. But then you also have the time of survival or so basically the first part this part at the bottom shows you these are probably the people who have died and these are still alive and then within each group and then this is a survival time. So, some most of these are censored data right most of them have not died. So, it is censored, but these are the ones with actually death event. So, basically two group of people and then within each group you have I mean survival time from short to now. So, I think and then if you want to do some password analysis for this you can click on the inter interpreter and then you can do ORA or GSEA similar to what we have done before. But if you submit the query and basically it will give you the Webcast report as we have done that before. So, let us say we do not need to do that now rather I think we can go back to perform a new analysis and in this time you still do the same analysis, but you ask a slightly different question you ask which copy number change is associated with survival in this awareness cohort and after that you do another query which in the R and C data which MLA abundance is correlated with survival and after you get results from all the three platform and I can show you how to do show you how to do the link compare. So, you can go back to new analysis and repeat what we have done, but just change the target set target data set. The copy number is SCNA and under for if you filter the data type for SCNA and then you get multiple four rows right. Because the results can be we have results at the both focal level and the gene level and then at the gene level the data can be thresholded or without threshold in this case we pick the one without threshold pick the third one. Yeah, we pick the SCNA gene level 5-hose G-spec 2 that is the algorithm used to do the analysis. Oh sorry at the very top you still select clinical I am sorry yeah we should select clinical and it is in the target data set that is a somatic copy number. Yeah it is step 4, step 4 SCNA copy number and then you can also view the results for copy number the same way. Okay let us see if you have were able to get 2 or 3 results from this for example if you get all the 3 ISTIC copy number and the proteome in this OV data set associated with survival. Now you can do the link compare and what you do is you select on the third column this select column you select let us see all the 3 of them and then you click on compare. Now you have this table with all the results so basically in the first few columns you have the 3 results from the 3 platforms let us see this is ID 24 correspond to this ID 24. So this is the results from R6 and then you have ID 56 this correspond to the results from copy number and ID 57 this is the results from proteomics. So basically each of this you have the signal strength meaning the hazard ratio and then you have the p-value FDR right and then here we use the sum Z which is a meta analysis method try to summarize the p-value from the 3 analysis and then you get the sum statistic and then sum the p-value and some FDR. So let us sort the results based on the p-value and as you can see now before for individual platform we get very few significant genes right in proteomics we did not get any gene that pass the FDR cutoff but now we can see after you integrate 3 platforms there are a lot more significant genes than individual platform this indicate although each platform do not give you the signal strong enough to pass the cutoff but maybe all of them point to the same direction that give you more confidence. So when you put the result together and you get the enrichment and then you can look at the top genes to see their minus log meta FDR meaning this is integrated FDR and these are the top genes positive genes and the negative genes and then you can also have the result in the heat map. You can also redo the filtering and for example you run the filter rather than setting the FDR is 0.05 let us say if you relax this to 0.1 submit filter this will give you more genes to look at. So basically you can adjust this to get the genes that pass your cutoff for example now if we relax the FDR and then we get more genes to look at and all these results also can be downloaded and saved by click on the download figures. So again and you can go to the link interpreter and for example select GSEA and then now you can submit this some this statistic as rank metric to rank the genes and then the analysis will be performed the enrichment analysis will be performed against this summarized statistic rather than individual ones. Again the result will be the same very similar to what we saw before for the GSEA in Webcast. In today's session you got a demonstration about how the three modules of linked omics work using ovarian cancer TCGA dataset as an example. You have also shown how the tool can be used to generate survival information from a target gene using data from RNA-seq, copy number variation and proteome level information on the effect of target gene on survival was demonstrated. Finally using the third module of the tool it was demonstrated how the results appear and it can be interpreted. In conclusion we hope that now you have a fair bit of idea how you can use these available tools and use them for your own research. Thank you.