 Welcome to MOOC course on Introduction to Proteogenomics. In today's lecture, Deep Thiru Biswas a PhD scholar from Proteomics Laboratory IIT Bombay will give you a brief idea about pathway enrichment and network analysis. As you are already aware that these next generation sequencing and mass spectrometry based technologies can provide you large omic data sets in a very short time. But how to get meaningful information from the big data? Therefore, doing further analysis using pathway enrichment and network analysis becomes very crucial to address different biological questions. We have learnt a lot about different type of data analysis strategies and how to get significant candidate list whether it is genes or proteins. But until we relate these lists and expression values to obtain the biological insight the whole list or the information remains pretty incomplete. Today, he will discuss about how your data set can be used to generate the functional groups, protein-protein interaction modules and pathway enrichment. He will also discuss about multiple online available tools and how these tools can be used to relate your data set with biological function. So, let us welcome Mr. Deep Thiru Biswas for today's session. Till now, we have learnt a lot about how to design an experiment, what are the conditions, what are the parameters that need to be taken into account for sample collection and performing an experiment. After everything, we have learnt a lot about statistical power, primary analysis and secondary analysis to generate the expression data set. Now, the main important thing which is coming that we got a very good pattern of differential gene regulation. So, the questions come, what next? I want to start with two important question, how do my data relate to known biological function? Are there specific function that are characterized by gene expression changes? After the secondary analysis, what I feel the most important things to do is the tertiary analysis that means identify the functional group. This functional group identification is based on different pathway enrichment, network analysis and PPI modules that is protein-protein interaction modules. So in the workflow, what I have added is the last one after the identification of the differential gene is identify the functional group. Different kind of softwares that generate different IDs. If we are using any proteome discoverer, commercial software or transprotein pipeline, they will give you different kind of IDs in the protein identification. But to start a tertiary analysis, we have to get multiple ID and that is possible only through ID conversion. So, this is a very basic thing, but still I want to take a little time and want to tell you how this ID conversion can be done. So, there are mainly three important platform which we can taken into consideration. The first thing is David. David is a multiple ID conversion tool. Apart from this, it can help in also different types of annotation and enrichment studies. Next is WebG salt. I think Dr. Bing Zhang has already told a lot about WebG salt and this platform can also be used for different kind of ID generation. The next is protein identifier cross reference. This is another platform where we can upload our ID and we can get a multiple ID converted from this software. So, David WebG salt and protein identifier cross reference is a very simple tool where we just need to put the ID, the list of ID and we can select what are the, what list of ID we will get as a conversion. But there is an important tool that is Kegmapper converter ID that is a very important tool because like other platform, we cannot put any other ID in the Kegmapper. We have to get the Keg gene ID from the Kegmapper and then only we can put it in the identification toolbox and we can get the Kegmapping pathway. So, I will show you a glimpse like how from a test data set we can approach to Kegmapper and we can convert the ID to Keg gene and after that we can put those Keg converted ID into the Kegmapper to get the pathway. So, I have already shared the test one, the test one text file, you have to open the text file. There is a list of gene that is mainly a GBM repository data set, a processed file which I have taken and we will copy paste that list into the Keg converter ID. So, I have already shared the link in the slide. So, please go to the Keg ID converter and copy paste the link. So, this is the homepage of Kegmapper convert ID where in the first outside DB, you can choose what is the different NCBI gene ID or NCBI protein ID or uniprot ID you are putting. So, as I have given a list of uniprot ID, so I will be selecting uniprot ID here. After that the important thing is the what is the organism. So, when we are clicking this, so we have to write the name of the organism and as I know I have taken the file from the Homo sapiens repository. So, I will be selecting the Homo sapiens and I will select the tab and so it is showing here HSA. Here in the entire outbox outside identifier we will copy paste the list of the uniprot gene and then we will select we will click on the execution tab. So, it will take a little time and it will give you the complete converted ID from uniprot to Keg ID. So, as you can see the conversion result is giving you the name of the uniprot list. So, we will be copying only the Keg ID. So, we will click here display Keg ID only where we will get the name of the Keg IDs. So, we will copy paste the complete Keg ID from here. So, we have to come to the home page back again there is a option of search pathway. So, we have to click the option search pathway here and the search pathway dialog box will open. So, here we have to paste the Keg ID which we have already converted and here we have to select the Homo sapiens pathway. So, after selecting the Homo sapiens pathway we have to go down and click the execution tab. So, after clicking the execution tab you can see the Keg Mapper has already generated a complete profile of what are the different kinds of pathways are there and how many path how many hits are there in each pathway. So, if we click into each of this pathway it will redirect you to the complete to the pathway and you will found that what are the proteins that are present and they are highlighted with a yellow color and the red font. So, this is the glimpse of how you will use Keg Mapper and how you will convert your uniprot ID into a Keg ID. Next thing what I want to show you is the WaveGisal. I think Dr. Bing Zhang has already talked a lot about WaveGisalt and I feel this is one of the best software in omics platform where it is giving a complete downloadable data downloadable image from your data sets. So, ORA sample run, GACA sample run and NTA sample run are three important platform that WaveGisalt is providing. Apart from this in 2019 they have also included phosphosite sample run into this software. So, as Dr. Bing Zhang has already give a very good glimpse of this software. So, I will I want to show you the different kind of images downloadable images it is providing. So, in the left you can see it is providing a complete list of different kind of classification like starting from biological pathways, cell components and so on. In the middle there is a complete list of gene ontology, different kind of gene ontology from your data set what they are coming and how they are linked. Apart from this whatever I was talking about WaveGisalt gene conversion, so they are also providing a complete conversion of your ID into different kinds of gene name and entrance gene. Apart from this it is also giving a glimpse of the network whatever you are getting through PPI interaction protein-protein interaction module. So, now I will be talking about a very new, but a widely used software that is Reactome. So, Reactome is nothing a database which can help you to link your proteins, link your candidates with a different kind of pathway. This database this software is so much robust and dynamic that it will not only end up with giving a single pathway rather than it will give you a different kind of sub pathway and sub network and followed by single single reaction. So, after clicking the analyze data another window is opening which is asking for to submit your data. So, here we can submit the data in two way first choosing a file where we can choose a text file with the name of the candidates. Apart from this we can just simply go to the box and copy paste our candidate gene. After that we have to select the continue. So, here are two option the first option is project to human and the second option is include interactors. Include interactors means what are the proteins that is interacting with your candidates or what are the chemical compounds that mainly are metabolized that is including different kinds of drugs that can be linked with your candidates will be also shown. So, we will be clicking here and we will start analyzing the data. So, as you can see there is a complete list of different kind of pathways they have given and here they have also given what are the identifiers that are not found that may be due to the upgradation of the databases. So, now if we want to check the top pathway which has coming that is related to disease of signal transduction. So, when we will be clicking this that pathway we can the reactome database will show the complete details of the pathway and when we will zoom in the pathway we will find that this has given a complete glimpse what are the different subnetworks that are present. So, apart from this if we come to the expression the expression is nothing, but whatever the different candidates that are present in the pathway and what are their expression throughout in different tissues are present here. So, if we select different kinds of tissues from here and we can find that what is the expression level of the that candidate in this tissues. If we go to the molecule tab over here we will find there are couple of options are already available that means chemical compound proteins DNA RNA and drugs. This says what are the different kind of molecules that are present in this pathway of which chemical compounds are mainly the metabolites, different kinds of proteins DNA and RNA and different kind of drugs. If I am selecting one disease and we can found there are different kind of sub pathways that are coming. And in this sub pathway there are two important symbols that is one is you and one is plus. So, if we keep the pointer over here we can find that the U says that this is the updated data basis and the plus is it is related to a disease. So, likewise reactive gives a lot of information about your data set to large aspect. So, now the important thing is like downloading the result file. So, here is a tab of downloading the result file where clicking this one will download the result in .csv format. So, the .csv format will have all the data sets and all the complete data file of the analysis. So, from there we have to select certain criteria like p value or FDR which are already given as you can see. So, on the basis of that we have to select and we have to filter the complete analysis. So, after sorting and filtering a data on the basis of p value and FDR I have found some top pathways which I will be taking for the next part of the analysis. So, as you can see this is the table I have taken from the result file and you can find the pathway identifier this is nothing but the unique identifier ID of each pathway in reactome. These are the pathway name this is the p value this is the FDR species name and these are the hits that means these what are the proteins from your sample data is matching with this pathway. So, this data we can put into any kind of protein-protein interaction module and from there we can take the .csv file or the .json file and we can put it directly into the cytoscape and check what is the visualization is coming. So, as cytoscape can is having different kind of plugins we can generate different kinds of visualization network, but apart from this today I will I want to show you a very robust visualization visual analytic platform for comprehensive gene expression profiling and meta-analysis that is network analyst. So, network analyst will give you different kind of visualization platform where you can do single analysis and multiple analysis even multi gene expression analysis. So, now we will go to the next hands on that how the data that we have already generated from the pathway we can link it to a network platform to generate a network analysis. This is the home page of network analyst where there are four top platforms first is a gene list input where when you are having a normal gene list with p value or fold change we can use this one. This one is the multi gene expression tables where multiple gene with different expressions can be checked. This is the gene expression table where microRNA microarray and RNA sequence data can be done and this is the raw RNA sequence data from where we can take the sequencing data and we can start with. This is the network file where analyzed file of .cif or jsn can be directly incorporated and we can get the visualization. After clicking the gene list input we will be having this home page where we have to select the organism that is Homo sapiens we have to select the ID as we are taking the IDs from the same test one file. So, we know that this is a uniprot ID then we have to copy paste the ID names. After this we have to select the upload here and if we are getting any kind of duplicates or errors. So, it will be shown over here. If everything is fine we have to select the proceed option. So, first we will check the list enrichment network that is a pathway enrichment network visualization platform where after clicking this you can see there is a complete network interaction module generated from different kinds of pathway. So, in the left we can select the reactome database. After this we have to change the background color to white and submit the database. After submission you can see there is already a huge number of pathways are already there and we really do not need this many pathways as it in making the complete network very much complex. So, now we will go to our table sorted table that we have generated from the reactome and we have all the pathways that are present and we have taken on the basis of significant p value. And we will select those pathway which is already present there like this one downloading signaling matrix, signaling of EGFR, signaling of FGFR, signaling of NGF and like this way we have to select some of the pathways from here and we have to come to and we have to extract this pathways. So, after extracting this pathways we can see there are only few pathways which we have extracted. So, just give you the glimpse I have selected some few pathways, but your dataset may have different top pathways that you can take into account. So, after this there is a option of view and from here we will be selecting the bipartite network that will not only give you the name of the pathway, but also it will give you the name of the proteins. So, as you can see whatever pathways we have selected are already present there and apart from this whatever the proteins that you have submitted in your dataset is already available now. If we select each pathway each proteins from like this and there is a option of label and label the selected nodes. So, already these pathways these proteins are already labeled. So, by this way we can select different proteins different candidates according to our dataset and we can select those and highlight those protein. Even we can change the color of each protein like if I want to show that this protein is upregulated. So, I can put in red color whereas this protein is downregulated. So, I can put green color and here again I have to select the label selected protein and it will show that IRS1 is a upregulated one whereas the PINCI1 is the downregulated one. So, there is a lot of thing that can be done in this list of bipartite network and network analysis. So, now we know how to generate a very good pathway enrichment model. So, the same way we can go for the protein-protein interaction model. So, to get the protein-protein interaction model we have already uploaded our dataset in network analysis. We will be choosing the generic PPI. So, there is another very good platform that is a tissue specific PPI. Like if someone is working in brain or someone is working with kidney. So, there are already this kind of tissues are already available in the database and they can check. But as I just want to give you the glimpse. So, I will be choosing the generic PPI where three names of the databases are already there. So, these three are the PPI that is protein-protein interaction database. One is IMEX Interactome, String Interactome and Roland Interactome. So, people generally use String, but IMEX Interactome will give you a very big profile of different kind of interactors that are present which they mainly update their database from the curated literature. So, if I am choosing the IMEX interaction and we can see like there is a one subnetwork with 4677 nodes 114, 114 to 7 edges, 176 seeds. So, from here you can download the .cif file of the interactions and we can upload it again into the network for future use. So, now we will proceed and we will found the data the software has generated a complete protein-protein interaction module which is a big which is a big module and now I will show you how to make this module small or informative and how to decrease this complexity of the network. So, as you can see the software has generated the complete protein-protein interaction module which is really very complex. So, first we have to select what is the database we want to choose. So, let us go with reactome database and after that we will be changing the color to white. Now I have already given you in the test 1 file. So, now there is a option of batch selection which says that whatever what are the proteins that we are interested in we can copy paste those protein accession ID and after that we have to click the submit. So, after submission we can see there is a highlighted candidates that we can found in this complex network. So, now as we are very much interested with these candidates we will select and extract those candidate from this complex network. So, after selection and extracting the candidates we found that these are the proteins that we have selected and is present in this network. Apart from these proteins which are there are the top notch interactors that is coming in this protein-protein interaction module. So, like this we have to make some adjustment to make this network visually interpretable. So, that can be also done from their given layout which are different kind of layouts are already available. But for reducing the overlapping I will choose this one to reduce the overlap and as you can see the complex network has got some clarity. So, after adjusting little manually we can download this one with as a PNG image and we can save the file as a PNG or JPG. So, now we know. So, now we know how from a data set we can generate the pathway enrichment model protein-protein interaction model. So, after this I will show you an example of a recent paper that got published in Nature Scientific by Fan et al in 2018. So, they have given a very good they have used this network analyst software to a large extent and they have given how this network visualization platform can be used to produce this kind of network analysis images. So, in the first one you can see they have differentiated this protein, differentiated the protein candidate list in upregulated and downregulated manner. And they have also generated different sub-networks like these four sub-networks they have generated and on the basis of different pathway or protein-protein interaction. So, now if I want to check like what are the different clusters that is coming in my protein-protein interaction module in terms of pathway that can also be done checking the curated sorted reactive list that we have generated. Apart from this they have given a complete view of what are the proteins present and whether the a single candidate is present in multiple pathway or single pathway with this help of the with the help of this diagram. So, that is all. Thank you. I hope today's session was informative for you all where you got an idea how tertiary analysis plays an important role in data analysis. As Deep mentioned to you a data set can be used in different ways to extract the biological information. However, there are many parameters that need to be considered to obtain the meaningful data set. He also showed you that how combination of different software tools can be used to obtain a very good interpretation of your data and what strategies need to be used to obtain more meaningful information. I suggest you to practice the tools mentioned today. You can download data set and explore these kind of tools by yourself. In the next supplementary video we will talk about the case studies in cancer. Thank you.