 Hello students, I hope you know from the previous sessions you got motivated and you started playing with certain online tools available for you to start analyzing your own data set and more importantly how best to plot your data and start you know visualizing the data, start presenting data and then start making the biological sense of the data. So once you analyze your data, you end up getting a shorter list you know from the very big large big data set which was the starting point from the NGS or MASPEC or microarrays and now you know you have narrowed down to a shorter list of the most significant proteins. But now you would like to put them together into the biological context that what are their roles? These proteins belongs to which pathways they interact with which other proteins are the part of you know the network or the given physiological pathway where that makes more sense, right? So then once you start getting that idea then only probably you will have you know the follow-up experiments plan can I now start thinking about a drug which might be able to inhibit and control a given pathway, right? So many times these clues only comes when you start your big data set, you start looking at the most significant list, narrowing it down and then further you process after looking at the pathways and networks they come up with the hypothesis and then look at what can be the actionable hypothesis to build the next set of experiments. So there are many online tools which are currently available which are really good some of them are you know developed from lot of video resources and funds are provided by the national governments but with mandate to make it publicly available. You can start utilizing these resources and start looking at various example data set and see whether you are getting comfortable in doing these kind of analysis. So today my research scholar deep through this verse he will walk you through different steps involved in looking at the pathway enrichment analysis and network analysis using various tools. So let us start today's lecture. Till now we have learned a lot about statistical power, primary analysis and secondary analysis to generate the expression data set. Now the main important thing which is coming that we got a very good pattern of differential gene regulation. So the questions come what next? I want to start with two important question how do my data relate to known biological function are there specific function that are characterized by gene expression changes. After the secondary analysis what I feel the most important things to do is the tertiary analysis that means identify the functional group. This function group identification is based on different pathway enrichment, network analysis and PPI modules that is protein-protein interaction modules. So in the workflow what I have added is the last one after the identification of the differential gene is identify the functional group. Different kind of softwares that generate different IDs. If we are using any proteome discoverer, commercial software or trans-proteome pipeline they will give you different kind of IDs in the protein identification. But we to start a tertiary analysis we have to get multiple ID and that is possible only through ID conversion. So this is a very basic thing but still I want to take a little time and want to tell you how this ID conversion can be done. So there are mainly three important platform which we can taken into consideration. The first thing is David. David is a multiple ID conversion tool. Apart from this it can help in also different types of annotation and enrichment studies. Next is WebG salt and this platform can also be used for different kind of ID generation. The next is protein identifier cross-reference. This is another platform where we can upload our ID and we can get a multiple ID converted from this software. So David, WebG salt and protein identifier cross-reference is a very simple tool where we just need to put the ID, the list of ID and we can select what are the, what list of ID we will get as a conversion. But there is an important tool that is kegmapper converter ID. That is a very important tool because like other platform we cannot put any other ID in the kegmapper. We have to get the keg gene ID from the kegmapper and then only we can put it in the identification toolbox and we can get the kegmapping pathway. So I will show you a glimpse like how from a test data set we can approach to kegmapper and we can convert the ID to keg gene and after that we can put those keg converted ID into the kegmapper to get the pathway. So I have already shared the test one, the test test one text file you have to open the text file. There is a list of gene that is mainly a GBM repository data set a processed file which I have taken and we will copy paste that list into the keg converter ID. So I have already shared the link in the slide. So please go to the keg ID converter and copy paste the link. So this is the homepage of kegmapper convert ID where in the first outside DB you can choose what is the different NCBI gene ID or NCBI protein ID or uniprot ID you are putting. So as I have given a list of uniprot ID so I will be selecting uniprot ID here. After that the important thing is the what is the organism. So when we are clicking this so we have to write the name of the organism and as I know I have taken the file from the Homo sapiens repository. So I will be selecting the Homo sapiens and I will select the tab and so it is showing here HSA. Here in the enter outbox outside identifier we will copy paste the list of the uniprot gene and then we will select we will click on the execution tab. So it will take a little time and it will give you the complete converted ID from uniprot to keg ID. So as you can see the conversion result is giving you the name of the uniprot list. So we will be copying only the keg ID. So we will click here display keg ID only where we will get the name of the keg IDs. So we will copy paste the complete keg ID from here. So we have to come to the home page back again there is a option of search pathway. So we have to click the option search pathway here and the search pathway dialog box will open. So here we have to paste the keg ID which we have already converted and here we have to select the homosapiens pathway. So after selecting the homosapiens pathway we have to go down and click the execution tab. So after clicking the execution tab you can see the keg mapper has already generated a complete profile of what are the different kinds of pathways are there and how many path how many hits are there in each pathway. So if we click into each of this pathway it will redirect you to the complete to the pathway and you will found that what are the proteins that are present and they are highlighted with a yellow color and the red font. So this is the glimpse of how you will use keg mapper and how you will convert your uniprot ID into a keg ID. Next thing what I want to show you is the WebG assault and I feel this is one of the best software in omics platform where it is giving a complete downloadable data downloadable image from your data sets. So ORA sample run, GACA sample run and NTA sample run are three important platform that WebG assault is providing. Apart from this in 2019 they have also included phosphosite sample run into this software. So I want to show you the different kind of images downloadable images it is providing. So in the left you can see it is providing a complete list of different kind of classification like starting from biological pathways, cell components and so on. In the middle there is a complete list of gene ontology, different kind of gene ontology from your data set what they are coming and how they are linked. Apart from this whatever I was talking about WebG assault gene conversion. So they are also providing a complete conversion of your ID into different kinds of gene name and entrance gene. Apart from this it is also giving a glimpse of the network whatever you are getting through PPI interaction protein-protein interaction module. So now I will be talking about a very new but a widely used software that is Reactome. So Reactome is nothing a database which can help you to link your proteins, link your candidates with a different kind of pathway. This database, this software is so much robust and dynamic that it will not only end up with giving a single pathway rather than it will give you a different kind of sub-pathway and sub-network and followed by single-single reaction. So after clicking the analyzed data, another window is opening which is asking for to submit your data. So here we can submit the data in two way. First choosing a file where we can choose a text file with the name of the candidates. Apart from this we can just simply go to the box and copy paste our candidate gene. After that we have to select the continue. So here are two options. The first option is project to human and the second option is include interactors. Include interactors means what are the proteins that is interacting with your candidates or what are the chemical compounds that mainly are metabolized that is including different kinds of drugs that can be linked with your candidates will be also shown. So we will be clicking here and we will start analyzing the data. So as you can see there is a complete list of different kind of pathways they have given and here they have also given what are the identifiers that are not found that may be due to the upgradation of the databases. So now if we want to check the top pathway which has coming that is related to disease of signal transduction. So when we will be clicking thus that pathway we can the reactant database will show the complete details of the pathway and when we will zoom in the pathway we will find that this has given a complete glimpse what are the different subnetworks that are present. So apart from this if we come to the expression the expression is nothing but whatever the different candidates that are present in the pathway and what are their expression throughout in different tissues are present here. So if we select different kinds of tissues from here and we can find that what is the expression level of that candidate in this tissues. If we go to the molecule tab over here we will find there are couple of options are already available that means chemical compound proteins DNA RNA and drugs. This says what are the different kind of molecules that are present in this pathway of which chemical compounds are mainly the metabolites different kinds of proteins DNA and RNA and different kind of drugs. If I am selecting one disease and we can found there are different kind of sub pathways that are coming and in this sub pathway there are two important symbols that is one is U and one is plus. So if we keep the pointer over here we can find that the U says that this is the updated databases and the plus is it is related to a disease. So likewise reactive gives a lot of information about your data set to large aspect. So now the important thing is like downloading the result file. So here is a tab of downloading the result file where clicking this one will download the result in .csv format. So the .csv format will have all the data sets and all the complete data file of the analysis. So from there we have to select certain criteria like P value or FDR which are already given as you can see. So on the basis of that we have to select and we have to filter the complete analysis. So after sorting and filtering a data on the basis of P value and FDR I have found some top pathways which I will be taking for the next part of the analysis. So as you can see this is this is the table I have taken from the result file and you can find the pathway identifier. These are nothing but the unique identifier ID of each pathway in reactome. These are the pathway name, this is the P value, this is the FDR, species name and these are the hits. That means these what are the proteins from your sample data is matching with this pathway. So this data we can put into any kind of protein-protein interaction module and from there we can take the .cif file or the jason file and we can put it directly into the cytoscape and check what is the visualization is coming. So as cytoscape is having different kind of plugins we can generate different kind of visualization network. But apart from this today I want to show you a very robust visualization visual analytic platform for comprehensive gene expression profiling and meta-analysis that is network analyst. So network analyst will give you different kind of visualization platform where you can do single analysis and multiple analysis even multi gene expression analysis. So now we will go to the next hands on that how the data that we have already generated from the pathway we can link it to a network platform to generate a network analysis. This is the homepage of network analyst where there are four top platforms. First is the gene list input where when you are having a normal gene list with p-value or fold change we can use this one. This one is the multi gene expression tables where multiple gene with different expressions can be checked. This is the gene expression table where micro RNA microarray and RNA sequence data can be done and this is the raw RNA sequence data from where we can take the sequencing data and we can start with. This is the network file where analyzed file of .cif or jsn can be directly incorporated and we can get the visualization. After clicking the gene list input we will be having this homepage where we have to select the organism that is homo sapiens we have to select the ID as we are taking the IDs from the same test one file. So we know that this is a uniprot ID. Then we have to copy paste the ID names. After this we have to select the upload here and if we are getting any kind of duplicates or errors so it will be shown over here. If everything is fine we have to select the proceed option. So first we will check the list enrichment network that is a pathway enrichment network visualization platform where after clicking this you can see there is a complete network interaction module generated from different kinds of pathway. So in the left we can select the reactant database. After this we have to change the background color to white and submit the database. After submission you can see there is already a huge number of pathways are already there and we really do not need this many pathways as it in making the complete network very much complex. So now we will go to our table sort a table that we have generated from the reactant and we have all the pathways that are present and we have taken on the basis of significant p value and we will select those pathway which is already present there like this one downloading signaling matrix signaling of EGFR signaling of FGFR signaling of NGF and like this way we have to select some of the pathways from here and we have to come to and we have to extract this pathways. So after extracting this pathways we can see there are only few pathways which we have extracted. So just give you the glimpse I have selected some few pathways but your data set may have different top pathways that you can take into account. So after this there is a option of view and from here we will be selecting the bipartite network that will not only give you the name of the pathway but also it will give you the name of the proteins. So as you can see whatever pathways we have selected are already present there and apart from this whatever the proteins that you have submitted in your data set is already available now. Now if we select each pathway each proteins from like this and there is a option of label and label the selected nodes. So already these pathways these proteins are already labeled. So by this way we can select different proteins different candidates according to our data set and we can select those and highlight those proteins. Even we can change the color of each protein like if I want to show that this protein is upregulated. So I can put in red color whereas this protein is downregulated. So I can put green color and here again I have to select the label selected protein and it will show that IRS1 is upregulated one whereas the PINCI1 is the downregulated one. So there is a lot of thing that can be done in this list of bipartite network and network analysis. So now we know how to generate a very good pathway enrichment model. So the same way we can go for the protein-protein interaction model. So to get the protein-protein interaction model we have already uploaded our data set in network analysis. We will be choosing the generic PPI. So there is another very good platform that is a tissue specific PPI like if someone is working in brain or someone is working with kidney. So there are already these kind of tissues are already available in the database and they can check. But as I just want to give you the glimpse so I will be choosing the generic PPI where three names of the databases are already there. So these three are the PPI that is protein-protein interaction database. One is IMEX Interactome, String Interactome and Roland Interactome. So people generally use string but IMEX Interactome will give you a very big profile of different kind of interactors that are present which they mainly update their database from the curated literature. So if I am choosing the IMEX interaction and we can see like there is a one sub network with 4677 nodes, 114 to 7 edges, 176 seeds. So from here you can download the .cif file of the interactions and we can upload it again into the network for future use. So now we will proceed and we will found the data the software has generated a complete protein-protein interaction module which is a big module and now I will show you how to make this module small or informative and how to decrease this complexity of the network. So as you can see the software has generated the complete protein-protein interaction module which is really very complex. So first we have to select what is the database we want to choose. So let us go with reactome database and after that we will be changing the color to white. Now I have already given you at in the test one file. So now there is a option of batch selection which says that whatever what are the proteins that we are interested in we can copy paste those protein accession ID and after that we have to click the submit. So after submission we can see there is a highlighted candidates that we can found in this complex network. So now as we are very much interested with these candidates we will select and extract those candidate from this complex network. So after selection and extracting the candidates we found that these are the proteins that we have selected and is present in this network. Apart from this these proteins which are there are the top-notch interactors that is coming in this protein-protein interaction module. So like this we have to make some adjustment to make this network visually interpretable. So that can be also done from their given layout which are different kind of layouts are already available but for reducing the overlapping I will choose this one to reduce the overlap and as you can see the complex network has got some clarity. So after adjusting little manually we can download this one with as a PNG image and we can save the file as a PNG or JPG. So now we know, so now we know how from a dataset we can generate the pathway enrichment model protein-protein interaction model. So after this I will show you an example of a recent paper that got published in Nature Scientific by Fan et al in 2018. So they have given a very good they have used this network analysis software to a large extent and they have given how this network visualization platform can be used to produce this kind of network analysis images. In the first one you can see they have differentiated this protein, differentiated the protein candidate list in upregulated and downregulated manner. And they have also generated different subnetworks like this four subnetworks they have generated and on the basis of different pathway or protein-protein interaction. So now if I want to check like what are the different clusters that is coming in my protein-protein interaction module in terms of pathway that can also be done checking the curated sorted reactive list that we have generated. Apart from this they have given a complete view of what are the proteins present and whether a single candidate is present in multiple pathway or single pathway with this help of the with the help of this diagram. So that is all. Thank you. So as you have got some sense today that all the experiments that are defined are based on certain hypothesis and therefore the data analysis and interpretation becomes really crucial. What is very important that you have to be very unbiased when you are starting on a big data field you have to be very unbiased you have to start from the big data big table and look at what is most significant changes happening looking at the statistical value looking at the p values and then thinking about you know various threshold which you have set it up to you know very high stringency filters which you have applied to obtain a much shorter list which is the most confident candidates the genes or proteins which you would like to take forward. Now based on these then you would like to make some actionable hypothesis and then you would like to do a follow-up experiment to test out is my hypothesis working. If not then you will look for you know other proteins or other set of you know genes on the same list and look at is there any alternate hypothesis which might be more effective right. So you should start with very unbiased way looking at the data at the same time you should also be on top of literature and what is has already been published in the PubMed you know if you go to the various publication of that question what people have already published. So you would like to also make some sort of strong foundation from the publication already available and look at data independently then start trying to map the things together are there certain part of the published report is also getting mapped in your you know unknown data set and if that is the case then of course you are more confident okay people have reported these pathways and these you know set of the proteins which are very significant in their publications and we also see that among our top 100 proteins 50 of them are coming but then there are 50 more which are protein new and unknown proteins and what those proteins are now then you know you are more curiously will be further how to test these out how to ensure that these proteins what we are finding our top hits they are the real proteins right. So the pathway analysis and the network analysis really try to give you the much you know comprehensive picture which is very close to the biological question which you wanted to address. So it's good idea for you to get familiar with these software tools with these bioinformatics aspect of analysis so that you know you can start looking at your data in a very different way which is otherwise not possible just by looking at your excel sheet and just looking at proteins and gene list in the isolation. We will continue more of these discussion in the next lecture till then thank you.