 Welcome to MOOC course on Introduction to Proteogenomics. In last few lectures, we have heard about various ways to analyze pathways from Dr. Kerstin Krug. After understanding about how mutations affect phosphorylation, leading alterations in the signaling pathways, today Dr. Kerstin will talk about how one could use MIMP and GSEA in the hands-on session. He will talk about how to use Github to obtain the basic codes and to use them without actually coding but by manipulating codes as per your data and analysis requirement. Dr. Krug will talk about use of two different formats for GCT files, GCT 1.2 and GCT 1.3 and conditions when one could make use of this format to provide better results. So let us welcome Dr. Kerstin Krug for his last lecture and learn more about usage of MIMP and GSEA. So there will be two parts, the first part that we will try to use to predict mutation events affecting kinase substrate binding. So this relates to the first part of my talk. And here we will try something very experimental, we will try to use R and actually I already prepared an R notebook file that I will talk about more what that is, an R notebook and so on and so forth. But you will find this kind of file here hands-on MIMP.RMD which stands from R mark down in the zip file that you hopefully all downloaded. You have to open R studio and then you just load file, go to file, open file and then you open the R mark down file which is called hands-on slash MIMP.RMD. So what you are looking at is an R, it is a so called notebook. It is R code intermingled with you know just text, who of you have heard about mark down in general, ok. So it is ok, that is great. Markdown, it is a very simple text-based format to create you know structured documents or like HTML pages and stuff like that you know. And there is extensions of that that allow you to execute code in these mark in these documents. Actually there is a link here about R notebooks and if you click on this link you will get some more information here. That is a very convenient way to analyze, to document your analysis results and your code and also to share it with your collaborators. Maybe I quickly show you the result of this whole analysis. At the end of the day you will have an HTML document which looks like that which you just open in your browser and there you will have some documentation about like the different steps. So this is text that you entered so you can describe what kind, what is the goal of my analysis, what is the data input and output. But you will also have all kind of you know R code that has been executed in order to get to that analysis results. And this you can easily share with your collaborators and you can just rerun everything in order to get to these results. So but right now we just focus on that one. So as you see as you scroll down a bit you see you know this block here starts with R. So this is actually R code here in this kind of block. And you can execute this code by just clicking on this little triangle here. Please try to do that. So this might take depending on Wi-Fi connections might take some time because the first part actually sets up the entire analysis. So it downloads a couple of packages again and so on and so forth. So if you could please try to click on this little triangle here. That's the first or it's you know in line 25 that's probably easier. So line 25 if you click on there on the triangle. Which is probably not going to happen because the website is down. But so there we wanted to use some documentation for the property that the CGA sampled. We wanted to try to show you how you would avoid this tool. So yes you can click here on search, search mutations. Then you can upload your VCF file. So you know the VCF file you've learned about. I mean the VCF file is called a very common format. And depending on your pipeline that you use to call your variance you look at a VCF file. It's a pretty standard data format for a very common. So it's not about that you have to create a VCF file, it's more about you should have gotten it from somewhere. If you have shipped your sacros to sequencing center they do the sequencing to get a bank out of that. So running past Q is for the genome, which is back. Then typically you would have some pipeline running in your lab or maybe you're collaborating with someone or it's becoming spiked last and greats. They would take these back parts and you know the phone invocation card. The result of that are VCF files for example. I mean it's not a very complicated format, but I can do it here. So when we install all of the technologies we need to specify more data. And here I have chosen a mutation file and a muscle size file and the cross only database file for one of the TCGA pages again. In order to make it more convenient for you I put everything on GitHub. So it's automatically fetched to the GitHub. So you don't have to go anywhere to download if it will happen automatically. And here you actually have a link to GitHub. So you can also just copy these things. And if you go there, you see it's three files. So one is called mutation.txt, the other is tsize.txt, and the other is support.txt. And again it's basically the format. So can everybody do that? So this is very specific to the software that we are using here, right? So this is now linked, and the mutation file is divided into two columns. The first column is the gene name, and the second column is the non-SEM. You can ask the substitution that this mutation calls, easy as that. So this link will be there also after the workshop, so we can always come back. So the installation succeeded. The installation succeeded because it's called on par with the data sheet. And then we will try to render this markdown document into an HTML report. Let me go and look at the HTML report, because it's just easier to do. I don't know why, but this is now a problem that I can see here. Again, so the link to HTML. So actually before we go, these are the last calculated patterns. However, if you first want to go and look at the HTML document, and if successful, this HTML document should be in the same order. And here we have the entire report as HTML, which includes some description about the project, and also documents all different steps that we've done here. So first we install all of the packages. You also see the output that was generated by the different code chunks here. Here you now have the direct link to GitHub. If you just click on that, you end up on my GitHub page and so on and so forth. Meaning we specify the input data, what a script then does, it downloads the input files, and then what a script does, it imports all of these files and just shows the first couple of entries. You all have that in your HTML file here. So this is the R code that creates that table. Does that make sense? So again, we're looking at the mutation file. We have two columns, gene mutation, gene mutation. The phosphosite file is similar. Again, so this is the code that generates this table. And what happens then is it will open up this result page. So this is what just happened when I ran MIMP. This is also one example that I was showing in earlier this morning. With this result page, here we see there's three mutations that effect in total like 22 possible phospholation events. We see that most of them are losses. So meaning a motif got lost. So the phosphosite is most likely not phospholated anymore by the kinase it used to be. But we also observed two gains. So meaning now we suddenly see like an increase of or potential, a predicted increase of that phosphosite to be occupied. I just go to the last page. So these are the two events where we predicted a gain. And here I just was showing or show you two examples of a phospholation gain. So basically that's the wild type. That's the mutated version. So we see this aspartic acid here, which is now being recognized by like a novel kinase. So this motif fits to that particular kinase. So we predict or we assume that this phosphosite is more likely to be phospholated now. So that's the type of data that we get out of here. The objective or the goal of this hands-on session is to explore breast cancer subtype specific pathways. We're going to use the breast cancer data set that we have used. But we're only looking at two subtypes just to make it a bit easier. We're just going to look at basal and luminary. And the tables that I already created only contain these two subtypes. So we're going to do, so this first exercise is optional. So this would involve morphers. And I'm not sure how well this works here again. So we can skip that. I can just demonstrate how we use morphers or how you could use morphers to create or to convert your protein centric tables or your phosphosite centric tables into gene centric tables. So this is always what you have to do if you want to do path analysis. So again, this is optional. So you have already the correct tables to move on. Let me do two different ways of pathway of GSEA analysis. One is like the classical, so to say, GSEA using this Java application that everybody was able to download or most of you. In the second approach, we're going to use the same data set. So we will use single sample GSEA to project our protein matrix into pathways. And then I actually plan to use morphers again to perform some cluster analysis and marker selection on the pathways. So again, I cannot guarantee that this is going to work due to internet connectivity, but we will at least try to do that. And again, so I tried to make the slides as comprehensive as possible. So you should be able, in theory, to just go home. You have all of the data. You go through the slides and you can repeat these exercises on your own. Just a quick recap. So that's the data set we've heard about this data set a couple of times already. And now we only want to look at basal versus lumen and A. And just by eyeball and protein space or in this proteogenomic space, you clearly see differences. Now we're interested in what are these pathways that are differentiating basal and lumino. And in this case, we're only looking at the cancer hallmark pathways. So it's a very small compact and well annotated curated pathway database. So it's basically just what I said here. So we have two different data formats. And I got a lot of questions about GCT and how to create GCT files and how to open GCT files. These files are simple text files. You can open these files in any text editor. And I would highly recommend to install or to use a text editor on your PC. I would highly recommend Notepad++. This is what I use on Windows systems. And again, if you just Google Notepad++, it will bring you to the right page and you just download that. That's like a general recommendation from my side. So in this hands-on, we are going to use two different versions of GCT. One is called 1.2. That's the older version of GCT, which has been around for 10 years now, I suppose. And since a couple of years, we rise to format and came up with a new one, which is called 1.3. And the only difference is that in 1.2, we only store the data. So we just store the data and we have two annotation columns, two annotation columns to describe the data. So somebody made that hard-coded, so you cannot change that. You always have to have two annotation columns and then you have your samples and your data. GCT 1.3, you can store metadata, which describes your experiment, for example. So for example, so this is like a snapshot of this data that we are going to use. And here's the data in this corner. And on top of that, we have all the metadata that describes the sample. So this is one sample. Here you have the TCGAID and then we have all kinds of information, so which team teacher has been used to quantify that, which subtype it is, HER2, ER2, PR2, PR status and so on and so forth. So the advantage of this format, although it might not be very intuitive in the beginning, but if you are getting used to that, it's very convenient because you store all of your metadata together with your data. You don't have to look through your computer and find metadata that actually annotates your data matrix. What is the GCT file format? That's what you want to know. And how this file format is organized. I mean here, if you spend 10 minutes or so, you will better understand what GCT means and how to create one. We have to use both versions because the Java application does only support GCT 1.2. So this tab is now optional. Let's try to make it work. Let's try to go to Morpheus. And here you have different ways to import your data. You can just browse your computer. If you have it on your Dropbox or you can provide an URL, or you can just simply drag and drop it into this window here. So now what we're going to do, we go to... So that's the zip file. I'm going to quickly extract that here. So that's the one that you downloaded. So now if I now go into this file, you see two folders, GSA and single-sample GSA. We are going to focus on the GSA one. Everybody with me? And here you have two GCT files. So one, you know, proteome-basin-luminal-A 1.2. And the other one says proteome-gene. So this is gene-centric. This is proteome-centric. So in case we are not able to use Morpheus, we already have the gene-centric matrix. That's what I'm going to... This is what I want to say here. But right now we try to... Just as an exercise, I just want to show you how you would do that. And for that you just drag and drop this file, the one without genes, into Morpheus. So drag and drop means you do this. Does that work for everyone? And here you already see, you know, that you have genes that appear multiple times here, right? So these are different isoforms. And, you know, these types basically tell you that we cannot really resolve these isoforms. They have very similar expression patterns because they have many peptides that are shared between those. And for pathway analysis, we need to have a single row for each gene. Does that make sense? If you would open it in Excel, this is what you would get, right? Just Excel, that's the same file. You see the number of genes here, or like 30 inns and a number of sample columns, like 42. Then you have the gene ID, you have some description, and then you have the data matrix. That's 1.2 format. Easy as that. That's the same file, just we look at this in Morpheus. You can easily, I mean, you know, I don't want to go into too much detail, but you can easily... I think in the later slides, I show how to change the annotations. For example, you could go and say, okay, I also want to look at the description, right? So this is highly customizable. And maybe again, you have to spend some, like, you know, a couple of minutes and just play around with your own data, but everything is possible here. But what we want to do, we want to create gene-centric tables. So you click on Tools, that's the first step. Then you click on Collapse. Then you should see there's been no popping up here. I'm going to do the same in parallel here. Tools, Collapse. So then you have to pick the field that I want to use to collapse. And in your case, it's the ID column. So this is the first column here shown, which contains the gene IDs, right? And here you can also specify whether you want to collapse rows or columns. It really depends. We want to combine or collapse different rows. And here you can choose how you want to collapse them, median or mean. Again, it's very, there's no clear answer what, you know, what would work best. So median is usually more robust against any, like, outliers. So we're going to do that. Then I just click OK. Then you will see that you will get a new data tab here. Now you can also go back and forth, right? If I click here, that's the protein-centric matrix. If you click here, that's the gene-centric matrix. So you see that each gene symbol now is listed only once. So it's, you know, very convenient to create these tables. And if you want to download that result table, you can just do so by clicking on File, Save Dataset. You can, you know, pick a GCT version. You can give it a name and so on and so forth. And then you can just click OK. So now you're ready to do GSEA. And that's what we're going to do now, okay? So now I want you, and again, so here is a step-by-step manual how to do all of this, right? So you should be able to do that at home. So now it's GSEA time. And I saw for a large fraction of you guys, you got it to one. So please try to open GSEA, the Java application. So you should be able to see this kind of screenshot here. Or like this kind of window. I tried to do the same on my PC. So if this JNLP file is not automatically associated with Java, like here on my PC, you can just right-click on it. Then you should be able to see Java Web Start Launcher. Launcher. So then you should be able to open the app. And once this is finished, we should be able to see the GSEA window. So now it asks me whether I want to run this application. And I just say yes, run. All right. So it's a little bit small on my screen here, so because of my resolution. Java also or GSEA also comes with a very extensive documentation. And also like the entire user interface is, you know, if you pay some attention, it's very intuitive. Because here on this start page already, they actually describe all the different steps that you have to do. Steps in GSEA. So this is exactly what we're going to do now. So what do we need for GSEA? We need the expression data. So this is our GCT 1.2 file that we just created. We need phenotype annotation. So because we don't have the metadata about our samples in our GCT file because it's 1.2, we need an extra file in order to tell the software what is luminal samples and what are basal samples, right. And that's why GCT is so convenient, because you don't have to worry about any other files where everything is in your file. But in order to make this work, we have to create a phenotype label file. And I'm going to show you how we do that. And we have to just pick a gene set database. And again, so here you can upload your own gene sets. You can download different databases which you can upload here. Or it also, you know, directly links to the MCDB page. So you are sure that you always get the latest version. The phenotype labels are stored in so-called CLS format. Here if you follow that link, you will get again more information about that format. It's again something broad specific that has been used for a while, but now because we have GCT 1.3, you know, that's not really required anymore. But however, for this particular application it still is. And I mean, what is important here is the third line. So the third line contains the same number of, you know, columns here than your GCT file has samples. So in this case, we have, now with CLS file, we will have 42, which is in the same order than the columns in your GCT file. And then you can say, okay, first column is basal, second column is luminal, and so on and so forth. And I already prepared that file, and you can find that in your GSA folder. So which is called phenotypelabels.cls. So if I open that in WordPad, so the first line tells you how many samples do you have, 42. The second line specifies, the second cell here specifies how many groups do you have in your file. So it's two, luminal and basal. And the third one has to be always one. Don't ask me why, but this is what it says on the webpage. The second line always starts with the hashtag here. And then it lists both labels, like a unique, you know, representation of your class labels. And then the third line is the important one, where it just, you know, defined for each sample. And again, this has to be in the same order than your GCT file. And you say, okay, this is the first sample is basal, second sample is basal, third sample is luminal A. Okay? Okay, now let's try to import the data first. So we're going to go through these steps, and this is also the order which is shown here on the left. So the first step is to load data. So you will end up on this page here. So again, you have different options how to import data. And now just make sure that you select the genes version this time. So that's the gene-centric version. And I just drag and drop it here. So now the file is here. And I do the same with the phenotype labels. So we have both files here. And in order to actually load these files, you have to press this button here, load these files. So there will be this pop-up window which tells you, okay, I uploaded two files. These are the names of the files. And files loaded successfully, two out of two, which is promising, and there were no errors. So now just going to hide this window, I just click OK. That's it. And again, if you go back to my PowerPoint presentation, you will see all these steps that we just have done here. Okay, now we are going to the next page, which is called 1GSEA. So here we are going to define parameters that we're going to use during our pathway analysis. So please click on 1GSEA. I'm going to do the same on my PC here. 1GSEA. So here on the first set of parameters, so these are required fields, so you have to define those. So which is, you know, this is an expression data set, the gene set database, number of permutations. So why do we need to do permutations? Say it again? Almost. So we are doing permutations. I mean, if we would just calculate it once, we would get an enrichment score, which wouldn't tell us much because we don't know, you know, what does this score tell me. So we are doing permutations, we are permuting the class labels or samples, repeating this entire analysis 1,000 times in this case. So we will get a distribution of enrichment scores. Then we can go back and actually calculate the probability that our actual enrichment score that we got isn't a tail of the distribution or not, which tells us, okay, this one is significant or it's not. We can use that to calculate p-values, right. So that's the main purpose. So we are generating a background distribution, you know, of false positives enrichment scores, because we randomly shuffle these last labels, so it shouldn't make any sense, right. We get a negative distribution. Then we look where this distribution does our actual enrichment score fall into. It is at a number of permutations, and then we have to specify the phenotype labels. So that's how about we just do it. We just start at the top here. If you click on that, there will be only one data set loaded, right. So we specify that one. Gene set database. So this might take a second or two because now it's connecting to the broad servers. Okay, here we go. So now I'm going to click on here. So it says gene matrix local GMT. Okay, okay. So you have to import this database first, like we imported the GCT file and the class label file. So please go back to load data, and then please go to the single server GSEA folder. So right now we were here in the GSEA. Now I go one folder up, and there will be another one called single sample GSEA. And in that folder you will find this file, H.or version 6.1. So that's the hallmark database. And you can just again simply drag and drop it into GSEA. So now if you go back to one GSEA, we should be able to see the database. Once we get the error message again. Okay, now I'm able to see this file here, right? Okay, so number of limitations we just discussed. So now we load the phenotype labels. So here you can select in the first panel here you can select the source file. So there's only one, right? So you could also have multiple phenotype labels and you can play around with different ones. But here we only have one. But here now it actually lets you select what kind of comparisons do you want to do. Do you want to do basal versus luminal? Or do you want to do luminal versus basal? Right, you can just, you know, pick one. And I just leave it at basal versus luminal. And I click OK. Okay, now we're almost done. So the next option here is actually very important. So collapse dataset to gene symbols. So that's what we have done already in Morpheus. So just keep in mind that this software has been developed in 2005. I mean, not this particular software itself, but like the principle of gene set enrichment analysis. So there was no proteomics and no RNAseq, no, I mean, you know, there was no RNAseq and no proteomics to the extent that we know now. So this has been developed for micro areas, right? And the software comes with the option to collapse micro area probes to genes. So this is what this option is for. So we are not able to use that here. So that's why we have to deselect that. And just say false. Use dataset as is. So that's what we want to do. The permutation type, you can either select whether you want to do permutations on your phenotypes, meaning on your sample columns, or you can do the permutations in your gene sets. So why would you do, why would you have to choose or change this option? So can you think of a scenario where you cannot do your permutations on your phenotype labels on your sample columns? What would that be? One is based on a phenotype and the other is based on a gene set. So in option one, we would permute the sample columns and the phenotype labels. In option two, we would permute the gene sets. We would randomly generate gene sets, nonsense gene sets to create our background distribution. Exactly. If you have a sufficient number of samples and I would already consider 40 or so as, you know, sufficient, you can do the permutations across your samples. If you just have two biological replicates, you cannot do permutations on two replicates, right? So then you would choose gene set. But in this case, we have like 40 samples in total. It's totally fine to do our permutations there. Okay, so now we actually filled out all required fields. So now we can expand the basic fields and we're just going to do some small adaptions here. So first of all, we can just give it an analysis name. Hands-on IIT workshop. What is probably the most important option here? I mean, you know, in principle, you don't have to worry about these kind of parameters. What you would have to worry about probably in some scenarios is how you do your ranking. If you remember, so GSEA works on a ranking of your genes. So it would rank, in our case, luminal versus basal. So it has to do some sort of marker selection or some sort of ranking that differentiates luminal from basal. So the default option is signal to noise, which is basically, if I'm correct, it's basically the average between luminal and basal. The difference in average is divided by the product of the standard deviations of both groups. That gives you a measure. It's basically the full change divided or the full change between luminal and basal scaled by the standard deviation. If you have a high full change, but a high standard deviation as well, you would end up with a lower full change. Whereas if you really have a high difference, a high full change, and a very low standard deviation, your denominator would be very small and you would still get a high full change. You could also do, I think this requires at least, they say on a web page, you should have at least three or so samples in each phenotype if you want to do that. You could also do, for example, t-test. That's probably the second that I would recommend. Or other metrics like Euclidean distance between luminal and basal or correlation and things like that. There's different metrics, how to rank your genes. In general, I would recommend to just leave it as signal to noise as long as you have sufficient number of samples. So, we're going to leave that here. And, you know, so this might be interesting for you because that's the folder where you can find the results afterwards. So, this is where GSA stores its results. And you can also change that folder, but this is like per default, you will find the results here. And these other filters here, you can exclude gene sets that have fewer than 15 members and more than 500 samples. I mean, these are pretty good default parameters, so you don't have to worry about them too much. Okay, I think this should be everything that we need to actually perform the GSA analysis. And in order to run that, you just have to click this little one button here. Okay, now it shows success too for me, so let's take a look at the results. And you can just click on success and then there should be a page, it should pop up like an HTML report which summarizes your results. That is false detection rate, right? FDR is a false discovery rate. False discovery rate. Exactly. So, that has to be less. Yes. This is basically the fraction of, let's say we have like 100 pathways, right? And if it says you have an FDR of 5%, this tells you that 5 pathways are actually false positives in your list of significant pathways. Yes, I was actually genuine. Yes, so here the default parameter in GSA is 25, which is very loose, right? But you can also adjust that parameter. So, we are looking at FDR, so false discovery rate smaller than 25%. So, 25 is actually pretty high, right? So, that's the default setting here. And you know, there's probably nothing that you would put in a paper, right? But it helps you to, you know, get a first glimpse on your data. You will have all of these results in an Excel sheet as well where you have the FDRs and then you can just filter or only look at the pathways at a certain FDR. So, this is basically, I think this parameter has been used in your Wichita study back in 2005 and it still made it until the latest version here. So, it's basically a summary of, you know, very high level summary of your results. So, you have like these two blocks here. The first block tells you this is the enrichment and phenotype basal. Everybody with me? So, we see we have 19 samples in basal. The second block is enrichment and phenotype, Blumenet A, where we have 23 samples. Then you can get some, you know, high level summary here. So, for example, 34. So, I told you that the Hallmark database has 50 gene sets. So, that's why here it says 34 out of 50 gene sets are upregulated. They have a positive enrichment score. So, they are more enriched in basal compared to luminal. In luminal we only have 16, right, and this should add up to 50. This does not mean that they are significant or whatsoever, right. This just tells you the direction. So, here at FDR 25 percent and again this is very high, I know, but again that's a summary and it tells you it's 5 gene sets that are below 25 percent for basal. And there is 2 gene sets that are below 25 percent for luminal A. And all of these, you know, this entire page is again, these are different hyperlinks that will forward you to the actual results. So, here you have a summary about your data sets. You are looking at 11,000 genes and so on and so forth. And here is the summary about your gene set database. And also, you have very detailed and very extensive documentation about GSA because it is such an old software, old approach, very well developed, very well maintained and curated. You have a lot of documentation and tutorials online. And here already you find the direct link how to interpret the results. You can just click on that and you will find all the information you need in order to make sense out of this result page. So, what I am going to show you and here you also have a direct link to the Excel sheet, as you can see, right. If you click on that, it should be an Excel sheet. So, that is now the path values for Bazel, right. And here you also have the FDR value where you can just focus on the first two which are below 5 percent and then below 10 percent, right. Okay, let us look at an example here. So, what I did now, I clicked on enrichment and phenotype Bazel. I clicked on detailed enrichment results in HTML format. So, this is kind of a similar table that you got in Excel format, but now it is in HTML and you have these different hyperlinks here. So, the most significant or most differential pathway is apparently G2M checkpoint, so some cell cycle. So, now for each of these gene sets or pathways, you can click on G as details. And you will actually get these enrichment plots, right. So, here we are looking at the G2M checkpoint signature. We have the P value and the FDR value, which is associated with this pathway. And we also have this enrichment plot here. So, in the x-axis, we have the genes which are ranked ordered according to the differential expression between Bazel which is shown on the left and Luminal A which is shown on the right. So, these are all the genes here in this area. All the genes that are more abundant in Bazel, here are the genes that are more abundant in Luminal A. And all of these vertical bars in this case are members of that particular pathway. And again, just by eyeballing, you see, you know, this cluster of members here, right, which are, which basically do cluster among genes that are very abundant in Bazel subtype, right. And if you calculate the enrichment score. So, if you now look at another example for Luminal A, we can also look at a snapshot of enrichment in the results. This should be like a summary about all of the pathways. So, what are the, so this is the most significant ones. This is the second most significant one. And we see both of them, both of them are estrogen-related, estrogen response early, estrogen response late. Does that make sense? So, we are comparing Luminal A with Bazel. Now, we are looking at Luminal specific pathways. And many of these Luminal tumors are positive. And this is actually the Horma cancer pathways that we are seeing here in this set. So, what is shown here is the signal to noise that we have calculated. It also says that here. And here, according to that, this is the ranking of my genes in my entire data set. So, we had like 11207 genes in this data set. And this is the ranking according to the signal to noise statistic comparing Bazel and Luminal A subtypes. So, again, so what these genes here are more abundant in Bazel subtype. These genes on that side here of the ranking are more abundant in Luminal A subtype. And again, so here we see a clear enrichment. So, these vertical bars, again, are members of this particular gene set. The estrogen response early. And we see a clear cluster of these members here in genes that are more abundant in Luminal A subtype. So, I think the most difficult part is to get the data into the right format. And I gave you some hints to use Morpheus and so on and so forth. In today's lecture, I hope you learned how to use R-scripts and incorporate data in MIMP and GSEA-2s for understanding and visualization of the data. Use of Morpheus to convert your data set, which is protein centric or phosphocentric to gene centric, which can be used as an input for GSEA. The next session is going to be, again, hands-on session in which Dr. Bing Zhang will talk about how one could use linked omic tools. Thank you.