 So this morning, we spent a lot of time talking about pathway enrichment analysis. And this just reviews what we saw this morning, which you guys know about. And the sort of typical way that you visualize the results of this is as a table. Obviously, pathway enrichment analysis is very useful. Tens of thousands of papers have used it. But the way that almost everybody views the results as this table is not the nicest way of seeing the results. Because one of the things that you find if you go through these lists is that there's a lot of pathway names that are related to each other that are spread out all over the list. And so if you want to see the general themes, there's some redundancy. So for instance, if I go through this list, there's a lot of pathways related to the immune system. But some of them are, if you weren't immunologist, you might not, or if you didn't learn immunology, you might not know that all of them were related to the immune system. So not all of them say immune in them, for instance. But they're probably as an immune theme here. But it's not easy to tell. And so you'd have to kind of go up and down this list. So what we've done with enrichment map is we took the network concept that I just talked to you about, and we're visualizing the results of the enrichment analysis as a network. And why would we do that? Because we can see the relationships between the pathways, and we can see patterns in that that we would otherwise be hard to see as a table, as I explained before. So as you guys learned this morning or earlier, you can use GSEA to get your pathways are enriched in condition A versus B and B versus A. So this is the up-regulated guys, and this is the down-regulated guys. And so what an enrichment map does is it takes each gene set or pathway, visualizes it as a node or a circle. The lines connecting the circle, the size of the circle, is proportional to the number of genes in the gene set or in the pathway. The color of the circle is proportional to the significant score. So the stronger the color, the more significant. Up-regulated is red, down-regulated is blue. The edges connect nodes. The lines connect the circles. If there's crosstalk among the pathways or if the pathways share genes, if they have genes in common. So you might see this if you have multiple versions of the pathway from different databases. So many databases will have a pathway called cell cycle. So you'll see all of them coming back and all of you highly related to each other. OK, so I'll go through how this sort of some use cases. So we, in the original paper for this, found a publicly available gene expression data set where they looked at estrogen treatment of breast cancer cell lines. So they were looking at a particular time point, and they had three replicates of treated and three replicates of untreated. And we used the gene ontology, biological process or gene ontology in general, to do the pathway analysis in this case. So the differential expression compares treated versus untreated. And this is the enrichment map that results. So as I said, all the nodes are pathways. And instead of seeing, so you can sort of see in the zoom-in, they have names like microtubule organizing center. Sorry, my mouse pointer. Microtubule organizing center. Microtubule cytoskeleton organization and biogenesis center zone. So all of these are related. And we put them into a theme called microtubule cytoskeleton. So you'll see very quickly there are major themes that come up. So each of these circles here, so the standard enrichment map view I should actually show it, doesn't have the circles and the labels. Normally you just get the red and the blue nodes and the green lines. And then afterwards, we add the circles and the labels. And we had to use to do that manually. But now we have a tool called auto-annotate. And it does it all for you. And you still have to edit it to make a publication quality. But it definitely helps speed things up. So this is a bunch of pathways related to translation. And here's some pathways related to tight junctions. So you can quickly see what's going up and what's going down. So that's what I like. This visualization helps you get a very quick overview of the data set. And from that overview, you can hopefully identify things that are interesting to zoom in on. Question? Yeah? In this case, yes. It's a gene ontology term. But in general, I like to think of it as a pathway or gene set. Read the message of the ontology when this bit is done. Yeah. So when you say this is the which path? So this might include all of the terms that go up. Usually, as Quaid mentioned, we filter the sizes. So we don't have terms that are too big or too small. So I can't remember what we use in this paper. But it might have been 300 genes. So we wouldn't find all the top level terms in gene ontology because they would have thousands of genes in them. So we eliminate quite a bunch at the top that are very general. The reason we remove that is that there are often so general. Like if I get a pathway coming back and says biological process, what am I going to do with that? It's too general. So by removing it, we increase our interpretability and we decrease our multiple testing penalty. So each of these is actually just not a lot of overlap? Well, there still could be overlap. And that's what the green lines show. But at least we've grouped them together so that you know the ones that are related. So in the tool that you can use during the lab, you can click on these nodes and you'll see the genes. And you'll see it's interactive. So you'll see. I'll show you in a sec. And you can click on those as well and you can see what those genes are. And you can see the heat map. If you have gene expression data loaded, you can see the heat map for you. Yeah? Can I start with the green lines? Yes. The thickness of the green line shows how many genes are shared. Any other questions? So on this A what's it be? If we have an ABC? You could have ABC. I'll show that in the next case. So here we have two time points. So this is actually four different. It's ABCD. So it's A and B at 12 hours, A and B at 24 hours. So each time point we did an enrichment analysis, treated versus untreated at 12 hours, treated versus untreated at 24 hours. So we got two GSEA results. And then we wanted to compare them. And so we used the visual properties and set escape that I told you about to show this where the center of the node represents the early time point, the color of the center of the node, and the color of the border of the node represents the late time point. And what you can see here is that many of the pathways are up and down at both early and late. However, a couple, like in particular this ubiquit independent protein degradation, is white in the center and red at the border. And that means that it's not seen at the early time point as differentially expressed. And at the late time point, it's highly differentially expressed compared in treated versus untreated. So the nice thing about this now is, again, it provides a very quick visual summary of the results, in this case, two different pathway enrichment analyses. And I can very quickly identify the parts of it. Like here, here, there's some that were found at the early but not the late. And here's late but not early. So very quickly, if I was interested in seeing the difference between two time points, I could just quickly see which pathways are uniquely changing. And in the software itself, you don't get a view exactly like this. But if you click on the node, you get a heat map, and you can interpret that further. So here, you can see that this protein degradation pathway at the early time point is actually both up. It's up in treated and control. And in the late time point, it's down in treated and up in control. So the treatment reduced protein degradation. And that's why it's coming up here. So the last use case is getting at this idea of the master regulator that I told you about this morning. So in this case, we took a public gene expression data where somebody had knocked out a microRNA in the heart and in mice. And so they collected gene expression data with the microRNA present and with the microRNA knocked down. And we created an enrichment map which shows a whole bunch of pathways go up around all these red ones here, and a few pathways go down. So you might expect a lot of pathways to go up because microRNAs are negative regulators. So if you remove the negative regulators, everything that they're repressing might go up. But we wanted to see, to get more insight about the mechanism, how the microRNA is linked to these pathways. So we took the microRNA set of predicted targets of the microRNA, the target scan, which is a microRNA target prediction system. And we created a set of genes with all the targets of the microRNA, the predicted targets. And we represented that set as another node, this little triangle in the middle here. And then we used a query set analysis or post analysis, we call it in the tool, to create additional edges between this set and the other sets. And what we can see is, and these are now pink lines, and the pink lines show that there's genes in common between the known target or the predicted targets of the microRNA and pathways. So what you can see is that there's a lot of predicted microRNA targets in some of the pathways, but not others. So none of the down-regulated pathways have that. That's kind of a good positive control, because that's what we would expect. But not all of the up-regulated pathways have microRNA targets in them, only some of them. And some of them are much stronger than others. So this gives us some insight into the mechanism. This microRNA might be controlling certain pathways directly, and other pathways might be going up or down indirectly as a result of the microRNA control. We would expect if you knock down the microRNA. Yeah, right? Does that make sense? In this experiment, yeah. Correct, yeah. Well, I mean, there's no targets of the microRNA in the down-regulated pathways. Right, but they're interactions between the other pathways. Yeah, so it's indirect. Well, right now, we're not considering predictions of how the effects knock on after that. We're just working with one step, which is the microRNA and its targets. So you could further look at the data using analysis that we'll cover in day three, in the morning of day three, to look at transcription factors and things like that that you might be able to see. MicroRNA regulates a pathway that has a transcription factor, and then that regulates some other things. So you could do that type of analysis. Here's an example of the same thing that we did with a transcription factor. Veronique and Shahina, who is a biostatistician that we worked with, were able to do something very similar with transcription factors. And there's a paper here. So you'll get these slides so you can see the references. And this is what we used to create this autism map that I showed you in the beginning. In the autism map, we used gene ontology and pathway databases, KEG, NCI, and Reactome. This is six years ago, so we have better pathways now. But the number of gene sets we had, if we used all of them, we would have had 14,000. If we filtered, and this is a quite liberal filter, 5 to 700 genes. Usually we do 5 or 10 to 3, 4, 500 genes. You get 6,000 pathways. And if you limited it to just the ones that have copy number variants that you can actually test, it was 3,500 pathways. So this is actually a lot of pathways, but it illustrates the point from this morning that once you do the filtering, there are fewer pathways off than then genes. The enrichment map software will be covered in the lab, so you'll be able to try it out. And again, the idea is that you kind of get a global overview of what's going on. You can select something of interest and zoom in on it to see, instead of looking at the pathway level, you can go down to the gene level. And you can even, in site escape, look at gene interactions and look at the actual expression data on those interactions. We don't have all of this automated, and we're still working on that, but it's possible to do it yourself. So there's software that you'll cover in the break. And Ruth Isserlin, this next slide reminds me to acknowledge Ruth Isserlin, who is in my group who programs the original enrichment map paper. And when she presented at a lab meeting, she baked an enrichment map cookie, because she liked the tool. So OK, so that's it for right now. In the next 35 minutes, or however long you want to stay, we're going to go over the enrichment map. We've actually worked with the GSEA team to put enrichment map in GSEA, so you might have noticed there's an enrichment map button in GSEA. If you click it, you'll get an enrichment map from GSEA. But enrichment map is useful for any type of enrichment analysis. You can download G-profiler results and load it up in an enrichment map. And enrichment map doesn't do your enrichment for you. It takes enrichment results from whatever enrichment tool you're using and visualizes it inside escape.