 Hi, everybody. We're going to start up again after the break. So for the next 45 minutes or so, I'm going to tell you about a new topic, which is network analysis. And then we'll talk about enrichment map, which I'll tell you what that means. And then we're going to have another lab for another 45 minutes, where we get to use what we've talked about in this lecture. Okay, so this topic is about network visualization, and it's mainly here to introduce the next topic, which is a lot more relevant for the material that we learned this morning. And we also provided some reading material in advance, so that we could go over this more quickly. There's a lot of material online about learning network visualization and analysis, and also how to use site escape. So we chose in this class in this workshop to limit that and instead focus more on labs and things like that. But we're happy to answer a question. So it is going to be a little bit quick, but it will reference some of the reading material that hopefully everybody did. Okay, so in the beginning this morning, I talked about pathways and networks and how they're both representations of biological systems and pathways and processes in the cell. And network analysis, networks, the idea of networks is a little bit more general than that. We assigned this primer, how to visually interpret biological data using networks. And we wrote it a number of years ago. So hopefully everybody read that, that's kind of covers the basics of this topic. And so I'll just provide a little bit of context and additional information that will help us move to the next section. Okay, so networks represent relationships. And it's a general type of data structure, I guess. You can imagine the idea of relationships is very generic. You have relationships in lots of different fields and you want to understand how they work. People study these a lot in social networks, for instance, how people are connected. And in biology, we tend to use networks for looking at molecular interaction networks or genetic interaction networks related to the cell. But you could imagine them being cell cell interaction networks related to tissues or people also use them for food webs in ecology. So actually have multiple applications, even in biology. The ones that we'll talk about are more cell and gene oriented. And so they tend to, when we see them, they tend to represent relationships like physical relationships like a protein-protein interaction, a regulatory relationship like gene A regulates gene B genetic interaction, where you have like a epistatic relationship that you may have learned. If you if you don't do model organism genetics, you don't come across these two frequently. But the idea is that you like one mutation interacts with another mutation. So if you have a phenotype associated with mutation A, and a phenotype associated with mutation B, and then you mutate both, you have an organism that has two, that has a combination of both mutations. If the phenotype is unexpected, given what you know about the A and B phenotypes, then that's a genetic interaction. It means something more is going on than than just two independent types of mutations. And then functional interactions, you can imagine are a little bit more generic. So actually all the three types that I mentioned are, you could say functional interactions. We're going to learn about this more tomorrow. But the idea of a functional interaction is that genes are related. If genes are related somehow functionally, you could draw a functional relationship between them, like they have similar sequence or they have similar domains or they have similar, they're part of the same pathway. All of those are examples of any kind of some kind of functional relationship. And that's useful for gene function prediction, which we'll talk about tomorrow. But you also might find it coming up in other contexts. So networks are useful for discovering relationships in large data sets. If you just had a few relationships, you could draw them out, you could just write them like A connects to B and B connects to C. You can understand that. But if you have thousands of these things, and you put them in a spreadsheet and look at them in tables, you can't really understand the global structure. So networks allow you to visualize the results and see global structure and how things are related to each other. And the other advantage of networks is they help visualize multiple different data types together. So for instance, we could view protein interaction networks and we can overlay gene expression data on the network and see if there are certain patterns that are common. And then finally, network analysis. And so these are mostly covered in that pre reading material. What's not covered too much is network analysis. So network analysis is an analysis method that uses networks. And there are lots of different types. Just to get across the basic concept, I'm just going to cover this idea of six degrees of separation. How many people have heard of this idea? Six degrees of separation. This is the idea that everyone in the world is connected to each other by six steps through a friendship network or acquaintance network. So this idea originated in the 60s. This social psychologist Stanley Milgram did an experiment where he asked people in Boston to send postcards to somebody in New York. They had to do it through friends because they didn't have the person's address. So and each time a postcard went to a friend, the friend was instructed to send a postcard back to the scientist so that he could keep track of where things were going. And quite a lot of the postcards actually made it to New York to these random people. And on average, it took six hops. So that's where this idea comes from. It's probably a lot closer now with Facebook. But it's an interesting example that people have heard about sometimes and sort of you can think about how it works. And the question, if you had a network mapped out like the Facebook network, if you wanted to know if you're connected to someone else and how, then there's that's a, you know, a simple question like how am I connected to this other person? And what is the path? So there's a computer science algorithm that is called shortest path by breadth first search. And it's from the field of graph theory. So in computer science, they don't call things networks, they call things graphs. We don't use, we use the term networks in biology because if you say graph, most people think plot, like X versus Y. So, but if you if you're interested in learning more about this topic, you search graph theory, and you'll find a lot of like 100 years of researching computer science in this topic or in math. So this field has come up with all sorts of interesting algorithms, and they've proven that they work. So in this case, we can just say, if we're interested in figuring out the path from A to B, and just go talk to a computer scientist, and they'll say, Oh, use this shortest path algorithm that's been developed decades ago. And it will tell you if there if two nodes are connected. And if so, we'll find the shortest path. And if they're not connected, it will say, I can't find the shortest path. And it's interestingly guaranteed to do that mathematically, proven it. So you could the advantage of understanding that analogy and mapping sort of the biological networks to graph theory is that there's a whole bunch of these algorithms that are available, and they're very powerful because they're many of them are proven to work in particular ways. And you could just go to that library and take them and say, I'm going to use this algorithm to answer this question. And so obviously, we could ask if two proteins are connected and and find out how they're connected. Is that biologically relevant? Maybe, maybe not. It depends on our network. Just because proteins are connected in a network by some path doesn't mean that path actually exists in the cell. The proteins have to be co-expressed, etc. So there's additional biological information that you need to think about. But it just illustrates the idea that you can use algorithms from computer science to solve problems and ideally biological problems. So people have spent, I guess, since around 1999, 2000, so almost 20 years, trying to figure out different ways that they can answer biological questions with computer, these computer science algorithms. And people have come up with all sorts of different interesting types of network analysis over this time. So, for instance, gene function prediction, which we'll talk about tomorrow, detection of protein complexes and other modular structures. This is useful if you have a network. I don't think many people here are generating networks as part of their data. But if you were mapping protein interactions, you would be interested in looking for modular structure that could represent protein complexes. This method is used in something we'll learn about tomorrow. And so it might be mentioned. You could study network evolution. You could predict new relationships. So given existing relationships, predict new ones. And that's useful for kind of completing the database. And then there's also more disease oriented, clinically relevant types of network analysis methods, like identification of networks that relate to disease. And this we'll also talk about tomorrow with the Reactome Fi Vis Set Escape app, where you can identify reasons of a network where the genes are highly connected and also mutated. If you're looking at cancer mutations, for instance. People have tried to use network information to help diagnosis. So not just, you know, you've probably heard of the idea of biomarkers. So this is a pattern in your data that's predictive of some outcome. Like, can I find a gene expression biomarker that predicts whether someone's going to respond to a drug or not? If you incorporate network information, sometimes you do better at those types of problems. For the same reasons I mentioned this morning about why pathways are useful. And then this is related to this mutation idea, you know, which parts of the network are mutated. So these are just some examples. And these slides sort of have some additional examples here that you can search. But in general, we're covering the major ones in this workshop. So we'll be talking about them. Okay, so that's a network. A network is, you know, these relationships, but what's missing? So a network is, you could ask the same question, I guess, about the gene sets and even the detailed pathway diagrams that you might know from textbooks. The, these things are models of cellular processes and they don't capture everything that's going on the cell. In fact, we don't know most of the things that's going, that are going on the cell. But we know for sure that they're, they're missing out on dynamics. So they typically represent static processes. It's difficult to represent like a calcium wave on a neuron or some feedback loop with these systems. There are more detailed mathematical representations that can simulate processes. But people don't usually use them. Ideally, we would use them, they would be quite useful for predicting like what happens if I knock out a gene or something. The problem is, is that they, they need such detailed information about rate constants and things like that, that it's impractical to use them because we just don't have that information measured on most, most proteins. Networks also don't have a lot of detail. Usually, a gene is represented as a node or a circle. And we know that genes have structure and proteins have structure. So that's could be represented, but generally is not. And also context is missing. So for instance, it would be nice if I have, usually when I see a network in biology, it's the network for the cell, but it doesn't say it's a network for photoreceptors in the eye or cardiomyocytes. Those are probably different networks. They would be different networks and we can increasingly get information cell type networks, but most networks that we get are sort of union of all information across all, all stages of development and cell types. Okay, so that was a quick overview of just a couple of additional points for networks that just to summarize here. And a couple of additional points is are that when you see a network, you should understand right away what the nodes and edges mean. Sometimes they can mean different things. So a node could mean a protein in one case. In another case, it could mean a small molecule or a drug or something. And you don't want to mix those two things up. Relationships are even more important. Usually in biology, there's only a few types of nodes, but in there are many types of relationships. So physical protein interaction is very different than a genetic interaction that I mentioned, maybe a more relevant example would be a relationship that says that two genes are co expressed. That doesn't mean they physically interact. You know, physical interaction is a lot more of a strong statement than for function prediction, then genes happen to be co expressed. And because there's a lot of different methods available for the list of network analysis, and I guess, we're going to mention this a few times in this workshop. Earlier, Veronique was mentioning that you can go look at omics tools, and you know, there's certain websites that you can go and look to find tools. There are 100 tools available for network analysis that I'll mention in a second. And sometimes it can be overwhelming. So, you know, there's, you know, if you become an expert in the area, you can figure it all out. But if you most most people would be approaching this with a question like I have gene expression data, and I want to, I want to do this. And then you can search for a solution either by going on a mailing list. So I'll talk about side escape. Side escape has a good mailing list. You can email your question there, and you should get answered by within a week. And at most, and and so, you know, sometimes that's what you need to do is talk to people and communicate online in online forums. Okay, so the next topic is network visualization and analysis using side escape. So side escape is a freely available. And again, I'm going to go through this pretty quickly, because you guys installed it and went through it already. I'm going to give a demo again, just to quickly show you some features. So everybody's knows the main features. Side escape is a freely available network visualization and analysis tool, developed by a collaboration of a number of different people, including our lab is one of the labs, we have two full time software developers that work on developing side escape, and there's about 10 or 12 globally. And it provides basic functionality for visualizing networks, and then you can extend the functionality by downloading apps. So the basic functionality allows you to visualize and manipulate networks and query them and lay them out. And you can also do database searching. The app store is present. This is an old picture, but they're actually I think over 330 apps now. And if you go to the app store at apps.sideescape.org, you can rank them by their popularity, for instance. And so you can see the ones that people are using. Those are probably the ones that are most useful. And you can read about them. But there are many others. So you can search here for for things of interest and browse around the categories that are available. So they're side escapes dot the only network visualization tool, there are other free ones and some commercial ones. It's definitely the most popular and kind of one unique part about it. I guess is the apps. But also an active community of people and these numbers like are actually out of date. So there's I think it's like 16,000 downloads a month or something now and 5000 people run it a day started up a day in the world. So the advantage of that is that there are people on forums who have probably done something similar to you. And so if you go to the mailing lists, or if you look at tutorials, you can find a bunch of information, the mailing list, you can again email and I usually get an answer. Here's a picture of a side escape meeting that we had a few years ago in Toronto and people spelled out side escape just for fun. Okay, so I'm going to move to the side escape demo. The main point of side escape is a useful free software tool for network visualization analysis. We're going to use it in the next lab and also tomorrow. Okay, so the latest version is side escape 3.6.1. And and I don't know if this computer still is recording when I do this. But okay, so what I did before starting here is make this bigger in a different way. So you can see the menus. So just to show how said escape works, I am I've loaded up some data. The data that I loaded up is a sample file that comes with side escape. So if you go to the side escape directory, so on this computer, set escape is is in applications, you don't have to follow along now but just you can just watch what I'm doing. There's a in applications here, I've got a set escape 3.6.1 folder and in sample data, there's some files here and one of them gal filter dot CYS is a file that I loaded up that has a bunch of information preloaded so it's useful for demos dot CYS is a side escape session file. If you've installed side escape in it, it's you selected to have your computer recognize those files and if you just double click those files inside escape load up. And this gal filter happens to be a yeast protein interaction network. So you'll see here that this network is sort of pre laid out and the nodes have different sizes and colors. The first thing I can do here is I can I can move this network around I can click on a node and I can move it around. If I want to click multiple nodes, I press shift and I can get I can select a bunch of them and I can move them around. And most of the time when you load up data, it might not be laid out very well. So the first thing you'll probably do is layout your network. So in the layout menu, there's lots of different types of now of layouts. So if I just do a grid layout, it's not going to look very good. Everything is just organized in a grid. It's a big mess. This might happen if you load up a network that you've downloaded from somewhere and it doesn't have a layout associated with it. So I'm going to apply this prefuse layout and that's one of the default layouts and and it lays it out very nicely. And I think that the reading material that we talked about talks about those layout algorithms, but I can answer questions about it. One of the common questions is what you know, how does it figure out the length of the of the connection? In this case, most layout algorithms just choose to lay out the nodes such that they don't overlap. And that they try to reduce the crossings in the number of crossings that the edges have. And that clears things up and it makes it easy to sort of see the structure. And the length of the edges is just determined to optimize those do other things. So they don't have any meaning. Some of the layout algorithms, they do try if you had some weight like a confidence value associated with your interactions, they could consider it and pull things together closer if they're if they are more stronger strongly connected. I forgot to download the Y files app here, which has some nice layouts. If you haven't done that, I recommend doing it. Let's see if I can just quickly do it now. So this Y files is Y F I L Yes. So I'm going to install that and hopefully this works. Okay, so now that I've installed that now under under layout, I now have a bunch of additional options. So one that I like is Y files organic. So it, it works pretty similarly to the other one. And but it's it's nice. And there's some additional layouts there. One of the things you notice, you might notice if you set escape is that when you're using this is that the labels are disappearing here. The reason that is is that set escape tries to speed drawing of networks up. And it can do that by getting rid of the labels. Because if you have a big network with lots of labels, it slows down a little bit to draw all of them. And if it zoomed out too far, such that you couldn't even see the labels, even if they were there, it just it just hides them. So if you want to see everything, you can go to view and you can click show graphics details here. And then everything will always be there. And it won't it won't do any of that hiding. So it's it's there. But you can see as you get smaller and smaller, you can't really meet them. So but sometimes it's useful for that effect is disconcerting. Okay, so what you should do with layouts, if you're if you're interested in learning about them, is just trying a bunch of them out. So you can often undo a layout if it's not to your liking. But I would just go through this menu and try out a bunch of layouts and see what they look like, because then you'll know what types of layouts exist. Like, here is a hierarchical layout. If I had a tree, this would be a better like a tree network would be better to layout with this type of network. And let me try a circular layout. So this organizes everything in the circle. Sometimes that's useful. And if you have an attribute, you can order things in the circles, like most expressed least expressed. Okay, so I'm just going to quickly go back to this one. Yeah, so summary, I recommend trying out a bunch of different layouts just to learn what types of layouts there are. And then you'll know another thing. Sorry, I'm zooming in and out with the zoom to finger zoom on the Mac. You can you can use these buttons up here to zoom in and out as well. Usually different computers will have different shortcuts for how to zoom. If you have a mouse, it's the scroll wheel. If you have a trackpad, frequently, it's two finger. Well, it's, it's, it's a scroll wheel. So it's how you scroll windows, like the finder, the finder or Windows Explorer. However, you scroll that up and down, it's the same for for this. And yeah, we have not set it such that the kind of pinch to zoom thing doesn't doesn't work on this. So it's scrolling. Okay, so a couple of other things I want to show you. One of the things that happen sometimes with networks is that they get too complicated. And so it's nice to be able to make a smaller network. So one of the things that you can do is, we'll find something here. If I type in MCM one, it highlights this, this node in the middle. So that allows me to find something quickly. I can also filter this network by some, some value that happens, there happens to be a bunch of values loaded in the sample file. So I can filter these in this case, I'm getting all the most highly connected nodes. And the slider bar here that I'm moving is changing that automatically. And if I, if I'm able to find some subset of the network, like say I, I zoomed in and I wanted to just look at these, these nodes here. And I can go to file new network from selected nodes, all edges, and it would make another network that's just those guys. And if I want to lay that out, it would look slightly different. And now if I go back to the network panel up here, now I have two networks, I have my original network, and this little child network, which is the smaller set. And this tells me how many nodes and edges there are these numbers here. And this, this is a collection of networks. If I have lots of these, I can collapse them. And this says it has two, there's two networks in this collection. The color of the nodes, that's what I'm going to go to next. So the good question. So the other thing you can do is load up information and the tutorials kind of tell you more about how to do this. But if I click on one of these nodes here, and I look at the bottom here, I can see that there's a bunch of information on this about these nodes, like names, and some other variables, including some gene expression data at the end here. If I click this little, this little show columns, it will tell me all the columns that there are, and I can turn them on and off. And by default, most of them are on. And so, and you can do the same thing with edges. If I select an edge here, I have to click on the edge table at the bottom here. So nodes, node attributes and edge attributes. So here are some edge attributes, here are some node attributes. If I select a bunch of nodes, then it will just show me those, those nodes here, select just two. So here, it just shows me two here at the bottom. So I can take these numbers and map them to visual properties with a style panel. Again, there's tutorials on this. But for instance, I might want to some of these are already set in this sample file. So the fill color is set to map some gene expression data. So I'm going to get it to map something else. Another type of gene expression data that's here. Oops. Okay, so I switched to another type of gene expression data. If I click on this, I can sort of change lots of options here, I can change the color to red. And now this is now mapping from blue to red. So this this little panel here is sort of a very powerful visualization system that can take data that you have in this case, the data is some expression values that range from minus 2.426 to positive 2.05. And there's a zero line is white. So you can set these things up and you can visualize your data. So the other thing that's visualized here is the the size. So the size is the degree, which is the number of connections. So the more connections a node has, the bigger it is in this network. I can change that so that I'm going to change this to be smaller. And now those bigger ones got smaller. So anyway, okay, the yellow is I think, if I look at the fill color, the yellow is stuff that's out of range. So this this little color here, if I click, you know, purplish, then those should change to purple. This one's yellow because I was like I had selected it. So it can be confusing. Any other questions about side escape quickly? Okay, so going back to the presentation. This presentation has some additional slides that are just sort of backups, and talk about the concepts that I mentioned. And the they also have, there's also some information about at the end here about different apps that are available. And there's one that does enrichment analysis, but we don't use it that much anymore, because the current pipeline that we have is better. This find active subnetworks is similar to something that's we're going to teach tomorrow. This is a way of clustering your your network data. And there's some a text mining app, although it's, it's actually one of the more popular apps, you can try it out. You can type in, you can do PubMed searches, and it will try and get a network result. And this was talking to someone earlier about text mining. This is an example kind of older. So it's not very up to date. And then these these slides just kind of go through some additional apps that you can look at if you're interested. And then also, there's some tips and tricks, which I covered some of them. But it gives you a little bit of extra information. So I'm not going to go through these, but they're in your slides for reference, basically, if you become an active site escape user. Okay, so I'm now going to move to another topic, which is in my short maps. Okay, so this is going to be more sort of get more interesting, because it's it's covering it's sort of tying things together from from today. So hopefully I can finish this in in about 15 minutes or so. But the so so this follows on from what we learned about this morning and up to the GSE and G profiler lab, that enrichment analysis is extremely useful. And frequently you get these big long lists of pathways. So it's great, you know, thousands of papers, as I mentioned this morning, almost every genomics paper does something like this type of analysis. But one of the problems that we have here is that there's a lot of similar pathways that that are spread out all over this list. And so if you want to if you get a lot of pathways, sometimes it's, it's hard to sort of quickly see the major themes. So for instance, there's a bunch of pathways like adaptive immune response and regulation of inflammatory process and my myeloid lymphocyte mediated immunity. So if you know a lot about biology, you can realize that those are all related to immune response and inflammation. But otherwise, you'd have to kind of search all over the place. So a number of years ago, we developed a method of visualization visualizing these results called enrichment map. So basically, it takes the data like this, and it shows it to you as a network. So now you know why I introduced site escape just now because we actually use site escape network visualization technology to visualize these results. So so the idea is that you have a bunch of gene sets like pathways, and each pathway gets visualized as a as a node. And the size of the node is proportional to the number of genes in the set. And the color of the node is proportional to the enrichment map score, like the enrichment score, sorry, enrichment, enrichment score, like the normalized enrichment score or any other score that you can get from enrichment pathway enrichment analysis tools. And then the gene sets are connected, if they if they share genes, if there's genes that are shared between the gene sets. So frequently, you'll have gene sets or pathways that come from different databases, but they're basically the same like five databases will have the wind pathway. And so enrichment map will identify that they all have that they have a lot of genes in common. And then when you run a layout in site escape, it will group all of the things that are similar based on the edges that connect them, the the lines, the green lines, the thickness of the green lines proportional to the number of genes that are shared. Okay, so so we can take the GSE example that we learned about and GSE gives us a bunch of pathways that are enriched in condition A versus B, and also condition B versus a so frequently we think about this as upregulated and downregulated. It doesn't matter, it doesn't mean that the pathways actually activated it. It means that the pathways enriched in the upregulated genes, which could be negative regulators of the pathway. So just keep that in mind. And then for GSE, we can take the significance, and we can we can color the nodes. But this time, because we have up and down, we can color them two different colors, red and blue here. And the overlap between these is computed as there's a couple different ways of doing it, but it's basically a set overlap statistic. Okay, so I'll show you some examples of this in use so you can see how it works. So this is some data that we analyzed. Data was published in 2007. And we found it in the database and used it in the enrichment map paper. It's a simple analysis. We chose it because it's a simple analysis and it was pretty clean because you used cell lines. And those are typically more less variable, have less variation patient samples, for instance. So in this experiment, breast cancer cells were treated with estrogen or not. And they were looking at how this response changed over time. So let's just focus on one time point after a day of treatment, 24 hours. So they did three biological replicates of treated versus untreated, do all the standard processing, get your differentially expressed genes, run it through GSE as we learned this morning, get your results and then you can load it in an enrichment map, and you'll get something that looks like this. It doesn't look exactly like this. This is a publication version of it, but we'll get to that in a second. You can see immediately that there are major themes that pop out translation, protein sorting, RNA transport that are going up, and these ones are going down. So this visualization allows you to at a glance sort of quickly see what major themes are going up and down. And if you zoom in on one of these, you can see we're zooming in on this one, you can see these these nodes have different pathway names, microtubule organizing center, centrosome, and these are all related as a sort of set of skeleton theme. Okay, so that's fairly simple idea. It's not that complicated. It's really just a visualization method. It's meant to just help you interpret the results of something like GSEA. You don't have to use this method. GSEA provides nice reports. And if you're happy with that, you can use that. You usually don't need this method. If you only have a few pathways that come through this method, it tends to be more useful if you have dozens or hundreds of pathways, and then it's it's useful to identify these themes in an automated way. The other thing with this that is not easy to do with standard enrichment analysis is due comparisons of multiple conditions. So here we have two time points, 12 hours and 24 hours. And we have new features in EnrichmentF now that make this better. I should have updated these slides to show the pie charts and things we can do now. Just to illustrate what we did here was we did the same type of enrichment analysis at 12 hours and 24 hours. And we mapped the enrichment results, the enrichment statistics from the early time point to the middle of the nodes and the late time point to the border of the nodes. And then you can see that a lot of a lot of pathways are enriched in the up, you know, the upregulated genes of both time points. But some of them like these ones here, for instance, ubiquit independent protein degradation are not enriched at the early time point and very enriched at the late time point. So that's something that's different between the two time points. Here's another example of something that's different. It's part of DNA metabolism. And, you know, so again, at a glance, you can see, answer the question what's different between the pathways that are enriched at the early versus the late time point. And now in EnrichmentMap, you can do, we used to have a limit of just two, but now you can do more than two. In EnrichmentMap, if you load up your gene expression data, if you have gene expression data or some other kind of data that you can load up as values of information like gene expression data, then when you click on a node, you can see the expression values. And so here you can see that the protein degradation theme, one of these pathways on the y-axis is all the genes. They're not labeled here, but they are in the app. And then across the x-axis are the different samples, and these are labeled here. So these are the early samples and the late samples. You don't get a visual, this was a manually created visualization, but there is a way of knowing what the samples are. So you can see that at the early time point, there's not much difference between the treated and untreated. They're both, all the genes are up in both cases. But at the late time point, all of a sudden, treated, these genes go down. And so that's what causes this to get a strong signal. And so you can start seeing how you can use this tool to, once you find a pathway of interest, to start zooming in and explain the data a little bit more. So now we know that estrogen treatment at late time point lowers these genes and is actually not lowering all the genes on this pathway, but there's a big signal of these ones going down and these ones staying up in the untreated. There's more that you can do with this tool. One other type of analysis is sort of the last major type of analysis is something we called Queer Reset Analysis, the type of Post Analysis. So once you've created an enrichment map based on your PSCA results, or G Profiler results, you can add in additional gene sets. Those gene sets can be whatever you want, but it might make sense to, if you know a bunch of genes that are associated with the phenotype that you're studying, like disease, you could add those as a gene set and overlap, which, you know, which pathways those genes are part of. So this little triangle here represents a gene set and you can, it automatically draws lines between all the other gene sets that have some of those genes in common. And so in this analysis, this was a mouse gene expression experiment that we found in the literature that had knocked out a microRNA. So if you knock out a microRNA, you expect certain pathways might go up because the negative regulator is removed. So a lot of pathways go up and a couple pathways go down. And then we took the known targets of those, or the predicted targets of that microRNA and represented that as a set here, this little triangle. And then you can see that there's a bunch of pathways, especially in the up pathways that have a lot of targets of that microRNA. So this might indicate that maybe this microRNA is regulating these pathways directly. This pathway doesn't have any microRNA targets. So maybe it's an indirect upregulation of this pathway. And the downregulated pathways don't have any targets, which is what we expect. Another example of this is using doing the same thing with transcription factors. So in this analysis we did, this was a leukemia project. And we did a, we had transcriptomics data. We made an enrichment map. We did an enrichment analysis and an enrichment map. And we had a bunch of pathways that came up. We also did some analysis of transcription factors that could explain the analysis, the transcriptomics data. And this is using a tool called opossum. We're not covering this on day three. I think we use iRegulon, which can help you in similar ways. But the top transcription factor that was predicted to be important is HIF-1-alpha here. And so again, we found HIF-1-alpha targets that are predicted or known. And we represented that as a little triangle. And then we added it in an enrichment map, put it as a triangle. And then this shows all the pathways that those targets are part of. So again, it's not all the pathways. So it gives you a little bit more specific information about what might be happening in the sample. So we might think, HIF-1-alpha, if it's an important transcription factor, it's regulating these three major pathway themes as the major thing that it would do. That's our hypothesis. And that's what was used to develop this autism spectrum disorder visualization that I mentioned in the morning this morning. That was done with just genontology and a few pathway databases and actually domains, just to give you a little bit of additional information. Enrichment map is a software tool that's an app for set escape. So when you load it up, you can, you can, you know, this is the, we have a slightly, we have an updated version now that looks slightly different, different. It's much easier to use. But you got the picture that you can do the things that I mentioned in here. And the way we use this is we start by visualizing all the pathways, and that gives us an overview of the whole, all the genomics data that we've, that we've applied in risk management analysis to. And then we identify interesting pathways and you can zoom in on those. One of the things you could do, you could do is if you pick out a specific pathway, you can take the gene expression data and map it to a network diagram of protein interactions or a pathway diagram and overlay the gene expression data on that, that diagram. The tool we recommend using for that is Path Visible. We can also use Site Escape. We haven't built that into the workflow in a very streamlined way. So something that we're still working on. But it's something that you can do. And it's sort of one way of drilling down to eventually get to the level of genes or proteins and protein expression or gene expression visualized on those genes and some to look for patterns. We also have word cloud and I should have included a slide call for auto annotate. There are two additional apps, Site Escape apps that help visualize in risk map. So in risk map, what you get is a network like this, and you can browse around it inside Escape. These networks have nice bubbles around them with names. We used to have to do that manually, but now we have a tool called auto annotate. So it's an app that you can download that that automatically draws bubbles around these themes and gives them names using a sort of the most frequent words. And I just wanted to show this last picture for fun. Ruth developed in risk map the first version of it. And so she wrote all the software for for that. And when she was presenting a lab meeting, she made a risk map cookie, which was awesome. So I'm not embarrassing her. Usually she's not here when I present this. But anyway, so it was it was it was good because we realized that in risk maps were useful and also delicious. OK, so that was just a joke. So that is it for this presentation. The lab is now on. And what we want to do in the lab is try to run in risk map. And so for that, you'll have to very because is going to are you going to say anything about introducing what the lab is. Where where is it on this? OK, I didn't set that up here. But three. OK, so in module three here the lab practical part one, right? OK, so so this creates a new map from GSA. And part two does the same thing for G profiler. OK, so OK, so for this for this lab, you'll have to run site escape. Hopefully it works on everybody's computers and and then you can follow the lab and we have about 40 minutes for 40, 45 minutes. Say 40 minutes is that that's gets us to five. And so hopefully we can get through most of the lab quickly.