 We are now moving on to module three, which is going to be network visualization and analysis using site escape. And the way this one is going to be structured is we're going to understand the advantages of network visualization. We're going to understand how to use site escape. There's going to be at the end of this lecture, everyone's going to get a chance to open up site escape and try it out along with us. It is not a formal lab, but it's just an introduction to site escape at the end. So you guys can get to play with it. In the lab, we'll do after this whole lecture, you'll get to use site escape a lot more. So I just want to introduce you to the basic features of site escape and to show you how you can create and optimize a network within site escape. So first we're just going to start off with a basic introduction to what network visualization is. And as I said before, we're going to just demo site escape at the end of this lecture as well. So everyone's probably heard of the concept of six degrees of separation. Sorry to move this over. So it was actually, it's changed a lot over the last, I'd say 20 years in that now everyone's part of social, why am I thinking on the word, the social network? Everyone has Facebook, everyone has Twitter, we're all socially connected in more ways than we would think, but originally the concept was devised by Stanley Milgram who did an experiment in the 1960s at Harvard where he mailed postcards to people and he asked them to put their name on the postcard and the object was to try and find if you knew the first person on the list. And he said, I want you to mail this postcard to somebody who you think might get you to the first person on the list. Now, if you knew the first person on the list you would then mail your postcard back to Stanley Milgram. And what he found was on average there was six degrees of separation that separated people. It was also a popularized in the 1990s as a game called six degrees of Kevin Bacon. In the Hollywood world, people said you can always connect two movies or you can connect movies based on Kevin Bacon and it was never more than six steps away from a given movie to another movie that had Kevin Bacon on it. So it's demonstrated the interconnectivity of our society but it's more important now with Facebook and Twitter the people that you are connected to you're often surprised that they're connected to people that you know or they're related to people you know because the world is just getting smaller and smaller. But that is only one example of networks within society. How is this relevant to biology? These are just people. But in actuality, when we look at, lost my mouse, but in actuality when we look at networks we can also apply this to the biological framework as well. And there are many different networks within biology that we can also use. It's not just social networks that we just discussed now. There's also networks within our nervous system. There's networks within the cells of our body cell, cell communication. And there's also molecular networks with protein-protein interaction networks or gene-gene interaction networks. Pathways are also a type of network. So there are many, many different network types that we can represent within biology as well. And here's just a few examples of some of the networks that we can see within biology over here. This was a cover of nature 150 years and it's just showing a beautiful network of all the connectivity they have. It's a feature that we've seen probably everywhere now. So why do we wanna use networks within biology? So they're very, very powerful tools in helping us to reduce complexity. They're more efficient than tables that represent our data. And I always say a picture is worth a thousand words. So this is actually a representation of a network from last year. I think it's from April, 2020. And this is the interaction of SARS, some SARS-CoV-2, which of course, everybody knows about because we're all living in the SARS-CoV-2 era. So over here on this network, the red diamonds are actually SARS-CoV-2 viral proteins. And this is showing the interaction of these proteins with human proteins. This was a protein-protein interaction study. The thickness of the line is representing spectral counts. So this is actually a proteomics analysis. And they're also showing areas in this network that represent protein complexes. So any sort of cluster of nodes where it's highlighted in yellow represents a specific protein complex that is being described in the literature or in a protein interaction network database. And they're also highlighting here biological process. So anything that's highlighted in blue is actually annotated with a given biological process. There are a few different examples, but in this one picture, they've managed to represent a lot of information. And if you had to look at this information as written text in a table, it would be very difficult to translate all of this information as text. So one of the first things that networks do for us in biology is they reduce the complexity and they efficiently show us a picture of what we're trying to say. So why would we use network visualizations for biological data? So there's many, many different reasons. Number one, we can represent the relationship between different biological molecules, protein-protein interactions, genetic interactions or functional interactions can all be represented in a network. It is a useful way to discover relationships between our entities, whether they're pathways or whether they are genes, going through an Excel spreadsheet is very, very difficult to figure out the relationships between two genes or that one gene might interact with 100 proteins versus another gene that doesn't interact with any of them. Getting that out of the table is very difficult once you represent it as a picture, it becomes a lot easier. Another thing it does very, very well as demonstrated in the previous picture is that you can integrate lots of different data. So in the previous network I showed, there were different organisms that proteins were come from and so that was a different shape of the molecule and different shape of color as well as bringing in annotation data, complex data as well as the actual interaction data. So it's very good at integrating different types of data. Network analysis also allows us to do a lot of other fun things like we can find subnetworks within our network. So in the previous picture again, we had complexes that you could clearly see a group of genes that all interact together and form some sort of unit and act accordingly. You can also find paths when you're looking for the relationship between node A and node B and how they're connected to each other. And then another common term that we look at is our hub proteins or hub genes. So there's a node in your network that actually is more important than other nodes. It connects a lot of different things. If you remove that node from the network, then certain aspects of the networks can start to deteriorate. It's called the hub of a network. So there's a lot of things that we can learn from these networks. So just a few examples, we've mentioned quite a few of these already, but there's the idea of detecting protein complexes from protein-propotin interaction networks, gene, sorry, proteins that tend to associate with each other to fulfill a given task. We've seen a lot of, you've probably heard of a lot of different complexes that there are within the cell. So then we can also use networks to help us find gene functions. So on tomorrow, I think we're gonna be looking at a program called gene mania that hopes, that looks to group genes together by their functional associations. So if you have a given gene that you might not know a lot of information about it, you can infer some of its properties based on the other genes that it interacts with. So you can use your network to help explain something that you don't know so much about. So the concept is the gene function prediction. But as well, there's a lot of other things we can look at. We can look at specific motifs. We can look at network alignment, comparing two networks from different species, trying to fill in pieces that might be missing or trying to align them and see how they're similar. And a lot of what we're gonna be doing here, later on today as well, is pathway analysis using our network. So how do we represent our pathways as a network to summarize the information that we've done earlier this morning? So just the basics of a network, a network consists of nodes and the nodes can be anything. You define what your network is. So a node can be a gene, it can be a protein, it can be a transcript, it can be a drug, it can be micro RNA, it can be anything that you define it to be. It can also be a person if you're looking at a social network and the connections of different people within biology, a social network that people tend to look at is how people publish together. So if two people publish together, they get an edge that connects them and you can see the different networks of people that tend to publish together and what they publish. But in this context, a node can also be a pathway, a group of genes, right? So it can be anything that you define it to be. And the reason why I mentioned pathways here is because that's what we're gonna be focusing on later part of today. An edge is a connection between our two nodes and it can be a genetic interaction, it could be a physical protein interaction, it can be two genes that are expressed at the same time. It could be a micro RNA protein interaction meaning the micro RNA binds to this protein and it affects its expression. It could be anything that you define. So the basics of a network. So some features of the networks that we like to look at are the topological features. This is the general trends that we see in the networks including the number of nodes and the number of edges. We can look at the node degree. So that's the number of connections any given node has. We can also look at the degree distribution across the network. Certain areas of the network, a lot of the genes are highly connected. There no degree is much higher. And then we can also look at clustering coefficients, how genes cluster together. These are just different topological features we can take from the network. I've mentioned this previously. We have something, the concept of a hub versus a sub network. So in this example, networks that I have over here, this MKRN1 is a hub of this network. It is connected to a lot of other nodes. If we take this node out, then you'll see that a lot of things become disconnected. A hub is something that pulls a lot of disconnected things together. It's a vital part of the network. You can also have sub networks meaning that you can look at a certain subsection of the network to look at just its connectivity. And another thing this shows as well, it's, you know, we have our MKRN1 is the main key or the focus of this network. But also we have two different types of nodes in this network. We have apoptotic genes as well as transcription factors. And, you know, it's showing different relationships within this network. There's two different color edges as well. When a protein interacts with another protein, the edge is blue. But when a transcription factor is targeting a protein in this network, the interaction is shown as green. So there's a lot of different things that we can represent in this network. But it's also very good at there's different ways of presenting our data. And often networks can be very, very large. And the next thing we need to focus on is how we can use it as a tool to express what we're trying to conclude or what we're trying to portray. And so over here, I have three different, sorry, four different examples of the exact same network but shown in different ways. And we're using what we have in this network to portray the information. In area A of this plot, we have the mass spread analysis of 400 protein-protein interactions in pneumonia microbe. And in section B, we've taken a subset of that network. And if you look carefully at this network, you can actually see that the center area is the actual subnetwork that we've focused on. We focused on an individual protein and all of its interactions. The third represent, the B and C are the exact same representation of the network, but we've just laid it out very, very differently. And what we've done is we've manually clustered genes that are highly interconnected. There are tools within Citus Gate that will help you find these clusters, right? There's cluster analysis directly in them. And it's clustered three sets of genes into a highly interconnected cluster. And when we've laid them out according to the cluster, it's a different representation of the exact same picture. And further, in D, we've actually collapsed these clusters so you can see the genes that are interacting with the clusters. You've taken out that level of detail in order to try and summarize the network. And so there's different ways we can present the data even though the data underlying it is actually the same. And it's important, especially when you're trying to focus in on a given idea. So the one thing that unfortunately is missing within Citus Gate or within network analysis are the actual dynamics, right? So when you think of a pathway, you think of the flow of information from one step to the end. And networks, although they can have directionality, they lack this aspect of dynamics and moving from point A and point B. That's a different sort of set of tools. And it also misses a lot of the mathematical representations of models. So networks in general are very, very useful tools. So I've just demonstrated briefly that they're useful in showing relationships in large data. It's important to define and understand what your nodes and edges mean, right? So there's lots of different things that we can represent in networks, but it's also important to define what we are using it for. It's also important to define the biological question and what you're trying to show to your user. And there are many different available methods for a gene list and network analysis. So we're gonna look at a few of them right now. But before we move on, I wanna discuss Citus Gate specifically. So we're gonna be using a tool called Citus Gate. It's a network visualization and analysis software. It is an open source platform for visualizing, analyzing complex networks. I'm just showing you a little bit of like different pictures over here, right? Because it's not just related to biological networks. You can represent whatever you want in Citus Gate. It's just defining what your network represents. So over here, it's just actually showing you represented, it's like a flow diagram, a workflow diagram represented as a network as well. So there's a lot of different uses you can have. Citus Gate, as I said, is open source and it's supported by a lot of different institutions. It's actively developed and they're still working on Citus Gate. They still release it. I think it started 20 years ago. And so it's come a long way from when it started. I think the original publication was from 2001. It's developed in Java and some of the development happens in Toronto, but there's also another main hub in California as well. So some of the like a few things I guess that we can do within Citus Gate, right? We can manipulate networks, meaning a lot of the stuff that I've demonstrated up until now has been done within Citus Gate, right? You can change the visual attributes of anything in your network. There are a lot of features you can fine tune. You can filter networks. You can bring in networks from multiple different places. You don't necessarily have to come with a network already. So within Citus Gate, there's the ability to search a lot of public databases that are out there that have already represented their data as networks. We're gonna go over G-mania tomorrow, but there are a few other ones that you can use directly from within Citus Gate. And another thing that's very, very convenient is they have a lot of automatic layouts that are available for you to use, right? So you don't necessarily have to lay everything out. They have a lot of layouts that you can, that optimizes the separation between your nodes and can clean up your picture a lot done automatically. Another beautiful thing about Citus Gate is that it is an open source ecosystem, but it also allows for people to develop their own apps. And there are over 360 apps that have been developed for Citus Gate over the years. A lot of them are still maintained. Some of them, as they become more and more used, become core apps and then the actual Citus Gate developers incorporated into the core code base and it's released all the time and you don't have to worry about that. Some apps are developed by third parties and you are reliant on them keeping their app up to date and working, but a lot of them do. Within our lab, we have developed quite a few Citus Gate apps that we use all the time and that we're constantly adding new features to because these are apps that we use all the time. I'm gonna introduce that in the second half of this lecture. So yeah, Citus Gate has a great tutorial page which I hope everyone did in the pre-work. It has been cited thousands of times and as I said, there's over 360 apps that you can use from. So now is the time, I guess, we're gonna actually try outside of Citus Gate. So as I'm going through this, I'd like you guys to open up Citus Gate and try the same things as I'm trying as well. So I'll just quickly go over what I'm gonna go over and then I'll do it also live. So oops, that worked, yeah, okay. Here's what you'll kind of see when you open up Citus Gate. You won't see this network yet because we haven't gotten to loading that in that network, but on the left hand of the screen, we have a network manager. Sorry, no. On the left hand of the screen, we have the control panel and the control panel actually has many different tabs and they're lined up on the left hand side over here. By default, you should always see network style, filter and annotate, I believe. Oh, and also layout tools as well. Those are the ones that are commonly there. If you install and use a different app, sometimes that app will also add an additional tab over here on the left hand side. But what you'll see is that it will kind of make it fatter. You might have two lines of tabs. The network manager tab is always gonna list all the networks that you have. On the bottom of the Citus Gate main panel, you have the table panel. And again, the table panel will always have the node table, the edge table and the network table. The node table describes any attributes you have associated with the nodes. The edge table describes anything that's associated with the edges and also with the network network is anything associated with the network. So for a given node, you might have, yeah, say you have a gene and that gene might have a description associated with it. It also might have an expression value associated with it. You can associate anything you want with your node or your edges. And Citus Gate allows you to load them in very, very easily from an Excel spreadsheet or from a tab to limited files. It's very easy, but hopefully we'll go through that a little bit more. In the corner of this main part where your network is is called your canvas. In the bottom right hand corner, it has a bird's eye view. And if you move this blue box over here, it will move your network as well. And over on the right hand side, you'll have a results panel and also a lot of apps will give you information in the results panel. So we'll be using that as well. Just a little bit more detail about the basic navigation at the top. So these are standard, I guess, navigational tools that you've seen before. You can zoom in, you can zoom out. This one over here with the two arrows fits the entire network in the view. So if you wanna like automatically just pan out to everything. The one with the check mark is if you have selected nodes, it will zoom in to your selection. These two little arrows over here is gonna apply your preferred layout. So if you move things around, you wanna move things back. You can move things back right away. One thing I think it's important to mention is sometimes we'll like try out different layouts. We don't like what we get. So the set escape kind of behaves like other programs. If you do command or control Z, you can undo your action. Also within the navigation bar, you can just go file undo if you wanna go back, especially if you've like worked hours on getting something perfect and then you accidentally clicked on apply preferred layout. It's not something you want. These are another, a few other shortcuts. You can hide selected nodes and edges with this eye. Next to it is you can select first neighbors of selected nodes and neighbor of the nodes, anything that's connected to it. So if you have a selected node, you wanna know all of, let's say you've selected a gene, you wanna know all of the genes that that node interacts with. You click on the first neighbors and it will highlight everything that's connected to the node you have selected. You can do it for one, you can do it for two, you can do it for however you want. It's basically expanding your network to like everything that's connected to it. And then of course on the other side over here, you also have the ability to quickly import network tables and networks, save your session, open a session and search within index. Index is a library of networks. So the idea is this is actually something that's affiliated with Sight Escape that if somebody publishes a paper and publishes a network, there's the ability to take that network and put it on the cloud so that somebody who's reading paper says, oh, I wanna like take a look at this guy's data, they can pull that network directly into Sight Escape and play with it directly. So it's like a library of networks. We're gonna use that in a second. So, okay. So now I'm gonna open up my Sight Escape so you guys can follow along with me a little bit more. Give me a second. Okay. So within Sight Escape, we're gonna load in a network. So in the top left hand bar, you'll see the index in your network panel. You'll see a little index, oh, I actually don't know if you will see the index up here first. There's a little arrow next to it. These are all the databases that you can search from. So you have G-mania, Intact, Intact's approaching-proaching interaction database. The fourth one down I think should be index which is our library of networks. Some of the few below it, we have Psychic which is actually a service that tries to collect all of the approaching-proaching interaction network databases together and you can search that. Stitch is a compound related database and then String is similar to G-mania. It's also a database that tries to collect all different connections between genes and lastly we have Wiki Pathways. We're gonna use index and what I have put here is SARS because it's the world we're living in right now, why not? And if you click on search, it should pull up a bunch of networks but actually I want you to choose a specific network. So in this table, if you click on name, it will sort according to name and then I want you to scroll down to where it says IMAX. There's a bunch of IMAX is a standard for approaching-proaching interactions and so I want you to click on the IMAX Intact Coronavirus Data Set full detail and next to that you'll see a little green download button I guess and if you click on it, it will pull the network into the size gate for you. Just give it a second and it will load it back. Okay, so now you should have a network in front of you and let me go back to my screen over here. Maybe you should stop here and ask people, yes, they get to that image, to that loading. Just, you don't want to lose people right at the end. Is it the practice or are you just showing the- It's just a demo, but I love people. Yeah, go ahead. But they're doing it at the same time, right? Or they're not? I don't think so, they're the practice after. Yeah, so- Okay, sorry, I take it back. Sorry. No, it's okay, no, no, it's, it's supposed to be a demo, right? Let's, I want you guys to play along with it as well. It's a demo, I need to stop the recording. So that's why- It's not a lab, it's just a demo. Okay, perfect. It's just, yeah, hopefully I'm not going too quickly that people can't keep up. So just a few kind of things you're gonna look at, right? So right away you'll see there's lots of different colors in this network. There's a lot of different things that are representative of the network. So I just wanna briefly go over what we have here, right? So on the left-hand side and slide escape, here, I'm going backwards between this. Underneath network, you'll see a tab called styles. Now this is where the person who made this network would have defined all of the attributes of this network. If you look at the node table as well, you'll see that each node has a lot of information associated with it. It has a name, it has an alias, it has a taxonomy identifier. If you scroll down, you'll see there's multiple taxonomies represented here, and it has a lot of other information. So what the people who created this network, they took that information and tried to map them to visual aspects of it. So when you open the style browser, you'll see that there's node, edge, and network, right? We can define attributes to a node, to an edge, or to a network. So what you'll also notice here is that there's three different columns. D-E-F stands for default. So that's the default way the node will look. M-A-P stands for mapping. You want to take an attribute and you want to map it to something. And bypass means that you have a given node that maybe doesn't fit into the class of what you're trying to define, because you want to define it directly, right? You want to bypass everything else and put what you want. None of these nodes actually have bypasses, but you'll see they do have mappings. So if the third one from the top, I don't know if it'll be the same for you, but for me it's the third one that says fill color. And if I go to the arrow on the right-hand side and expand it, you'll see this is all of their mapping, right? So every single one, they're mapping these colors to a species, right? But at the top you'll see column species. And they probably haven't done this manually. Don't worry, because like you can see this, tons of different colors, right? So Side Escape has the ability to, you've given that column that has unique names and it just assigns a color to it, right? So it does it automatically for you. But what it's doing here is for every single species, it's giving the different color. And that's the definition in the colors over here. And I'm going to assume also we see a lot of blue. I'm going to guess blue is human. If I scroll down, actually don't see human. Yeah, here it is, yeah. So human is blue, right? Which is type of blue. There's a lot of information in this one specific attribute. So that's just one example. You can then collapse this and look at some of the other things that they've defined, right? So shape over here is also another thing that they've defined. You can expand it and you can see that they're representing genes as diamonds, peptide or protein as circles, and small molecule as a triangle. So you can see that certain things, I guess are defined as different types, but they're attributing the same shape to it because it's basically all the same thing. So there's lots of different things you can associate or map in your attributes. So here's a brief overview of what they did in this network. So node properties over here was mapped to shape and the fill color, even though I showed you in the original, in the actual network has tons of them. I've only, I've cut that down over here to just the main colors, right? Because these are probably experiments that they've pulled from lots of different publications and they might have been done in different species, but the gist of this network is that, you're looking at SARS-CoV-2 proteins which are red and human that are blue. There are also a lot of mouse and rat in there as well because they've been tested in those species as well. Another thing that I focused on here also was edge color because as you can see, there's a lot of different edge colors within this network and they're actually defined by the experiment that was used in order to find them. So there's a lot of information in this network and I would say that a lot of the time you don't wanna see a massive network but hopefully if you're trying to focus on your conclusion, you'll zoom into a network something like this that's a lot smaller, that is better at portraying the picture. So the one thing I want you also to give a try is to try out some different layouts, right? There's lots of different layouts within Insight Escape and how you would try out the layouts in the top bar and you'll see an option called layout and you can do any one of these things, these layouts, there's the grid layout, obviously not very good but it can be useful for certain aspects, not for this network. A lot of these features are the one that I tend to like is the confuse. So there's different types of layouts that you can try out. Okay, lastly, I just wanna mention that there are different types of networks that you can move within Insight Escape. So I just wanna just mention another one. So the one thing that kind of lacks within Insight Escape is it's optimized for network representation and not necessarily for your traditional pathway representations but there are pathways available. And so what I have here is I have an example Wiki Pathways pathway that I've loaded in from Wiki Pathways that is, this is just an example of the androgen receptor. And what you can see here is in their representation in these published networks that are on the cloud, they do have that beautiful flow. This is not using any sort of layout within Insight Escape. Somebody has manually loaded in this network and created this beautiful view and it is available within Insight Escape. It's just not as, it's not available for all network types or specific networks that you can pull in that has this information. So before we go on to the next part here, I'm gonna pause and ask if there are any questions because I don't wanna like jump to the next part. We're good. Okay, so let's give me one second. Okay, so the next part that we're gonna go over is specifically a specific site escape app called the Enrichment Map. And we're gonna connect what we did this morning with what we just went over with network biology. So how is all the site escape stuff relevant to all of the enrichment analysis we did this morning? And over here I've just listed a bunch of the different apps that we're gonna be using within Insight Escape for the rest of the afternoon. So hopefully by the end of this lecture, you'll understand how to transform the enrichment results you got this morning from G Profiler and GSCA into a beautiful network. And you're gonna understand the differences between a network as we just discussed and an enrichment map, which is a specific type of network. And hopefully you'll be able to summarize your enrichment results using apps such as Auto Annotate, which is an amalgamation of multiple apps including ClusterMaker, WorkCloud and Auto Annotate. Pulls in a whole bunch of apps together. So in the first part of our morning this morning, we generated a lot of enrichment results. We generated enrichment results from GSCA and we generated enrichment results from G Profiler. And you guys hopefully got a chance to look at the results that came out. They were these beautiful spreadsheets that everyone loves, everyone loves spreadsheets. We can do so much with them, right? We see them all the time over and over again. But how does that help us understand what pathways our genes are enriched in, right? It's a huge list that nobody wants to go through. So how do we translate these into something that we can use a little bit more efficiently? So the general framework of pathway and network analysis is you have your experimental data and you define it, you create a list of it, whether it's the sublist from G Profiler or to use the whole list. You run it through an enrichment type of test. You give it a bunch of pathways and then at the end you get a list of enriched pathways. So you went from one list to another list, right? List of genes that are over representation to now a list of pathways. So how do we kind of solve this, right? So again, this morning, right? We had our ranked list of genes up and down. We used G Profiler initially and then we used GFCA and we brought a bunch of pathways and we've outputted now our pathways that are up-regulated and our pathways that are down-regulated. So in the previous part of this talk, I mentioned networks, right? We know the basic aspects of the network. We know that it consists of nodes and it consists of edges and we can define what our nodes are. So traditionally people think of networks as protein-protein interactions, right? So or gene-gene interactions or drug-protein interactions. But now we're gonna think of it a little bit differently and we're gonna think of it as an enrichment map. And what is an enrichment map? An enrichment map is where each node is a pathway or a gene set, okay? So now instead of being an individual gene, it's actually a group of genes and the size is correlated to the number of genes that are associated with that pathway and then the represents whether or not it's up-regulated or down-regulated. That's in a given example where we're comparing class A to class B. It can also represent which single cell, cell type it's associated with, right? The color can actually be what we define it as well. An edge between our two pathways is actually the genes they have in common. So why is this so important? A lot of the pathways that we're pulling have a lot of associations with each other. They're highly similar, especially for data sources such as what we're gonna discuss tomorrow, reactome as an example. It's a hierarchical representation meaning that there's a sub pathway that exists within another sub pathway which exists in another pathway, right? They're all related to each other. They have similar genes in common. There's a lot of redundancy in the pathway databases. Also within between databases, you have GO that describes this pathway and you have a reactome that describes the exact same pathway. They might be slightly different for the way they've represented it but both of those are results are gonna come out in our analysis because we've used as many pathways as we can get. So a lot of the data that we have is a little bit redundant. So when we create this enrichment map where each pathway is a node and the genes that connect them that are similar between them are connected to each other then we're reducing some of that redundancy. Pathways that are highly associated or highly similar are gonna group together and we're gonna be able to summarize our data a little bit better. So just to go into the details of the overlap that we're talking about here, right? So if we have a given set of gene, a given set of gene sets, right? Where each node is a gene set and we're connecting them by the number of genes they have in common. And it's not a direct interaction. It's actually the genes they have in common. How do we do that? We're gonna use a statistic called the overlap in order to connect genes together. And what we do is for every single gene set we're gonna calculate the intersection the genes that they have in common and we're gonna divide it by the minimum number of genes between one of these sets meaning that this set over here, let's say it has five genes we'll divide it by the minimum which is five. And that will give us a statistic of how much these genes, how much these two gene sets have in common. And that's how we translate it into a network. So if we're given our results, right? We have a tabular format of our results where each row in this table is a gene set and it's associated with it's given P value or FDR value. We're now gonna translate each one of these rows into a node. And whenever two nodes are associated with each other we'll get an edge. And what we're gonna be left with is a network representing all of our pathways. So there's a bunch of different use cases we can have for this analysis, right? So you can have a single what we're gonna go over in the lab. You can have a single two data comparison you're comparing two different classes. You create your rank list, you run GFCA and what you're gonna get in the end is a network of all the pathways that are overrepresented. Blue being your up-regulated, sorry, red being your up-regulated, blue being your down-regulated pathways. And over here is the first demonstration of an enrichment map. And what we have here is we have clusters of pathways that we've annotated with overall themes because they all represent similar pathways, right? The network looks large and you think you have a lot of results but actually it's more that the functions that are grouping together are highly annotated functions. So it doesn't matter the size of the network because we can actually collapse this network to just a few different categories that are being up-regulated and down-regulated. You could zoom in onto any one of these clusters to get more details. So if we look more closely, each one of these nodes represents an individual pathway that are all associated with microtubial cytoskeleton, right? How do you classify these labels? We'll show you in a second. Another use of enrichment map could be a case where we're looking at the comparison of two different enrichments, right? So we've actually progressed over the years, right? So we started with one and then you can also look at comparing two. So in this case, in enrichment map, you're given it two different pathway analysises and we're using node attributes a little differently here because now the inside of the node is one pathway result and the outside of the node or the node border is the second pathway analysis. And now you can highlight where the two analysises are different and you can also show where they are the same, right? So if you have a node that's all red, you know that it is the same for both of those analysis where you have the inside white and the outside red, it's highlighting a difference. So also within the enrichment map, you can zoom in on the expression of a given pathway. So over here highlighting an individual node that is different between our different classes. Over here, we were comparing 12 hours versus 24 hours of estrogen treated cells. And if you zoom in on just this individual gene set that shows you that there's a difference between these two different time points, you can see that the expression is clearly different between 12 and 24 hours. And within enrichment maps, you can actually zoom in onto these individual genes. And I mentioned when we were doing the lab for GSEA, the concept of the leading edge. And so those are the genes that are actually causing the enrichment. So at this point, you can actually zoom in and see the genes that are associated with this pathway that is clearly different between these two types. So there's a lot of information that we're able to represent in our enrichment map. Another thing that next use case, I guess we're able to use enrichment map for is highlight certain genes that are part of given pathways. We'd call it a query seven analysis, a post analysis. Now this is where I mentioned in Slack where you would use a micro RNA type analysis. Let's say you have a given micro RNA that you know is overexpressed in a given experiment or given data set. You can then ask the question, okay, I've done a pathway analysis on the expression between these two control versus disease. Now I wanna find out where those micro RNA targets that I know is upregulated, which pathways they belong to. And you could then add this in as a post analysis and it will highlight the connections between them. This is actually showing you the micro RNA targets of 125A, the predicted targets. And anywhere you have an edge over here, that pathway has one of the micro RNA targets. You can click on that edge, you can see what that gene is. There are different set of statistics you can use here. You can just look for the presence or absence of a given gene, but you can also use something, the man Whitney test, for instance, so that you can highlight pathways that actually have highly ranked genes that are also micro RNA targets. So this is another use case that we have for the enrichment map as well. Now last, Gary showed this picture when Gary showed this picture when he did his session this morning. This was the different append... Sorry, can't say. Different types of cancer, different subtypes of the cancer in question here. And what you're doing here is you actually have multiple data sets. There are nine different pathway analysis they have here and they're coloring the data set based on the, they're coloring it based on which subtypes it belongs in. And I wanna just mention here that like I'm showing pictures of networks and they're all very, very beautiful. They don't start like this. Unfortunately, it takes a lot of work to get to a figure that looks like this and it can be overwhelming at times. Sometimes it's better to simplify the network as much as possible and understand what you're trying to show because I've spent hours and hours making some of these figures. It's not as easy as I'm making it seem. So the reason why I say it for this figure alone is because there's a lot of pathways that are found in multiples of these, right? So how you organize this has got to be very, very challenging. But that being said, there's still a lot of information that can be found here. Okay, so those are just a few use cases of the enrichment map, which we're gonna use in lab three. Lastly, so within Site Escape, I guess, you've seen the basic network. When you load Site Escape, as I said, a lot of apps will add a tab over here. So now you'll see that I have an enrichment map tab over here. And this is the input panel for enrichment map. I've already created my enrichment map over here. If you wanna see the app panel, you have to click on the little plus sign and the lab will walk you through how you input the files that it needs to create. But there are a few things that you can play around with once you've actually created your network. You can adjust your Q value, your FDR value that you've used when creating your network. It will basically move nodes that don't pass that threshold. You can do the same thing with your edges. If your network is too connected, you can play around with these slider bars and try and reduce some of the connectivity in your network. Make sure it's not like a hairball. Over here, the add signature gene sets, that's the ability that I mentioned before is over here. You can change how the nodes are colored. So by default, they're colored by their NES score, their normalized enrichment score. If you have multiple data sets, one of the features over here, you can drop it down. You can color by data sets. So that was the example case number four, where we have multiple data sets. SocietyScape will automatically color it for you. In the table browser, Image from Map adds an extra tab. It's called the heat map. And that's where I've selected a bunch of nodes over here. And so it's showing you in the heat map, the expression of all of the genes that are associated with my selection. If you select an individual node, only if you select an individual node and only if you are using GSEA, it's only applicable to GSEA, you'll see something like this. This is your heat map. I've expanded my heat map. So I'm not just, because in the original picture I showed you a second ago, it was collapsed, right? So this is an expanded version where we're showing me all of the expression. Anything highlighted in yellow over here is the lean edge. So these are the genes that are contributing to the enrichment of this individual gene set that I've selected. Again, this is only for GSEA type results. So one other thing I wanna mention, oops, is once we've created our enrichment map, large interconnected bunch of genes, the next thing we would like to do is we like to annotate it. And you saw in a lot of the figures I had large networks with circles drawn around them. And those circles are actually drawn computationally, right? So I don't have to manually go through it. We have another app, it's called Auto Annotate. And Auto Annotate uses two other apps that we also, one of them is ours and one of them is a base of Sight Escape. So Cluster Maker 2 is a core app within Sight Escape. I believe it's core app, but it's not our app. And what Cluster Maker and Word Club and Auto Annotate, what they all do is they take your network, they cluster it so they find nodes that are highly interconnected within each cluster. They then look at all of the node labels and using Word Cloud, I don't know if people have seen that your word art where they take somebody's speech and then they put in big, like the words they say the most often. So it's the same concept, they use a Word Cloud, they grab all of the descriptions of all of our pathways and they count how many times the words are appearing within your selection. And it grabs the three most used words and it uses those to annotate your network. Now, it doesn't always do a great job. You don't have to take what it's defined as your label. You can manually change those labels and we do that often, right? Because even though it says one thing, if you actually looked at the functions, they might be a little bit different or maybe I interpret them a little bit different. It's just done so it's easier for you to annotate your network quickly. So, oops, there, okay. So here this is just, oh, my machine's doing, sorry. Let me try this again, there, okay. So I just went over it like verbally but this is what the network auto annotate does. It clusters the network and then for each cluster it finds the frequent words and the node labels, it grabs those three words and it puts the annotation there. By default, the labels will be scaled such that the larger the cluster, the larger the label is. I don't like that personally because just because something is annotated really well just because there's a lot of nodes associated with a given cluster does not mean that it's more important necessarily. It just means that it's being annotated. An individual node, there's only one node for it, could be just as important. Maybe it's just a function that's not very well described and it might be something that's just as interesting. So often when I create my signature map and I annotate it, the first thing I'll do is change that. There's a little setting on the side of auto annotate where you can just say don't scale that according to the number of nodes and it's the first thing I usually do. So, I don't like you doing this. So this is kind of within size scale what you're gonna see. So after you run auto annotate, all of your clusters are laid out on the left-hand side. You can click on an individual one. We'll highlight that one. You can right click on it. You can change its name. You can remove it. And over here on the right-hand side, you'll see that there's an auto annotate display window. And right here, you can see the feature I just mentioned before scale font by cluster size. You can unselect it. And then your nodes will be all the node sizes sorry, the node, the cluster labels will all be the same. Okay, another feature that I find very useful within auto annotate is the ability to collapse your clusters. As I keep on saying, the size of the cluster is not important, right? It just indicates that certain, that feature happens to be very well annotated in the data. And so often, easy way to simplify the network is to collapse those clusters. So what I have here is an example of this large network up here. Sorry, a large network that is being collapsed into just its important clusters. And it's a lot easier to portray a message with a smaller less busy network. So that's another feature of auto annotate that's very, very useful. Okay, so last but not least I guess is this was another example that I found that Gary mentioned this morning with the autism dataset. So this just shows everything all in one, I guess, right? So we have an enrichment math with our red nodes. We have post analysis edges here, which are the genes associated with autism that are our triangles. We have nodes that have been, sorry, functions that have been annotated not with auto annotate at the time because this is an old network, but a lot of the features that we now have were put in place because we were doing so much stuff manually that we figured out a way how to do it automatically. And this is just another view of a beautiful network. And hopefully you guys too will be able to create this network, but not today, one day.