 Okay, so we'll get started. So, um, again, this, um, just like we said previously, this is all this material is available through Creative Commons license is available freely to redistribute it and it's all recorded. So, we're going to change gears a little bit right now. We've done a lot of enrichment analysis with both chief profiler and GSA. And now we're going to move into a new program. And it's going to be site escape. And so we're going to be representing data as networks. So this is network visualization analysis with site escape. So this is the first part of this lecture. I'm going to go over some of the basics of network visualization. And then I'm actually going to go into details of site escape. And we've changed it a little bit this year. So module three actually consists of three labs and the first lab is a site escape primer. I will walk through site escape and you guys are welcome to use site escape along with me. And I would also recommend just doing the primer because it just shows you some of the features of all the different stuff you can do with it. With site escape. So we're going to go over an introduction. We'll go over network basics network visualization and then how we're going to use it for network analysis and then we'll also go through a demo. Okay, so the concept of networks I guess in general this is not related to biology, necessarily, but networks have been around for quite a lot. A long time and the concept is that you know everything is is connected. So the initial experiment was done I think in the 50s or the 60s. And it was a social experiment done by Stanley Stanley Milgram. And what he did was, it's kind of like a chain letter I guess. And what you had to do is you had a you had a letter and you had to send this letter to six. And everyone's had those, maybe, maybe not, but I remember when I was a kid those emails and those messages you had to send six people, and you couldn't send it to the same person. So the concept was with Stanley Milgram was that you, you received a letter if you received a letter and you knew somebody on that list, you were asked to send the letter back to their lab. And they were trying to figure out, you know, how socially connected people were. And the conclusion that came from this, and you probably heard it, six degrees of separation, it was, it was also a game in the 80s. There was a six degrees of Kevin Bacon, you can connect any actor, ultimately back to Kevin Bacon, probably doesn't apply today. I don't know who the actor would be nowadays. But the idea was that you can create these networks of people and that people are connected by shorter distance than you would, you would think. And now the same thing can be relevant in biology, right in biology, we have lots of different networks that we can generate, and they help summarize globally what's happening right so there's networks everywhere. We have molecular networks we have cell cell communication networks so two cells might communicate with each other within the nervous system as a network of signals that go from your brain to your hand it's another type of network. And there are locations in the networks that are more important than others meaning that there are hugs within that network, and then also with the, you know, things like Facebook and Instagram you have your massive social networks as well so networks are actually everywhere. Why do we want to use networks in the context of biology. So, a picture is worth 1000 words right so we can actually represent a lot more things. When we represented as a network, right it reduces the complexity, right it's more efficient than tables we just did G profiler analysis and we did GSE analysis, and both of them generated tables. And yeah tables are great for summarizing stuff, but there's more information that we can garner from that table that we can represent better within a network. So, this is actually an example of a SARS code to protein interaction network and it was published in nature in 2020. The reason why I use the SARS code to network was because you know when the pandemic started. So much research like shifted focus and went into like, SARS code to into COVID in the pandemic. So this is an example of multiple aspects of information that is represented in the network. Your red diamonds are actually viral proteins and any connection between a protein and something else is a is an interaction that that protein has with something else. But there's also additional layers of information the thickness of the edge is actually the strength of that association so this was done with a proteomics pull down. And so, and the stronger that edge was the more times you actually saw the connection between that gene and that protein. And then there's also highlighted areas in yellow and in blue, which represent protein complexes or a specific biological process. So, you know, we've represented a whole lot of information in a picture here where there's no way you can actually represent the same amount of information in a table. It also helps highlight certain aspects right you can easily see from this network, which viral proteins are actually more associated with with more things right you can see that very very easily. There are genes that are by themselves but they're also genes that are highly highly connected or hubs in this network. Okay, so why would we use network visualization for biological data so in the picture before I kind of demonstrated some of these things. It helps represent relationships of biological molecules it's a lot easier to represent a lot of things in a picture right where the list can be daunting and overwhelming the second you represent that in a picture you actually reduce some of that redundancy, much better than Excel. And you can also visualize multiple things at once. In the previous network we had different types of different types of proteins we had different types of interactions, and then we also had layer on top of that different types of information that were summarized in that picture. And also once we have that network, there are a lot of different programs that we can use now to analyze that networks and find other aspects of it some of the things that we can look at we can look at subnetwork so if there's a particular gene that you're interested in you can select that particular gene and just get its environment you don't no longer have to look at the whole picture. So that allows you to focus things a lot easier. You can also look for paths between two genes right you have two genes of interest and you want to know how they're connected, maybe they're not connected directly but maybe there's some sort of path between them, maybe some pathway from A to B. And another thing we can look at is finding central nodes or hubs in the network. So, you know, the, the concept of network biology. It doesn't only apply biology obviously, but networks in general are everywhere so you know in 2002, there was a massive, I'm not sure if this is the right year, but I think it was 2002 there was a massive blackout on the east coast. You know, nobody had power in New York in from New York all the way to Toronto. And what it turned out was, was there was, I think it was like a Homer Simpson type of moment, somebody had spilled coffee at one of these hub transformers, and it totally brought down the system on eastern networks. So, that exists in reality but it also exists in the cell as well you can have a gene that is highly connected to multiple processes, and then an individual mutation can actually bring down a whole set of functions right so in disease we see other proteins being more important than a general protein. Gary also mentioned where, when we're looking at the autism data, that it turned out that it was one pathway or a few pathways that were interrupted by different genes right it was the actual pathway that was important but the pathway was the hub. And it didn't matter which gene in that pathway got affected but if the pathway got affected, all of a sudden you had different aspects of the diseases that were showing up. So when we represent our data as networks is a lot of things that we can now look at. So this is just a, just a, I guess a few examples, a few pretty pictures. Another really nice thing about using networks is all of a sudden you can represent your data in a pretty picture and everyone loves pretty picture I know I do right so all of a sudden now you can represent things in different ways. So I just have here an example of, you know, detecting so from a protein protein interaction network, you can find complexes. So groups of proteins that work together to do a certain function. And we can also, we're going to be demonstrating this a little bit later in the modules I think tomorrow, but there's a picture of gene media, which helps with gene function prediction so you have a given set of genes, and you know their function and you have a new gene that is also associated with those genes, you can now potentially infer its function from the known protein so you can transfer that information from a known gene to an unknown gene. There are many other ways we can also use network analysis so I mentioned this already but we can look at subnetwork so individual proteins that you're interested in you can focus in on them and look at its surrounding. Another thing people also like to do is motif analysis so there are certain connections within the network that all have the same type of motif for the type of interactions. Another interesting thing that we can also do is network alignment so between species, you can analyze a networking one species and compare it to another, potentially in one species there could be missing genes. You can also infer what genes might be present and might not be there. And also another thing that you can use networks for is a pathway association. So just basic network information, right so they're the two main aspects of a network are the node and the edge, and you can actually define them however you like. It can be a person and no can be a power station, but in our context the node can be a gene and no can be a protein it can be a transfer the drug can be anything. Right, you define what the aspects of your network are. And then, once we have our know we also have the connections between the nodes. So depending on what your note is your edge is going to be different your interaction your relationship is going to be different. So you can have a genetic interaction so you can have a gene gene network or you can have physical protein interaction with approaching protein interaction network you can have co expression so two genes that are expressed similarly might have an interaction between them. The reason why I specifically mentioned this is because we've done enrichment analysis, and we're not going to be creating traditional networks initially that are protein protein interactions. We're actually going to be creating networks that consists where each gene, sorry, which each each node consists of a gene set, and the edge between them are the related genes between them. And we'll go into that more in the second half of this lecture. So some of the some of few things that you can kind of use your network with is you can look at topological features so that's when I mentioned how your network is all connected. I've highlighted here a hub in our network right so a hub is something that is more connected than other things in the network. And then within this network, I've also highlighted you can have a cluster. And you can also integrate different data types into this network right so this one over here has different shapes of nodes, because they're different type of genes that are trying to highlight. So there's a lot of, I guess, graphical aspects that you can use in order to aid in your visualization. This is another exact example of visualization over here on the left. They're actually the exact same network every single time, but there are different ways of representing and overlaying the data. So in figure a you have a hairball right so if somebody was looking at that, they'd find it difficult to see what was happening in C, where you've divided things according to complexes and stuff that interacts with those complexes. It's actually a lot simpler to understand what's happening. And finally in D, you've reduced some of that complexity, and you've actually collapsed those complexes. So it depends on what you're trying to what message you're trying to portray what kind of network you show. I would argue that you should never show a hairball, but if you go to a lot of publications they show them a lot. I guess it's more to show like, look at all of the data we have we have a ton of data. It's not informative, but I'm going to highlight what is important to this network. Right, so the different ways of representing now unfortunately networks are good, but there are things that are missing from the networks and networks that we're going to look at and some of the things that are missing is dynamics right there's no way to. I mean there could be ways to try and represent dynamics but over time it's not so easy and other thing you're going to notice that you can't represent is cellular location. A network is not a flow diagram. So at some of the apps will look at they try and represent the context right like cell nucleus and cell wall, but ultimately that's not the way the network is visualized inside escape. And it takes actually a lot of work by the person who's created the app in order to create that sort of structure. There are other programs that better represent those dynamics and those localizations a little bit better. So, this was just basically went over how networks are useful for large data sets, and how it's important to understand what your nodes are and how we're going to define our nodes. And there are many different methods available for network analysis. Okay, so what are we going to use for network and visualization. So hopefully everybody and installed side escape already. Side escape is actually an open source visualization software that we use in order to represent networks. And these are just a few pictures that you know, all of these were done at using site escape. So please know that any polished figure that I show very rarely is it generated directly from side escape. There is a lot of manual work when it comes to laying out these pictures, and people see public published pictures and think that, like, that's just what's going on, but it never comes out like that. And even like, we've tried to automate as much as the process as possible. But when it comes down to it making those perfect beautiful figures, and takes a lot, a lot of time. And so I'm just showing that like a few example of different, different figures you create you don't actually have to be in the biological sphere to work with side escape. One of the ones I actually had to create a lot of people like publication networks so I guess for grants they want to show who they published with. And so there's actually an app within side escape called the social social network app, and it can grab your publication records and it create a publication network for your, your lab or for yourself. So a lot of different uses of side escape that is not necessarily pathway network analysis. It is a broad consortium of different companies that have worked over the years in order to generate side escape. It's been around since I'm going to say 2001. It was developed in Java, just like GSA. So like, I think that nowadays, it might not have been the choice of platform, but 20 years ago, it was something that was cross. It was a cross platform program that you can use that's why also why GSA is also done in Java. There are new iterations of side escape where they're trying to push this more to the web and being able to use it from the web. And those are some developments that are coming down the pipeline. Some cool features, I guess that you can have with inside escape is that there are a whole bunch of automatic layouts that you guys will get to play around with. You can manipulate your networks kind of just like a, like an Adobe illustrator kind of thing you can lay things out and it tries to help you make those figures with with additional attributes within the network right you can line things up and you can sort things out and change the layouts. So filter and query your network. So if you're looking for an individual protein, if you're actually looking for an individual function, you could also search by that. So it makes it easy to find things. And then also, there's the ability to load in networks. So there are multiple apps that you can use and they're different type of networks that you can load in. So there are the newer ones that's pretty cool. Yeah, the second one over there is index, and the concept there is they've made a cloud system where if you've published a network you can actually upload it to index and then you can give somebody that index key, and they can download your network with all of the features that you've associated with that network as well as all of the layouts. And you can even share things privately so if you're working on a network, and you're not ready to publish it you're not ready to release it, you can upload that network privately and then send that network to your collaborators or send it to your PI, and they can actually then download it and play around with it and look through it and so very good form of a very good way to share networks. Some of the other apps that are available to pull in networks from Gmania we're going to be looking at that tomorrow I think it is. Another one is psychic psychic is basically a consortium of lots of different consortiums that contain a lot of protein protein interaction data. There's multiple databases that are part of psychic and you can actually search for a protein to get all of its protein protein interactions and you can pull those networks in. Another beauty of side escape is its ability to anybody can develop their own app for side escape so it is it's an ecosystem. And currently there are something like 316 used, regularly used apps, and we're going to actually be going through a few of these apps over the next few days of extends the actual usage of side escape to many many different frameworks. And so this is just a list of some of the apps that are available. It's an active community. Right so I fortunately I can't get updated statistics because the some of the Google analytics they no longer have, but on a daily basis, you know 10,000 users is have over 18,000 downloads and there are multiple active acts that are currently available. So these are just some of the top apps. And what I kind of trying to highlight here. I mean, I don't love this picture because they don't normalize for length of time so some of these apps are on the top apps because for an example of that is bingo bingo I think was from like one of the first apps in side escape. So although it's not used as often anymore, traditionally, or historically it was and so it's still in that top set of apps. And what I've tried to highlight are the apps that we're actually going to be looking at over the next few days. So, you know we're going to be looking at the string app y files is a type of layout, it's actually a very good layout algorithms that are within side escape. I think it comes. I don't think actually it doesn't come standard with side escape but I think we asked you to install it during the pre work. And we're also going to be using in Richard map and Richard map is what we're going to go into next Gmania reactome. I've highlighted Cy rest. Cy rest is actually an app that allows you to communicate with side escape from anywhere. And I use that from our all the time. And so a lot of the manual aspects of side escape can get repetitive so whatever if you are an user I would highly recommend learning how to use the Cy rest app because then you can actually talk directly with side escape in a given pipeline and you can generate a lot of the work for you, and then you eventually have to go back and sit down with side escape and create them. And two of the apps that were not going to be going through I just wanted to highlight. One of them is Clugo. And it's, it's up there. It's another enrichment analysis tool. There's some very cool visualizations. Now traditionally Clugo only worked with go. But thankfully, they've actually extended that to a bunch of other pathway databases as well. It's only available for human and mouse. But if you are working with human most I definitely recommend looking at it. Another one of the apps which I never actually heard of, or I hadn't used. It's called Cytohub. And the reason why I highlighted is it's actually a relatively new app and highly cited. And it's a way of helping find important parts of your, your network. And it offers some very cool visualizations as well so when if you are working with protein protein direction networks and I highly recommend checking that one out as well. Cytoscape is useful and a free software tool is actively worked on within the beta lab. There's actually a few people that are core developers on Cytoscape and Cytoscape. From when we wrote the, from when we did the pre-work to when we have our today, Cytoscape actually updated in that time. We're really hopeful that we just send you 3.10. And it happened all close to that, that time for those people working on Mac laptops. I would recommend using 3.10 because it was optimized for the M1 chip. So if you guys have those new chips, I know that a lot of people had. In general, there are a lot of issues I guess with M1 and M2 chips. So the 3.10 is definitely advisable. The thing about Cytoscape obviously is that you have the ability of using all these additional third party apps that help expand its usefulness. Okay, so at this time, guys, I would recommend you open up Cytoscape and you can kind of go along with me. I'm going to go through some of the basics of Cytoscape. So Cytoscape is a Java application. It is a little bit of a pig on memory. So when you have large networks, it does require a lot of memory. And if you, so if you are working with large networks, just keep that in mind because it can get frustrating if you have a computer that doesn't have the ability to use a lot of memory. So on the left-hand side over here is, oh good, okay. On the left-hand side over here is, this is called the network, sorry, this is called the control panel. And on the control panel, you'll often see when you open up an app, a new tab will open up for that specific app over here. So it's important to know in the steps you guys are going to go through. We refer to these different panels. So if you know where each panel is, it makes it a little bit easier. So on the left-hand side, you have the control panel. On the bottom of the screen, you have the table panel. And on the right-hand side of the screen, you have the results panel. The results panel is often closed, but a lot of apps will put additional information into the results panel. So it's one you have to kind of be aware of. So you have the words I view at the bottom, which can sometimes get in the way over here. So you can expand and collapse it, but it's an easy way to navigate the larger network without having to zoom in and zoom out, right? So you can jump around by dragging this sort of square. The network canvas is where most of the networks are. And the network manager over here is usually the top tab, and it allows you to jump between different networks. On the top bar, I think I refer to it mostly as the top bar. There's a bunch of different things just so you know. This over here opens and saves the network to index. So index is that cloud-based network sharing app. You can open a session, you can save a session. And over here, these are shortcuts to importing a network or importing a table. So the one where it looks like a network, you're importing a network, and one where it looks like a table, you're importing data to annotate your network with, right? So that's what tends to be in the table. You can zoom in and out. This guy over here will zoom to the whole network, whereas the one with the check mark will zoom to whatever you've selected. So when you have large networks, you can do a search for a given protein or a given term and it might highlight multiple nodes. You can then select this check mark so that it focuses in on what you've selected. The two arrows over here is to apply your preferred layout, which I think by default is set to like perfuse layout. Houses are neighbors, right? So I didn't introduce that concept, but within a network, anything that is connected to, so if I want to find, if I have a given node and I want to find all of the things that are connected to that node, I would select the node and then select all of its neighbors. Anything that is connected to a given object is its neighbors, and you can then do, if you want to expand your network even more, you can select a node, select its neighbors, and then select its neighbors of its neighbors, right? So that's how you kind of like grow your network if you're interested in a certain subnetwork. And then the last one I'm going to highlight here is just this eyeball where if you want to focus on a certain part of the network, you can actually choose to hide or unhide nodes instead of creating a subnetwork, for example. But you can also create a subnetwork. Okay, so then now we're going to, I guess, if everyone has side escape open, you can in the top of your network bar over here, right? Make sure that it's set to index. You can type in coronavirus. And what this is going to do is it's going to search the index database and it's going to give you a bunch of networks that are returned. And hopefully the third one down is this network, this IMAX network, coronavirus disease, and if you click on the green button, it should download it. And then, sorry, this is, let me just do the same thing. So with inside this thing, yeah, this is still there. Over here in the network, I'm changing this to index. And I type coronavirus search. Oops, yes, sorry, shouldn't know that. There you go. And then click on it and it will download and it will open it up for me. So over here, this is what the network kind of will look like, but over here I have its link to the webpage. So this is actually available online. Technical difficulties apparently. Anybody else? Did it load for anybody else? Okay, so it might be slightly different depending on the layout that was. So this is, there are also multiple options there. Right, so then what we've done here now is we have like the whole network, and you can now play around with it if you want. So, one thing I guess I didn't mention how to do is visual styles right so there's a lot of information encapsulated in this network on the in the control panel you'll see there's a style. The style tab and the style tab allows you to set visual properties and there's a lot of visual properties that you can set and there's there's nodes, there's edge, there's there's network. So for example, for this network here, I can expand the fill color and fill color is actually representing the species that the data is coming from. And as you can see, they're, they didn't do this manually, they probably just had the size they've assigned all these colors. So you can go into it and change individual ones if you want this is just fill color. And there's even more parameters that aren't even represented here right so those multiple things that you said and the power of this is obviously great. The problem is is as as a person, there's only so many pieces of information will actually be able to digest so you can put too much information in here. You will be visual, like be able to be interpreted by the person. So yes there's tons of things that you can set, but just know that there is a limit to what you can probably effectively visualize. But yet there's still a lot of information visualized in these networks. So another, another aspect of this is that you can change the layout so this is a basic perfuse layout, but if I go to my, my layouts over here. The really good one is the Y files organic layout. If I change that layout right now, it will lay it out slightly differently. And a few other layouts you can also try and our attribute circle layout. Right. There's a whole bunch of different things that it's just available by basically by another good one is actually the cozy layout. This one takes a little bit of time though. Here, that one. But within the, within the primer, I guess it does will have an opportunity to try out all these things. And so here's just a different, some different examples of the different layouts that you can try right so this is by default it should be the perfuse force directly out. I've actually zoomed in on a subsection of this network. That's why it doesn't quite look the same. Here's an example of a circular layout. And we didn't slide escape, they're actually different networks that you can load in so I've demonstrated that it works proper. This is actually an example from wiki pathways. So within a pathway there's often a flow. And this is what I mean by science keeps not built necessarily to represent this flow, but there are some apps that try and do it and this is an example of a wiki pathway. And within the wiki pathway, you can actually choose to import it as a pathway, or imported as a network. And it works very, very nicely as a pathway, but you can't use any of the layers within side escape, when you have it represented as a pathway, because then you'll lose this beautiful flow. And there's other aspects of the network analysis that you can't do once it's in this format, but there are reasons why people would want to see it as a beautiful, as an individual pathway. You can actually change the coloring on these nodes so you can show where your genes are affected within this given pathway. This one is wiki pathway, which is also you can see it's another option for downloading networks. So before I go on to the next part of this lecture, are there any questions specifically about side escape? So now we're going to go into a specific bunch of apps that we use in side escape and we're going to talk about enrichment apps. And over here I just have like a few icons I guess. A few icons in the different apps that we're going to be using today. So the main one is enrichment map and it's a side escape plugin. Okay, so hopefully, by the end of this lecture, you will understand how to transform your GPRO father, GSEA, and other enrichment algorithms to a network. Okay, so right now we've done module two generated a whole bunch of files, and they're all tables. And we want now to take those tables and represent them in a different format, but it's easier to analyze them. We want to understand the difference between a network and an enrichment map and the mission map is a type of network that we are going to define. So as I just said, and our results from lab two looks like this. These are the tables. That's that were the list of pathways that were enriched for the two different scenarios that we're looking at. But when it comes down to it, there's a lot of redundancy in these lists. And if you end up looking at a list, you're going to just like focus in on one aspect instead of looking at the results in general. So. So in general, what we've started with is that we've had a set of experimental data that is either as a rank file or as a threshold list. And we've given it a bunch of pathways that we want to find out whether or not our genes are enriched in those pathways. And the results of that is another table with our enrichment results and each pathway is associated with an enrichment set, sorry, an enrichment score. Right. So we've done G profiler and we've done GSA. And we've, with both you profile and GSA, we have the ability to generate two different subsets or as many subsets as we want because it could also be a cluster type of analysis. But we've taken our range list, whether it's thresholded or not, and we've run it through pathway analysis and we've generated emission results for condition A, condition B, up-regulated, down-regulated disease versus control. So, now we've also talked about networks. And the networks we've talked about are protein-protein interaction networks or gene-gene interaction networks where each node is an individual gene. But our results actually are tables of pathways and not tables of genes. Yes, each pathway is associated with a bunch of genes, but they're not individual genes. They're a group of genes. So what we're going to do is, what we're going to do is we're going to create an enrichment map. Now, an enrichment map is a network where each node is a pathway. And the number of genes that two pathways have in common are the connection between those nodes or those pathways, right? So, we have a given pathway or gene cell, which is a node. The size of that node can be the size of that pathway. It doesn't have to be, but I think by default that's what the enrichment map does. And the color indicates the direction and the strength of its association with the phenotype, right? So in this context, we were, in our lab, we were doing immunoreactive versus mesenchymal. So, I think blue in this context is the immunoreactive. They were the negatives in our ranked list. And the red is the mesenchymal subset, right? So over here, we have two different sets of pathways. One's associated with the mesenchymal set, and the other one's associated with the immunoreactive. But in a traditional control, sorry, disease versus control, your red would be up-regulated in disease and your blue would be down-regulated in disease. Now, the connection between our nodes is the amount of overlap between those two gene sets. Now, the reason why we do this is, inherently, the pathway databases are highly redundant. Even if you just look at GO, which represents a hierarchical process, right? So in GO, every term is actually very related to a lot of other terms. So you could be, your top 10 lists could all be referring to the exact same pathway. And that's like an overload of information you don't want. You want to simplify, you want to summarize it. And so the way we do this is we take our enrichment results and we create a network from it. So how are we doing this overlap? So just briefly, so you understand. Okay, within the enrichment map app, you actually don't need to know these details, but the way it's working is there's three different metrics you can use in order to connect your pathways. There's the jacquard overlap, there's the overlap coefficient, and there's also the combined coefficient. So I think generally by default, we use the combined, but the way the nodes are connected is you calculate the overlap between two gene sets. And then you're dividing it by either the minimum or the union of both of these sets. And depending on which enrichment analysis you're using, you might want to use a different type of connectivity. So you can play around with those within the enrichment map app, which you're going to get to. And so typical output here we've gone from so everyone, I don't know if everyone opened up the tables that were associated with GSEA or with G profile or GSEA generates a lot of tables. There's actually two important tables in there, but it generates a lot of results which can be a little bit overwhelming, but generally you have two main tables which represent your up regulated and your down regulated results. And so these are the two tables we're going to be giving enrichment map as well as a few other, a few other information. And then we're going to translate this network into, sorry, we can translate this table into a network. So this is the typical output from the enrichment map. And so now I'm just going to briefly go over just some few like quick, I guess, examples of how we can use enrichment map. So a basic example is a single enrichment case where you have treatment versus control. So this context was in, these were cells that were treated with estrogen and cells that were not treated with estrogen after 12 hours and 24 hours, ran a basic enrichment map, sorry, basic enrichment analysis to generate a basic enrichment result. This is actually one of the first analysis that we did. And again, I said it already before, enrichment map will not generate this picture. It will generate a more complicated picture which then you kind of have to like work through and move things around to generate a figure like this. Right. So we have circles around functions that are related to each other. These, there's an app we use called auto annotate that helps with calculating what those summaries should be but it's not perfect. The way it works is, it basically grabs the names of all of the nodes in a given cluster. Now it uses a clustering algorithm to calculate those clusters and then it grabs all the names of them. And it looks at the words in the names and it tries to guess what the best description for that cluster is. And so a lot of them unfortunately, end up not being correct, or not the best words you can use. So we recommend that like this is kind of like where you start from, you're like, you're encouraged to look at the sets in each one of these clusters and devise your own summary for it. But we're just kind of in the process trying to help right so you can zoom in on an individual cluster here. Right. And so you can see that this one microtubule cytoskeleton right you can actually look at the individual annotations on the nodes given for this cluster so this actually is a decent summarization of this sub cluster. So another example of a use case for sites is using multiple time points right so the first one we're just using an individual time point which is the 12 hours this data set actually consisted of multiple time points for 12 hours and 24 hours. So originally the way cytoskeleton was structured was we took advantage of multiple features in order to encapsulate more information. So in this network, the inside of the node is actually 12 hours, and the outside of the node is 24 hours. So you can actually highlight differences between these two situations based on the presence in the absence of that enrichment right because you're trying to see what's changing between the 12 hours scenario on the 24 hours scenario. So when you run enrichment maps with two data sets. So this is the use of this, those two visualizations. Since this part was developed you can actually now, before I go into that I'm just going to say, so another feature of the enrichment map is that you can delve into the details of an individual pathway right so you have an associated with your analysis in your ranks, and you can load this expression file into sitescape when you create your enrichment maps so that when you click on an individual node, you can actually see the differential expression of individual genes within that pathway. So what I'm highlighting over here is that these pathways over here where there is no expression at 12 weeks, sorry 12 hours, but there is expression at 24 hours you can see a clear difference in the expression patterns between your 12 hours and 24 hours. Oops, sorry. And so enrichment map offers, not only a summarization of your global pathway analysis but then you can also use it as a tool to delve into the details of individual pathways. And, you know, if there's this is a subset of your network that's of interest you can highlight it and then actually look at the genes that are affected another thing that you'll see when you use a lab. And we can also highlight leading edge analysis right so when you ran GSEA, there was, I don't know if you had a chance to look at your, your results you'll see that there was a column called core enrichment associated with each pathway, and basically the enrichment score as it's calculated that it reaches a peak at a point, and that's called your leading edge so any genes that are to the left of that peak are considered the important genes those are the genes that are driving the enrichment. So, when you click on an individual node within enrichment map you will actually highlight your leading edge genes. And those aren't necessarily more important, but sometimes it's it's it's interesting to look at those specific drivers of those pathways and so that's another feature we have here. So, another thing which was mentioned before a few times, the multiple talks is the concept of a post analysis. So, you can run a whole pathway analysis, and maybe you have additional experiments that you want to pull in right so it could be drugs that could be transcription factors but you could have an individual transcription factor that you're interested in or you could have, or you want to be interested in a certain type of drug that might affect some of these pathways so what I have highlighted over here is the concept of the post analysis. So this is this is a network, and what we're highlighting here are the targets of an individual micro RNA that we know is associated with this, with this hemopoietic progression stem cell. And so, over here we've done a post analysis, where all of the targets of this micro RNA are highlighted is actually quite a few that affect it but it is not only have to be micro RNAs, and there's the ability within site escape or sorry within enrichment map to. If you know what you're looking for you can look for a specific example like this Mer 125, but there's also a the ability to do a broad analysis so you can actually grab in all the drugs and say no I want to see drugs that are that target my network with a given statistic with a given threshold and it will actually list a bunch of drugs that potentially could be important to your network. And lastly, another beauty. Gary already showed this picture but another nice thing about enrichment map is that you can actually, it is not limited to just two data sets there's no actual limit or no actual threshold on the number of data sets you can use. So you can, you can put in here 2050 100. The problem is is this picture is very, very nice because it seems there's very there's a large division which pathways are affected by your data. But often you can have clusters like over here where there are functions that hit multiple data data sets but there still is the ability to put in as many days as you want to just becomes a little bit more difficult when you're trying to visualize it. Especially for noisy or data sets. So, I've mentioned a few of these but so within the enrichment map app over here in the control panel is the main window you are maybe using in order to look at your network there are a few slider bars that you can actually use over here, which can help reduce the reduce the connectivity in your network or also reduce the number of nodes returned in your network. So, you can create your enrichment map but then feel like there's just too many hits there so you can actually reduce the p value or p value or q value q value I don't know why does that. And you can also play around with the connectivity to see if like you can separate your notes a little bit, a little bit more over here in the bottom part with the heat maps that I showed previously. So clicking on an individual mode. You can see all the genes and their ranks associated with them. And then, as I mentioned before, if you select an individual node and your analysis was done in GSA, this is important. If your analysis was not done in GSA, then you're not going to be able to get this information this is something specific to GSA, the lean edge, and also something specific to an individual node. If you select a cluster you will not see this highlighting over here. This is what I mean by for this given cluster that for this given node that I've selected. These highlighted genes are the ones that are part of the lean edge. So, a few other apps we're going to be looking at are the auto annotate app concept of this app is once you've created your, your pathway enrichment results. It's, we want to be able to annotate clusters within the network. And what auto annotate does is first it uses another app called cluster maker, and it clusters your network. And then, for each cluster it uses another app called word cloud where it tries to guess what the best annotation is for that network and as I said it's basically, it takes all of the words that are associated with that cluster. It counts the words for that cluster as well as the network and it tries to normalize certain words that are very common in the network and highlight the unique words of that network. And what this does is it clusters the network, it generates these clusters and it tries to generate these labels. And over here I'm just showing you the raw labels that is calculated for just the subset of the network. You can see the labels are the size of the label is actually proportional to the size of the cluster. Now, I always change that so that it has them all the same size. People see a network and they see large clusters and they assume that the cluster is important. So the network does not actually indicate importance at all. It just means that that process is actually very well annotated in the databases. Often you'll have singular nodes that might be more interesting to you because they represent processes that might be more associate more associated with whatever you're studying, but you, you have to know that like that size of the cluster does not indicate importance. And I find that people have a hard time. I think, like just by human nature have a hard time, and ignoring the size of the cluster, because size often indicates importance, but in this context it doesn't. And that's why there's kind of an alliance doing the double. Okay, so this is just a, just a screenshot of what these are all the different clusters that are, these are all the clusters that are generated for this network and you can actually click on an individual one you can rename it you can change the name. And as I said before, there is no requirement to keep the name that it is guessed. Often, it will not be the best representation of that network but hopefully it's giving you a clue or it's an idea of what you think that network should be called. And one of the thing that we can do within within enriched map as well as you can also collapse those clusters. So, as I said the class the size of the cluster is not important. And often it represents very well annotated features or functions within the pathway database. So you can actually reduce the complexity of your network by collapsing each one of those clusters and this is an example of a summary of actually a subset of that network, where the clusters are collapsed.