 Okay, so I'm going to talk, give an introduction to network visualization and analysis with site escape, and then a brief demo of site escape, and then I'll talk about enrichment map, and then we can have a lab, and I'll try to keep it short so we can extend the lab time as much as possible, because that's usually quite valuable. Okay, so this is a little bit of a, we'll talk more about networks tomorrow, but you'll see why I'm giving the site escape demo today, because enrichment map, main reason is enrichment map is a network visualization of the results that we just generated. Okay, so network visualization and analysis is, so the things that I'll cover are just an introduction to networks briefly, and you'll hear more about that tomorrow, an introduction to network visualization, and discussion of site escape, and network analysis fairly briefly, and then we'll just give a demo of site escape. You guys have used it already, but having a tour might be a little bit useful, or somewhat useful. Okay, so we assigned this paper for you guys to read in advance to save time in a workshop, so we can go over this stuff more quickly, but if you didn't get a chance to read this, it's just like two or three pages about how to visually interpret biological data using networks. The main idea is that networks represent relationships, and if you have any kind of data that's rich in relationships, then it's much easier to visualize it and consider it when you think about it as a network rather than as a table. So networks, relationships can be represented as a table as well, like A connects to B, B connects to C, you have this long table of relationships, but if you visualize it as a network, then it's much more useful for discovering relationships in large data sets that might be interesting, much better than tables as I mentioned. There's different types of relationships that people typically care about in biology, so physical relationships like protein-protein interaction, regulatory relationships like microRNA or transcription factor regulates genes, expression, genetic relationships like synthetic lethal events like gene A when knocked out, doesn't do anything, gene B when knocked out doesn't do anything but knock them both out together and the system fails. That's a special type of genetic relationship, and there are others, and functional interactions are just my view, any kind of connection between genes that are functionally related. I mentioned some of those this morning. Networks are useful for discovering relationships and also for visualizing data and including visualizing multiple different types of data together, and you can see interesting patterns more quite readily when you visualize networks as visualized data as networks. And finally, they're quite useful for network analysis. So, has anyone heard of the idea of six degrees of separation? Anybody heard of this? Does anybody know about the experiment that was used to define it? So this psychologist, I think so psychologist, social psychologist Stanley Milgram famous in his field in the 60s came up with this idea that he wanted to figure out how closely people are connected. So he came up with this experiment where people would send postcards to someone in from Boston, they would send postcards from Boston and they'd have to get to New York. And they had the person's name and occupation, I think, but they didn't have the address. So this was before the internet, so people had to send postcards to people they thought would be closer to this person. So probably they'd send postcards to friends in New York or something and then, you know, then it would somehow filter to that person. And each time someone received a postcard, they were instructed to send a postcard back to the scientist who could then track where the postcards were going. And after trying this out, it turned out that most of the postcards actually were able to reach their destination. And the average took six hops. So that's where this idea comes from. I'm sure it's much closer now with the, you know, Facebook. But the point is, it's just a fun, it's a fun example that illustrates the concept of a network, in this case a social network. And in the 60s, this person had to create this elaborate postcard experiment, probably took a long time to figure out how people were connected. And basically, it tried to answer the question, you know, how people are connected, if they're connected, how, you know, how are they connected? In computer science, people, if you have a network representing the computer, can answer this question very easily using standard algorithms. So for instance, there's a standard algorithm in computer science that people learn in like first or second year computer science called shortest path by breadth first search. And it's proven mathematically to find a path between two nodes if they're connected. And if they are connected, it will find the shortest path or one of the shortest paths. If they're not connected, it will clearly say they're not connected. Now, this is an example of, you know, how you can use algorithms in computer science to answer answer a question about a network. And it turns out there are a lot of algorithms in computer science in the field of graph theory and computer science. People call networks graphs, we don't use that term usually because most people when you think of a graph, you think of a plot. So it's more clear to say network. But in the computer science field of graph theory, there's a rich library of algorithms that answer all sorts of questions like this. This is a simple example. And so you might be able to use these questions for bio to these algorithms to answer biological questions. In this case, the simple example would be I have a big network of protein interactions. And I want to know how these two proteins are my two proteins of interest are connected. If they're connected, then how are they connected? And this algorithm will definitely will like guarantee a solution to that optimally. And so we don't really have to think so much about that. We just take this algorithm, we use it. And as I mentioned, there are many algorithms and people have figured out different ways of using them to answer questions in biology. Now, this particular example is very simple. And the shortest path might not be the most biologically relevant one, but at least it illustrates the idea. And so people have spent a number maybe about 15 years thinking about this network analysis field and have come up with all sorts of interesting network analysis methods that are available to use. So gene function prediction we'll hear about tomorrow is based on network analysis, detection, clustering networks in which you can use to find protein complexes and protein interaction networks, you can use it to cluster other type of data. Looking at network evolution, predicting new interactions are sort of some of the basic applications. And there's also more applied applications to disease research, for instance, identifying a disease, a region of a network that is enriched in genes that are associated with a disease in some way, like mutated in the disease or differentially expressed in the disease. And people have also looked at subnetwork-based diagnosis. So can you use network information to improve biomarker detection? People have found that you can. And looking at how specific mechanisms might be affected by mutations. So these are just some examples. And I provided these here with specific collage to specific software that's available mostly in set escape, just as a quick example. Again, we assigned pre-reading so we could make this shorter in this particular case and focus more on other more interesting topics. But if you have any questions, so this is just a very quick overview. But if you have any questions, let me know. What's missing in the typical network analysis? I kind of mentioned this earlier, but someone asked the question about tissue-specific data or pathway. Kind of think about this particular slide in general with all of the data that we're using, like pathway databases and tomorrow network databases, that these pathways and networks are typically represented as static processes. Even though we know they're dynamic, we just see one snapshot of them. And it usually represents the set of things that could happen at any time. You could do detailed mathematical simulations of pathways. People typically don't do those because we usually don't have a lot of information available that's required like rate constants for enzymatic reactions. We don't also consider the structure of the protein. So usually proteins are just represented as or genes are just represented as single elements like atomic elements. Even though we know that there's a lot of detailed information structure within genes and proteins like promoters and domains and things like that. So usually we don't see that information when we look at pathways and networks, but obviously it's important. And then the context is something we mentioned this morning. Okay, so that's very quick. The main point is that networks are useful for seeing relationships in large datasets. A key point is that it's important to understand what the nodes and edges mean. So whenever you see a network it could represent different things and you have to ask what the circles and lines mean. We use the terminology nodes for the circles and edges for the lines. And what we'll see more tomorrow is sort of different biological questions that you can answer using network information. So it's, and because there's so many possible topics available, sort of so many tools and topics and different types of questions that people have asked, have answered with network analysis, often it's good to determine your question and search for a solution so you can ask, you know, I have this problem, you can go to like the cytoscape mailing list, cytoscape help desk mailing list and ask, you know, is there any tool out there for doing an analysis of this type of data and you usually get some kind of recommended response. So moving on quickly, network visualization and analysis can be accomplished using cytoscape. Cytoscape is a free software tool that is a desktop Java application similar to GSEA. So you can download it and install it on your computer as you guys have done. And you guys, again, we assigned trying to go through cytoscape and pre-reading so you can go through the tutorials. So we don't have to do that here. And but I will give you a quick demo and just a couple of pieces of information about it. So as I mentioned, it's free software. It's the most popular software available for network analysis. There are others, but this is the one that that most people use. It's developed by a large consortium of individuals. My lab is one of the labs that contribute to this. People in San Diego and San Francisco and Seattle and Paris and Amsterdam and other places and some companies are involved in building the software. It's an open source software tool, which means that it's freely available. All the source code is freely available and people contribute to it in a team effort. And we do that because we use this type, we need this type of software in our own research and we don't want to build it by ourselves. So anyone who's available to build it is welcome to contribute. It by default handles network visualization and some analysis, but most of the analysis comes from pulling in from downloading apps, which I guess you guys also looked at. So these are sort of the basic things that you can do with it, manipulate networks, filter and query, lay out the network, and there's some databases you can search. The real power comes from apps. So this is an old picture. There's a SideScape app store and there are 300 apps that currently, over 300 apps that currently extend the functionality of the system. There are lots of users. I think these numbers are out of date, but it's a very large community of users, not just in biology. And the reason I'm saying that is because that community of users has created a good knowledge base online that you can take advantage of. So documentation, data sets, mailing lists, there's tutorials which you guys know about, but you can easily take advantage of these mailing lists. I forgot to actually list the mailing list here, but on the SideScape homepage there's a link to the SideScape help desk. If you ever have any questions about SideScape, you can ask it on that mailing list and there's pretty much guarantee that you get a response within a week. We make sure that all the questions are answered every Thursday, so if you're wondering. This picture just illustrates that there's a community of people. These are all developers of SideScape and they spelled out the SideScape a few years ago from our building. So that's just a fun picture. Okay, so speeding through this again, SideScape is useful free software for network visualization analysis and it provides basic network manipulation features and then apps are available to extend its functionality. Okay, I'm going to switch to a demo here. Okay, so this is SideScape 3.4. We just recommended that you guys use this one because we've tested our apps for this workshop for that. The current version of SideScape is 3.5 point something and 3.6 is under development. We roughly release one every new version every six months or so, but most of the and there's there are various new features, but the core features have been very stable for a long time. So this one is fine for our use here. Okay, so one of the things I'm going to do first is just quickly load up some sample data. I'm just loading up a session file that comes as an example. It's a yeast network of protein interactions and protein DNA interactions. So I think most people tried this out and we're able to see networks. So SideScape allows you to browse around, you can click and move things around, you can select different things and move them around. So each of these circles as I mentioned as a node and the lines are edges. Once you get your data, once you get a network into SideScape, you know, one of the first things that you might do is lay the network out. In the layout menu, there are a number of layouts like Y files organic. If you click on these things, you'll get different layouts. Here is just going to go through a couple of examples. Here's a hierarchical version of this network. So you can see that it's kind of tried to lay it out in a hierarchy. Here's a circular layout. So I tried to identify cycles. I just wanted to mention that the main point I wanted to mention here is that there are lots of layouts. And if you're interested in, I recommend that you try a bunch of them out just to see which ones are available. And typically, one type of layout called force directed, which happens to be one of these Y files layouts is like Y files organic. They have different names. Sometimes they're interesting and useful names, sometimes not so interesting. But in general, anything that says force directed or spring embedded is the same type of algorithm that says kind of standard one to use. But you can try other ones. If you have tree-like data, then a hierarchical layout is better. And so you can try these out and see which ones work for your network. The force directed layout algorithms, they basically model the network as the network nodes as repelling forces basically like for instance, like charged, like charges, and they'll push away from each other. And then the edges are usually pulling the nodes together. So if you have nodes that are connected, they'll be pulled together, but the nodes will repel each other. So they'll kind of bounce around to be pulling apart from each other. So what that does is it reduces overlap. So nodes shouldn't be on top of each other. Otherwise, it gets confusing. And also the edges should have not too many crossings. If they have too many crossings, then you get this hairball. Everything looks like everything's connected to everything else. So these layout algorithms try to reduce the overlap of different components visually, and they try to identify major structures in the network. So if you have regions that are highly connected, they'll be grouped together. Let's see if I can look at an example here. You can use this box to move around. So here, for instance, there's a region that's fairly connected, like this region is highly connected. Okay, so you can use your mouse, central wheel, or your two finger scrolling on the Mac to zoom in and out. And also these buttons here, you can zoom into something of interest or just zoom out to the whole network or zoom in and out incrementally. Let's see. You can also take some part of the network and you can make a new network out of it. So I'm going to say a new network from selected nodes, all edges, and that makes a new network just from the parts that I selected. And I can lay that out again. I'm just going to click this preferred layout, which is one of the forced directed layouts. If you are interested in just a part of the network, you can select it and make a new network out of it and lay that out and usually get a better layout. The bigger the network is, the harder it is to lay it out. So if you make smaller versions of the network somehow by filtering it somehow, then you'll usually get a better view, although it's separated from the big network. So here on the left, you can see that you can go back and forth between these two networks that I just made. These numbers here show you the number of nodes and edges. Let's see. Importantly, if you click on a node or you select some set of nodes, you can see information about these nodes. And this information just happens to be loaded up in the session file that I loaded, the project file that I loaded, but you actually are able to add your own data. I won't go through that too much. But so these are a bunch of example pieces of information. And then importantly, just to know that the node table shows the node information about nodes and edge table shows information about edges. And network, we don't use too much, it's information with the whole network. And there could be additional tabs here that pop up with different apps. Let's see. A couple of other things that I wanted to mention. One is selection. So you can create a filter that selects by different criteria. So I'm going to select, if I click, I'll show you exactly where this comes from in a second. But someone had previously computed a bunch of statistics about this network including the number of connections per gene. And so that's called the node degree. And it just happens to be loaded up here already. If you compute, there are apps that allow you to compute that yourself. If you're interested, I can tell you more about that during the lab. But just to illustrate how this filter works, you can create a filter based on a node or edge attribute. And then you can change the filter. And as I'm changing it here, it's live updating. So this is now selecting everything between 8 and 18 connections. And you can see that only a few nodes are selected. Let's see what happens there. So you can create any number of filters and make building combinations. And then, I think lastly, I wanted to illustrate this style panel. So any attribute here, in this case, for instance, I have a bunch of gene expression data loaded up. These are gene expression full change, log 2 full change data points that are defined based on experiments where people have knocked out individual genes. And they captured the expression values and the p values associated with those. So this sig is significant. And it's a p value. And x is expression value. And it's log 2 full change. And so you can take any of the information in the table and you can map it to visual properties. So I'm going to take these expression colors. And these expression colors, I think somebody already mapped these in this session file. So if I click on fill color, that's the color of the nodes. You can see that somebody took Gal 1 expression values. And they used a continuous mapping function to map to a blue to yellow gradient. So you can change that. Let's change it to one of the other expression values, like Gal 80. And I change that and then everything's updated here. So now, these expression values are mapped to the Gal 80 expression values are mapped to color. And if I click here, I double click here, I can change this. If I don't like yellow, I can change it to red. And now everything's mapped from blue to red, et cetera. So this is highly configurable. You can change. I can change the border width. I can make the border width bigger, like say, make big thick borders, or even say thicker borders. Okay. So now the borders are really thick. And I can make the border color. I'll just quickly add another section here. So now I've mapped two different gene expression values to the nodes. So one is on the border and one is on the center. And you know, you can sort of see here that some of the, you know, you know, now I'm basically now comparing two gene expression signatures. And it's a little bit, I think I made the border too thick because it's including the center. Yeah. So you can, you can see now that there's certain patterns, like I might see things that are upregulated in two conditions or downregulated in one and upregulated in another. So this illustrates how you can overlay multiple different types of data together on the network. And this is illustrating how you can kind of integrate that data. So if you have different experiments, you might be able to integrate the data in different ways. And there are more complicated ways of doing this. One of them, you can actually plot like a time series as a chart in the software. And each node will get a little chart on it. And there's a whole range of charts, pie charts and line charts and bar charts. And, and so that's, that's a, this is sort of the power of set escape is this style panel here, where you can take any data and map it to any visual property. And there are lots of different visual properties available, different colors, widths, shapes, lines, the thickness of the lines, and such like that. Okay, so not set escape. Okay, so, so we did the introduction to set escape so that I can tell you about this, which is what we really wanted to talk about this today. And that will be the focus of the lab. And that is helping to interpret the results of the enrichment test that we've learned about for the, for the previous part of this day. Okay, so enrichment analysis, as we've learned about, is an excellent idea. It's been used in tens of thousands of papers. Pretty much everybody who runs genomics kind of runs this type of pathway analysis, it's pathway and respond analysis by default, usually, and it generates these nice tables of pathways and scores. But one of the problems with it is, as I alluded to this morning, or just described this morning, there are a lot of, there's a lot of redundancy in the pathway database. So when you look at these long lists, you often notice that there are pathways that are related to each other, like B cell mediated immunity and myeloid cell differentiation. If you were an immunologist, you would immediately recognize a number of these pathways being related to immunology, but they don't always say, they don't always use the word immune or something else. So if you know a lot of biology, you can sort of relate these things together, but especially in long lists, it's time consuming, and you'd also like to group all of the similar things. So my lab developed a tool that helps visualize the results of enrichment analysis, we call it enrichment map. There's one other tool like this called KluGo that's available inside Escape. Enrichment map is a bit more modular because it supports GSEA and KluGo just does gene ontology enrichment analysis with a hybrid gene after test. So we made enrichment map available to support more, a wider variety of enrichment analysis methods. So the basic idea is that you can take your gene sets and you can visualize it as a network, and I'll explain how this works. So we learned about ranked gene lists and GSEA, and we know that we can get genes pathways that are enriched in the top part of the list, and also in the bottom part of our ranked gene list. So what we do is we take this data, like all the data from GSEA that you just, or G Profiler that you just used, and we can load it into enrichment map. Enrichment map takes each pathway and converts into a node inside Escape. And the color, the significant score is translated to the color just like inside Escape. I showed you how you can take gene expression and map it to a color of the node. In this case we took the significant score of the pathway enrichment analysis and mapped it to the color of the node. The number of genes in the pathway is proportional to the gene set size, sorry, is proportional to the node size, and these aren't mapped correctly here, but bigger nodes will be bigger pathways, more genes. And then connections between pathways indicate crosstalk or shared genes. So a thicker the edge, the more overlap there is between the pathways. So cell cycle and spindle have a lot of genes in common, and so they get a thick connection. Proteosome shares no genes with a spindle, so it doesn't have a connection. In this case we use GSEA, so we get pathways that are enriched in the top of the list, those are colored red, and in the bottom the list are colored blue. So in general you kind of think of these as enriched in A versus B, and these are enriched in B versus A. For those of you who are interested, these edges are just computed using this simple overlap score. Actually there's a couple of scores that are available. So I'll just go through sort of the use cases of enrichment map. We took a data set for this demo or for this presentation. We took a data set, a published data set of gene expression that was measured from breast cancer cells that were treated with estrogen or untreated. So they had multiple time points. At one of the time points they had three replicates of estrogen treated cells versus three replicates of estrogen untreated cells. We ran through our pathway enrichment analysis with GSEA, and then we make up an enrichment map. So this is what the enrichment map looks like, and so you can see it provides a very nice visual overview of the pathways, and you can see that there's a bunch of pathways. So each of these circles is a pathway again, and all of these pathways are related to each other. They have connections, and we said that they're all related to translation. You can zoom in on one of these things. So you can zoom in and you can see the actual pathway names like microtubule organizing center, centrosome, and these in this case are all genontology terms, biological process terms mostly. So actually these might be cellular component terms as well in this particular case, but we would have biological process terms by default. So again, you can see very quickly that there's a bunch of pathways that are up and a bunch of pathways that are down. I usually call the group of pathways a theme, like a functional theme. So this is like a translation theme. This is a microtubule set of skeleton theme. And again, you can just view these much quicker. So the basic idea is that this is a visualization technique that helps you more quickly process the data from all the results of enrichment analysis. Okay, so here's an example where you have two time points. And in this case, we're mapping, just like I showed you in Sight Escape, the enrichment score at the early time point to the middle of the node and the enrichment score at the late time point to the border of the node. And so you can see that most of the nodes are all red or all blue. That means that the pathways are enriched in both time points. But this, you know, a couple of pathways like ubiquitin-dependent protein degradation is only enriched at the late time point. So in the early time point it's not enriched in cases in treated versus control. And you can actually look at the heat map if you load your expression data into the enrichment map. And you can see that indeed, you know, this particular pathway, which is APC-dependent protein degradation, the gene expression is basically the similar in treated and untreated. But at the late time point it's very different. And so that's what this is highlighting. And here's an example of the reverse pattern. Okay, so the third case, third and last case is using an additional gene set to relate to all the pathways that you've visualized. So here we took gene expression data that was measured in a knockout of a microRNA in a heart in mouse. And all of these red and blue circles here represent pathways that are going up and down. And then because this was a microRNA knockout experiment, we took the predicted targets of this microRNA. We represented that set of predicted targets as another node, in this case a little triangle here. And then we did the same type of, we basically measured computer statistic that looks at the overlap of targets with these pathways. And what we can see is that some of the pathways have a lot of microRNA targets in them, and some of them don't. So as you might expect, the pathways that are going up have a lot of microRNA targets in because when you remove the microRNA, which is a negative regulator, you expect its targets to go up. And the ones that are going down don't have that. So we assume that in general, that's validating the fact that the idea that microRNA is targeting these processes. But it's not targeting all the processes that go up. Some of them don't have any microRNA targets. So this would give you some information perhaps about which pathways are being directly regulated by the microRNA and which ones are indirect. So that starts giving you some additional information about mechanism than just looking at the pathways. You might start actually explaining why the pathways are going up and down. And this is another example of a paper that Veronique and Shahina, our group, participated in where we took the gene expression data and used a tool that we're going to learn about, similar to a tool that we're going to learn about on day three, to predict the transcription factors that are important in the gene expression analysis and then map those targets to pathways in the same way that I just explained with the microRNA and again to see which pathways a particular transcription factor of interest might regulate. You can look at that paper if you're interested in more details. The autism example that I showed you this morning was made with this enrichment analysis idea and this happened to use these pathway databases just as an example to show what we did here and this is how many pathways we used. We wouldn't use all of these by default as I mentioned many times today. So pathway, so enrichment app is an app that you can get for set escape, free app again. It allows you to load up your results of GSEA or G Profiler and visualize them like this fairly straightforward, in a fairly straightforward way. Once you've identified interesting pathways, you can zoom in on them and this is how I kind of explained. You might eventually want to eventually identify, so this is the enrichment app. It gives you a bird's eye view or an overview of the whole experiment. You might identify a pathway or theme of interest and then you can even go further and identify one specific pathway. So this is the reactome apoptosis pathway represented as a gene set and it's just one little circle in this big diagram but actually if you go and look and react to them it's a complicated pathway and you can overlay your gene expression data on it and you can identify more interesting information like in this case we realized that one particular complex in this pathway was really differentially expressed as opposed to some of the other things. So there's also a couple of additional apps that are available. I should have mentioned one called auto annotate. We have a word cloud app that allows you to help summarize the theme. So by default enrichment map gives you this map but you don't have these bubbles that I show here, these bubbles. In this case these bubbles were manually added and this was a while ago. Now we have a system called auto annotate that automatically adds those bubbles and it automatically chooses a label for the bubble based on the names of the pathways that it groups but you can also look at all of those names and look at how the frequency of different words in a group of pathways using this word cloud app and you might use that to get a better name for some of these annotations. We'll go through that in the lab. Ruth Isserlin programmed this originally and when she presented it in a lab eating she was really excited and made it big to cookie to bring to lab eating about it which I always like showing because she was so excited about the software. So that's basically it. That's the introduction to enrichment map.