 I'm going to give you an overview of network visualization and analysis with SiteEscape. It's a pretty simple, quick overview. And the reason why we're doing this now, we gave everybody pre-reading that covers most of what we want to talk about. And you also, hopefully, were able to install SiteEscape on your computer. Was there anyone, I don't know if we asked this earlier, was there anyone who wasn't able to install SiteEscape or had problems installing SiteEscape on their computer that we could look at? So we're going to talk about network analysis and visualization. And then for the later part of the afternoon, we're going to return to enrichment analysis. But we're going to have a network view of the results which are helpful. And so that's why we're doing SiteEscape first. So it's a little bit of a diversion, but we're still going to come back and focus on the main topic of today, which is enrichment analysis. OK, so this morning I introduced you to this workflow where we are this sort of generic pathway analysis workflow. There was another published paper that we published quite a while ago. It's a little bit out of date. We're working on creating an update version of this, but that's not ready yet for this course. That focused on using biological networks and gene expression analysis to analyze gene expression data using SiteEscape. So I'm just referencing this here in case you're interested in that. So I'm going to talk about networks in general, what they mean, network visualization, then SiteEscape, and then a little bit on network analysis. OK, so I included this paper in the pre-reading. That's just a short few pages. That covers a lot about what I want to mention. The basic take-home message is that networks represent relationships. It can be different types of relationships. You can have nodes connected by physical, where the nodes can represent proteins. And the interactions can represent the edges or relationships between the nodes can represent protein-protein interactions. So we might call that a physical interaction, because the proteins are touching each other. You can have a regulatory interaction, like A activates B. You can have a genetic interaction, like two genes are synthetic lethal, which means that if you knock out gene A, nothing happens. You knock out gene B, nothing happens. But if you knock out genes A and B together, then the cell dies. So that means that those two genes are important. They're buffering each other. So one's a backup for the other. And you can have functional interactions, which basically means that the two genes are functionally related, based on the interaction. And there are lots of different types of interactions that are useful for relating the function of gene A to gene B, like sequence similarity. If one gene has very similar sequence to the other, it's likely the other gene has a similar function. If they're both co-expressed across many conditions, they probably have similar function. If they have similar protein domains, they have similar function, so you get the idea. The main reason why we are talking about networks, I'll tell you a few reasons. But they're useful just for discovering relationships in large data sets, which they're just much better than tables in Excel. So if you have a spreadsheet where you have columns of relationships, it's hard to see the big picture from that. So networks provide a visual way of seeing the big picture, and you can identify the relationships. And the patterns that we like to look for are covered in a few main patterns that we like to look for are covered in this primer or primer. It's also useful to visualize multiple data types together to see interesting patterns that you also couldn't see if you were looking at each of the data types separately as a table. And then network analysis is a very useful feature of the reason why we want to think about networks in biology. And I'll give you an example, a very, very basic example. Basically, the idea of network analysis is that there's a lot of biological questions that can be answered using network analysis algorithms. Many of these algorithms are worked out in the field of computer science. So in computer science, people have studied for many, many years. Going back to before computer science and math, even more than 100 years, people have studied the concept of a graph which in math and computer science means a network. We don't use the term graph too frequently here because if you just ask people what a graph is, usually people think of it as a plot, like graph paper. But just so you know that that's the terminology used, the jargon used in computer science. And they have the field of computer science called graph theory. And the interesting thing about graph theory is that they've developed a lot of algorithms for graphs that answer all sorts of questions. So this slide just illustrates an example. So how many people have heard of the concept of six degrees of separation? So a few people. So this is the idea that everybody in the world is connected to everybody else through a social network by at most six links. So if you take any two people in the world, there's some connection through Facebook or something of six links or less. This was popularized by an experiment by Stanley Milgram, who's a famous social psychologist, where he wanted to know how people know each other. He wanted to investigate that. So he designed an experiment where he asked people to email a postcard through their friends from Boston to New York. So people in Boston mailed postcards. And they didn't know the address. They just knew the name of the person and that they lived in New York. So each time a postcard was mailed and the next person got it, they were supposed to send a postcard back to the researchers. So they could track where the postcards were going. And they got a whole bunch of postcards actually through to the person. And they found out that the average length of time was only six steps. So that's where this idea came from. If you have a network of all sorts of people, like the Facebook network and how they're related, you might want to know how people are connected. And in graph theory, from the field of graph theory in computer science, there's an algorithm that has been mathematically proven to give you the shortest path between two points. And it's called shortest path by breadth first search. And if two nodes are connected, it will tell you that the shortest path, or if they're not connected, it will say they're not connected. And the nice thing about it is that it's mathematically proven to always be perfectly correct. So that can be used if you have a protein interaction network and you want to know if two proteins are connected and how they're connected. You can use that algorithm. You can just take it from computer science. You don't have to do all the theory improving, math proving that they did to make sure that it's running perfectly. And then you can just use it. Now, you might ask, well, is it biologically relevant to know that one protein is connected to another by a shortest path? Maybe that represents an important pathway. But maybe it's not really an important pathway. Maybe the path, those proteins, don't actually talk to each other. So you do need to consider the biological relevance of the result. But this example illustrates the point that you can take work from the field of computer science and use it to answer specific questions. And it saves a lot of time, especially if the question is a good match. So people of over the past 15 years of network research and biology have come up with all sorts of different methods to use these network algorithms to answer questions in biology. So tomorrow, Quaid will talk to you about gene function prediction. And this is based on the idea, the problem with those pointers that doesn't appear on my screen. Let's see if this works, yeah. So in gene function prediction, you look to see if you have a gene that you don't know what its function is, you can try to find out how it's connected via protein interactions and other types of interactions to other genes. And if you know the function of those other genes, then that might tell you something about the function of your gene of interest. So again, it's using a network idea. People have found that you can look for dense regions in a network. In a protein interaction network, dense regions represent protein complexes. So if you have a big protein interaction network that you might get from proteomics experiments, which started generating this data in around 2001, 2002, then you can run these graph clustering algorithms and identify all the protein complexes. And you can predict new complexes. People have used these to study network evolution, how pathways are changing between species. You can predict new interactions and new functional associations. So one algorithm that people used was try to find these dense regions in a network where there's a lot of connections, but some of the connections are missing. So maybe those connections just have not been identified experimentally. And I would predict that they should occur. And that actually works reasonably well at predicting new protein interactions. People have also analyzed a lot of disease data. Come up with a lot of interesting ways of using networks in disease informatics, I guess you could call it. So one interesting paper that's this pinnacle z or pinnacle z from UCSD tries to identify regions of a protein interaction network or another network that are transcriptionally active in a disease context versus control. And that might help you identify pathways that are or regions of a network that are correlated with the disease. People have used network information for diagnosis of disease, again, and it has some of the same advantages of pathway information. And also in kind of a GWAS type of approach. So a lot of the types of analyses that you can do with pathways, you can do with networks, and there's some things that you can do with networks that are more natural. But the nice thing is that there are plenty of a lot of methods out there that are useful, that could be useful for you. Networks are fairly simple. Like if I give you a protein interaction network, it's a model of what's going on in the cell, but it's missing a lot of information. So typically networks are frequently, and even pathways as well, are represented as static processes. So I guess I should say gene set versions of pathways. So you might not know that there's an ionic wave going through a neuron or a feedback loop that's negatively regulating the pathway in a network. There are mathematical simulation tools that are available if you really know a lot about the pathway and you want to simulate it and see how it works at every little microsecond. You can do that, but we're not covering that in this course. Usually to do that, you need to accumulate a lot of detailed data, which we often don't have. Another thing that's missing is detail about the atomic structure of the protein. So proteins have a structure. They have domains. They have phosphorylation sites. We usually don't see that in a network, although sometimes you do. And then context was mentioned earlier, the cell type in the developmental stage, we don't always see in the network. Usually when we look at a network, it's the combination of all the interactions in the organism, whether they're in one tissue or another. So if you want to filter that based on gene expression in the brain, for instance, then you can do that and get a more brain-relevant network. OK, so that was a very quick overview, again, because we provided pre-reading. And the main point is that networks are useful for looking at your data and identifying relationships in your data and patterns. It's very important when you see a network to understand what the nodes and edges mean. Often, in biology, nodes represent genes or proteins, but they don't have to. They could represent other biological concepts. And the edges often mean some functional interaction or physical interaction, but they could be sequence similarity relationships or something. So whenever you see a network, always ask what the nodes and edges mean. And there are many methods available for gene list and analysis using networks. So sometimes there's an overwhelming number of methods available. I'll show you in a minute the number of tools that are available for site escape. There's hundreds of them. So it's hard to know which tool is available for answering your question. So there's two ways of handling that. You can try to be an expert and read through everything, but that's very time consuming. Another way is if you have a very specific question, like I have my gene expression data, I want to find pathways that are predicting outcome or something, you can ask the question on a mailing list, like the bio stars or the site escape mailing list. And then with a specific question, people might be able to recommend a tool that exists for you. OK, so any questions? So the question is often the networks are visualized as flat plane. So they're 2D visualizations. You can make a 3D view. The 3D view of a network often is difficult to work with unless you can rotate it. So the reason is that when you view a network, you want to reduce the overlap between the circles and the lines. And so the layout algorithms work to reduce that overlap. If you have it in 3D and you don't get a chance to rotate it, there's a lot more overlap. And that overlap harms your ability to see the relationships, because sometimes you can get lost following an edge. So if you have a 3D view and you can rotate it, then you can eliminate that. And sometimes the 3D view helps you by putting more information on the screen. In terms of more dimensions in that, you could try to add more dimensions somehow, but usually we're limited to two or three dimensions. Can you do it in 3D? Yeah. So it's not often the question is, can you encode something else in the third axis? Yes, you could. And in network visualization, it's really up to your imagination, what you want to do, and how you want to encode the information. For certain types of things like that one, I'm not familiar with a feature in Sight Escape that lets you do that easily. But with simple scripting, you'd actually be able to do that. So you could do that. Any other questions? OK, so now I'm going to talk about Sight Escape. Just the basics, and I'm going to give a demo. So Sight Escape is a freely available network visualization and analysis tool originally developed in Seattle at the Institute for Systems Biology by Trey Eidecker and Benosz Wachowski. And then Trey moved around, and now he's in San Diego, and Beno's now in Paris. And as they moved around, they brought Sight Escape with them. And I'm one of the contributors of Sight Escape as well as some other people in San Francisco and some companies. So it's an open source project that has grown in popularity because multiple people have kind of contributed to it. It's a nice thing about open source projects. And this workshop series in general tries to highlight open source projects because they're freely available and scientists can contribute to them and make them better and customize them. So it's usually a more efficient way of getting things done. So from a development point of view. So Sight Escape provides as a default features to visualize and mostly visualize a network. And then you can download apps that help you with many other types of network analysis. You can, by default, you can manipulate networks, move them around. You can lay the networks out. And this helps you with visualization. You can filter and search in databases. And then the real power of Sight Escape is that there's an app store. And I just checked. I think there's over 280 apps. This is a bit of an outdated number here. But there are over 280 apps that have been developed by people the world over and contributed to the ecosystem of Sight Escape. And you can search on the app store to find analysis packages and other things that might be useful for you. So that's really the, there are other network visualization and analysis tools out there. There's a number of them. There's another one developed in Toronto called Navigator. And but Sight Escape is the most popular one. And it's mainly because it was freely available in open source from an early point. So a lot of people contributed to it. And because there's a lot of contributions, there's a lot of users, and there's a lot more features available for it. So this number is actually old. I think we're at 14,000 downloads a month, which is a lot. And 250,000 people have downloaded it in its lifetime. So because of that, there's a lot of users on the mailing list, a lot of documentation, a lot of answers to questions online, and a lot of apps that people have contributed. You can build your own apps. Right now, it requires programming in Java. In the future, it'll probably be, we're trying to make it easier. And so the main take home message here is that it's an active community, which you can take advantage of, not trying to sell Sight Escape. You can use other tools if you want, but it is probably the most, some of these other network analysis packages might have unique features. But Sight Escape has the, and you can use them if they're useful, Sight Escape has the most amount of features associated with it. It's a nice active community. We had a Sight Escape conference a few years ago where people spelled out Sight Escape here. And OK, so again, I'm not going into too much detail here because we gave people pre-reading, so we could save time during the lectures. So I think I covered what, you know, the quick summary.