 Some background about network and visualization analysis and then talk about a particular type of pathway analysis visualization called enrichment map that was used to make that autism network that I showed on the first yesterday morning. And we'll also go over site escape, which most of you are everyone's familiar with now. The reason why we have some intro, still some intro material that we're giving is that we felt that it's in general better to get you started, and actually I think this is based on feedback from previous classes, it's better to just get you started with getting your hands wet with lots of different stuff rather than spend a whole day going over concepts and then only at the end going over actual tools. So we put more tools in the beginning even though we didn't go over all the concepts and we'll fill in some of the concepts afterwards. So now you've actually used networks and looked at networks and so some of these concepts will be partially familiar but we'll just go through them to round out the conceptual aspects of the lectures. Okay so just to summarize or come back to what we've been talking about in this course that there's sort of a general network analysis workflow. I showed this yesterday and so I think everybody's seen this now and what we've learned yesterday and this morning are aspects about figuring out how interesting areas of networks. So for instance looking at modules, taking your gene list and converting into a network with ReactomeFI and then identifying modules and doing pathway enrichment analysis on those modules, that's a little workflow within here that helps you identify interesting networks in this case and then you can look at genes and go out to Reactome and drill down to understand molecular mechanism. This little image is updated from an older version that was published previously so I think this paper integration of biological networks and gene expression data using set escape is still valuable although it's much older now. It still talks about different aspects of the same workflow. Okay so Lincoln also mentioned some of this and I just wanted to go over again fairly quickly again to fill in some gaps the background about networks. So networks in general represent relationships in biology. Those relationships are often physical, regulatory, genetic or functional. So physical relationships are protein-protein interactions. Regulatory relationships are transcription factor, target. There's an arrow there. Genetic interactions are often more often found in model organism research where you have a logical relationship between two genes in terms of genotype and phenotype. So if you have for instance gene A that you knock it out and nothing happens, gene A, gene B you knock it out and nothing happens, you knock them both out together and the cell can't survive. That's a synthetic lethal genetic interaction and there's a lot of discussion about genetic interactions where you have even in any kind of field of genetics. Functional interactions are more general. They are any kind of interaction that can be used to help relate the function of two genes or proteins. So there's a wide variety. For instance, if two proteins have similar sequences, they might have similar function. If they have similar expression profiles over many experiments, they might have similar function. So any type of pattern that relates genes where you can say they might have similar function is a functional interaction. And the Reactome-FI functional interaction, that network is composed of functional interactions which are themselves combined for many other types of relationships. Networks in general are useful for why do we actually bother with networks? One they're useful for discovering relationships in large data sets so they're much more easy to use than finding relationships in tables which is what you would have to do if you didn't have this idea of a network. They're very useful for visualizing multiple types of data together so you can have a network and then overlay different types of information on that and you can see how they're related and of course for network analysis. And this is an example from the pre-reading that we sent out, the primer on network, how to interpret biological networks where we have a lot of different information overlaid on this network and that's useful so we'll come back to this later. I think Lynn can mention the difference between pathways and networks as well. My take on it is that pathways are networks, they just have a lot more detail and often that detail is related to, there could actually be different types of details so you could have a metabolic pathway where it's really important to have steps that are transitioning between metabolites, you can have a regulatory pathway where it's really important to kind of think about the logical circuits of the cell, have one part of the cell is controlling another part of the cell and signaling pathways which talk about information flow so they represent that information. So there's actually lots of different ways of representing pathways. And there's no real definition, official definition of a pathway. One way of thinking about it is that it's sort of general definition is that it's any gene that's related to a process. So how could you figure that out? You have a way of stimulating a process and a way of reading out a process so maybe you have a drug that you put on a cell and that activates a receptor and some pathway occurs and then a gene gets expressed. So you can read out, you can check that the gene's expressed and you can control the activation of the pathway and then if you knock out genes in the system any gene that affects that transition that changes the ability for that signal to go through is part of the pathway. One way of thinking about it, metabolic pathways are more biochemically oriented, they might think about flux of material through a system where you're really just talking about channels of, you know, major channels of material flowing through the cell. But I'm sure many people here know this but the, sometimes people ask, you know, what actually is the official definition of the pathway and there's no real, very specific definition. So network analysis, you know, on the one hand networks are useful for seeing relationships and visualizing data but the most powerful reason why people have gotten excited about networks in biology is that there is a large amount of work that's being done with network analysis in other fields and in computer science network analysis is being used for and math network analysis is being discussed for more than 100 years. People didn't call it network analysis they called it graph theory so many, some people may have heard that term so computer scientists and math call networks and mathematicians call networks graphs that's confusing so we didn't use that. Most people when you ask, when you say graph they think plot so we don't use that terminology in biology we say network because it's more intuitive but at the same time we can go look at graph theory and see all the algorithms that have been developed over more than a century and some of those might be useful for biology and so in the beginning of this network analysis field people looked to graph theory and they said oh I have all these biological questions I can just take all these algorithms from computer science and they answer my question I don't have to develop anything new and that was extremely powerful and lots of computer scientists came up with all sorts of interesting ideas for solving biological problems and a lot of those are used in the tools that we are using today like the Reactome FI system that finds the modules that system that finds the modules is a network clustering system that people have been working on network clustering for a long time and now when you want to do that in biology to find modules you just choose one of the 1000 different algorithms at least that exists to do that and they chose one that worked well and was fast so you don't have to develop it ourselves so here's an example of how this how this works a very simple example how many people have heard of six degrees of separation this idea of six degrees of separation so this is this is the idea for people that haven't heard that everybody in the world is connected by at least six hops in in terms of friendship to everyone else I'm sure this was developed in the 60s I'm sure it's like two or three now with Facebook but the it's an interesting idea the person who came up with this idea was a sociologist named Stanley Milgram who didn't want to learn about social networks and did an experiment where he took postcards and he gave them to people in Boston and he said send them to this person in New York you have the person's name and what they do but you don't have their address and you're allowed to use any you're not allowed to look up any information about them you just have to use your friend network and send the postcards through your friends to see if it eventually will get to them and half the postcards got to them to the person and each time a postcard was sent he he asked the people to send a postcard to him so he could track where they were going and in the end he realized that it was six it took like on average six steps before before it got to the right person so that was really interesting if you you know now we have this Facebook network so you have it all the computer if you want to do that same experiment you want to know how two people are connected what could you do which path would you take so it turns out that computer science have solved all these problems for how to search these networks and find paths between people and for instance there's an algorithm called shortest path right breadth first search so it it searches from a node outwards until it reaches all of their nodes and then eventually it will tell you that this is the shortest path and the good thing about that algorithm is that it's mathematically proven to guarantee it will has a couple of guarantees one it's guaranteed to find a path of it if it exists and it's also guaranteed to give you a shortest path there might be more than one shortest path but you will guarantee get the shortest path so that's interesting so that's a standard algorithm every undergraduate in computer science learns it if you were interested in if you have this giant biological network and you wanted to know how two proteins are connected you could just apply that algorithm and find that path now is that biologically relevant it's definitely a solution from from computer science but that path is the shortest path does that mean anything in biology maybe it means something interesting maybe it's not it doesn't consider some important aspects of biology so we do have to be careful when we're thinking about when we're moving in for you know algorithms from computer science over to biology whenever you learn whenever you're using network analysis if you eventually use it enough you can identify sort of these core algorithms that that are used over and over again and just be aware of what's missing what they're not talking about so we talked we you've seen a few already the and you'll see more later but network analysis is useful for all sorts of things it's so basically a number of algorithms from computer science of being ported over brought over plus a lot of new algorithms of being being developed in computational biology to do all sorts of answer all sorts of interesting questions predict the function of a gene finds modular structures which could be protein complexes or pathways study network evolution so if you're if you have networks from different species there are alignment algorithms for networks just like there are alignment algorithms like blasts for protein sequences predicting new protein interactions or interactions so one of the ideas is you find a region of the network where almost everything is connected so maybe everything should be connected so those additional connections you can fill in and maybe that's predicting new interactions that's one idea but there are there are many ideas like that there are also a lot of app network applications of network analysis that are applied to disease so people have published papers about identifying disease-specific sub networks so that are that that could be networks that are transcriptionally active in your cases versus your controls or your experiment versus your your your control people have used networks to for for diagnosis and I mentioned this briefly yesterday morning there are being a couple of papers that are are are quite highly cited now where instead of looking at genes to associate with in this case it was breast cancer metastasis they use networks and they search for networks that were so where the gene expression the networks contained proteins that were connected and their gene and they were all differently expressed and they were correlated with outcome in this breast cancer study so all those three things together were searched at once and these networks came out and and they actually overlapped a lot better than the individual gene signatures that's what I was mentioning earlier yesterday and similarly people have applied this to to GWAS studies as well okay so there was a question about this yesterday and I'm sure you guys have been thinking about this about what's missing in in networks and pathways so we we mentioned yesterday in response to a question that pathways and networks basically don't have any context they're representing static they're representing dynamic processes as static networks so you're not seeing an animation of the process when you're looking at network or a pathway you see a textbook like picture and and it doesn't represent so that's that's missing I mentioned yesterday morning that one place to get that information is your own experiments so you your own experiments are providing important context it's also only useful for representing a certain part of what we understand about how the cell works so a feedback loop is not easily represented a calcium wave and a neuron is not basically is impossible to represent with networks and pathways typical representation and you know a lot of a lot of things that we know occur in the cell just don't show up on these pathway diagrams so there are more detailed modeling representations that exist so if you ever need to do this we're not going to cover it here but people have developed a huge field of work to help simulate pathways and if you know a lot about a pathway you can predict all sorts of things about it like what happens if I knock out this gene what's going to happen to anything else in the pathway a lot of detail is missing from the protein structures as well so we know a lot of detail about atomic structure of proteins and proteins are made up of domains and they have binding sites and none of that is really discussed and I mentioned context okay so just to quickly summarize I think everyone understands that networks are useful for a wide range of ideas and there are many many tools available for network analysis and I think I mentioned this later but just mention it now there is a yeah so so because there are so many methods available if you go to the site escape app store which I'll mention I think there's 200 apps now or more than 200 apps each one of those represents pretty much a method that somebody's developed and that's not all the methods that have been published so how do you know what method to use for your problem so in this course we're teaching you major classes where we have methods where we highlight the most useful and user-friendly tool in our opinion and so that might actually be fine for you for anything that you do however if you come across something that you want to do that's not covered by that there are two ways of dealing with it one is obviously can become an expert in everything but that's not feasible for unless you're really getting to the area the other way is just asking experts so you can ask us in this course anybody that's a good opportunity to do that and there are also mailing lists including for network analysis the site escape discussion mailing list is particularly useful if you have a question you know I want to do this I want to answer this biological question here's my experiment is there some way that I can do this and we're in relation to site escape at least people will answer if there's an app available you can also go to the app store and see what the popular apps are or just look for publications if you're looking through data and you look at publications and you go to Google scholar and see how many people have cited the publication that's usually a good a good indication that something is working well okay any questions okay so again this is this is basic this is a figure from the from the pre-reading hopefully everybody did the pre-reading to this just just reminds everybody that there's different ways of thinking about a network so often in a if you're representing a network is in a spreadsheet you'll have a set of relationships that are just you know two columns usually gene a connected to gene B and going down those columns sometimes you can have a score associated with that like an interaction strength this is the sort of standard way of representing networks and you can also represent networks as a heat map where which we don't really cover that much but is is maybe useful so the reason why people would represent use one of these versus the other if they're sending a file like a network file and you would need to represent it at a table that it would be represented like this this is how we prefer to visualize networks but if the networks are really really really strongly connected or they're everything's connected to everything else this visualization breaks down you can't really see anything after there's so many at once there's too many edges in the in the network and so in that case a heat map visualization is is much better because it's it's using all the space it's there's no possible conflict between edges that are all crossing over each other network visualization is is really dependent on on automatic network layout so we'll talk about that if you didn't have network layout algorithms which again are something that was taken from stolen from computer science then you when you draw a network it would look like this mess but then once you apply a layout algorithm it it looks a lot nicer and you can see structure so that's that's very important so the just a practical there they're actually there are also many many different layout algorithms and in the lab I urge you to just try all of the different ones out that are inside escape the major class of network layout algorithms that people tend to use first is called force directed and the way it works is it sets up a system where nodes are repelling each other so there's some force pushing nodes apart usually it's based on physics and the forces will be like charges or something and you know if you put two magnets of the same chart pull together they'll repel so that the physical formulas that are that model those processes are usually embedded in these algorithms and then edges are usually pulling the nodes together so you have this tension between edges pulling nodes together and nodes pulling each other pushing each other apart so the edges pulling are usually represented as springs the reason why you want nodes to repel each other is because you don't want all the nodes overlap overlapping on top of each other and the reason why you want edges to pull is because you want things that are closely connected to be close to each other in the network so here for instance all of these guys are highly connected and they're all close to each other and so we can see this nice blue region here whereas here that they're just all over the place so this is you know these blue guys have been pulled together because they're connected to each other there's a lot of forces pulling them together so force directed layout is definitely the first thing that you should probably try with any any network in general it's good for up to a certain size of network 500 nodes bigger networks give you have problems as i mentioned you get a a hairball where there's too many if you have too many edges you'll just get a lot of edge crossing and you won't be able to see it and how many people have seen this already in their reactom fi trials so if that ever happens practical advice is to reduce the number of edges somehow so for reactom fi i think every edge is associated with a confidence value and i don't know if you guys go over this in the documentation if there's too many edges that come back you can reduce that you can filter and remove lower confidence edges do you do you guys do that yeah so that's probably not that big of an issue with reactom fi but in general if you're working with networks you'll see in gene mania that sometimes the networks get really big and and there are also other types of network layout algorithms if you happen to be working with a network that is more like a tree like you're working with a pedigree or phylogenetic tree then there are specific layout algorithms that lay that out hierarchically and and generally you need to do extra work to get a publication quality version of a network you should not just rely on the network layout algorithm because it's not perfect so if you want to get basically get a publication quality figure you should do two things one you should manually adjust the layout so that everything's not overlapping two you should load you should save the network as a pdf a vector graphic which is pdf and then load it into a drawing program like adobe illustrator correll draw and there's there's free versions of that as well and then adjust things so that you can emphasize certain things like you might some of the labels might be overlapping so you can move them around you can add add bubbles or arrows or color things appropriately and that's if you if you take care to do that you'll get a much better looking network then you just print out whatever comes out of side escape okay so please interrupt me with questions usually actually one thing i forgot to add to the slides people often ask a question what is the meaning of the of the length of the edges when you at the length of these edges when you run a network layout algorithm that that doesn't have any biological meaning it's only the length of the of the interaction is set to basically where the nodes are positioned after running the network the layout algorithm in general so if you have a long connection or a short connection they're all they're all considered the same by the the layout algorithm the standard layout algorithms there are versions of the layout algorithms that are also available in side escape if you want to have the length of that edge relating to a confidence value that you have on the on the connections you can have an edge weighted spring force directed layout and it will consider those those weights and and shorter edges will be stronger edges and longer edges will be weaker edges weaker interactions you can adjust the layout if you want to maintain that in your figure then you have to be careful when you're adjusting the layout but usually the adjustments that are needed are fine-tuning adjustments to the biggest problem when you're visualizing data is that you have visualizing networks is that there's overlap so you want to reduce that overlap so nodes and labels shouldn't overlap each other otherwise you can't see something and the other big problem is edges that cross so the layout algorithm is trying to remove as many edge crossings as it can but it can't remove it it's impossible for often it's impossible for it to remove all of them because it's just not possible to draw a network in 2d without edge crossings for some networks and for most networks and but if you want to strategically move things around you might be able to do that you might also be able to kind of move whole sections of the of the network around so that you switch their positions sometimes that might be useful for layout purposes and sometimes it may be useful for biological purposes so maybe you want to group certain types of proteins together the network layout algorithm doesn't necessarily know where they should go but you want to put all the nuclear nucleolus proteins here and or nuclear proteins here inside of plasma proteins here so you can you can move around these modules and clumps if you want when we get into the enrichment map later we do a lot of that moving about editing and most of it's done in site escape and then we we do what i mentioned in terms of making a publication quality figure any any other questions so the the question is how you know when would you actually use these these strengths often we don't have that information of strengths with it with interactions some systems that we use like react MFI this the the confidence or the strength of the interaction is there and then we can choose to use it it may not be useful to to use it that much for visualization purposes but it's very useful often to use it for filtering so you might want to especially if you get too much information you can remove the weak the weaker interactions and then focus on the stronger ones that's often a very valuable technique for dealing with too much information as i mentioned here so if you have too much too many connections you can you can either filter by if you have this information and often you get really amazing results like you'll see structure in a hairball that looks like this there is actually structure in here you can sort of see these these modules in here if i were to remove if i were to do some filtering on this i might really see those coming out and see how they're connected to each other whereas they're hidden right now the i don't know if that answers your question so i wouldn't worry too much about the edge weights because often we don't have them protein interactions often we don't have that information you'll see in gene mania that some of the networks have weights and some of them don't sometimes there's a natural weight a natural weight that's this that comes with a network like if you're using co-expression networks the natural weight is the correlation coefficient and so it really depends on your networks protein interactions you almost you rarely have it correlation based networks you you do have it functional interaction networks you often have it and so it depends a little bit on the type of network that you're using is that yeah so i wouldn't say that it's it's important to think about all the time it's it's it's not a core concept that we need to worry about too much but if it if you happen to have that information then it could be useful for filtering and then you may want to visualize it but that doesn't come up too often in papers that are typically published with network analysis so so i mentioned filtering the other obvious way to deal with too much information is to zoom in and often you want to zoom in on some some area of interest like a set of genes that's involved in this phenotype or a set of genes involved in a pathway okay so we you know i think you guys have also tried visualizing you know you you've played with networks you've seen that they're you've seen how they're laid out today we'll get into a little bit more sort of different options for for doing things especially in relation to use of cytoscape there are a lot of different ways to visualize network information and there's a lot of attributes that you can use for nodes and edges and that that can be mapped to various different types of visual attributes so in general we think of nodes the core idea of a network just has nodes and edges there's nothing else that's it what we've added in biology is a label so clearly you need a name for the node and you usually don't have a name for the for the for the interaction and then often there's lots of information that we attach to nodes because nodes represent genes and genes have lots of information like gene ontology categories and and other information and and so we can just attach a lot like whole spreadsheets to each node basically and and interaction those can be mapped to visual attributes so for instance if i have gene ontology categories if i have a set of functional categories like 10 functional categories i can map those to colors and i can have different colors viewed in the network that are related to function um and really this is up to your imagination and how you draw the network but it's just a useful concept to just think about how you have data in your network and you have and you can visualize it and that's one of the core parts of site escape is is is uh that idea of mapping data to visual attributes in the network and this network that we've that we published in that that primer uses site escape so maybe i'll just go through quickly so blue the color is is related to the function some some functional term kinetochore nucleosome replication fork and the the edge width is related to the correlation of gene expression the edge is there or not if there was found in a protein interaction and the size of the node is proportional to how high the the maximum expressed transcript was in this experiment which is cell cycle tracking in yeast um and then um we used a force directed automatic layout we manually moved some of these things around um and but mostly it was the result of the force directed layout that that made it look like this and um and then we saved as a pdf and we added this purple circle here um this purple cloud i guess in adobe illustrator we added this arrow um we um and you know all these little clouds we added and we we moved the labels around a little bit so they were more visible so there was no nothing overlapping the labels so that you can actually read the labels um and we added these large labels here um to um uh in the drawing program and actually some of these things now you can add in site escape it's called annotations on the network and we can look at that um and but they they probably look better if you add them in a adobe illustrator um and as we discussed in this primer obviously it's useful for data relationships there's this idea of guilt by association where you can predict the function of genes if they're linked to a lot of genes of known function and we'll talk a lot about that tomorrow with quade telling you about gene mania um these dense clusters you've seen how they're useful in the reactome fi plugin and global relationships between these clusters might might tell you something interesting okay any questions about network visualization yep sorry what was the question you can just have the name and the names can be linked to colors um i'll show you how that works in site escape at least okay okay so um any other questions about network visualization so um the the sort of summarizes what we've talked about but because it's pretty basic i think we'll we'll go over it um quickly um and we can move to network visualization analysis using site escape so um so i'll talk about site escape as software um some of the basics and and some examples uh everyone has used site escape so the the goal of this this uh section is to fill in again fill in gaps um uh and and answer questions about site escape can site escape do this or that that's usually the most useful thing so site escape is um free software for network visualization analysis originally developed at the institute for systems biology in seattle um by trey idaker when he was and bennish wikowski so trey was a phc student with lee hoode who coined the term systems biology i think um plus he did a lot of other interesting things um like invent automated dna sequencing and um so the um beno was a as a faculty there and they both worked on the first version of site escape um then they moved in different places trey is now in san diego and benos in in paris um and multiple people have joined the consortium over over time so after site escape was made it was made freely available in open source so open source means that the source code is made public and anybody can contribute and and then turns out lots of people um contributed to this project i i got involved pretty early because i was working on something similar my graduate work um developing a network visualization system and um but then when i saw site escape it looked like they were ahead so i said okay i'm not going to work on mine anymore let's just collaborate and work on on this together and so through that um so that's one nice thing about open source development is that you can um focus more on science so you don't have to redevelop reinvent the wheel you can collaborate um developing tools that everybody needs and then you can develop the the more important stuff that's more related to your to your research um so i'll talk about about that oops looks like this was cut off here um so um as you guys have seen site escape provides lots and lots of different functionality um and uh it's very very useful for visualization it's not the only network visualization software there's probably quite a few others that are out there this is the vastly the most popular one and um and the only one that's really available for for free and open open source um yeah so so ingenuity is more about pathway analysis so i can i can mention that in general um i think that that's related to the this entire course so um i don't see i don't um think that ingenuity is uh i don't think of ingenuity as a network visualization um system it does have some network visualization parts in it but mostly it's um it's a the good thing about ingenuity is that it's a commercial product that has a really nice um system that they've made you have to pay for it but it's it works well it's smooth they have documentation and support um it's um so that's a good thing about it another good thing about it is that they've spent a huge amount of money like 50 million dollars um collecting the biggest network database that they can collect um and they don't give that out to anybody it's only accessible via ingenuity pathway analysis um and um you basically can't can't buy it so um compared to public public data like the reactom fi the ingenuity pathway analysis databases bigger than anybody else's um the so that's that's interesting whether how useful that is in terms of actually getting better results is um probably case case dependent um but uh the disadvantage is that it has a very limited set of features compared to what's publicly available so they've taken like three types of analyses and um all of the analyses that we talk about in this course are are more advanced than those statistically and and they there there's better concepts in general because they're they're really kind of stuck in analysis that was popular originally 78 years ago so for instance they they have a enrichment analysis but they don't do like what gsa does with the with the ranking um so they're they're just limited to having a cutoff of like focus genes that you that you analyze um so that is missing out on what we discussed as advantageous for gsa um they also have a um a way of finding networks that are related to enriching your gene list that's pretty good i think and um it's an interesting analysis um analysis they they have a lot of additional data like drug interactions and things that they make very easy to to see um the um but not um the uh there's a there's a wider variety of thing i'd say you know we never we we've done we have one paper where we really compared ingenuity and um and set escape on the same data set and i could tell you that reference um it um it's published last year i think um and uh and what we could see is that a lot of the same themes came up in either one um but there was actually some unique to one some unique to the other so in that case there was a little bit more information that you got by doing two but if you had done one or the other you you know in terms of the types of themes that you saw enriched often you know a lot the major ones were all there um and um you know so i'm not sure how that if you have a license to it it's definitely useful if you're yeah yeah i think i think that that it is useful because mainly the database that they have that's the biggest and it's nice and user friendly but you know that's not that you can get get away with not having something user friendly as long as it's good right um but really the the key thing scientifically is that their database is very different than the public databases i'm sure they pull in all the public databases as well yeah so it's like the it's like the controversy over the public genome project and the private genome project and private genome project took all the public data to help their assembly so um but that's that's fine and that's why we're making data available for free so everybody can use it um okay so uh site escape is really focused on networks uh and it doesn't it's not really focused on gene lists so this course really talks about gene lists and starting with a gene list and then doing analysis on the gene list with site escape there's a step that you need to do to convert your gene list to a network and the reactom fi plugin app is very useful because it does that for you right you give it a gene list it gets a network for you and then does some analysis um we'll talk about gene mania tomorrow is another app that can do that for you but by default site escape doesn't do that when you just download site escape and start running it it's it starts with a network um that's particularly useful if you have a network already and many people do so if you're studying protein interactions for instance you're mapping them with proteomics the first thing that you want to do with your data is visualize the network and that site escape can't be beat basically for that um right off the bat it's it's useful for that um but that's one confusing aspect if versus you know how it fits in this course just the this relationship between gene list and network um so just keep that in mind that when we go through site escape we're talking about networks but somehow you have to get the network from somewhere right so Lincoln talked about networks and pathway databases and we've looked at reactome fi so um there are ways to do that but um what i'm talking about here is sort of the typical site escape case where you start with a net you load a network and and that's where you start um so network information comes from databases and is loaded in and then you do some analysis on that network um so um site escape has lots of different features uh these slides are out of date but i'll go over the release version um and um it by default it doesn't really help you do much analysis it's just a visualization and filtering and querying system and the automatic layout is there um and the um and there's databases that you can search which is useful um probably the most valuable aspect of site escape is that because it's a free freely available system and also um it has this it allows people to write extensions that um we call apps now we used to call them plugins um so we have an app store that was more fashionable i guess so um if you go to apps.siteescape.org how many people have gone to apps.siteescape.org already okay only a couple so um we can we can look at that but apps.siteescape.org has over 200 apps that you can download and they do different types of analysis like the reactome fi app um site escape gets um it has thousands of users and um i think these these statistics are even out of date now i think it's like 8,000 downloads per month um and um and thousands of people are using it using it per day um there's quite a lot of documentation so it's good that the reason i'm mentioning that is not to say that oh you know it's great that everybody's using it the reason why i'm mentioning that is that it's an active community of users that help new users as well so um if you um you know people there's this very busy mailing list that um if you ask a question there's a bunch of people on that mailing list that will help you answer your question and usually um we try to guarantee an answer to the question within a week um so it shouldn't be longer than one week that you wait um there are quite a bit of documentation and data sets that people are making available um tutorials are available that you guys went through uh there's an annual conference the next one will be hopefully in boston uh sometime this year um and and all these apps that are being developed by this community as well so um if you want to if there's something missing in site escape you can you can build it if you know how to program or if you have a friend who knows how to program um it's possible to do that and many people have have done this and usually each one of these has a publication that they they published so this is just a picture from a site escape meeting that we had in toronto a few years ago where everyone's spelling site escape was for fun um so there's a fun community um okay so site escape is useful free software for network visualization analysis it's not the only one by default it provides basic network manipulation features um but you really need apps and data to make that useful for for any particular task and that's one of the one of the difficulties with site escape is because it's it's a workbench um imagine you you're building something with tools and you come to a workbench you have all the tools but how do you know which order you have to use the tools in right to actually make a table or something like that um so whenever you have a workbench situation you're always faced with that problem you have lots of tools how do you string them together into useful things um and so in this course that's why we're talking about workflows and telling you about some of these useful useful workflows but um there are many inside escape and once you get to know know it there's uh there's uh lots of things that you can do with it it's very powerful but that comes at a cost of having to learn that okay so um I'm gonna uh just demo um go over site escape um features here um up here so it's actually site escape 3.1.0 um that we recommended for this class although just a few days ago or maybe a week ago we released 3.1.1 which is uh fixed a lot of bugs um okay so um I think everyone saw this so the welcome screen is the first thing that comes up and this was added to really try to help people um uh get started um with some network data so if you're working on a different organism you can click um you know get download this network if I click on any of these buttons the networks that are that are going to be downloaded are actually quite big because it's um all the protein interactions for a whole organism um it could be hundreds of thousands of of or more than 100,000 connections um should be downloaded pretty quickly um but then you'll get this network that will be difficult to work with to visualize um so the the useful thing with this is certain workflows that need a network to start and then you add your you query your network with your gene list to identify interesting parts um you can you can start with a network from from here um I'm going to I loaded up um the um uh sample session file that is uh comes with site escape and um I'll just load that up to to show you how it works so I think as everybody everybody saw you can um you can zoom in and out and move move nodes around um you can select a bunch of nodes and move them around um there are lots of different layout algorithms the default layout when you start if I if I just click this button here apply preferred layout um it will lay things out using the force directed layout um and that is the site escape force directed layout I can um set my um there's a way of setting um to uh setting the preferred layout if I um if I want uh any layout here to be the the layout that gets run when I click this button you can you can do that so in layout settings um but you can you can um see there's all sorts of um interesting layout so I just urge you to um to try these out yourself and see how they they all look um okay so um maybe I should mention a couple of things about specific layout so Y files is used to be our favorite layout Y files organic is a force directed layout and used to be the one that I recommend everybody to start with because it's really it's really good and fast um the one problem with Y files is that it's a commercial product and so some people um had some um issues with that because you couldn't use it in your own app but um and so we made a uh a site escape force directed layout that is um one of these layouts actually I used to be called site escape layout but I think we we renamed it to one of these force directed layouts um so um there's a there's a bunch of other layouts that can lay things out based on attributes so here is a um there's a bunch of attributes that were I've already been loaded up I'll talk about how those work I'll just try a um okay no that wasn't the one I wanted um a um group attributes layout um where I I group things by some attribute so in this case it's the number of connections so it doesn't really make sense but you could have noticed that there's circles here so there's multiple circles each of these is a different attribute if I had functions I could lay out my uh attached to if I had gene functions attached to the network I could lay out the network based on functions and each function would be grouped into its own circle um okay so you get the idea um you can um do some and out some manual positioning of nodes that is helped by um these these functions here you can rotate sets of nodes so that's kind of fun you can scale nodes um this is very useful if there's a lot of things bunched up together and you just want to spread them out so you can you can just spread them out like that um and then there's also a lining distributing so you can align them all to one corner one side or the other um you know uh and that that might be helpful for manual manual layout um okay um I also uh talked about um so that's layout I talked about if you have a hairball and you want to filter um so one thing that you can do if you have too many nodes and edges is you can use this little search box at the top to just type um search terms and it gives you an example search query so you can use a star for anything so if I want all the genes that start with y it's everything because these are yeast genes let's try r um so here is um uh here are a um basically you know it found all the genes that start with r and it it highlights them here um and um if I if I have some highlighted and I want to zoom in on them then I can just click to zoom selected region and it will give me um give me those right away um okay more um so usually if you want to cut a bunch of nodes out from a larger network you might um find that you've selected some nodes and then you can um you can you can create a new network based on those selected nodes and this allows you to zoom in on that network and you can relay it out and you can now it's sort of more uh manageable as a smaller network that you can play with and the old network is up here um so you can go back and forth um yeah the the the normal layout as I mentioned doesn't consider the thickness or anything it's just positioning the nodes but there are if you do have some um value on the edge um and I'll just look on on on here there is sort of a value here that was calculated and I'll talk about these what you know all these attributes in a in a sec um then you can use uh the edge weighted layout and um on that attribute so in this case it's called edge betweenness um so if I do that then um um now what happened here it didn't look like it worked very well um the network was like made really small so that didn't really work I think that the problem is is that um these numbers are are too big um and I need to change some settings here which I didn't set up for but um but I would normally these these settings are set to have to expect edges edge weights that are between zero and one and I think these numbers are zero to 20,000 or something like that so it's not um working very well um but um yeah so I don't think I'm going to be able to fix this quickly I'll see if I can fix it later but um uh in general it's it is possible to to use that kind of some edge weight to to to lay the thing out um we we only use that occasionally sometimes it's quite useful um if you have a project I don't know if you're working with co-expression networks like correlation it's actually very useful but most other networks it's um any kind of correlation network it's very useful most other networks it's it's not that important yes yeah yeah yeah and you can sometimes those network weights will come with the network from wherever you're getting it from okay so um uh I think you guys notice that um when you click on a node um or select when you select a node or an edges then there are um attributes at the bottom here that show you the type of data that's loaded up um I will try to load up some more data to show you how that works but there are attributes for nodes edges and the network we rarely use attributes for the network but nodes nodes we use all the time in this case we have names we have a bunch of values that were computed uh based on the structure of this network um that relate to how highly connected nodes are and there's also gene expression data that was loaded up uh from an experiment that had three different um uh experiments that had um three different knockout conditions in yeast and the expression ratio and the p value associated with that are all loaded up so there's three experiments six columns are related to that um and um and that's it that's that's loaded up here um I could also load up additional data so one thing that I can load up is um is uh um gene ontology information so if I go to file import and I select yeast and let me try the slim ontology the the yeast slim ontology and I will import that and I I don't know how quickly that will take on this network connection um okay there's a problem here I think it loaded it all up so um so now I have um now I have uh just one one node here um I have a lot more information here um molecular function um the evidence code uh a reference biological process um and um what didn't get loaded up here is names for these um which so I think there was some issue with loading up this this network so maybe I'll try another network to see how that that loads up um oh no maybe it oh it looked yeah okay put it in somewhere else so sorry it's actually here um there is a column here that um shows you that all the different go term names vacuum or organization um translational initiation so now that I've loaded up loaded that up I can start searching for you know translation here and um it will find all the um it find it highlights all the nodes that are uh have translation attached them in some way um okay this is like going to show next just turn that off um I think you guys noticed these abilities to zoom around and move this around to to move the network around um okay so the the next thing I um a couple of other things I wanted to show you with set escape are um uh one okay let's let's do this one next so we'll just zoom in here um you can right click on nodes and edges and and a lot of um a lot of information is is a lot of functionality is built into these right click menu so you can edit the network you can add edges you can select the first neighbors um you can group the nodes and if you group them they will um let's give it a group one now these nodes are grouped if there's some interesting function and I wanted to reduce those I can go to this group menu and I can say collapse groups and then they they collapse into a small a small group so this this is now a group it's not a protein so it has it has proteins inside I can I can um uh I can ungroup the the sorry I can expand the group and then it goes back to its its original nodes um and uh so that's sometimes useful for grouping grouping things we don't have a really good way of visualizing those groups yet so that's something that will be coming later there's a little bit of uh possible grouping that could occur um if you have another network that you want to visualize as um on this and so if I add what I did was I added nested network and so now when you zoom in here there's actually a little network in here so that's an interesting function if you want to build up networks hierarchically um and um but but most of the time you probably won't be doing that um okay so um let's uh get to what I wanted to show you so that's nodes you can do the same thing if you click on edges you'll get a different set of menu items um uh there's a when you start side escape there's nothing if you don't have any apps installed there's nothing in this apps menu but if you add apps some of those apps could add functionality into these menus and then if you right click on the the um just the background there's some interesting functionality for that we call annotations so you can add a shape annotation an image annotation or text or bounded text so bounded text um I wanted to be a rectangle using this font and um so this is what it will kind of look like and maybe let's make the font bigger um and we'll bold it um and now I can I can put it here and now I can um um there's something that you're doing with me using it but um so the oh yeah here's the the text that I want to add so I can type in like um cell cycle um and then um change my change my settings and um and then I can put it where I want it to to be um and it's not the best user interface but that's why I kind of prefer W Illustrator right now I think in the future um this um annotation system will um will you know so I can I can delete it but it's not easy to move it around and it doesn't look as good as it would if it was in Adobe Illustrator and so while you can you can do that for sure it's there and you can save it and you know that that's definitely useful um it's not the always the best looking so I can delete it and delete this one as well um um but there's other things that I can add like um I can add a shape annotation um like a custom shape or a rounded rectangle and I'll just add that around here so if I'm if I'm making a network I can put nodes into a into a rectangle that rectangle will just sit there okay so um sometimes that's that's useful um okay um okay the selection menu allows you to select all the nodes and edges um select nodes that are first neighbors of selected nodes um you can if you have a gene list um you can select all the nodes that are in that gene list if you have them in a file so from id list file basically what what that will allow you to do is put all of the identifiers of your genes in a file and then um you if you have a network loaded like all the the whole human network you can say select select all the genes that are in this listed in this file we'll select them and then you can do things with them you can move them to your network or you can um you can just lay those out lay those out um okay um okay so getting into more interesting things you met there was a question about why these these edges are are um thick here um the uh style this little window here for style is um allows you to set up visual styles how many people played with this already visual styles a couple of people so um visual styles are one of the most powerful features of of site escape um they um allow you to make all the visual magic that we've seen in network visualization happen so there's just a few concepts here one is uh there's a whole list of visual properties like I mentioned like the color of nodes um so right now nodes are colored by some value I happen to know that they're colored by gene expression let me um change it to something default so this is this is probably how the network would look if I just loaded it up by default and then um I have all these attributes here and I want to color things by attributes so one way I can do it is I can just select a bunch of nodes and say okay I want these nodes to be red that's often the first thing that people try it's not the recommended way of doing things by inside escape and I'll tell you why in a sec but if it's sometimes useful and um you can do that by saying okay I'm selecting these nodes and I go to fill color and I and I click this little box here and I want them all to be red okay so now I unclick them and they're all red um so what I did there was I set up what's called a bypass so there are three columns here there's the default which is um in this case this I'll just change the default to another color so you see that all the all the nodes um change color when I change the default if they're not if there's no setting on them the red nodes that I set are um are I set them to be red so I bypass any setting that that's automatic including default and the um um and and so now they're they're kind of they'll stay red the problem with bypass is that they're not it's not dependent on the data and the powerful aspect of network visualization is visualizing data so um let's get rid of the bypass um I just click here and remove bypass and now they're all default so now I can use this middle column which is a map so if I click on this I will get two options a column um which I'm going to choose gene expression values um so I'm going to choose gal one rg x which is I know is gene expression values um their ratios uh uh log two ratios so they range from negative sum number to positive sum number with zero meaning no change in expression in experiment versus control and positive meaning the experiments higher the control negative meaning experiments lower the control um and I'm going to I'm going to choose a mapping type so there are three different mapping types continuous discrete and pass through um what I want here is continuous mapping um a continuous mapping maps a continuous number like a range you know a range of numbers to a continuous um a continuous visual attribute like color um and um a discrete mapping would be you know taking gene ontology terms and labeling them by color there's a term it gets a color um so um if I click on this if I double click on this I get an editor for this and I can say okay I want this um uh things that are under expressed to be red and things that are overexpressed to be blue and everything in the middle is going to be let's add another point here everything in the middle is going to be white so I'm going to set that to zero and if I can double click um I can click on this on this this um number and I can choose a number here so I can just say zero um and so now that zero is white so let's do okay and as you can see as I'm changing this the um you know the the network is changing automatically um so that's kind of useful okay so um so this this note is like below my threshold so it gets colored black and I think there's some notes that might get colored white so um uh so that's how to so there's a wide range of of visual properties here um if you click this little properties button you can see only certain ones are checked so those are the ones that are active there's a whole bunch of other ones that are um you know possible to use like um but but all the the uh most commonly used ones are checked by default so you can but there are additional ones so you can play with the um border the width of the border um the label the label color the label font the label size and the shape of the node um so maybe I don't like these things I want all my nodes to be uh wreck I want all my nodes to be octagons so now they're now they're all octagon so um the um size I can change the size etc so this is useful sometimes the if you want the labels to all be in the the box um so you might say that you want the shape to be a rectangle um and the size to be um slightly bigger let's try 50 and that's not big enough so let's try 60 so now the labels are are in the box um there's no unfortunately automatic way of sizing things according to the labels like there should be but um there is and and by default if you want to just get a rectangle you could unclick this lock node width and height and now you have um the height and the width are separate not just the size of the node but the height can be can be smaller and it'll be smaller here um okay let's see what else transparency is kind of nice sometimes um for edges where you have weights um which I keep on mentioning but um okay so one interesting thing let me try um just one additional thing that's kind of hidden that is sometimes useful if I um change this to a discrete mapping um so now and let me choose a um okay let me choose a biological process evidence code so now I have genontology evidence codes here and if I want to color them color nodes by evidence codes then um I can set up colors here so usually you just pick a color um but if you want just a whole bunch of colors you can right click and click um a mapping value generator and just say rainbow and it will just it will give you a whole bunch of colors automatically um and now my network is rainbow colored for whatever use that is in this case so usually um it would be useful if you have some categorical data that you want to color each thing as a separate color um okay so I showed all these visual properties you have the same things for net for edges um edge um width you can set to the um you know to be a continuous a continuous value based on edge betweenness and let's let's just try it and now I have um you can see the thickness of the width of the edges is is modified according to this edge betweenness um which I can talk about exactly what that means yeah um so the curvy lines are called um edge bends and they were um some of these layout um layout um okay so there's a couple of options in the layout um menu that kind of do the curvy lines so one of them is bundle edges and this might be a little bit buggy so I in this version because I know there was a bug here so let me um bundle all the edges um and um so now they you know what it tries to do is kind of put the edges together a little bit if they're similar so it doesn't really do much in this network but here's an example where these two edges were kind of close to each other and then they they got bundled so they'll they'll overlap in this in this part but because they're both going into the same node it's not losing information but it's saving you um space basically so um now these now these edges are are um have uh bends to them so I can clear that the edge bends and and they disappear I can also um if I have more than one edge between a node let's see if I can just quickly set that up here um so I can add an edge to the node um okay so let me select two nodes and I can add an edge connecting selected nodes um what happened there so I don't know if an edge actually got created let's see if I can try that again so now I've created an edge the second edge usually you don't have two edges between nodes sometimes you can um you can have multiple different interaction types represent those different edges and in this case the second all the first edge is always straight and the second one is a is bent um by default you can also right click on these things and um there is some um I think um there is a a way of if you press on the Mac it's alt and you click on an edge then it adds a little a little control point you can add as many of them as you want and so you can actually like drag the edges around and so that might help you a little bit um so it's the alt option key on the Mac and I think it's control on windows um it's alt on windows as well okay um so those things can be um removed as well um I with oops um clear all edge bends that clear all edge bends clears everything but there's actually a key to remove them and I just I never use that feature so I forget what it is now um does anybody know TAs so um I think it's just clicking alt again on the same node gets rid of it okay um option sorry uh yeah the the clearing clearing all edge bends will get rid of everything get rid of all those bends yeah what kind of sets are you so yes I think the answer is yes but what kind of sets are you talking about yeah if you have an attribute called groups and you say all these nodes are part of group one and all these nodes are part of group two group one can get diamonds and group two can get triangles so that's very easy to set up so that you would set up you'd have an attribute with that information and then you would set up a node shape mapper that maps a discrete it's a discrete mapper so it it's dealing with discrete or categorical data um and you just say if it's group one it's a diamond if it's group two it's a a triangle okay any other questions okay so I should get going there's a couple just a couple I think there's just one more thing that I two more things that I wanted to mention uh filters um allows you to um uh filters allows you to um do more more complicated queries on the on the network um you can um any of your um your attributes here you can you can um select so this is node degree node degree is the number of connections per node and in this case I'm just I'm just selecting say the highest degree nodes so let me select the nodes that have more than five or five or more connections and then I can also select nodes that have five or more connections and that are also um overexpressed um let's see in this experiment so they're they're um they're overexpressed and if I want to do or and I can choose that here um there's also in general these filters are automatically applied um you can create new filters and rename them and import them and export them um there's this little button at the bottom here called chain which is not very useful at this point but it allows you to chain filters together so you could make more complicated filters um the um uh a couple of other things I didn't mention the app manager um the app manager allows you to install apps so you can go in and see you know um look at apps look at all the apps here um and uh and install them probably what's more useful is if you go to the app store and um if you're browsing around the app store like one of the things that we um just released is word cloud and I want to install word cloud um I can just click this button here and it will it will install it for me into side escape so um now it's it's here in my side escape window so by going on the web you just click and install so that would be a much easier way of just installing apps so um the um the other thing I didn't really talk about was importing data so just just one thing to mention one last thing to mention is um I loaded up a session that has a lot of information um loaded up um but if you don't have a session already created you uh need to load it in somehow so you either load it in from a database like reactom fi you download an app you give it a gene list it gives you some networks that's probably mostly the way that you probably do it if you had network information you want to load up you usually have a file um and um there's a whole bunch of sample files here like um galfilter.xls which is a spreadsheet um an excel spreadsheet and then I get this this um dialog box that opens up that says um okay what do you want to do here um I'm selecting column one as the source and column two is the target so now these two columns are highlighted these are my interactions um all these additional columns I might want to load up um I can load them up as um you know additional attributes that will be edge attributes in this case because these are in each row here is an interaction and these these two things to find the nodes that are part of that interaction or participating in the interaction and the rest are attributes of the edge so if I load that up I will get a network that and this is this will be the default and set escape you you um um get a network that's not laid out um so the first thing you do is is laid out and that's obviously much more much more useful notice that when um I zoom out after a certain point the labels disappear this is an optimization to basically not take up too much computer resources while you're looking at really big networks um but if you want to have um and and so their little details of the visualization are turned off as you zoom out um if you want to have them always on you can go to uh view and say show graphics details and now whenever I zoom out there the labels are always be there it doesn't really matter too much with this network because it's kind of small if I had a really big network that might be an issue okay um any other questions there's lots of other features um you know everything that you see here usually there's a right click menu and you can do um various different things like I can delete these networks um I can rename them um there's lots of lots of options any questions general questions about side escape okay yeah no this network is a old protein interaction network that was published in science in 2001 um that was related to this yeast glucose metabolism how it was generated by yeast to hybrid protein interactions and maybe it also contained additional protein interactions from other sources so it was experimentally defined well reactome doesn't give you the full change um so usually usually then when you get a network loaded in you'll just get the connections and you may get some attributes on the nodes and maybe a edge weight like an edge like interaction strength or confidence um but um and so you have to load in all of your attributes yourself so normally what you do is you have your network and if you want to load in attributes you'd set those up in a excel file maybe you download that from bio mart like we saw yesterday and and you progressively load up data you apply the data to the network and it's all saved inside escape and then when you save your your sessions I didn't mention like just saving saves a session it saves everything that you've done um if I have nodes selected I have filters defined everything gets saved in your session you can take that to another computer and load it back up and everything that you've done will be the same um so that session just saves all of that information that you're loading up once it's loaded in then you can you can work with it you can do visual mappings um you can filter based on it you can analyze data based on it there there um the betweenness and other things actually there's a couple of tools that are kind of here um by default so one of them is called network analyzer so you can click analyze network and what it will do is um you can say treat the network as directed or undirected so undirected means um there's no direction of the connections of the interactions and then I click okay and it gives me a bunch of statistics about the network like clustering coefficient connected components no degree distribution average clustering coefficient all these graphs uh shortest path length distribution shared neighbor distribution betweenness centrality so um I don't find these super useful for biological insight but there's a lot of um there's a couple of reasons why you'd want to do this so um there's an idea that if you have a really big network and uh certain nodes are central in that network that they're more important and people have shown that by search looking at how protein interaction networks correlate with essentiality and the the proteins that are most highly connected in the network are correlated quite strongly with essentiality so the more highly connected they are the more likely they are to be essential genes in the system um and so one thing that you can do when you have a network is look for important nodes um you can have these this tool calculates a whole range of different values of importance um centrality there sometimes they're called centrality measures the simplest one is degree how many connections the node has the more connections the more important it is another one is how many shortest paths go through the node and that's called the betweenness centrality so how important is this node like if you took this node out the network would like if you took all the nodes that had high had a lot of paths going through them the network would fall apart so that they're important for keeping the structure of the network so sometimes those are useful statistics anyone has any questions about any of that stuff or or just let me know any other questions yeah so that is the confusing part of kind of teaching cytoscape in this course and how we've kind of structured it so the course is all about gene lists and as I said cytoscape when you use it and every document that you look is all about networks so um normally you would if you didn't have this course and you learn cytoscape and you want to learn it the first thing that you need to think about is getting network data into it into the into the program and so you would have some networks that you've got from somewhere um which may be difficult for you to get but there are lots of resources available for downloading networks but um you'd have that that network data that you collect from somewhere and then you load it in the welcome screen of cytoscape kind of tries to help by having single buttons that download some standard data sets they may not be the ones that you want um so in the context of this course where we're starting from gene list there are two apps that help you load up a gene list and get a network view that you can analyze in cytoscape one is reactom fi and the other one is gene mania which we'll talk about tomorrow so that's why we're talking about those yeah yeah so you can load up cytoscape and then immediately start reactom fi just like you guys did put in your gene list and then you're good and you have your whole analysis pipeline that's done okay but if you end up getting to use the reason why we're talking about cytoscape more generally is because they're all these other apps and it's it's a really powerful tool because of all these for this functionality so just learning about it in general is useful for you later to go off and well it's useful now during this course to actually just use it for the few workflows that we're doing but also later exploring the app store is a really interesting thing to do if you're interested in learning about different types of network analysis that are out there that we don't we would never be able to cover and take weeks and weeks to cover everything so would you recommend one of those two tools then to get the number first rather than taking one of the species getting them yeah so i guess reactom fi only supports which organisms human only human right so so that's the limitation of a reactom fi the gene mania supports i think the next release will have the coli in it but otherwise it's all eukaryotic systems and seven or eight organisms whereas cytoscape is general so if you have other organisms that you could get that network from from somewhere you can you can serve the whole thing and then maybe some yeah yeah there's different types of analyses but say you could run the same type of reactom fi system to identify modules there is a plug an app inside escape called cluster maker that can do the clustering for you and there's an app called bingo that which i'll mention in a in a second a number of these that can do the enrichment the pathway enrichment analysis on any any set of genes inside escape so you could recreate if even though reactom fi is really kind of focused on human if you're working on c l against for instance or anything else you could recreate that workflow using other apps inside escape load up the c l against network selects your genes get the the subnetwork that's just related to your genes and maybe it's its neighbors and and then use cluster maker to identify the modules and then take those modules and use bingo to do the pathway enrichment analysis so that would be one workflow that would similar to the reactom fi workflow any other questions okay so um let me just skip forward because these are all just backup slides so this this is the the workflow that we used last year that i updated in the yesterday morning but it's it's kind of more side escape oriented but again it's getting you this this it's answering the question about gene list versus network so gene mania reactom fi are useful for getting networks there's actually other ones agile search string i ref web um you have different types of networks that you might be working with they're all they all have one thing in common you're visualizing them inside escape is useful for all types of networks and then there's different types of analysis that you might do like the pathway enrichment analysis regulatory network analysis gene function prediction which we'll talk about tomorrow in module detection which is what you learned about with reactom fi and here are the names of the apps that are available for doing these different different things so this cluster maker i mentioned for module detection and bingo for pathway enrichment analysis in the rest of this this slide deck there are a bunch of there are a bunch of apps that are mentioned some of them are old apps that i probably should update but they're still useful so that's one good thing to mention is that not every app that's ever being published for cytoscape is currently available for cytoscape 3 um for cytoscape 3 is fairly new so apps are still being ported over but um so if there's an app that you or a plugin that you like you can always load an older version of cytoscape if it and it only works on an old version of cytoscape you can always go back to that old version of cytoscape and it works roughly the same as the current one and that's totally valid um there's a really interesting app uh adjunct literature search which if you don't have a lot of information known from database or that's present in databases about your genes you can type them into adjunct literature search and it will go to pubmed and try to extract relationships automatically from pubmed using text mining so um this uh this is another way to convert a gene list into a network um it might not work with really big gene lists but you can type in a bunch of genes and it will go to pubmed it will find all the abstracts related to those genes and then extract interactions um based on text mining so we'll look for sentences that says say things like a interacts with b and then we'll draw a connection between a and b and then you can you can actually click on the um edges that it creates and it tells you what sentence it came from and if you don't like the sentence you can delete it um but it's a it might be a it's a good way of kind of getting a network that you might not be able to get easily from other sources um there's a an app called netmatch that finds little feedback loops and things like that um there uh at the end of this um is cytoscape tips and tricks um a couple of these things I didn't mention but you can read through these I don't really need to go over them um I mentioned some of them um but this is uh this is you know just a couple of things that might useful if you end up using cytoscape a lot um okay so I think we're still going till three right Michelle yeah okay so um any questions any more questions about cytoscape okay so what we're going to do for the rest rest of the day is um talk about uh enrichment maps and we'll see how much I can get through before the break uh and then in the afternoon session we'll have a lab that is using enrichment maps maker to make basically making your own enrichment maps that uh a very equal lead okay so we learned yesterday about uh what I call pathway enrichment test and more generally people call gene set enrichment test um and you know the the general idea is that you have uh your experiment and your pathways and your your you want to find which pathways are enriched in in the experiment so that's great it's used I think I counted just I took a few papers that have popular tools that have published a paper like david and gsa and a few others and just counted the the citations for them and I think just a few papers I counted 20 000 citations so there's a huge number of papers using this method um it's very useful however one of the problems that you may have noticed um when you're doing this is that there's a lot of redundancy in these in these terms so if I look at this list um taxes chemo taxes okay those are related I can see they both say taxes but I have to really know a lot about biology to see that um some of these things are you know there's a bunch of immunity things but myeloid cell differentiation is probably immunity um uh lymphocyte okay isn't that I guess you know there's some of these things are are really related to immunity but they might not say immunity inflammatory response um uh leukocyte activation um so there's actually a whole bunch of terms here related to immunity they're all spread out all over this list it would be really nice if I could put those all together automatically so um so one way of so so this is a list and um it's not really the best way of visualizing this because there's relationships between these terms so what what what would we what would we do if we saw this we had a list with that there's relationships what would we want to do visualize as a network exactly so if we have relationships that we want to visualize we want to look at so it's hard to see the relationships in this in this table so we um have developed a a visualization technique called enrichment map there's actually a couple of other similar similar ideas out there um that uh shows that converts this this data into kind of a network and this is what I showed for the autism paper um so the way this works um what we typically do uh we do a lot of gene expression analysis Bernice does a huge amount um we take the experiment our ranked gene lists we run GSEA which you learned about yesterday in this case we don't just get enriched pathways we get pathways that are enriched in the genes going up and pathways that are enriched in the genes going down and so we'll color those red going up and blue going down um and then we we the enrichment map is a network where each node represents a pathway and the edge represents the um basically crosstalk between the pathways so you have two pathways and they they have some crosstalk crosstalk is measured based on shared genes so spindle and cell cycle share a lot of genes and they have a lot of genes in common um and the you know these two pathways have fewer genes in common so the thickness of the of the edge of the interaction proportional to the number of genes the number of the amount of crosstalk of these pathways also the size of the nodes um which is not shown here effectively um is proportional to the number of genes in the pathway and the color of the nodes uh is proportional to the enrichment score that we looked at yesterday so um that might be the normalized p-value um or some other score that comes out of GSEA or G profiler okay so there are three major sort of use cases of the enrichment map that we that we think about right now um one is um you have you want to visualize a a single enrichment pathway enrichment analysis result so here I'm taking we took some um breast cancer some gene expression data that was published for breast cancer cells uh that were treated with estrogen and um there were three replicates treated versus untreated all the processing was done to identify differentially expressed genes we used GSEA with a gene ontology database to find our enriched pathways and then um and then we made an enrichment map with the site escape enrichment map app so the enrichment map takes the results of GSEA and draws this this this nice thing except for two things one it doesn't currently draw uh it doesn't currently make the the nice labels here and doesn't currently make the nice bubbles here so everything else other than labels and the bubbles are done automatically in GSE in enrichment map um all the coloring and the construction of the network etc um and um and then we usually use power points or something like that to overlay to just quickly annotate this um in the future we hope to automate that here's a zoom in so you can see that um that each of these circles here is a different go term biological process so it's it's basically a different pathway so I should quickly mention that this is useful for getting a very quick overview of the functional themes that are coming out of your enrichment analysis so there are a few functional themes and there these ones are going up and these ones are going down so very quickly instead of looking at this table and seeing how everything relates and basically a few seconds I can just look at this and get a very quick sense of what's going on and obviously you have to do some more work to interpret it but again thinking back to our our workflow where we identify interesting pathways and then drill down or zoom in this is very useful for identifying interesting pathways because you could say oh I know all of this oops Peter I like when it does that um I um I know all of these things um but oh this is interesting maybe this is a new uh a new um pathway that you didn't really expect to find and so that might be something to follow up on so maybe you could go zoom in on those genes in there um okay enrichment map is also useful for comparison of two enrichments uh so say you have two time points here we have an early time point and a late time point so we're interested in seeing how gene expression is changing over time and um now we can make an enrichment map that's basically two enrichments over on top of each other so simple visualization technique include the early time point um nodes are colored the the score for the for the enrichment at the early time point is the color in the center of the node and the score for the enrichment at the late time point is the the score at the borders of the node so um the um so if i'm thinking about two different time points i might be interested in seeing a change between the two time points so um most of these these pathways are not changing they're red in the middle and they're red on the border so or blue in the middle blue in the border so there's no big changes but there's a few places here like for instance here there is a um um if you can zoom in here there's uh pathways that are enriched early but then are not enriched late and and the opposite pattern here so we can zoom in on those and an an enrichment map if you happen to have gene expression data and you load it up you can click on these nodes and you can see a heat map view like this um it won't have these treated untreated but it will show you the actual heat map and you can see that um indeed these pathways the genes in these pathways are really different differential at 12 hours but at 24 hours they're they're similar in experiment versus control similarly these guys are again same in 12 hours different in in 24 hours so if when you look at your this is this can be really really interesting because when you cluster the whole data set you might never see these little patterns but certain pathways are really acting differently you can zoom in on those and see really important big changes just with those pathways that may have been and very likely to be hidden in a when you look at all the genes together um so the two you know um two color we call it a two color enrichment map it's um it's um you know much better than having two tables and matching up all the themes um let's see um the uh the third use case the third and final use case uh is what we call query set analysis so yesterday i was talking about how um we'd like to explain the results that we see with some mechanistic interpretation so in this case this is a kind of a trivial example because we kind of already knew the the the regulator that's being tested this is a um a paper that was published a few years ago where they knocked out a microRNA in the heart and measured gene expression so we did our our normal GSEA analysis and enrichment map visualization and found that there's a whole bunch of pathways that go up and some pathways that go down when you remove the microRNA that makes sense because a microRNA is a negative regulator and if you remove a negative regulator like a you know holding a balloon you like go to the balloon a lot of things go up and most of the things are going up so then we we um we added one more gene set here which is the set of all microRNA targets that are predicted by a microRNA target prediction system um called target scan and uh and then enrichment map shows you the overlap just like the cross talk between these pathways it shows you the overlap of this gene set to all these pathways and so you could see a really nice strong links between this and vesicle trafficking and angiogenesis so those might be you know a lot of microRNA targets predicted microRNA targets are in these pathways some other pathways don't have significant overlap so this might tell you something about how um direct the regulation is of this microRNA to these pathways no links to the pathways that are going down which makes sense again um because that's not an expected direct connection you can also do um more and this is in the integrated assignment this this more of this type of um analysis so if you didn't know the transcription factor that might explain why your pathways are going up and going down you could try to search for it using a possum um and you guys talked about things like that yesterday um and um in this case this is a and just an example uh we're very unique to the analysis with with shahina in in our group and uh found a um a transcription factor that seemed to be important um hip one alpha um in this particular experiment so we had an enrichment map that we um showed the targets of the transcription factors are highlighted and so all these pathways here are seem to be regulated by this transcription factor presumably if we looked at these other transcription factors they made highlight different parts of this network um I don't think we I don't know if we ever ever did that did you did you look at the other transcription factors when when you did this analysis and how they work with this no yeah so so I think that because it turned out that this transcription factor was the really the the important the one most interesting one um but presumably other transcription factors that are explaining the data might explain different parts of this pathway map so I really like that that idea because um it starts helping you interpret these maps so even though these maps are giving you nice visualization of the results um that is very clear it doesn't do the interpretation for you um and you never get away from doing the interpret if the computer could do interpretation for you it could do your homework it could drive your car eventually it's going to be able to drive your car but you know it's not it turns out interpretation of scientific data is a lot more difficult than driving car so um the um so that we're not at that stage yet so you really have to do you you know these tools are really meant to help um your interpretation they're meant to develop a number of hypotheses that can be tested and and and that obviously that's useful um here's the enricher map that we made for autism the paper um it looks really nice because we made it we we updated an illustrator so illustrator just does a better job of of graphics um and um so yeah as you can see we um we added these bubbles we added all these these various different types of bubbles um we actually have two different types of enriched three different enrichment analyses that were done here the circles are enrichment analysis that was done on the copy number variants the triangles are enrichment analysis that was done on a list of just intellectual disability genes about a hundred of them um and we also did the same thing on known autism genes also about a hundred so we had pathways that were we're showing pathways that are enriched in known intellectual disability genes pathways that are enriched in known autism genes and these are parallelograms here um and and then pathways that are newly discovered in the enrichment of this of this copy number experiment and then you can actually put those all together and see how they're related to each other and interestingly even though the pathways that were affected by the deletions and the copy number variants um are um uh even though there are pathways that are interesting in the central nervous system they're not they're not the same pathways that were exactly enriched in the intellectual disability genes but actually they're really connected to each other with a lot of genes in common so this really helped support the the results because it connected functionally these new pathways to these existing known pathways um the other things that we did here were um custom using cytoscape visual mapping we we we we measured the um false discovery rate of the enrichment of these um as a as a color from white to red um and um yeah so you can see how kind of put a lot of things together that we that i just told you about today to make this nice figure let's zoom in um yeah so this just this is just some background on what that was so this was the pathway sources were all all of gene ontology keg nci and reactome and PFAM domains this was the number of gene sets that were tested the paper is referenced um the the enrichment map um app is um really quite nice so you'll get a chance to to play with it later um but it gives you a little slider interactive slider bars that you can slide and you can change the the cutoffs uh interactively for your fdr threshold for instance and your um your enrichment map will automatically update itself so um usually what i do when i look at this type of data is i play with these um these slider bars i make them really stringent we will go over this more in the lab they make them really stringent to see what the most the the strongest signal is in the data and then um and then that's useful which is no okay if i have anything i have this really strong signal that's here and obviously it should be a strong signal should should have a good a good fdr cutoff and then uh gradually i i i expand it and i see which functional themes pop up at different thresholds um because a functional theme what i call a functional theme is one of these modules contains many pathways like many gene ontology biological process terms um some of them will be stronger than others so i i if any of them are strong i consider that functional theme strong um so that's that's um that's useful um this is just a uh a interesting idea about um zooming in so um we have an enrichment map to summarize the data we could find a region that's really interesting and then one of these pathway nodes might be a pathway in reactome so you learn that you can go to reactome and see the the pathway so you can actually go and overlay your gene expression data on that network using reactome's website and then notice that maybe a lot of the signal is not coming from the whole pathway but maybe just a particular complex here so that's how we zoom in and drill down to more detailed information um we um we use gsa uh yes you learned about gsa yesterday we have a collaboration with gsa so enrichment map is actually going to be part of a future version of gsa so just be able to create these directly from gsa it makes it easier um we'll try this word word cloud plugin in the afternoon later later but um this is a an interesting tool that helps summarize all of the pathway information that's here so i can select a set of nodes and side escape and then i get a little word cloud you often see these on the internet to kind of summarize a lot of text words that appear frequently are drawn bigger um and so you can just get another just tool to help you quickly uh see what's going on in a given cluster from module um and this is a a word an enrichment map cookie that Ruth israelan who programmed um uh enrichment map um she got really excited about the project and so she when it was her turn to present a lab meeting she baked a she baked it into a cookie so i can tell you that and this is my my old joke that enrichment maps are not only useful they're they're they're tasty so um okay so the inertia map lab will do after the break um and the goal will be to try out enrichment map um there will be uh time to load g profiler results um gsa analysis results from the tutorial online um you can use the results from the integrated assignment which is not liver anymore that was from last year and about to update it and the um and there's there's a bunch of tutorials and ideally you could try it with your own data as well so okay any questions