 Okay, so for the last part of the day we're going to learn about sort of one part of this workflow which is visualizing and identifying interesting pathways and networks. So I think today we so far talked about you know GSEA and G Profiler those are tools that allow you to input a gene list or a rank gene list and perform pathway enrichment analysis or enrichment analysis in general. Gene set enrichment analysis and the results of those are these tables of pathways with p-values that you've seen. So what we're going to work on this afternoon is how to visualize those in a better way and you'll see what I mean when we get there but actually the first part of what I'm going to talk about is an introduction to site escape. The reason I'm talking about site escape first is that the visualization method that we're talking about which is called enrichment map that we'll talk about later uses site escape and the idea of networks and so this lecture this next little bit of a lecture is a little bit of a sidetrack into site escape and it will become more useful tomorrow when people are talking about different types of network analysis that many of which use site escape but so what I'll do today is give you a demo of site escape an introduction of site escape and network analysis algorithms in general a little bit just to provide you some context and then also answer questions about site escape but it's really just learning the basics of site escape so that we can move on to the enrichment map and then tomorrow you'll do reactome fi which is a network analysis method that uses site escape and there'll be other things basically tomorrow morning that are mostly using site escape okay so so to kind of also reduce the amount of time that we spend having to talk about this these basics of site escape we already gave out some pre-reading material like this primer primer that we wrote a little while ago had a visually interpret biological data using networks so hopefully you read that it's pretty short as a result I don't have to go through all the details of that and we also ask people to try outside escape at home and that also helps reduce the amount of time we're spent doing basic stuff here and so I'm just going to give you a quick overview of some concepts in networks and that we can discuss and then we can move on from there so networks you know I told you about this morning difference between pathways of networks you know the EGF receptor pathway could be represented as a network of connections between the the proteins generally the idea of networks is that they represent some kind of relationship between aspects of your data so they could represent physical interactions or regulatory interactions or genetic interactions like some kind of gleefuls or functional interactions that relate genes based on their functional relatedness like if two genes are co-expressed they might be part of the same processes or if they have similar sequence they might be they might have similar function so in in general there it networks are useful for discovering relationships in large data sets and it's better than looking at tables in Excel to figure out how things are related it's also useful to visualize multiple different types of data together which you might use to see interesting patterns and then there's a whole bunch of network analysis methods and network visualization methods that are very valuable in biology and that's really part of the one of the reasons why people talk about networks a lot and use networks a lot is because of all these network analysis methods so I'll talk a little bit more about that so the idea of networks has been studied for a long time in computer science and the there's a field of computer science called graph theory and computer scientists call networks graphs we don't use that term in biology because most people when you ask what's a graph they think of a plot so if I tell you graph theory you probably think the theory of scatter plots or something like that so but just so you know that field and computer science of graph theory has been working for you know more than a hundred years on math around you know networks and how they how they work and people have developed all sorts of algorithms that are available to analyze these these networks and so just as an example and so one of the powerful things that has happened maybe 10 15 years ago in biology is that when we start getting more network type of data like protein interactions people realize that you could take a lot of the methods that have been developed in computer science for a long time and math for a long time and apply them to biological questions so there might not be not everything in that field is applicable but there are very useful things from that field that you can just bring over and start using and there some of them are very powerful so just as an example a very simple example how many people have heard of this concept of six degrees of separation so so this is this idea that for people that haven't heard of it this is the idea that everybody in the world is connected to everyone else by at most six hops and it's probably a lot less now with Facebook and email but this idea came from an experiment in the 60s by somebody named Stanley Milgram who was interested in studying like psychology and other other fields and he did this experiment where he sent he wanted people to take a postcard and send it to somebody in New York from somebody in Boston would send a postcard to someone in New York but they didn't have the person's address so they had to go through their friend network and each time they sent it to someone that person was supposed to be closer to this person in New York and then eventually someone would know the person in New York personally and would actually be able to send a postcard and each time they sent the postcard they were sent they were supposed to send a postcard back to the experimenter back to Stanley Milgram and so he could track where the postcards were going and he found that on average it only took six hops before it got to the person even though the person in Boston never knew person in New York or their address so the so that's where this idea came from and the question that that raises is how are people connected how are you connected how are you know what's the shortest path that I can go through a network of people to get from one person to another and in computer science there's an algorithm for computing that so if you had all the Facebook links you could just find out how people are connected and so the computer science algorithm is called shortest path by breadth-first search and it just goes through the network and it it finds all paths and what's interesting about it is this mathematically proven to find a connection between nodes if it exists and if it exists it will be the shortest path it might be more than there might be more than one equally short path so it'll find all of them and if it doesn't if there's no connection between the node between the people it will say there's no connection and it's guaranteed mathematically to be correct and it's guaranteed to be as fast as it can be so it's nice that these people in computer science came up with this thing and then you this this method and then you can just use it to answer this question so you might be able to use in biology to find out if two proteins are connected in the cell somehow through a series of protein interactions and if so how the question then would be is that path biologically relevant maybe maybe not maybe need to consider additional information but that's just the key concept of taking some algorithm from computer science in this graph theory area and applying it to networks in biology so people have used networks for gene function prediction for identifying protein complexes and other modular structures in large networks so here's sort of a an example we have this really big network and then one of the little regions here is a protein complex that you can predict people have studied network evolution so if you have protein interaction networks that are represented from different species you can compare them and you can see how that this the pathways evolved people have tried to use this to predict new protein interactions or new types of interactions and then more recently a lot of people have been using this too for disease applications so identification of subnetworks that are sets of genes that are related to disease that are also potentially predictive of the outcome of the disease if you have these genes expressed at this level they're all connected they would that little network and associated expression data would predict outcome of particular diseases and people have also been doing this with GWAS studies for instance instead of looking at individual genes so not going to go into too many details of this because we'll talk mostly about this tomorrow but just as a very quick introduction to some of the many things that people have used this network analysis for okay so what's what's missing so the networks that we talk about don't have information about dynamics represented so everything's a static network and there are ways of representing doing mathematical simulations we don't cover them in this set in this course there's software like virtual cell that helps you do that we don't have detail about atomic structures or the cell type or developmental stages often not represented in these networks but it's they have been very useful for a number of different types of applications so the and most of these points are really kind of taken from this primer that that we assigned for pre-reading and so that's I can just review them even though I didn't talk about them specifically but one of the the issues with networks is that you might have different types of networks so the nodes don't always have to represent proteins or genes they could represent anything and we'll see an example of that later towards the end of this lab and then there are many methods available for gene list and network network analysis and so we'll talk about some recommended ones tomorrow and so that's just a very very quick summary and introduction okay so site escape as everyone I think did anyone not get a chance to install site escape on their computer everyone's got it and everyone anyone have trouble running through the tutorial if there are any issues could that come up in the lab then we can work through those with you so I'm now going to switch to a introduction and demo of set escape again usually you've seen this a little bit but often this helps just broaden the understanding of what this tool can do so set escape is free software for network visualization and analysis it has many different it does some things bit by default and then you can add apps that add additional functionality it's developed originally at the Institute for Systems Biology in Seattle and then the people that were there Trey Idyker and Ben Oshukowski went on to other places and they've kind of taken site escape with them I'm involved in site escape development I started after when I met them originally 2002 and and then since then a number of other groups have kind of joined so it's an open-source free software development project the basic idea of site escape as it allows you to visualize networks manipulate them you can filter and query them there's automatic layout and there's ways of getting data into site escape your gene list can be converted to a network and then visualized in site escape and again we'll see that specific things tomorrow site escape has an app store so there are hundreds of apps I guess they're over 200 apps now that provide different functionality that you could you could look at there are other tools out there for network analysis that site escape is the most popular one it has the most active community so there are thousands and thousands of people using it and there's over 8000 downloads per month and so it is actually the standard network analysis tool in biology and other fields use it as well so the good thing about that is that there people are investing effort into developing more functionality into it and so and there's also a lot of help documentation data sets mailing lists tutorials and annual conference etc and these apps that that people are developing you can build your own apps you have to know the Java programming language right now in the future you probably you'll be able to write apps in our or other languages so this requires knowing programming or knowing someone who knows how to program so if you want to develop your own added functionality you can and the mailing lists are very useful if you ever start using this software and you have questions about it you can email the mailing lists and the and you're pretty much guaranteed to get an answer within a week so every Thursday we have a conference call to make sure all the questions are answered but they're usually answered faster than that so there is there are people waiting to answer questions on it and this is just a fun picture from a conference that we had in Toronto a few years ago where people spelled out site escape who are at this conference okay so the take-home message is that site escape is a useful three software tool for network visualization and analysis the simple software that you download provides basic information for visualizing and manipulating networks and filtering them almost all the power comes from apps that you download and and install so okay so I'm gonna give a demo of site escape the rest of the slides are in a few different categories the next bunch is just a bunch of slides that are provide screenshots so that you can remember what I showed because when I move into the demo you I won't be in PowerPoint anymore and and then there's some I'll talk about other slides after afterwards okay so this Camtasia is still working okay so so site escape when you start it up you get this welcome screen and you can load in data from a couple different different places so one thing that I'm going to do is open a network file here and I'm going to go to my site escape directory where I installed it and I'm going to load up some sample data and I always forget which one I like to use this one click okay okay so this is a sort of sample network that I've loaded in here it already has a bunch of visual styles associated with it and let me just kind of change a little bit here okay so I've loaded in some data I didn't really tell you how the data was loaded in but I'll talk about that in a bit I just wanted to show you so I'm zooming in I can move nodes around I can select a bunch of different nodes and these you know the circles are called nodes they don't have to be circles and the lines are called edges in typical graph theory language I can select some edges here and I can go to the view menu and or layout menu and I can click rotate and I can rotate these around or scale them so if they're really bunched up sometimes I can just use the scale feature to expand them so that's just a quick demo of some layout functionality usually what happens when you load up data inside escape you'll see a layout that might look like this this is not very useful it's just sort of a default grid layout so all the nodes are just arranged in a grid so the first thing that you usually want to do is go to the layout menu and apply some layouts so one thing that you can sorry I'm just my the Mac operating system now has this full screen mode that I really don't like because one of the things it's done is it's gotten rid of the toolbar that is pretty important and it just disappeared so somehow I need to figure out how to get this working without losing it okay that's right okay so I'm so you can click buttons here to load and save and and pull in information from places you can zoom with these guys if I just select a little region here I can zoom in just to that region these are all yeast gene symbols in the sample data and then this button here lays out the network with a default layout so algorithm so this these layout algorithms as we explain the primer they work to reduce the overlap of nodes and reduce the crossings of edges so it just kind of pulls the network apart a little bit more so that you can see it and there are a whole bunch of different ones so during the lab at some point you can try different ones one that I like is called why files organic and actually looks pretty pretty similar to the one that I just used but you can also have other types of algorithm layout algorithms like this is a circular layout that's in the work know why try this one okay so this is this tries to lay it out as a hierarchy it doesn't really work well for this network because it's not really a graph or a hierarchy but it's not like a tree but if you had a gene ontology loaded in here you could load it in you could or a file genetic tree for instance you could that might be useful okay so let's just go back to this that's just a demo of some layout so normally what you would do is load in data I haven't really told you too much about how to load in data so I'll do that in a bit but I just wanted to go through all these features fairly quickly so you can select networks here and you can put those networks in a new a new network so I'm going to make a new network from selected nodes all edges and that gives me a new network so that's useful for kind of zooming in on a region of interest so one thing that I might do is I have some data here that is I'm just gonna zoom in here and click on some of these nodes and to show you this the data that's automatically loaded up in this network so when I click on a node I can see at the bottom here in the node table the node that I've clicked and whatever additional data is loaded up so the data that's loaded up in the sample file is gene expression data and it also happens to be visualized as colors on the node so red is I think over expressed and green is under expressed and I can let me select a few nodes here and so you can see at the bottom here there are a few different types of expression data that's loaded up here and these are log full changes in the expression data columns okay so one of the things that I could do is go to the select option here and I can create a filter that so default filter so I just click this plus button and I'm going to select gene expression that is all the negative full changes so now all of the negative full change nodes are selected I can change this so it's really only selecting the release tension negative full change ones and then I can if I want zoom out and see where those are so the I've selected a bunch of nodes in different places here if I wanted to move those to a new network I could move them to you know you know work they're not very connected I can just see how connected they are so in general they're not connected so none of the none of the genes that were low in full change are connected just close that and let's look at the positive ones so none of these are connected either so oh it's because I got a new network from selected nodes all edges so some of these are connected so you know I could you can slice and dice the network however you want and just so you you know about that this demo is not really explaining how to use site escape to answer specific questions it's really just to give you a quick overview some of which you've already seen but others you haven't so the filters can be made arbitrarily complex arbitrarily complex and you can chain them together and whatever data you have associated with the network you can you can load up and filter on let's see the other important thing that's sort of the most useful thing in site escape probably is visualizing data on the network itself so I already have data visualized here that is showing different nodes is having different colors and the different edges of having different colors so I'll tell you how that works so let me just reset things so let's try minimal there's something going on with this that's out of sync but I want to try opening up a new session okay sorry I opened up another session file which is I think will allow me to do what I want which is change the colors here and show you how that works so so I showed you that I have gene expression data associated with these nodes and you can see that here actually this has a lot of additional data associated with it this session file shows an interesting visualization which is charts so you can put it you can put charts attached to your node so you have data you can kind of show it over time and there's a way of adding charts here but let me just actually clear this yeah so this is sort of a simple simple style that we can get started with so you might your network when you load it in might look something very simple like this so I'm going to change the fill color so that it colors nodes based on how connected they are and there's it happens to be a column in here called degree which is the number of edges that are connecting to a node and I'm going to select continuous mapping and I'm going to double click on that and then I can change these colors here so I'm going to select this to be light blue maybe and make this red so now the more connected a node is the the more red it is okay so some of these things don't have I'm just gonna select this click this triangle as well and make this light blue as well so now you can kind of zoom out and see what the most highly connected nodes are they're going to be red so and I forgot to show this little little window here allows you to move around in the network so this what I've just shown is how to use set escape to take data that you have in your little spreadsheet associated with nodes and visualize it in this case I took the this column here called degree and I can click this little button here to float the window you don't have to follow it along all this just it's just really quick to go over and show you some of the things that it can do so I'm looking at this spreadsheet of all of the data that's associated with all the nodes in the network and one of the columns that I just happened to have in this network is degree I can click on these things to sort them just like you might be able to do an excel I just click on that the headings and they get sorted if I click again it sorts one way or the other way and so I can see that the highest connected node is this this node and that should be basically colored as red as it can be colored so I accidentally closed the the table panel so I'm gonna go to view and show the table panel again so there comes back and as I select nodes select nodes here the selection color is kind of red so it's hard to see it with that kind let's see so as I select these nodes I actually I picked red which is the same as the current selection color so let's try and select a blue node I can select nodes from this table as well if I if I do this the color is not changing as I expected but it's okay any questions so far okay so the idea of this style as I was trying to explain is that there's some data associated with the with this this information so it could be gene expression data or whatever other data you have that you might have loaded in and the these visual styles allow you to map this data to some visual property so I can color the nodes based on a gradient according to how connected they are and let's try another one here I can change the border so that it's colored according to some other gene expression data I'm just going to cancel it because I can't see the borders here because I think the border is set to zero so I'm just going to click this to set it to five and now the border is thicker in general so that's 10 okay so these borders are getting thicker and I'm going to I'm just going to set different points here that is kind of explained in the tutorial but I'm setting this middle point to be zero and this point is the high end so I'm just going to set this up to be red for over expressed blue for under expressed in the border when I click okay I should have borders that are potentially different than the centers so here's a here's a border that is red so this means that the gene expression is high in the particular gene expression experiments that I selected and the border is sorry so the border is red and the center is blue so that I had set the the node fill to be the node color to be based on this degree so I'm actually just going to change it to use another gene expression value so the one I used for the border was gal 1rg and this one is gal 4rg which is two different transcription factors and I'm going to set that up to be similar to the other one here so make this dark blue okay so now I've selected the node center to be colored according to one expression experiment and the border to be colored by another expression experiment and now I can look around in this network and I can see I'm going to look for differences in these different expression values so here are like here's a node that is slightly down in one experiment slightly up in the other experiment but these ones these proteins are or these genes are acting in the same way in both because they're both have a red center and red border so you can add additional information into the visualization in a similar way so you could change the label color I can change the shape there's lots of different shapes to work with I can make custom shapes you can change the transparency and the size of the node so that size is another another way of adding information and so this is what I meant by the network view allows you to overlay lots of different types of information on the on the on the to visualize a lot of different types of information together so the type of information that's being shown here right now there there are a few different types so one is the these lines are representing protein interactions and also protein DNA interactions in this case I didn't show you that but if you look at the edge table and you select some some time okay if you select the edge table and you look at the different interaction types here you can see that there's different types and you can change the colors of them in the style I can just click here edges and change the line type it for see interaction type I'm going to make a discrete mapping and PD I'm going to say it's a dash line and PPP which is protein interaction will be this weird arrow and so now you can see that some of these edges are I've changed them so I've there's different types of interactions there's two different types of gene expression data two different experiments and I can just go on and to my heart's content and change the visualization based on the data that I have so this is very useful if you have a network and a lot of different types of genomics data you can overlay it and then you can see patterns that you might not be able to say easily in a table and you can use the selection criteria to select different aspects of it and visualize it in different ways so I think that's pretty much it for what I wanted to show that's like the basics of side escape there are some kind of fun things I think that you can do here maybe this one I think I can generate a okay it's not working sorry the ways of generating these things automatically let's see okay so I think that's that's pretty much it that's the basics of side escape and that quick demo as I said just is meant to quickly show you some of the functionality of this the next lecture is really about using side escape and using some of these these things that I showed you and that you learned in the tutorials that you use before class and so I think we'll just go back to that lecture the lecture slides unless there's any questions about about this yes so side escape is really about starts off with network information you can convert a gene list to a network and we'll learn about that tomorrow with the gene mania and those yeah basically most of tomorrow will be focused on that kind of conversion and then using the networks but so if you have network information like protein interaction information for different studies you could load them both up and you can take their intersection for instance so one of the tools here is this merge network so you can choose to take a union or intersection or difference of different networks for instance and so that could tell you what's different between the studies there might also be ways of visualizing the data so that you you can create a network that shows how they're they're related like if they're sets and how they overlap that may not be the most practical answer but you'll see how that relates to the next yes yeah and you can create charts like I showed you when I loaded up the gal filtered session file to start there were these little charts so you can show data over time or over multiple conditions as charts I think that's accessible and possibly get stuck here because I remember where the charts are but anyway I'm taking up too much time I think so any other questions okay so okay the next slides that are in this slide deck are mostly screenshots from site escape just for reminder purposes there's an old workflow that is more focused on site escape which shows you how to get information into site escape in different ways so you can have a gene list that's converted to a network with these tools like gene mania you might have different types of networks that you're that you're loading in so I just keep is used for visualization and then there's different apps for for instance there's an app inside escape called bingo that does the pathway enrichment analysis that we talked about today and then we'll talk about these other ones tomorrow so the next few slides here are I'm not going to go through in detail but they are showing a few different types of apps that are have been useful for people and so you can kind of look through them and and see if any are useful for you as I said there are 200 apps so we can't go through all of them so this is just a way of identifying a few interesting ones potentially and and highlighting them in your in your slides and then finally at the so for instance there's a so text mining app that allows you to type in a set of genes and then it will search PubMed for abstracts that relate to those genes and will extract a network out of those PubMed abstracts and show you that network and you can curate it you can delete you can fix errors that it might have introduced so that's called the agile and literature search app and there's there's lots and lots of apps the last few slides are sort of tips and tricks with site escape if you end up using it a lot you might want to look at this to see some of the tips that we recommend so for instance if you use a lot and you it starts running out of memory you can increase the memory or these are kind of advanced advanced options okay so so this is the topic that we kind of really wanted to get to use a site escape and I'll tell you how it works so we learned this morning that enrichment analysis generally works like this you have your experimental data which can be gene expression values sorted and you have a pathway database like gene ontology and you have your enrichment test that gives you finds out which pathways are enriched in your sample so you have spindle apoptosis you have some p-value or q-value and this is generally excellent tens of thousands of papers have used this this idea you get this table like this one of the problems with it is that there's a lot of similarities between the pathways that are resulting from that result from these these analysis if you get a long list of pathways there might be you know in this case for instance there's a lot of immune related pathways but unless I know a lot about immunity I might not know that you know some of these are directly related to immune response because they don't say immune in them but as a biologist you can sort of see how they relate but the problem is is that these relationships are not obvious in this table format and so it'd be useful if you could see these relationships more clearly and so enrichment map is a visualization method that visualizes the results of enrichment analysis as a network remember I mentioned that networks are good for visualizing relationships so in this case what we we take the results of g-profiler or GSEA and then you can make an enrichment map out of it and I'll show you how that works so here's for example GSEA so we have our genes that are up and our genes that are down we run GSEA and we get pathways that are enriched in condition a versus b or enriched in condition b versus a and then we can convert that to an enrichment map where the pathways these pathways like cell cycle spindle are represented as nodes and the edges connect pathways when they have overlap in genes so that might represent crosstalk between the pathways or might just represent that the pathways are one pathway is more general version of another but for instance spindle and cell cycle have a lot of genes in common and then the significant score here is used to color the nodes with the more deep red color being more significant and if you're using GSEA and you have things that are enriched in genes that are going up and things that are rich and genes are going down you've got two different colors here red means genes the pathways are enriched in genes going up and blue means pathways that are rich in genes going down and what's not really shown here but the size of the node is proportional to the number of genes in the pathway so some pathways are very general and they'll have many genes and so the node will be bigger this is the one of the overlap statistics that's used to compute how thick this edge is so if there's more overlap between these sets the edge will be thicker and so you can see how cytoscape that visual styles can be used to kind of make this kind of visualization I'll just go through fairly quickly different uses of this visualization which is from the enrichment map paper so this is using the same MCF-7 cells that experiment that Veronique told you about in the last lab the mate one of the three uses of enrichment map is sort of a single enrichment so that's what you would normally do in if you just have your experiment versus control for instance so just one two-way comparison so in this case as we talked about earlier this cell line is treated with estrogen or untreated so untreated is the control and they did this at multiple time points so just looking at this 24-hour time point they had a few replicates we just used the gene ontology database and so we ran GSEA as you guys learned about and then we are able to visualize the results instead of a table we visualize it as this enrichment map so you can see all the things I mentioned the size of the nodes is proportional to their number of genes there's green lines that are connecting nodes which are pathway the nodes are pathways the green lines you know the thicker the line the more genes the pathways have in common and one of the things that we've done with this in Richard map is kind of drawn bubbles around okay so so the immediate thing that you see is when you do a layout and site escape is that pathways that have a lot of genes in common all get pulled together and dense little networks and so you see all these networks here and we've manually drawn bubbles around these and labeled them according to their their theme but if you zoom in on one of these you can see the actual pathway names these are all gene ontology names that are from the original GSEA results so centrosome microtubule cytoskeleton spindle pole so all these are really related they're all related to microtubule cytoskeleton and so instead of looking at a very big long list of results from GSEA you now have this much simpler visualization and you know right now these bubbles are manually created although we have a method for doing this automatically that will come in a future version of the nurture map this year I don't think that's a very neat that's not available yet right the automatic automatic annotations not in the app store right it's just in the development version okay so that and there's a link to the development version in the on the wiki okay so the so the nice thing about this is kind of gives you a visual overview of the of your enrichment results and it should be faster to review and identify interesting themes okay so here's another use case comparison of two enrichments so so in this case we looked at 12 hours and we did a GSEA on the 12 hour time point and another GSEA on 24 hour time point so this is a little bit relevant to one of the questions that was asked how do you compare different data sets so if you have different data sets that you are doing enrichment analysis on like two different time points each one versus control then you can do in pathway enrichments for both of them with GSEA or G profiler and then you can visualize an original map where the node border is the color according to the enrichment in one in this case the 24 hour time point and the node center is colored according to the enrichment score in the early or 12 hour time point so I can immediately see that if I'm interested to see pathways that are differentially regulated between these two time points I can compare to control I can see that a lot of pathways are actually the same so for instance army transport they're both enriched in you know all the RNA transport pathways are enriched in genes that are going up in both both time points but here's a segment of the inertia map that shows that ubiquitin dependent protein degradation is not enriched in the early time point and it's enriched in the late time point and here's also some other differences around here the reverse kind of situation in DNA metabolism so this very quickly I can see that most of the pathways are the same there's just a couple of little areas that are changing between these two time points and you can go in an enrichment map and actually see a heat map visualization of the gene expression data and you can click on these nodes and you can see that you know indeed in this case this ubiquitin dependent protein degradation or APC dependent protein degradation gene ontology term is very similar between experimenting control at the early time point but it's very different at the late time point similar here it doesn't let the visualization doesn't really look like this but you do see the heat map it's not like a little arrow that pops out but you can get the heat map so this was made up for a figure to make it a little bit nicer the third thing that you can do that's quite useful is once you've done an enrichment analysis you can query it with a dish an additional set of genes so in this case I'm looking at pathways that are differentially active in a mouse model where the one of a micro RNA was knocked out in heart and as a result certain pathways went up and certain pathways went down so you can then take the micro RNA predicted targets and represent them as an additional node in this case a little yellow triangle and you can calculate the overlaps between all the genes that are targets of this micro RNA and the genes that are in all these pathways and what you see is that some of these pathways are very strongly enriched in targets of this micro RNA and others are not and it's kind of expected there are no targets in the pathways that are going down because when you remove a repressive micro RNA you expect path you know a lot of pathways potentially to go up and those are the pathways that ideally would be direct targets of the micro RNA but you see some pathways that are going up don't have a lot of micro RNA targets in and others have many so presumably these pathways that have a lot of micro RNA targets are the more direct the micro RNA is regulating more directly and these other ones might be more indirect so you can do this type of analysis with disease genes or any other additional set of genes that you have that you're interested in to see how those genes relate to the pathways that you have changing going up and down and so that ends up being quite useful so for instance one example is a paper that we just reference here where we looked at pathways that were going up and very very neat did this analysis with Shahina in my group we had pathways that were going up and down in the particular cell line and we used a POSM which is a tool that's available online for predicting if transcription factors are responsible for the gene expression pattern that you see and it highlighted HIF-1 alpha as like the transcription factor that most explains the data sort of the simple way and then we took the targets of HIF-1 alpha and we layered it onto this enrichment map and you can see that it sort of highlights a bunch of pathways that's potentially HIF-1 alpha transcription factor is potentially controlling and so that starts providing a little bit more insight into regulators that are possible okay here's the enrichment map that I showed you this morning that was from the autism spectrum disorder and in this case we used gene ontology and pathway database keg NCI and react on pathway database and also PFAM domains just for your information there was you know 14 thousand gene sets represented here but when we filtered we only got you know 3500 in the end that were relevant to the data set and after filtering so enrichment map is a site escape app I think you guys have all installed it it allows you to create an original maps from GSEA data or from G profiler or other tools there's a heat map functionality if you have GSEA data the leading edge is colored in yellow this part here helps you choose a cutoff so you can change this this slider bar and as you change it the enrichment map will update so you can change your Q value cutoff and so what I like to do is make the Q value cutoff really stringent and you'll see what pathways are what functional themes like what blobs in the in this network are very stringently there and as you make it less stringent the Q value threshold you'll see other themes popping up and so those you can get a sense by doing that of which themes are most stringent which ones are these are less stringent it's a little bit harder to do that in a table format so that's just a valuable visualization tool to reiterate that point so the way I think about this is that the enrichment map and pathway enrichment analysis kind of gives you a ten thousand foot view of your data in terms of pathways and as I mentioned this morning you want to use this to identify pathways that are interesting so here's in this example we zoomed into apoptosis and then one of these nodes and one of these circles in enrichment map is a specific pathway in in reactome and so what we then did is we went to the reactome pathway database loaded up the actual date the actual pathway itself in site escape overlaid our in this case protein expression data and we could see that only one you know a couple of parts of this pathway were really different differential and we then further zoomed into one little neighborhood which was a proteins that interact with caspase so that kind of shows you what you can do with this these tools to zoom into something potentially interesting and also in the keep your data in context there is a another app called word cloud that's available that allows you to select nodes in the enrichment map and get a summary of the words that are associated with that so it kind of helps you in your exploration of an enrichment map and then but we actually have a lot of this automated now this is a cookie that Ruth is sort of in who made an original map baked when she was presenting at lab meeting so she just was so excited about the project so it's just a fun slide that shows that the original maps are also not just useful but also delicious tasting