 So I'm Lincoln Stein. I'm Director of Informatics and Biocomputing here at OACR. And I work on problems involving curated pathway databases. You'll hear a lot today about one of the projects I work on. The Reactome Database of Biological Pathways and Reactions. And Robin Haw who you know well from the past few days is our outreach coordinator for Reactome. And we'll be leading you through an exercise after this lecture. So I have to make it move. Okay, there we go. So yesterday you learned about gene over-representation analysis. We're now going to go into a little more detail on more advanced topics in pathway and network analysis. And as I think should be clear to you now, by now, the reason people do pathway analysis is to do dimensionality reduction. You're studying a type of cancer, you're looking at mutated genes or hypermethylated or hypomethylated regions and microarray exchanges which affect hundreds or thousands of genes. And in order to make sense of this large number of genes, you need to reduce them. One problem in identifying driver genes among all the noise in a cancer genome is the number of hypotheses. If you're looking at thousands of genes, you have multiple hypotheses and your chances of identifying any single gene as being a driver gene rather than a passenger gene suffers from a loss of statistical power. So by reducing the number from hundreds of genes to dozens of pathways, you can increase the statistical power. You can find meaning in that long tail of rare cancer mutations that almost all of us see. And you can take a small number of pathways and turn them into biological stories to generate hypotheses that you can test. So pathway network analysis is a very broad term. It's applied to almost everything. It's any analytic technique that makes use of biological pathway information to gain insight into a tumor or other biological system. Very rapidly evolving. There are a huge number of approaches that various groups have taken. What I've done in the Wiki is to give you a link to a review article that was co-written by myself and members of the International Cancer Genome Consortium's Mutation Consequences Working Group where we sort of do a review of all the pathway and network analysis techniques out there. I encourage you to take a look at this. This manuscript is under review so please don't distribute it. It's unpublished data. And a lot of the slides I'm showing you are taken out of that review article. So first, let's discuss what the difference is between a pathway and a network. So pathways are a series of directed reactions linked temporally and causally and are kind of the basic building blocks of our biological knowledge. They are the biochemical pathways that you learn in introductory biochem. In undergrad, they are the signaling pathways. They are the kinase cascades. They are pathway. They are mitotic spindle assembly. They are a very detailed description of a series of directed interactions between macromolecules, proteins, DNA, RNA, lipids, and so forth. And so here is a simplified example of the EGFR, the EGF receptor pathway, showing some of the initial steps. EGF receptor and its ligand EGF which associate with each other to form a multimer. There's actually an EGFR dimer and the EGF monomer. This is a negative regulator here, LRG1, which inhibits EGFR dimerization. This complex is enzymatically active, converts ATP into ADP to give a phosphorylated product which then activates downstream steps. So ultimately leading to cell growth. And there is another, there's a positive activator of this event, SARC1. Now, networks, on the other hand, are a series of bimolecular interactions which broadly cover both well-known biology and less well-known biology. So you can take a, and there are a variety of network types or directed networks, undirected networks. I think Gary discussed those yesterday. You had that? You have yet to do cytoscape. Oh, you have yet to do cytoscape. Well, you have connection, you have, you have undirected connections between two molecules such as an interaction where they're peers and then you have directed ones where one is doing something to the other. So a protease cleavage reaction, there's the protease and there's its target and that would be represented in a network by an arrow to indicate information, directed information flow. We can represent the EGFR pathway as an EGFR network. And so what we've done here is we've taken all the green round corner rectangles here, converted them into directed interactions. We have EGF that's interacting with EGFR. It's being inhibited by LRG1 that's interacting with SHC1. And then we're also able to bring in other things that we know less well about. Let's say we've done a proteomics pull down of the active EGFR, EGF receptor complex. And we found a number of other molecules that are co-immunoprecipitated, co-pulled down. And we can say, okay, well, these are interacting. We don't know exactly how they're undirected interactions. And the advantage of the pathway view is that you understand the mechanism, but it is typically the product of human curation, coming going into the literature and extracting these pathways, putting them down in the diagram. And they typically they'll only cover a portion of the genome space. So the best pathway databases cover maybe a third of the well annotated genes in the genome. In contrast to networks, they actually tell you less about mechanism, but they're more expansive. They can bring in less well characterized proteins such as yeast to hybrid screens, microarray based co-expression, literature binding for things that are known to interact in some way. Genetic suppressor enhancer screens, you name it, they can all be brought into a network. So they tell you about more genes, but there's less information about how they're actually interacting with each other. So in this I know you've covered. In any pathway or network analysis, you need two basic ingredients. You need first a list of altered genes, proteins, and RNAs. This is what you provide from the experiments you've performed. So up-regulated RNAs, proteins which are mutated, series of changes in ubiquinolation in a tumor cell line that's been inhibited by an SH RNA, you name it. And number two is you need a database or a source of the pathway or pathways or networks that you're going to apply this information to in order to interpret the list in the context of the pathway or the network. So pathway data, so we're going to talk about where you get the second ingredient. So basically you get them from databases. In a pathway database, typically these are curated databases where biological experts, Ph.D., have gone into the literature and extracted the well-known biological knowledge from the literature. They provide a biochemical view of biological processes. They typically come with pretty pictures which you can intuit easily. They capture cause and effect, and they're human interpretable. Disadvantages of pathway databases that they don't cover the genome very well. And because it's a subjective human-driven process, different databases disagree on the boundaries of pathways. So if you go to Keg and you go to Reactome and you ask for the EGFR pathway, you'll get the same core pathway, but you'll get different things at the branches where people disagree where EGFR receptor stops and another pathway, say, growth regulation starts. So we're going to talk about a few well-known pathways here. The most well-known is the Kyoto Encyclopedia of Genes and Genomes, Keg. It has human-curated pathways in both human, many different vertebrates, many, many, many prokaryotes. It's large, it's quite comprehensive. It gives you pathway diagrams like this one and provides services that allow you to find what pathways your gene list is overrepresented in using the tools that Quaid and Gary talked about. And it will then give you colorized views of those pathways in which it has lit up the genes on your list on the pathway diagram so that you can start to understand what's going on. The reactive database that Robin and I work on is a human-centric curated database of pathways. All the work is done on human pathways taken from the primary literature. It's quite comprehensive, probably for human. It's the most comprehensive of the pathway databases, very high standards for curation, very low error rate. And like Keg, it provides these nice, browsable pathways. This is actually a Google Maps view. You can click on individual molecules or reactions or the connections between them and you'll get a little summary down here which gives you information and you can actually dig into this in great detail and do analyses right in the diagram, including overlaying your own data on top of the map so you can see your data in the context of the map, doing overrepresentation analysis. Robin will take you through some of the things you can do with Reacto. So I've already told you most of this. Currently, as of the most current release, we're covering 1,500 pathways in human, 7,327 human proteins. And this is very detailed information. If we say that there's been a phosphorylation event on a protein, we indicate which amino acid is affected. We indicate if there are multiple RNA isoforms that are expressed in different tissue types, we identify which isoform is relevant. We provide a Google-style reaction diagram. You can do some basic overrepresentation analyses on the site. And we also have a service in which we take the heavily curated human pathways and project them onto other species to produce automatically generated pathway diagrams using orthology information. And if you're looking at other vertebrates, this is pretty good inference. If you're going down to invertebrates or yeast or prokaryotes, it becomes increasingly dicey, as you can imagine. And Reactoom is open access and free to use. So now we turn to network databases. So networks can cover less well-understood relationships. It can be used for genetic interactions where you know that there's a suppression or enhancement or an association, but you don't know what the physical basis for that is. Physical interactions from high throughput screens. Regulatory relationships such as co-expression, you know from a large number of microarrays that two genes are always linked when one is highly expressed. The other is highly expressed, but you don't know the mechanism for that, but you can express it in a network. Things like genontology, term sharing, and in fact adjacency in pathways. If you know that there's a molecule that forms a multimer, you can say that the components of the multimer interact with each other are connected by a network arc. Network databases can be built completely automatically or via curation. Typically, they're a combination of the two. They provide more extensive curation, more extensive coverage of biology, although the underlying evidence is more tentative. If a pathway database says that two molecules are interacting with each other, there's usually very good evidence that that's occurred. If a network database says that, you should need to take that with a grain of salt and look for the underlying evidence. Here are some popular curated networks, our BioGrid that's based here in Toronto. It's curated interactions of literature. 529,000 genes, obviously not all from human, they're from multiple species, 167,000 interactions total. In-tax database, which is a European project, is again curated interactions in literature, 60,000 genes and 200,000 to 3,000 interactions. And the Mint database based in Italy, 31,000 genes and 83,000 interactions. Respectively, one of the interesting things looking at this are the different ratios between genes and interactions, and it indicates differences in the curation and automatic inference algorithms used by that network database, rather than differences in biology. In addition to those three or four that I gave you, those three that I gave you, there are another 900 or so network and pathway databases. There is, which can be of course very confusing, which one do you choose? I want to point you to a nice resource for looking at collecting information from network and pathway databases called the Pathway Commons. A number of years ago, the largest network and pathway databases got together and agreed on a common language to describe pathways and networks, and a language called Biopax, and then the databases have all submitted their holdings to this resource called Pathway Commons, which allows you to browse networks and pathways from multiple databases. There's also a resource called Wiki Pathways, which is a community-hosted pathways database, which is also accessible through Pathway Commons. Okay, so now we've got the two ingredients. You have your gene list, and we have information on networks and or pathways. What are you going to do with this? So this is another figure from that review article. There are basically three broad classes of analysis you can perform. The first is gene enrichment studies, which you've already covered intensively. This takes this rich network and pathway information that you've gotten throws away most of the information and divides genes into a series of buckets or bags, named after gene ontology processes or pathways or subcellular macromolecules such as the ribosome. It depends that you can slice them in many different ways. And then using statistical tests on your gene list asks, are any of these buckets, are the genes in my gene list, gene or protein list, over-represented, statistically over-represented in any of those buckets? And so what it will give you is a series of over-represented gene or protein buckets, which will tell you something about the underlying biological process. The second class of analysis uses more of the topological information in the network or pathway. It's actually mostly used for network analysis. Using the entire network, you overlay the information from your gene list, and you ask, are there any topologically unlikely groupings of the genes in my gene list? Are there subsets of the genes in my gene list which are interacting with each other more than you would expect by chance from a randomly selected set of proteins? And then from that you can extract clusters of genes, little subnetworks, which are altered in your cancer or other biological system. And this is called de novo subnetwork construction and clustering. The advantage of this is it allows you to discover new relationships among genes, proteins, and other macromolecules, which were not previously put into a bag. The most sophisticated type of analysis is pathway-based modeling. Here you keep all the detailed regulatory actions, and you operate directly on the pathway level, and you try to build models of how the altered genes in your system are working with each other to change biological activities. So you may have 15 different mutations. You're going to try to integrate them together and say, what effect is the aggregate of this going to have on EGFR signaling? So I might have an EGFR amplification and a K-RAS activation. They're on the same pathway. Are they synergistic with each other or are they canceling each other out? This will help tell you this. So what are these three things used for? So gene enrichment is typically used to find out what biological processes are altered in my cancer or other biological system. It won't tell you exactly how they're altered or what the effect is, but it gives you a very strong head start on coming up with hypotheses. With de novo subnetwork construction and clustering, you can actually do discovery of new pathways. You find a group of genes which are interacting and are also altered in your cancer and you ask, well, what is this? This is not a biological process we've seen before. What exactly is it significant? It also is very useful for identifying clinically relevant tumor subtypes. So you can see different patterns of active subnetworks in different subsets of tumors, for example, and try to relate those to clinical characteristics such as response to chemotherapy or patient disease-free survival. The third one gives you the most detailed information. It's also not coincidentally the hardest type of analysis to perform. How are the pathway activities altered in a particular patient or tumor? What is the integrated effect of multiple mutations, copy number changes, methylation changes? And it also helps you make predictions about targetable pathways in a particular patient or tumor type. So in Richmond-affixed gene sets, you've now been covered. I'm not going to go through them in any detail, although Robin will have an exercise using Reactome for gene set enrichment. By far and away, the most popular form of pathway in network analysis, because it's easy to perform, there are lots of good end-user tools, and statistical models are very well worked out at this point. Disadvantages are there are many possible gene sets, and typically when you do one of the analyses, you'll get many processes which are interrelated with each other that are up-regulated or down-regulated. And you need to sort through those lists, and that's what gives rise to things like the enrichment map that Gary talked about, where you take multiple go processes and you try to put them back together into uber processes. And the bags of genes metaphor is getting rid of regulatory relationships among the genes, so you lose some of that information. That's all I'm going to say about gene set enrichment. Now we're going to look in more detail on subnetwork construction and clustering. So in this class of analysis, you start with a network, you apply your list of altered molecular entities, and you identify topologically unlikely configurations, such as all the genes which are up-regulated in your system are clustering together in a tiny little corner of the network. You can then extract those clusters of unlikely configurations and then apply biological annotation to them to understand what their role is. So here is an example of doing this with reactome, and Robin will, if there's time, will take you through an example of using cytoscape to do this sort of thing. Reactome has created a network representation of the pathway database called the Reactome Functional Interaction Network. What we did here is we took our curated pathways. I'm having trouble finding my mouse here. Curated pathway databases. We added a lot of uncurated interaction information, these two hybrids, co-expression gene, literature mining for co-mention of genes in the same paragraph and so on. And put that together with a little bit of machine learning to reduce false positives, and ended up with a network of 11,000 proteins and 270,000 interactions, which, if you remember back to the network databases, got a very high ratio of interactions to proteins, you know, indicating how much richness there is in the underlying pathway databases. We then have a system that runs in cytoscape for extracting and clustering altered genes from your gene list to give a series of modules, altered modules, and typically there will be 10 to 30 of them, and then you can annotate them, look at them in more detail. So this is what, this is actually a previous version of the network when it was just a little bit under 11,000. So this is a little corner of that network showing that there's a lot of non-random clustering just in the network itself, because of course, every time you have a subcellular organelle such as ribosome, it ends up being a little cluster of highly interacting proteins or genes. We cover about half of the named genome and we've tuned things that the false positive rate is kept to below 1%. And I'm showing just 5% of the network here, it's much larger than that. So here's an example of using the FI network to discover novel biology. So this is a slide from, I guess, back in January of this year. We've been doing pancreatic cancer whole genome sequencing and have identified these identified mutated genes in the coding regions of 52 pancreatic cancers. And what you're looking at here are the top 50 or so genes that we've identified, there are hundreds of them, and their frequencies. And this is pretty typical for a cancer mutation spectrum that there are a couple of genes, KRAS in P53, which are very frequently mutated. KRAS is mutated 95% of the time. P53 is 50 to 60% of the time mutated. And then there's this long, long, long tail that goes way off the screen of genes which are infrequently mutated some of them are passenger mutations some of them are rear driver mutations which are contributing to the disease but we don't know which ones are which. None of them meet the boundaries meet the threshold for statistical significance just on the basis of recurrence. If we apply this, if we apply this as a simple extraction and clustering system to that list, however, you see very non-random association of these genes. And in fact, we see multiple modules coming out. We have the largest module involves KRAS, as you would expect, the size, the area of each node corresponds to the number of patients who had a mutation in that gene. And you see a penumbra of genes that interact with KRAS or with each other in that long tail. And it involves genes drawn from or EGFR signaling, or B signaling, FGF signaling, and axon guidance interestingly. Yes. Please, the last question is more significant. Sure. Yeah, yeah, yeah. Sure. So you have to distinguish between whether it's biologically significant or statistically significant. To do statistical significance, there's simple procedures to take multiple random draws from the genome of a series of genes of the same size distribution and recluster them and ask, how often do I see one of these clusters coming up by chance? And if you do that, there are some pathways which are false positives. So module 7, which is extracellular matrix, focal adhesion, integrin signaling, which you'd think is very biologically significant, turns out there's a large number of... there's a big number of large, frequently mutated passenger genes in that set, which also interact with each other, and they come up all the time. So you end up xing that out. And the cytoscape tools gives you... cytoscape gives you tools for doing that analysis and removing them. In terms of biological significance, that's up to you. We can't do it on a single patient, but the interesting thing about this type of clustering is that even when we had only 10 patients, we were essentially getting the same list of modules that we got at 50, and now we're up to about 80, and the map hasn't changed. It fills in a little bit, but the basic structure hasn't changed. So it's actually really, really powerful. And we've done experiments in which we took the full set of genes and then we started introducing false negatives. We just dropped genes out at random to simulate what happens when you're looking at low-coverage sequencing or low-cellularity tumors. And up to... the map remains cohesive and essentially unchanged until you get down to about 60 to 70%, at which point the modules start falling apart. So it's actually very tolerant of a high-false negative rate. Okay? Yeah? Just in terms of being able to also be differentially expressed genes... Absolutely. It's just a gene list. It can be two-fold up-regulated genes. It could be genes which are near hypometallated regions. Any gene list that you can give it or it can be a list of proteins from a proteomics experiment? No. So this method doesn't use a ranking. You select the threshold and give it the list. Now, I'll talk about hotnet in a moment which allows you to attach values to genes such so that you can distinguish between the two-fold up-regulated genes and the two-fold up-regulated ones. Okay. Okay. Have any questions? Okay. And the fascinating thing about this type of clustering is that it can reveal subtypes within the tumor. If I were to take this list of 200 recurrently mutated genes and just do a hierarchical clustering of them, I would see no substructure at all because most of the genes only occur... Most. The vast majority of genes are mutated only a couple of times in the whole set and they don't cluster together. I wouldn't be able to distinguish different subtypes of the tumor. However, if I've reduced... I've reduced this to the 10 or so modules and cluster on the basis of what modules are mutated in each tumor, actually a very striking pattern emerges. What we're looking at here is a hierarchical clustering of the modules going across and we've color-coded the modules to indicate how many genes in each module are mutated in any given patient, in any given donor. Then we have the patient samples running along the y-axis. You can see four clear apparent tumor subtypes here, one of which is, say, KRAS module minus P53 module minus and when we went back to those samples, it turned out not to be pancreatic doftal carcinoma. There was ampullary carcinomas that had been misdiagnosed, so we picked that up. That's nice. We have one here where two of them are ampullary and one of them is still under debate about what it is. It may be a rare KRAS negative doftal carcinoma. Type two is KRAS positive P53 positive, PIC3 kinase positive. Type three is a different combination of those. Of course, we immediately went and looked for histopathological differences aside from the ampullary carcinomas that didn't seem to be any differences. We looked for survival differences and didn't see that. You win some, you lose some, but in another case, when we did the same thing in breast cancer, and this is actually an RNA expression differences. Now, we did exactly the same thing with mRNA expression arrays in estrogen receptor positive breast cancer. We actually found a module that involves a cell cycle and a rorobikinase signaling which is a strong prognostic indicator of poor outcome. This is a Kaplan-Meier curve showing disease-free survival in these patients. Estrogen receptor positive patients typically have a good prognosis and are expected to live a long time. However, in the subset of patients who had high expression in this module, they actually had a much more rapid recurrence rate so much so that they have the same prognosis as triple negative patients. That would be a patient population that you might want to pull out and treat more aggressively. Here are some popular network clustering algorithms. Gene Mania, which Quaid talked about. Can we introduce it when it's coming up? It's coming up in the future. Gene Mania is a website which has several hundred different networks all integrated together and allows you a very nice interface in which you enter a few genes from your gene list and it finds other genes which are highly related to those, pulls those clusters out, displays them, lets you annotate them and explore them. It's most useful for finding genes that are related to your process that you might have missed experimentally. The hotnet system from Ben Raphael's lab at Brown University uses a sophisticated clustering algorithm which avoids a major... designed to avoid a major artifact. If you consider a gene like P53, P53 has been studied very extensively for two decades and has a lot more interactions than most other genes do just because it's been more heavily studied. And if you do a naive clustering, P53 module will typically come out just because it has so many interactions. What hotnet does is it models the network like a lattice of wire, metallic wires. And then you introduce your mutated genes or your up-regulated genes as a series of hotspots on the lattice and the heat diffuses out using a physical model. And a gene like P53 which is so heavily interconnected has many connections leading out and so it loses heat more rapidly than other genes. And so it compensates for the differences in annotation. So it does a better job at finding high-quality modules which are less prone to these artifacts. And in fact, after the publication of that algorithm, we swapped out the Reactome functional interaction networks clustering algorithm. We added as an option to run hotnet for clustering and it does a better job. There is another site escape app I want to draw your attention to called Hyper Modules written by a post-doctoral fellow in Gary Bader's lab which is designed specifically for finding network clusters that correlate with clinical characteristics. So that if you have experimental data sets and arrays or sequencing data you can present that to the network along with clinical characteristics you're interested in, such as response to Adria Mycin. And it will try to find clusters that not only are statistically unlikely in your set but which are more strongly correlated with the clinical characteristics that you're looking for. And then there's the Reactome functional interaction network site escape app. It has a long name which Robin will be talking to you about and it offers multiple clustering and correlation algorithms including some of the more sophisticated ones that I'll talk about next. So pathway based modeling is the last topic and this is both the most sophisticated and the most inaccessible. The hardest for you to use. It is in this case you're applying a list of altered molecules to biological pathways, not to networks. It preserves the biological relationships, functional relationships among them and in particular these algorithms are designed to allow you to integrate multiple different types of molecular alteration into the same model so that if you have copy number variations and expression changes and single point mutations you can put those all into the same model and get a prediction of what their effect is. And this shades very quickly into systems biology. So the types of pathway based modeling there are typically pathway based modeling is very specialized for a particular biological question. So for biochemical systems such as metabolomics where you have yeast growing in a fermenter and you want to make predictions on what will happen when you increase the amount of maltose this is classically described using partial differential equations and Boolean models. I'm picking one of the most easily accessible tools out one called CellNet Analyzer from University of Padua. And this will up to systems of about a dozen genes. This will make predictions on what happens on the metabolic side. Once it gets larger than that these models become intractable to solve and you need a lot of detailed information you need the KMs for each reaction you need the binding coefficients all the very detailed biophysical measures. So in cancer typically these are off the board you can't use them. For the studying signaling cascades and particularly kinase cascades there are network flow models in which it uses information messaging theory to study what happens to a piece of information signal as it goes through the cascade and uses models that were developed for communications networks. And so the ones, two examples are net forest and network kin for studying phosphorylation cascades if you're doing proteomics and you're studying kinases this is the tool you should turn to. If you're working on expression arrays and you're interested in decoding or deconvoluting the transcriptional regulatory network to find say master regulators or master switches which are altered in your disease and which are affecting a series of genes underneath it you should turn to the transcriptional network-based reconstruction methods I recommend Andrea Califano's arachne software and what this does is it uses information theory to sort primary from secondary effects and then construct a hierarchy of transcription factors so it solves the problem that you have one transcriptional regulator and you have 15 targets in your system of interest both the regulator and all 15 targets go up simultaneously and you don't know which one's regulating which it's actually able to use information theory to sort that out and find what the master regulator is and lastly and most recently there are probabilistic graph models or PGMs which are a very general form of pathway modeling for cancer analysis and I'm going to go into more detail on this because this is probably the most relevant for work in cancer biology and an algorithm called Paradigm is the paradigm for this whole class of models so the way Paradigm works it comes out of Josh Stewart's lab at UC Santa Cruz published about four years ago three and a half years ago starts with individual pathways taken out of a database I'm showing a very simple example here of P53 it's MDM2 repressor it's actually a eupyconolation reaction and P53 leads by a whole series of other pathway steps we're not showing here two apoptosis Paradigm takes this biological model and turns it into a probabilistic graph model using the central dogma so the MDM this single MDM2 node here actually expands into four different things in the model the MDM2 gene, the DNA the MDM2 RNA the MDM2 protein via translation and the MDM2 active protein via a post-translational modification each of these has a series of weights attached to it which indicates how the changes in activity of the gene will change the activity of the RNA so for example if you have an amplification of the gene you expect the RNA to go up by some level and then the weight can be a correlation coefficient same thing with P53, P53 gene P53 RNA, P53 protein and the active protein and then there's one interaction between these two pathways MDM2 eupyconolates P53 and will end up inhibiting the P53 active protein the weight in this case will be a negative weight alright, so that's a nice little model and it follows the central dog that makes sense you can then paradigm does this across all annotated pathways and then you can apply your gene list to it and your gene list can be a series of mutations that affect the DNA a series of RNA fold changes that affect the RNA copy number changes that affect the genes maybe you've got some mass spec data on the protein indicating the protein has gone up or down or has changed or has had a post translational modification and then it will integrate all these information together and it will give you a readout on the pathway activity has the combination of all these alterations caused apoptosis activity to go up, to go down or to remain the same and this technique actually works quite well in the general case here it has been applied this is in the 2010 paper from the Stuart lab has been applied to glioblastoma multiforme from the TCGA where they had they had RNAs, they had CNV changes and they had they had point mutations and methylation and what we're looking at here are individual patient samples and members of the different pathways that went into paradigm and the prediction of the changes in pathway activity so where it's showing that there is a group of patients here group number three which have very strong predicted inhibition of the GATA transcriptional factor pathway here and increase in the EGFR pathways and so on and so forth group number two has a different pattern so we're representing the effects on each patient as a series of pathway activity changes and you can then go into this and use this to do what if experiments for example you can take the model and say okay I have a patient who has predicted increase of activity in EGFR let me go through the list of 2000 or so clinically approved drugs and that have known drug targets applied them to the model and see if I can reduce EGFR activity to normal or maybe I want to knock out an essential pathway in this particular tumor in a synthetic lethal fashion so you can ask that of the model and actually find in some cases drug targets that are specific to that tumor to facilitate precision medicine so unfortunately there's good and bad news the bad news is that it's although it's open source it's distributed in source code it's hard to compile the thing they don't provide you with any pre-formatted pathway models there's no documentation they'll see to run it and even if you get it up and running it takes weeks to run one of these simulations on a cluster it's not actually the thought experiment that I told you of doing a virtual screen for targeted drugs you can't actually do that in practice it would take years to do it there's also some good news recently the software developers on Reaktum have incorporated the paradigm algorithm into the Sight Escape app and we've introduced performance increases it increased the performance about 50 fold so you can now start to do those experiments and you can do analyses in real-time and we've pre-populated it with Reaktum-based pathway models it's in alpha testing now it's not really ready to demo at this point but it looks very promising and will be available within a few months I believe so that's the end of the talk you have ended the talk which you have copies of with a series of URLs and background reading on each of the three classes of algorithm and we'll take some questions and maybe have a quick break and then Robin is going to give you a practical exercise of using some of the tools and techniques I talked about