 Good morning, everybody. Hope you're enjoying the course so far. I'm going to give you an introduction to pathway analysis. Here is our Creative Commons copyright. So we're on pathway and network analysis. So the objectives for this morning are to understand what is meant by the terms pathway and network analysis. They're thrown out sometimes interchangeably, sometimes not. It's a source of confusion even within the field. I'm going to talk about where the data comes for performing this type of analysis, go over the three main paradigms people have for performing pathway and network analysis, and then show some specific examples of using one network and analysis framework based on reactome to analyze biological data. And that will lead into the exercise practical that Robin Hall will lead you through after my lecture. So I think this will be familiar to you that the reason that people are interested in pathway analysis is because of the fundamental dimensionality problem that biology has. We have genomes that are tens of thousands of genes. There are 20,000 in the human genome. There are 20,000 coding genes and twice as many non-coding genes. Whenever you look at a biological perturbation, whether it's a disease or a model system, you'll typically find changes in thousands of genes. And given that we can't do tens of thousands of experiments, we're usually doing tens to hundreds of experiments, that means there's a huge problem with multiple hypothesis testing. So the reason one wants to do pathway analysis is to reduce these changes that you measure in thousands of genes to small numbers, dozens of altered pathways, and then do your hypothesis testing on the pathways. That increases your statistical power and it lets you find meanings in the long tails that we frequently see in cancer and other diseases. So for example, if you look at autism spectrum disorder, you find many, many, many extremely rare germline mutations. How do you make sense of many singleton events? Another advantage of pathway analysis is it relates the changes that you see to the biology, to biological pathways that people have worked on for many years and many cases have an intuitive understanding for it. It lets you identify hidden patterns and long lists of genes that have non-explanatory names. It allows you to create mechanistic models to test your hypotheses with experimental interventions to predict the function of unannotated genes, something on the order of 20% of the genes in the genome we really don't know anything about. And it gives you a framework for doing quantitative modeling, sort of cell engineering and quantitative prediction, and it lets you develop robust molecular signatures. So pathway and network analysis, broadly speaking, is any analytic technique that uses information on biological pathways or network informations to gain insight into a tumor or other biological system. And I'm sorry, I'll keep talking about cancer because that's the field that I work in. It's my go-to. It's a rapidly evolving field. There are multiple approaches. And there's little consensus on what the best practices are in pathway and network analysis. OK, so now let's talk about what is the difference between a pathway and a network. So a pathway, when I use it and when many people use it, refers to the classic type of mechanistic model that we learn in biochemistry, 101, where there is an ordered series of events occurring in the cell involving macromolecules, organelles, and other entities that interact with each other in a particular sequence events to yield a biological outcome. So here, for example, we're looking at a simplified version. Where's my pointer? Here we go. There's the right button. Wrong button. OK, here we're looking at EGF receptor binding to being bound by the EGF ligand, creating an active dimer. This then hydrolyzes ATP. And it leads to the product of phosphorylated EGF receptor, which then activates downstream events, ultimately, causing the EGF signaling. It has a series of inhibitors and other regulators. This is the advantage of looking at a path of a biological process this way is that you can understand what it's for, and you can imagine experiments in which you alter different parts of this to test your hypotheses. However, these types of models are computationally very difficult to work with and also require a degree of curation and previous knowledge that often you don't have. So if you focus just on curated pathways of this sort, you're going to miss a lot of biology that's not yet explored. So the alternative is networks. And in a network, you take this fairly complex and highly structured pathway, and you break it down into a simple model where you just have a series of interactions. So instead of this idea of EGF receptor binding to EGF to form a dimer, we just have EGF interacting and activating the EGF receptor without you're actually knowing what the mechanism really, really is. So there's positive interactions which activate a molecule. There are negative ones that inhibit. And then there are interactions where you don't know what the direction is, but you know they physically interact somehow, but you don't know exactly what they're doing. And although this lacks a lot of explanatory power and is less satisfying, computationally, it's often, it's usually easier to work with the simplified network view than the pathway view. So is that people clear on that? Maybe I'll speed up a little bit. So when you do pathway and or network analysis, you basically need two ingredients. First, you need a list of the genes and proteins and RNAs, other macromolecules that are altered in your system, because we're typically looking at perturbations, either ones that you create in the lab or naturally occurring ones that occur because of a disease. And you need a source of pathways or networks which to do the analysis, your reference set. So let's talk about where these come from. So pathway databases are the older of these ingredients. They go back to textbooks like Leninger. There are multiple pathway databases. Their advantages are usually they're curated. Often they're created by a team of PhD level curators who are actually manually constructing these pathways and drawing them out for you. They provide a biochemical view of biological processes. They capture cause and effect, and they give you human interpretable visualizations. They're intuitively easy to understand. Disadvantages are they have a sparse coverage of the genome. And if you look at different databases, they will actually disagree at where one pathway begins and another pathway stops, which makes perfect sense because the cell isn't organized as chapters in a textbook. And where we draw boundaries between EGF signaling and RAS signaling and other GDPR signals is really a historical, an historical artifact that doesn't really reflect biology. So I don't know why these are called reaction network databases, but pathway databases two prominent examples are Reactome and Keg. Reactome is a database that Robin and I work with. Keg is a very well-established and well-known database from in Japan. They describe biological databases as a series of biochemical reactions. They use, so the pathway databases usually use a very straightforward model in which the reaction is the center of the data model. A reaction takes multiple inputs, which can be anything from a protein to a drug. Those two inputs interact with each other in some way to produce one or more outputs. And they may be regulated in various ways. They may involve catalysts or inhibitors that alter the rate at which the reaction occurs. This is a very general model. It can be used for things like a proteolytic reaction where the inputs are, say, ATP and a large protein. And the outputs are ADP and the cleaved protein products. It could be a binding reaction such as EGF and EGFR receptor outputting the active dimer. Or it could be something like a macromolecule that's inside a cell being transformed by a transporter to a macromolecule outside the cell. So it's fairly general. Now Keg is the Kyoto Encyclopedia of Genes and Genomes based in Tokyo, Japan, or Kyoto, Japan. It's a very, very large database that includes many other things from pathways. It includes it has genome sequences, protein database, chemical compound database, and its fans, multiple prokaryotes and eukaryotes. The pathway portion of Keg is a manually drawn pathway maps that have been created by expert curators. It used to be a free resource, but more recently, because of public funding issues, it's gone to a subscriber model. So you have access to part of it for free to get access to more you have to subscribe and you have to subscribe and pay a license fee in order to download. Here's a typical Keg pathway diagram will look like this, showing a very similar to what I showed you before, showing individual genes in this case, or individual proteins in this case, complexes consisting of multiple proteins and the reactions that they participate in and their positive and negative relationships. Reactome is more focused than Keg. It focuses really on human biology. In contrast to Keg, it's completely open source and open access. Reactome is able to do that because of continued funding from the National Institutes of Health, which I pray we will continue to get. It is a curated database that encompasses metabolism, signaling, multiple biological processes. It covers roughly 50% of the coding portion of the genome. It has just about 11,000 genes in it as of the last release out of the 20,000 that are known. Every pathway is traceable to the primary literature via the efforts of curators and experts who are brought in to work on each of the pathways. It's peer reviewed, so every pathway gets reviewed by at least one other outside expert. It cross-references to many other bioinformatics databases, and it provides a basic set of data visualization and analysis tools ranging from a browsable Google map style reaction diagram. Let's you zoom in and out and pan around. It lets you find pathways containing your genes of interest. You can do gene overrepresentation, which will be the topic of this morning's exercise. Although it emphasizes human, you can use it to match human pathways with pathways in other species, including most model organisms. So here is one representation of the pathway browser. This is actually a hand-drawn view. As you zoom into this, it becomes increasingly detailed. And after assert, if you zoom in enough, it becomes a pathway diagram with individual genes and proteins being shown. If you zoom out, you get a more abstract display that shows the entire genome and lets you show overrepresented pathways. In this view, we've actually done a pathway enrichment analysis in which we've uploaded a data set from a RNA-seq experiment from a cancer collection of some sort. And it shows overrepresentation in a variety of signaling pathways. There is a list of the individual signaling pathways that are overrepresented, along with their p-value and false discovery rate here. There is a picture of it showing the overrepresented pathways with a little orange bar showing what portion of all the genes and proteins in the pathway are covered in your set. And then there's a kind of a hierarchical display of signal transduction and all the sub-pathways and sub-sub-pathways within them and their overrepresentation. Why would you prefer these over a geo or something like that? Oh, we're going to go into that in just a second. OK, so now networks. So pathways capture only the well-understood portion of biology. Networks can cover less well-understood relationships, including genetic interactions from a suppressor enhancer screen, physical interactions, things like CRISPR studies, go-term sharing, co-expression networks. You name it, you can put it in a pathway. Network databases can be built by a via curation, but more typically, they're built automatically from high-throughput experiments, such as CRISPR screens, for example. They have extensive coverage of biological systems. But the relationships and underlying evidence are more tentative. Typically, there will be false positives and false negatives in the network. And you have to be careful in understanding the limitations of networks you work with. There are multiple sources of curated networks, too, that I want to bring to your attention is the BioGrid consortium that was actually started here in Toronto. It's a series of curated physical and genetic interactions from the literature. 65,000 interactors and 1.1 million interactions. It's quite a large database. And then there is the intact database based in the EBI in the UK. Again, uses curated interactions from the literature. It's somewhat smaller, but in general, the standards are a bit higher, 694,000 interactions. Just a look at intact as an example here. Very straightforward user interface. If you're interested in what interacts with P53, you type the gene TP53 in the search bar, and it'll give you a long list of 11,128 molecules that interact with P53. And you can dig into this and identify what type of interaction, how the interaction was defined, what the evidence was for, and primary literature references. And then I'm going to close out this section by talking about a database of pathway databases, the pathway commons. This is an effort that Gary Bader, who was here yesterday, helped launch with Chris Sanders about eight or nine years ago to bring into a single database information from multiple pathway and network databases. There are about a dozen that contribute to this. And it has a simple interface. You can search for a protein or other macromolecule that you're interested in. And it will list all the interactions both from network and pathway databases that have something to say about P53 and give you pointers to the source database. So if you're looking for everything that's known about a particular gene, protein, or RNA, this is a good place to start. Then on top built on top of these databases, there are quite a few visualization tools that let you manipulate pathways and databases. These can become quite large, as you can see. We're focusing on site escape during this class, because it's far and away the most widely used of these tools. OK, so now let's go address your Lyron, right? I'm going to address your question. So there are basically three types of pathway and network analysis, one that you can do. There's enrichment of fixed gene sets. There is de novo subnetwork discovery and clustering. And then there's pathway based systems biology style modeling. So the first and easily most widely used one is the one that you discussed yesterday, where you have a gene set, such as GO, in which genes have been parsed into a series of fixed sets, each corresponding to something biologically relevant, relevant such as a subcellular compartment or a biological process. And then you use one of many, many enrichment statistic tools to test whether a list of genes that you have is over or underrepresented in one of more of those gene sets. In number two, and now we're going to go in over a second what the advantages and disadvantages of each other. The second most widely used style is subnetwork construction and clustering. Here, you don't start out with a preconceived notion of the boundaries of your gene sets. Instead, you have something like a big pathway or a big interaction network, which has maybe 10,000 genes in it. And you then apply your data set to this network in order to discover highly enriched subnetworks within that network. So it lets you discover relationships, biologically significant relationships, which are not previously defined by your list of GO terms. And then the last one is pathway-based modeling. Here, you start out with a biochemical style pathway network with inputs and outputs and a biological endpoint. You apply your perturbed genes to this, and you have it predict what the result is. So for example, if I have a cancer cell, a tumor, and I have identified that a gene has been deleted in this cell, pathway analysis, pathway modeling, will allow you to predict what the effect on KRAS signaling or what the effect on proliferation or contact inhibition would be by applying modeling principles. Each of these has a different role. So the simplest one is asking what biological processes are altered in this cancer. It doesn't tell you why they're altered or what the effect of the alteration is or even what direction it is, but it gives you the first clue that I'm dealing here with neurotic targeting or axon guidance. The second allows you to discover new pathways that are altered in this disease. Here, I'm using cancer because this is originally done for a cancer talk. Are there identifying, lets you identify clinically relevant subtypes of the disease based on differences in which pathways are affected. And then the third allows you to get at the effect of the perturbations. How are the activities of the pathways altered in this particular patient, in this particular mouse, in this particular cell line? So we're going to go over these now in more detail. So enrichment of fixed gene sets, such as Go Analysis. It's easily the most popular form pathway network analysis. And the big advantage of it is that they're mature, well-tested tools that you can apply to this. Because the software is good and mature, gene set enrichment is easy to perform. The statistical models are well-worked out. The big disadvantage is that there are many different possible gene sets. You can take Go, which is a very popular choice. You can use pathways. You can take From Reactome or Keg, all the genes involved in the EGF signaling. But there are many other possible gene sets. And the results that you get really depend very largely on which gene sets you choose. Gene sets are heavily overlapping. If you perform a gene set enrichment analysis using gene sets from multiple different sources, you're going to get partial overlaps. And you're going to have to reconcile that. I think Gary might have talked about enrichment. Is enrichment maps yesterday? Yeah. OK, that's one of the tools that you used for resolving those overlaps. And the other problem is the boundaries. When you have bags of genes, there are arbitrary boundaries between them. And if there are regulatory relationships between, say, two Go processes, that may be obscured. So DeNovo subnetwork construction and reconstructoring and clustering, the basic statistical model is given a network of gene or protein interactions. And a gene set that you've developed from a disease or a perturbation experiment, it will find topologically unlikely configurations. It'll identify groups of genes that came up in your analysis, which are interacting with each other more frequently than you would expect by chance. And then from that, you can try to pull out a biological relationships that may not have previously been discovered. Do I have advantages and disadvantages? Should. All right, well, I'll just go through it. The advantage of this is that you can discover new relationships. The disadvantage is it's much more work. You have to annotate the clusters and make sense of what you've found. Here's an example of doing that using the Reactome-FI network, and it's similar to your exercise this afternoon. The Reactome functional interaction network is actually a hybrid network that is based on a core of curated pathways from Reactome and other databases. And a set of uncurated but large-scale interaction networks, which were then combined using a bit of machine learning to create a very conservative gene interaction network, which minimizes the number of false positives. So as these go, it's relatively small. It's only got 11,000 proteins in it and 270,000 interactions. But we believe that the rate of false positives is less than 5% in this set. So a very simple application of this is you take theFI network. You apply a set of disease genes to it. And in this case, these are genes which are recurrently mutated in pancreatic cancer. And then you pull out from this network interacting sets of genes which are clustering together in ways that are statistically unlikely. And then when you annotate this, according to what pathways have contributed to each of these clusters, you get out typically biologically interesting clusters. We have one for PI3 kinase, one for P53 and DNA repair, one for cell cycle checkpoints. And the difference between this and a go analysis is that these modules are discovered. So they don't correspond directly to go categories. And so it may bring in genes which are not annotated as belonging to DNA repair. But we've discovered that in this disease, at least, they're highly associated with other genes involved in DNA repair. Yeah, then you have to go to the lab. And you actually do you start knocking in and knocking out the genes. And you measure. You do DNA repair assays using RAD40 assays. So it's all imputed in light. This is all hypothesis generation. And people who work in this field have the phase of double-edged sword of you can get your results really quickly. I mean, this analysis takes minutes. But then it'll take a couple of years to get it to the point where you can publish it. Ah, you laugh. You laugh. Did you find anything different? Oh, actually, we'll go back to this data set and show that we found some pancreatic cancer subtypes, which I think were interesting. Yeah, it's different from what you would get from a go analysis. OK, yeah, this is just repeating what I said before about the React.mfi network. The current iteration is 12,000 genes, almost 300,000 functional interactions, about 61% coverage of the genome. And the false positive rate is less than 1%. But its false negative rate, it's about 80%. And that's because there's a lot of biology that we don't know. We're just seeing a tiny slice of the network here. So this illustrates the kind of, I'm going back to the pancreatic cancer example here. This is a slide didn't translate very well. But it's showing the distribution of mutated genes with K-RAS mutations bearing in the 49 out of 50 tumors. And then there's a long tail of mutated genes that are mutated in fewer than five samples in our set. All right, so here's a closer look at that network map that I showed you before. We've done a little bit of pruning here to make it easier to see, showing a number of things that are known coming out, such as P53 signaling. Things which are unknown, such as axon guidance pathways are mutated in pancreatic cancer. This we actually did get published and as a nature publication a number of years ago. It was after quite a lot of validation by experimental groups. And you can do interesting things with it. So if you try to cluster the tumor samples just by which genes are mutated, you really basically get no clustering at all. Most tumors have K-RAS mutants. And then the next most frequent ones are only there in about half of cells. If, however, you cluster the tumors by which of these discovered network modules are mutated, you actually get four types of tumor defined by common altered pathways. There is a K-RAS negative one. There's ones that's P53 and DNA repair and axon guidance positive, and so on and so forth. They actually have slightly different histological appearances and different responses to chemotherapy, although it's not strong enough to make a decent actionable signature. Here's an example from Brest, where we actually got a good prognostic signature out. Here we looked at expression levels, RNA expression levels, in estrogen positive breast cancers, which generally has a good prognosis. Identity did a similar network-based clustering and discovery, and identified this module, which combines both cell cycle M-phase and Aurora B kinase signaling, which is highly variable from one tumor type to the other. And when you project this onto patient survival, you find this very nice prognostic signature, in which high expression of the genes in this module have a much worse prognosis than patients with low expression in the same module. And it turns out that patients in this group have a similar survival curve to patients who are triple negative, which is generally associated with a poor prognosis. So we can go into a group of patients who ordinarily would have a good prognosis and would be treated less aggressively, identify those who are more likely to have aggressive disease, and potentially give them a more aggressive therapy. So that's a useful case. This would not have come out with a go-process analysis, because go happens to cut right through this module and individually the two processes do not add up to a signature. So there are multiple network clustering algorithms. The ones I'm going to pull out are Gene Mania, which is an algorithm developed collaboratively between Quaid Morris and Gary Bader. It uses a birds of a feather principle in which you provide it with a few genes that you're interested in, and it will identify other genes which are more closely related to them than you would expect by chance. So it pulls out, it'll pull out clusters of related genes. It's great advantages that it has a wonderful, easy to use web interface, very convenient to use. Hotnet, which was developed by Ben Raphael at Brown, now at Princeton, is another very popular algorithm. What it does is it models the network as a metallic lattice of little ball bearings connected by wires to other ball bearings, and it introduces hot and cold nodes. If your gene is overexpressed, it makes that ball bearing hot. If it's under-expressed, it makes it cold, and then it propagates the temperature across the network. So it finds related sets of genes and turns individual hot nodes into hot and cold areas of the network. It's unclear why that physical model has anything to do with biology, but in practice, what it does is correct for biases in annotation. So for example, P53 is so well studied that it's got more connections than all the rest of the genome combined, just because it's been studied so heavily. However, in this model, because it has so many connections, hot P53 diffuses very quickly around other genes, and so it doesn't have the disproportional impact of the network as it would have as more naive techniques would use. There is a network analysis module built into site escape called hypermodules that is specifically designed to find network clusters that correlate well with clinical characteristics. So it tries to explain differences in some sort of phenotype, such as aggressive cancer versus indolent cancer by finding network clusters that associate with that. It's specifically designed for signature discovery. And then there is the Reactome, a functional interaction network site escape app, which actually takes multiple clustering and correlation algorithms, including a hotnet described up here, paradigm that I'll describe later, and this survival correlation analysis that we use to discover the breast cancer, give you a sec, breast cancer signature, and lets you do it in an interactive user interface. Yes, you have a question. How many samples or cases do you need to make this worthwhile? Are you talking about hundreds of thousands or tens? Nope. Typically, to get statistical significance, you need hundreds of samples or perturbations. The more you can do it on as few as, say, 50, depending on the effect size. OK, other questions? Yeah? A couple slides back to how you came out. Oh, is this one right? Oh, those numerical values are the, I'm trying to remember what they are. It's a weighted score from 0 to 1 corresponding to the number of patients who have a, or it's the number of genes in that module which are mutated in that specimen. OK, so if all of the genes in the module are mutated, it will be 1. But it's been weighted for the frequency with which that gene is mutated in the population. Yeah, each of these groupings is a module. We weight each gene by the frequency with which it's mutated in the pancreatic cancer population as a whole. So I can't really see that, but I think that's K-RAS. Yeah, that's K-RAS. The big orange, the big greenish guy is K-RAS. It's very frequently mutated, so it's large. And the index is the weighted average of the number of genes in that patient that are mutated by the frequency with which it occurs in the population, so that more frequent genes get down weighted. And they're actually determined by a spectral clustering method that's the same method used frequently to analyze social media networks. OK, now we're going to talk about pathway-based modeling. This is the method that provides the greatest explanatory power. It's also the hardest to do. Here, you apply a list of all your genes, proteins, or RNAs to a biological pathway. The model preserves the biological relationships among them, so it doesn't conflate a kinase reaction with a proteolytic reaction. It tries to integrate multiple molecular alterations together to yield lists of altered pathway activities. So for example, if I have a cancer tumor that has a EGFR amplification and an inactivating K-RAS mutation, so EGFR has exerted its downstream effects via K-RAS. Ordinarily, EGFR amplification will increase the EGFR pathway activity. However, if K-RAS is mutated, that will counteract it. So in pathway-based modeling, you would present both mutations to the model. And it would tell you that in this case, EGFR pathway is not activated even though the receptor is amplified. Is everyone following that somewhat arbitrary example? OK, great. So it tries, when you have a complex system and in particular cancer, each tumor will typically have somewhere between four and six, four and 10 driver mutations. It will tell you how they relate to each other and try to predict the network activities. It easily shades into systems biology. The types of pathway-based modeling that's out there are very varied. There are classic systems biology, biochemical models that use partial differential equations. CellNet Analyzer is an example of this. They're mostly suited for biochemical systems, such as metabolomics. And the modeling is very computationally intense. Typically, you're talking about pathways that aren't larger than a few tens of genes or macromolecules, maybe up to 100. There are network flow models which use information theory to propagate changes through a series of positive and negative regulatory relationships. They're mostly applied to kinase cascades. So the world of kinase cascades, net forest, and network kin are used. There are ones that are specialized for modeling regulatory networks in the transcriptome, such as a RACNI from Andreas Califano at Columbia University. And then there are various graph models, probabilistic graph models, Boolean graph models, such as Paradigm, which I'll talk about in a bit, which are a very general form of pathway modeling and are increasingly being used for cancer analysis. So I'm going to give an example of this. This is Paradigm. It was written by Joshua Steward and David Housler at UCSC about six, seven years ago, widely used in the cancer field. And what it does is it uses the central paradigm as its framework, allowing it to model simultaneously, changes at the gene, protein, and RNA levels. So here's a simple pathway. P53 activation leads to apoptosis, and it's inhibited by MDM2. In practice, you'd have a much more complicated one. What Paradigm does is that breaks that down into this simple pathway, suddenly becomes much larger because it has nodes for the gene, which creates the RNA, which creates the protein, which is then modified to become the active protein, which P53 has a parallel regulatory relationship. Gene is needed to create the RNA, was needed to create the protein. And then the MDM2 active protein inhibits, there's a little box here, the active P53 protein. So you can then introduce from a single cancer tumor analysis experiment every perturbation you've found. If you've done whole genomes, you've done genome sequencing, you have a series of mutations, and some of them inactivate MDM2, so you can model that. You can measure changes in the RNA, a transcriptome, and increases or decreases at the RNA level. You can model copy number changes affecting P53, so P53 deletion, or you could add, really, arbitrary information, mass spec on the P53 protein. And then the model will integrate all these together and tell you what the anticipated effect in apoptosis is, which you can then go in and verify, and they'll try to verify in the lab. It's been widely used to discover cancer subtypes. It's basically a more sophisticated version of the React LMFI network subtype discovery that I showed you. Here it is applied in 2010 to a TCGA glioblastoma multi-forma data set. What we're looking at here are patients going down, individual patients going down the columns. Each patient has multiple RNA expression changes and DNA-based copy number and point mutations. And what we're seeing on the rows are what paradigm predicts for pathway activities in the GATA pathway, EGFR pathway, HIF-1-alpha pathway. And you can see that the patient, there are multiple distinct clusters of patients that are distinguished between differences in their inferred pathway activity. And that is because they're integrating multiple sources of information, it means that you can see patients who have GATA pathway inactivation clustered even though they have very different molecular changes. They might have a mutation. They might have a methylation induced RNA change. They could have an amplification or a deletion. And good and bad news about paradigm is that the paradigm algorithm itself has been made almost impossible for people to run. It's not distributed any longer because they made a company around it. They never distributed any pathway models. The documentation is scant and it takes a long time to run. The good news is that Reactome FIV's side-escape app re-implemented paradigm based on the published algorithm. We include Reactome-based pathway models and we've improved performance so that it can be run in hours to days rather than weeks to months compared to the original version. So that's basically out of time. So I'm just a few pages at the end of your handouts which gives you links to the tools that I've mentioned here. And we're going to take a quick, we're going to take some questions and answers and then go into our coffee break and then have the practical.