 Again, my name is Robin Hall, I'm the project manager for the Reactome and JBRA's projects and I work at OICR. So these slides are released under the Creative Commons license, so please reuse them. And I would very much like to acknowledge that these resources and these slides are created by myself and others, such as Veronica Pearson, Gary Bader, Lincoln Stein, Guan Meng Wu, and I've also incorporated some slides from the EBI training resources as well. So this afternoon I'm going to follow on from Veronica's talk this morning about undergoing from genes to networks. And a lot of what Veronica's introduced you to is very much applicable here and I'm going to reinforce some of those ideals in the context of networks. Now, just as a sidebar, it's interesting throughout the discussions of the last few days, you've been talking about LOH studies and things like that. Now, I did it a very, very long time ago, I'm not going to show how old I am. But in those days, there was no reference sequence. There was no next-gen sequencing. It was all PCR, large gels and radioactive probes. That's actually how I radiated my eyebrow. And much of the quantitative work was really challenging. It was not, you know, you talk about having hundreds of samples or thousands of samples nowadays. We were lucky to have dozens. So a lot has changed and it's lots. And today I really want to talk about putting this all into the context of networks, network analysis. So by the end of this lecture, you will understand the principles of network theory and analysis. I understand a little bit more about the sources of network data, the analytical approaches that can be used for not just analysis, but also visualization and data integration. And we'll give a little bit of an overview on the Reactome-FI interaction network and something called the Reactome-FI Vis-Site Escape app. Now, they use this image from a paper by Bart Bassi many years ago and it's a pyramid structure of informatics. And the idea here is as you go from the bottom to the top, the information quality and the level of that complexity is exponentially getting bigger and more complex. And the context of this network analysis talk is that it tries to incorporate prior biological knowledge to analyze genes or proteins and other biological entities in groups in a biological context. And this is kind of Veronica introduced earlier, biological systems are often represented as networks and these are complex sets of binary interactions or relations between different entities. Essentially every biological entity has interactions with other biological entities from the molecular level all the way up to the ecosystem level. And biological network analysis has historically originated from the tools and the concepts of social network analyses and the application of graph theory to the social sciences. Now my take on the definition of network analysis, it's an analytical technique that makes use of biological or molecular network information to gain insights into a biological system. Can I say this every year? It's a rapidly evolving field. It still is. I mean, it's just adjusting to the new technologies, the new types of data, high throughput technologies have changed and analysis tools have to evolve to meet that need. And so there are many, many different types of approaches available. And the reasons we do this really are they're intuitive to scientists, just like pathways. You can analyze multiple data types in the context of a network. And there are a number of methods available to kind of automate that analysis, which is really helpful. And I think network analysis satisfies a number of common use cases in biological research, identifying hidden patterns within gene lists, for creating models to explain a lot of the kind of work that we do within the lab, predicting the function of an annotated genes or the understudied genes. And we'll talk a little bit more about that later in the lab, actually later in this talk and in the lab about another project that we've been doing at Tom. And then establishing the kind of framework for quantitative modeling and assisting in the development of or the identification of molecular signatures. Now, the most saddest reason that many use network analysis is to help analyze the gene lists. So just as an example here, a number of people from the Cancer Genome Atlas project identified 127 genes across 12 different cancer types, which they classify as cancer driver genes based on their mutation frequency. Now, looking at this list here, it's difficult to see what these 127 genes are doing and why the mutation may cause cancer. And what network analysis tries to do is allow us to map these genes onto biological networks and understand their functional interactions and their possible biological roles. Now, Veronica talked a little bit about this earlier. I'm just going to kind of expand upon this idea that the nature of the underlying edge information will dictate what types of network analysis can be performed. And for this reason, it's useful to highlight the main types of edges that can be found in a network. Now, the first is the on the left is the undirected edge. This type of edge is found, you know, see for example a protein-protein interaction network. It's a simple connection between the nodes without any additional information. And typically the evidence behind the relationship only tells us that, you know, protein A binds protein B. And then on the right we have the directed edge. And excuse me, this is the kind of connection found in metabolic signaling or maybe gene regulation networks. And there's this kind of clear flow of information from one node to the next. Now, both directed or undirected edges can also have a kind of weight or a quantitative value associated with them. And this is used to depict concepts such as maybe the reliability of that interaction, the quantitative expression change, the gene induces over another, or even how closely related to genes are in terms of sequence similarity. And the edges can also be weighted by other topological parameters, which I will get into later. Now, different types of information can be represented in the shape of networks in order to model the cell. And different types of data will produce different network characteristics in terms of the connectivity, the complexity, and the structure of the network. And some of the most common types of biological networks will be shown in the next few slides. Now, metabolic networks are commonly represented by two types of nodes, enzymes and substrates, metabolites and enzymes are the nodes, and the reactions are represented by the edges. And the edges can be, they can be unidirectional or bidirectional. And the edges can also represent the direction of the metabolic flow, or maybe the regulatory effects of a specific reaction. Now onto genetic interaction networks. Now, a genetic interaction is deviation from the expected phenotype when combining multiple genetic mutations, when the individual mutations alone do not exhibit that deviation. So this was kind of really shown demonstrated in a lot of model organisms and in particular in budding yeast, where most genetic interactions are measured using a single phenotype. So this is a very nice growth rate in kind of standard laboratory conditions. So, the example would be that gene one knockout is viable in the yeast cell. If you knock out the second gene. So if you, yes, if you gene, yeah, second gene is knocked out. The cell is also viable. So if you knock out the combination of those two genes, so instead of two but a double knock out, then that is lethal or non viable cell is lethal, it's lethal to the cell, and it's therefore non viable. And so what we try to do there is, you know, genes represent the nodes in these networks, and the edges represent the relationships. So there is genetic regulatory networks and it's common to represent transcriptional regulatory networks with the nodes being a combination of the genes and the transcription factors. And the edges represent the regulatory interactions that include the effect of the transcription factor on the expression and activity of other factors as well. So these are cell signaling networks, and these represent essentially the communication systems that control the cellular activities within the cell. Signaling pathways represent the order sequence of events and model that flow of information within the cell. So this could involve the pathway so this could be like proteins genes metabolites are represented as the nodes and the flow of that information the signal through the pathway, or the signaling pathway is kind of conveyed within that edge. And now finally, it's the protein protein interaction networks, probably the most commonly used interaction network in biological research. The representation of the physical contacts between the proteins in a cell, or, or, or actually it could be also an entire organism. Protein interactions are essential to almost every process in the cell. So understanding the protein protein interactions is crucial for understanding cell physiology and normal and in cancer and other disease states protein protein interaction information can represent both transient and stable interactions. As those associated with protein complexes like the ribosome or hemoglobin in the blood. And then you've got these kind of transient interactions that are like the brief interactions and these may well modify or carry a protein, leading to some other further change. So these are things like protein kinases, or nuclear poor importance. And they constitute the kind of most dynamic part of the interactive. And the interactive is that kind of totality of the protein protein interactions that happen either within the cell, or within a specific biological context. And the development of kind of large scale protein protein interaction screening techniques has created this ability to generate large volumes of interaction data. And making that data available through interaction databases. And these databases are really useful at helping you to build. You know, your network, which is considered one like to be traditionally is one of the first steps in performing protein protein interaction network analysis. And there are different sources of protein production action information. And you can obviously use, I mean, you could theoretically use, you know, you know, if you're generating this data yourself, you could actually you into your own experimental data. But it's more preferential to use it from a data source. And there's a number of primary protein protein interaction databases. And these databases, database resources have a curation team. And their job is to extract the experimental evidence reported in the literature using a manual curation process or a semi automated curation process. I'll talk a little bit more about text mining in a moment. And these databases are primary providers of protein protein interaction data, and they can represent a great deal of details about the interaction, depending on the database so that could be about the source the type of interactions the kind of, you know, the molecular entities that they cover, you know, state changes, other evidence that, you know, that, you know, supports the interactions. One has to consider that when you're looking at these different databases, that there are certainly higher quality data sets based on gold standard curation processes, versus resources that may simply just list, provide lists of interactions where a interacts with be, and that's all the evidence that we provide. My Mexican starting with basically an international collaboration between major groups and major interaction data providers, and they've basically agreed to kind of share that curation effort and data and data exchange formats, which are going to list it here below. And these data exchange formats are used to support data integration analysis and visualization. It's often necessary to integrate protein protein interaction data from multiple sources, since I think no one database has a full representation of all the protein protein interaction data available, or, and there is also sort of kind of inconsistencies in the curation styles and there's obviously redundancies within these databases. So I think it's important to be aware of these different resources and how they generate the data, and the type of data that they can that they contain. For example, you know, is it experimentally derived or is it predicted data. And, you know, some of the other challenges as well with these resources is that they're using different identifier types. And so you may need to like map different types of identifiers, you know, in order to create your network. I kind of touched upon this just almost a moment ago and it's just want to expand on that further. It's, it's important to understand the type of experimental evidence that supports the interaction data. Since there are some methods, some people stick more reliability on some methods over others. Some question as to whether some or whether they're some of these interactions do actually, you know, even though they may be demonstrated in vitro do they incur in vivo and vice versa. I'm trying to say is protein production detection methods have the limitations. The question is, how many truly physical physiologically. I'll start that again. Protein production interaction methods have limitations as to how many truly physiological interactions they can detect. And there are, you know, and they all find false positives and false negatives. There's things like used to hybrid approach pull downs and mass spectrometry and complexes where this like sticky proteins that just just have to stick by things like tin or other proteins that are kind of very abundant within the cell. So computational approaches to extract gene relationships or protein protein interaction relationships from the text, either full text, or from PubMed extracts, sorry, my text abstracts getting a little tongue tied today. Initially, there were some problems with recognizing genes, for example, is hedgehog, you know, a gene or a species, but a lot of improvements have been made in natural language processing and processing. And it's making the text mining of these kind of interactions, much more accessible so there are tools out there called pathway studio path text. And then there's the context mining communities that support reach and bio creative that are trying to improve. You know, these text mining approaches, because it is much faster than manual curation, but manual curation typically creates that gold standard data set for generating a network. Moving along to network topology. You know this is a key principle of network analysis as it's important to understand the work. So it's under it's important to understand the nature of the complexity of the network to extract useful information that you would not have learned by examining the components individually. The topological features of a network useful in identifying relevant participants of the network, and some structures that may well be a biological significance. And you can apply topological properties to the entire network, or to individual nodes and edges, and we'll talk a little bit more about that shortly. Protein protein traction networks show small world effects. In other words, it can be said that a maximum number of steps separating any two nodes is small, no matter how big the network is. And this connectivity allows for an efficient and quick flow of signal within a biological network. However, it does pose an interesting question. If a network so tightly connected why don't perturbations in a single gene, or a protein have a more dramatic consequences for that network. And biological systems are extremely robust. There's a lot of redundancy and they can cope with a relatively high amount of perturbations in, you know, in single genes or proteins. And if we're explaining how this can happen, we have to look at another fundamental property of protein protein directions and that's scale free networks. So, in this case, the number of connections each node has is called its degree, and the majority of nodes in a scale free network have only a few connections to other nodes, whereas some other nodes are connected to many other nodes in the network. So, if failures occur at a random node, the vast majority of the proteins, because they have a small degree of connectivity, the likelihood that a hub would be affected as small so the hub in this diagram of these large nodes. And the other regular nodes are just smaller here. Now if there's a hub failure occurs. If you lose, let's say you lost this one here. Okay. Then the network generally will not lose its connectedness. Due to the remaining hubs that are existing here. Now if you lose a few more hubs, maybe you lose another one here, there still might not be much of an effect. So if we start losing kind of critical helps say we lose this one here. Then we're starting to run into issues because you're then creating a network which in a sense is turning itself into an isolated graph. And typically what we see in these hubs these larger nodes are are elements of proteins that are either essential or lethal. For example, the canceling proteins are typically hub proteins. For example, to P53 a path is sequence of connections. The distance or shortest path between two nodes is defined as the number of edges along the shortest routes connecting them. One second. Sorry about that introduction. That's the second one for the workshops. Expected. Yeah, no, I do pledges. So centrality concepts were first developed in social network analysis. And what it does is gives an estimation of how important a node or an edges for the connectivity or the information flowing that network. And there's, you know, there are different metrics to calculate centrality. I mentioned degree a moment ago. This is typically a local centrality measure and doesn't really contain. So it really doesn't take into consideration the rest of the network and the importance that we may give to its value depends strongly on that network size. Obviously, there's a smaller network, we might put more focus on that node. And if it's a much larger network, then we may not necessarily put as much strength on that. That's no string type node centrality. Now there are global centrality measures that take into consideration that whole network view. So there's something called between, excuse me, between this centrality. This is where you have a central node that provides the shortest path between nodes. And these nodes are powerful to the extent that the needed information is conveyed between the nodes. And in the question is obviously in how many shortest paths and I'll relate that to in a moment to these this diagram here. So this is where this closeness centrality and it's measured by the closeness of a central node to other nodes, and it's a useful measure that estimates how fast that information flow would be through a given node to the other nodes in the network. And typically that's your shortest path distance. And so if we look at this example here we're going to blue note here right in the center so in terms of degree, it has dependencies to many other nodes like here. So if we look at closeness, the question is the closeness of this node to all the nodes. So if we were going from here to here, it would actually be two hops, one, two, or one, two. And then between this basically is telling us that the nodes on the left here are connected through the blue node to the nodes on the right and vice versa. So there are other network features to be aware of. For example, obviously the first is the size of that network and the number of nodes and edges that you might see that. And then there's things like the density of that network, or the proportion of the connections exist. And finally there's these kind of higher ordered organization motifs feedback loops, cliques, and other small network types that are over represented when compared to see a randomized version of the same network. Now, one of the kind of more important characteristics of protein protein interactions is the modularity will be demonstrating this in the lab. And high transitivity or clustering coefficient means that the network contains communities or groups of nodes that are more densely connected internally. So when looking for communities and network is a nice strategy for reducing the kind of complexity of the network and extracting these functional modules and these could be things like protein complexes that actually reflect the biology of the network. And there are several terms that are commonly used when talking about these kind of clustering analysis that's the process of identifying these communities. And I'll get to that in a moment but one important assumption that should be made is that we're not making. There's no assumption made about the internal structure of these communities were just looking at high density regions. It's also important to note that finding the best community structure is algorithmically extremely complex tongue twister, and it's only possible for very small networks. And there's too many to cover for this workshop but there are a variety of different clustering algorithms. Some of which I've been incorporated into react to my visap that we'll see later today. And there is a variety of other apps available through the side escape store that allow you to perform this type of clustering analysis on a network. And there's things like Mark of clustering algorithm or MCL. There's fuzzy seeming fuzzy C or K means having a motor panic there. This Chinese whispers clustering is labeled propagation clustering. And things like new and Gervan algorithm. So, there are a number of software tools out there that will help you perform network analyses and visualization. Obviously you've already heard about side escape. It is by far the best tool. And a number of these other tools that we'll be talking about again succeed because they're open source open access. There's a lot of plug in or application support. So you can integrate other data types, or, you know, perform different types of visual layouts. And of course, many people in data science, bioinformaticians are using bio conductor and are or me will be using Python as well. So there is a lot of other tools out there that support this type of analysis. Some of which are standalone tools of multi platform tools as well so you can run them on your Mac PC or Linux box. And others to discourage the use of Excel. Well, yeah. I should maybe put a legs over that. Yeah, I was, it's interesting. Yeah. Yeah, I think it's, it's interesting. It's unfortunately, by all of us still use Excel. Yes. And there has to be within a lot of these network analysis tools, ways in which data can be made available. Tabular data should be saying that can be imported into these tools. And unfortunately, yeah, I should, I should maybe scrub that one out. I that. It was brought up. Yeah, no, absolutely. And I'm very much in agreement there. I obviously fell for that as well. In the early days. But a lot of these other tools as well to provide, for example, API access, or like, for example, a side escape and things like through Python and I graph and Python tools. You can automate the analysis approach so you can really kind of put a lot of data to analyze. You can basically, I'm not trying to say set it and forget it but you can basically set up a workflow, press and execute button, and you can walk away and come back to this potentially analyze data at the end of it. And some, I mean, I don't. Excel, I mean, Excel isn't everybody's desktop. And so that's one of the reasons why it's so commonly used. But the issue is, is that depending on which mode your cell that you're copying and pasting into a well renamed certain. Yeah, like things that are March, we'll get ready. March is a good one. There's a bunch of bunch of these that get renamed, and then all of a sudden you, and then some people have had fun going back into publications and finding these change names into people so they make it all the way to publications and Oh, yeah, there's actually some high impact. Yeah, there's some really high impact papers out there there that they're back on the wrong gene. And they've actually gone away with publishing it until they realized that, oh, that's not the gene that we're working on and that's just because gene synonyms. Well, that's also another issue of gene synonyms as well. I mean, be careful with your gene. Be very careful with your gene names as well. Yeah, absolutely. Yes. So yeah, wholeheartedly agree. I will, you know what I'll do is I'll put a big warning sign next time slap it over Excel. Okay. Danger, danger. Yes, well, yes. I was going to say a little time bomb there or something like that. But anyway, here's the here's some reflex. Mr Smith. Here's the steps that typical work in a workflow network analysis you typically will upload your data to create a network. You navigate through that network, you analyze some network feature that you're interested in like you could be doing clustering analysis. And then when you've got those, you know, those modules identified where you put those tightly connected interactions, you then label those clusters with, you know, pathway. So, you know, my annotations, just as you know, Veronica was talking to you earlier about pathway enrichment analysis. And then obviously when you're potentially, you know, cleaned up your network image, you might create a figure from that that would be part of the paper. So in the final section here I want to kind of talk about network analysis. I'm going to talk a little bit about pathway network based modeling. And this approach attempts to infer how pathway network states are disrupted in disease. And also at the same time to potentially integrate multiple more like, you know, multiple data types that could be a list of altered genes proteins transcripts. And the idea here is to kind of try and preserve as much of that biological relationship information. And this is traditionally two approaches that the kind of network based method which applies graph theory to discover the relationships amongst the nodes in the pathways, where each node represents a biological entity such as a general protein, and that edge represents the interaction between those node pairs. And then there's mathematical modeling, which learns and analyzes the underlying network by transforming the reactions and entities into kind of a matrix form. And that's typically seen, for example, when studying a large signal signaling networks with bullying as bullying networks, or ordinary differential equations that can be used for quantitative modeling to describe small sized gene regulatory networks. And then this, you know, flux balance analysis and stoichiometric methods that are used to model metabolic pathways, both these approaches, you know, kind of use qualitative and quantitative measurements to kind of try to infer the activities of various components of the pathway, or the network. And it's kind of somewhat akin to systems biology. There's a variety of different types of software out there that tries to do this. You know, you know, classical kind of pathway modeling was developed to study and model metabolic pathways. So there's a tool called cellnet analyzer. And it's, it's, it's actually a MATLAB tool, it's a MATLAB toolbox for analyzing structure and functional biological networks. And it incorporates several algorithms for metabolic flux analysis and other regulator for and for studying the kind of flow of information through regulatory and cellular networks. More relevant to disease is Kynome Explorer and NetForest or maybe networking. These are looking at the computational modeling of signaling pathways and then specifically the elucidates the, you know, the phosphorylation events associated with a given phenotype or a disease condition. And since many expression studies are still very popular tools like arachne, you know, can process expression profiles to model the regulatory networks in mammalian cells. There's a whole host of side escape apps available through the store. There's Amino Petroscape that do other types of pathway and network based modeling. And the remaining couple of slides I'm going to talk about in the next couple of slides, I'm going to talk about something called PGMs, and in particular a tool called paradigm. So PGMs or probabilistic graph models are widely used technique in machine learning and statistics for modeling complex dependencies amongst multiple variables. There are methods such as BSEA networks to learn how cellular networks. So it's it, you know, the, the use methods such as BSEA networks to learn cellular networks from gene expression data. You can apply PGMs to analyzing cancer network for performing cancer network analysis. And the goal here is to integrate multiple data types to kind of find significantly altered pathways or networks and to link, for example, biological pathways or networks to the activities of patient phenotypes. So to do this, there's a tool called paradigm. This was developed by Vasky et al. And it allows you to integrate multiple simultaneous, simultaneous alterations. In order to do that, you have to create something called a factor graph. So, for example, a single protein in a pathway is expanded into four nodes. So that node now becomes gene copy number, gene expression, protein level and protein activity. So just as an example here we have a small fragment of the P53 apoptosis pathway shown on the left here. And on the right now is what we convert that into a factor graph. So we have the gene, the transcript, the protein and the protein state. So in this stage entity type, you can apply a different experimental data set. And through integrative approaches come up with a model and typically has been summarized in this heat map here. So in the original paradigm study. In the study of glioblastoma multiformi data from the cancer genome Atlas project. They identified informative subtypes for GBM cancer data. And so samples and entities were clustered then into using hierarchical clustering and admittedly through a visual inspection revealed for different cluster assignments. So first being if one alpha is a master regulator of transcription involvement regulation. Two hypoxic conditions. And then there was two others where there was the first two or three. So the next one the second one was where there was two of two out of three clusters had elevated EGFR signatures and inactive map kinase cascade involving data interleukin transcriptional cascades. And interestingly, what they discovered with the mutations and amplifications in EGFR were kind of had previously been associated with high grade glioblastomas, as well as high grade gliomas so the analysis kind of presented with previous established evidence. So, let's talk a little bit more about the Reactome FIVIS app. It performs pathway and network analysis. Variety kind of briefly introduced you earlier to the Reactome pathway database. Another tool in our arsenal that allows us to support multiomics data analysis using the Reactome functional interaction network. So you can have a variety of different input data that's going into the application. And you can provide different analysis types. So we're going to focus today on pathway enrichment analysis and also gene set enrichment analysis. But as I kind of previously mentioned the last few slides you can do brilliant network and probabilistic graph modeling as well. So, just a screenshot here, using the pathway visualization analysis features you can load the pathways that exist within the Reactome pathway database into side escape. And then you can also view to visualize the Reactome pathways here in the native pathway diagram state or potentially convert them into a network view. And you can use these visualization tools to perform enrichment analysis on the set of genes. And these results can be overlaid on to pathway and network diagrams. For example here with purple nodes, the overlay of a healthy enrichment analysis. And oops, I apologize I must have removed the actual slide but there is also a network view for the same diagram here, where you just see nodes and edges. And you get again the same analysis of the gene list. This is Gene Set Enrichment Analysis. Veronica introduced you to this concept earlier. It's very much widely used as an enrichment analysis tool. I think the key point to mention here is that the whereas where Veronica was describing gene set enrichment analysis in the context of the gene sets that are used for the analysis being, you know, from the M6DB database and encompassing many different types of gene sets. The only gene set for the analysis in the Reactome FI visa is in fact the Reactome pathways. So other pathways or other annotations that are available through other M6DB resources are not available through this GSEA tool. You can also overlay drugs onto Reactome pathways. This includes FDA approved cancer drugs. So using the feature you can view the targets of these drugs and also the affinity for these molecules were extracting data from the binding DB resource. And through linkouts you can go to that source information and view the supporting evidence for that interaction. I'd like to focus a little bit more on the Reactome FI network and the FI visa that implies a variety of different computational algorithms to analyze gene lists, expression data, mutated genes in the context of the network. And the goal here is to reveal the relationship amongst these genes, elucidating the mechanism action of drivers and potentially interactions with rare mutations as well that consistent with data sets and to facilitate some form of hypothesis generation on the role of the genes in disease phenotype. So I kind of mentioned already a functional interaction but I wanted just to find out a little bit more in terms of what it is. So functional interaction is a reliable biological network based on a manually curated pathway and extended with verified interactions. So my point is to create these kind of pairwise relationships from the reactions that which are the units of the pathway. So you break down the reaction to a variety of different binary interactions and some of which are shown here on the right in the bottom. And in terms of like on mass how this is done. We import a lot of data sets. Excuse me from a number of other pathway databases including Kang, Panther, NCI, and some other transcription fact to target data sources. We train. We create a, we create a training set applied to native based in classifier and we use the features from a variety of other protein protein traction data sets, some of which are human or mammalian based and others that are also eukaryotic or other gene expression, go go annotations. You create this kind of predicted Fi list from this, you combine that with the annotated Fi's and you create this ultimately this large functional interaction network and currently do this every every year we update the network. So currently with 13,000 proteins and about just over 436,000 interactions. So, what I'm showing in the slide is the Howard using how this visual demonstrates how we're have experimental data set the your experimental data is integrated into the react to my fine network. So just imagine this was the entire network. Obviously it would just look like a, like a hairball if you are a tightly wound ball of string, but you project your genes into the network. And they could be different types of information it could be genes that are upregulated and not regulated or different types of mutations. Obviously, those genes, those nodes, there's edges associated with them within the network so you can start building up these kind of connected regions. Now, it's quite possible that there's not a high degree of connectivity in this network so we can insert these triangles or these linkers, and this just improves the connectivity within your data set. So therefore, you then have additional edges incorporated into the into the subnetwork, and then you remove the way the data that's not part of your data set near left with this kind of subnetwork that hopefully explains the gene relation or in the in your data set. So going back to this original 127 gene list that I introduced you earlier. Now we're going to do some gene set based network analysis using this data set so you can construct a subnetwork for the 127 cancer genes. And then you can implement what we call the kind of high performance spectral partitioning. Essentially that's a clustering algorithm approach to identify genes that are tightly connected another. Now in the case of the reactome fi network what happens is the nodes within a module are recolorized. Basically genes in different clusters are highlighted with different colors. And then to understand the network module functions, you know, we can perform module based pathway enrichment analysis. So basically just performing pathway enrichment analysis on each individual, taking the genes within these individual modules and just performing enrichment analysis and you can essentially then take those labels those pathway labels, and, you know, enable these modules with different pathway annotations we can see here signaling by EGFR which would be a signaling by receptor tyrosine kinase looks like there's a cell cycle component here p53 pathway and signaling by notch went into So we basically reduced, you know, 100 or so mutated genes down to a handful of pathways. The reactome fi also allows you to visualize cancer drugs in the fine network context as a simplified relationship between the cancer drugs and their targets. And the interactions between drugs and targets are rendered in green as green diamonds and blue edges, respectively and once specific cancer drug or target interaction is selected a table view will appear, and this presents additional information about the, the drug and the targets and the affinity for both. By combining the reactome fi network with gene expression data, it's possible to search for network modules that are related to patient overall survival. The first step is to calculate gene expression correlations for the genes involved in the functional interactions, and then assign the correlations to the, the functional interactions to convert essentially an unweighted network into a weighted one. And then you use the, the MCL network clustering algorithm to identify the modules. And once those modules have been identified and potentially annotated. The next step is to perform some form of survival analysis using cost proportional hazards or Kaplan or Kaplan Meyer model. I believe Lauren will talk a little bit more about survival analysis tomorrow. So, basically the table message for the KM plots is that they're drawn and an example of a KM plot is shown here on the right. It's basically drawn for survival probability versus time elapsed for different groups of samples. And then there's a log rank test run to check the significance of the differences between the plotted lines. And in this case that we're looking at here, this is a breast cancer data set. All the samples were divided into two groups samples having low expression genes, and that's represented by the red line and samples having high expression, she's the green line. This particular module consists of 31 genes here, sharing elements of cell cycle cell mitotic apparatus assembly. And these, the expression of these genes was significantly related to breast cancer patient survival across five independent samples. And just showing you here is just two sets of pathway annotations those in orange are from the NCI pit resource, and the other annotations are from reactome. And the conclusion from the study was that patients with low expression of module genes fared better than patients with high expression of module genes. The conclusion here is that potentially a single network module or, you know, or it could be a set of modules could be used as a molecular signature. And as I mentioned earlier we've identified a pgm based functional impact analysis using the reactome if I network, and this allows you to integrate multiple omics data types together. The current version of the reactome if I visit allows the support for four data types CNV MR any expression DNA methylation and somatic mutations and the node size that you're seeing here and this just this network view. The impact scores inferred from the model. And you can, you know, you can view a variety of different impact scores and observed experimental data from your samples from the results. The reactome if I visit also implements sweeter features for users to conduct single cell RNA seek DNA analysis visualization. To do this we've installed a variety of packages. And these include scampi for routine single cell RNA seek data analysis and visualization and SC fellow for RNA velocity based data analysis. And just finally the remaining slides. I want to talk about another project I was I've been working on has been called the illuminating the drugable genome part of the illuminating the drugable genome project. And this is funded by the NIH. And the idea here is the goal of the, the IDG project is to better understand the properties and functions of understudied proteins within commonly drug targeted protein families. And certainly a lot of experimental data sets, you know, have a whole long list of, you know, understudied, poorly annotated proteins. So the IDG project. Look to the proteome and classified proteins into four groups. T clean, representing targets that had at least one approved drug. And the time, which represents proteins that have at least one Campbell compound T bio represents targets that don't have any drug or small molecule activities but they do have some biological function, and there's publications and data to support that evidence. And then finally that's what we call the T dark group, which basically represents the understudied proteins. These are targets that virtually nothing's known about. And the project with which react to undertook here was to basically develop a web portal to integrate resources collected from the target central resource database and this is the database to underlies the IDG project. And we also incorporated data from other sources to provide a kind of pathway centric view for understudied human proteins. Specifically, our tools allow users to search for a protein and visualize the, the protein, the, you know, the react to annotated pathways and pathways that are accessible via one, one hop pair wise interaction. So just take a moment, potentially you could have a protein when which when you search react home, we've already curated, therefore it would hit one of the pathways therefore it's called an annotated pathways. But in another case you may have a protein that's understudied, and we've not actually curated, but it does that protein does interact with another protein that we've already curated. In short, we're just showing a pathway diagram with the overlay of the four target development levels from the IDG project. And, you know, with the click of a button, you can convert that pathway diagram into functional interaction. And again, seeing those four IDG development levels colorized, and you can toggle between these different displays and provide the opportunity to explore these networks and pathways and to integrate other types of data. So, in that sense, we can support the overlay of protein and messenger RNA expression data that's been collected from the TCRD, which in turn has collected a variety of different cell and tissue specific expression data sets from a whole host of studies, principally things like GTX, TCGA and such. And this screenshot is just showing the overlay of human proteome protein study with the expression data set. In fact, I think I actually made this into a movie so I actually can just demonstrate. There we go. Basically, as you kind of click this little button here, which is the play button, we're kind of toggling through each of the different studies and you can see this just I'll read from that again. Just watch this little bar here there's different tissues and you can see the expression of those genes. Let's move to the next slide. We also kind of implemented features to overlay pairwise relationships together with target development information. So we're using a site escape web, which is this basically allows us to kind of display this network view in this pop up window, and the functional interaction network in the traditional pathway diagram view. Other protein-protein interaction and protein-drug interactions as well. And typically we get a table view as well that allows you to interact with the drugs. Sorry, interact with the interaction data. So in summary, we've kind of developed this kind of unique web portal to provide kind of pathway-centric, the pathway network-centric view of the understudied human proteins. So I think I'm going to leave it there in terms of my talk. I think in the remainder of the slides here, I have some provided some URLs to a variety of different network databases out there. So I'm going to show you some of the other Denova network clustering, construction and clustering software packages out there. Some more details about the pathway model thing. Software and tools out there. And it says we're on a break, but I should really start by saying time for questions. So this is, again, released under a Creative Commons license. And I will start by saying the learning objectives of this lab module will be to perform pathway network-based data analysis using the Reactome R5-BIS app. And also to search the IDG Reactome portal to understand the role of understudied proteins in the context of Reactome pathways. Now, the Reactome R5-BIS app, I can just take a moment to explain, has many features integrated into it. I do apologize, but I simply don't have time to talk about all of them today and to just demonstrate some of those features. But I think the most important features are listed here, support per pathway enrichment analysis in GSEA, the integration of a variety of different summary views such as these boronoi tessellations to give this kind of holistic view of pathways. We've integrated a variety of different modeling, whether that's Boolean network or PGM-based pathway modeling systems. We've construct, you know, you can construct functional interaction subnetworks from which you can perform network clustering. You can overlay a variety of different annotations from the NCI cancer gene index or from gene cards or cosmic and some other drug annotations as well. And through the new single cell RNA-seq data analysis and visualization, we can provide additional support for mouse pathways. And finally, kind of performing survival analysis to potentially identify prognostic signatures from your data sets. So, we'll talk a little bit about pathway enrichment analysis. I can introduce this already in the previous lecture slides. This is going to a little bit more detail. So, a number of features can navigate through the pathway hierarchy here on the left and display pathways here on the right screen. So, basically, a number of the features in the soundscape app require a left click on your mouse to select something and then a right click to see these pop-up menus and through these pop-up menus you can perform a variety of different pathway related configurations. And so, you can also, as I mentioned earlier, transition from the reactant pathway into the reactant FI network view. And you can see the kind of reactions here on the bottom panel here that contribute to this functional interaction network view. And you can toggle, of course, you can toggle back to the pathway view if you choose. Now, either from the functional interaction view that I just showed or the pathway diagram view, you can analyze pathway enrichment. You can perform pathway enrichment analysis, you should see. When you select that feature, you get this little pop-up appearing and you will select your data set and that could be in a variety of different formats listed here. You simply press OK. And in the background here, we're just seeing the colorized like genes in your list that are hits on the pathway are colorized in purple. And the same would appear if you had viewed then the FI view nodes would be colorized purple as well. You can also perform gene set enrichment analysis. It's a widely used ranked based pathway enrichment analysis approach. The reactant FI vis-up provides support to perform gene set enrichment analysis for reactant pathways using a gene score file. The gene score file may be a T score for a differential gene expression analysis or some other type of score that can be that allows you to write the list. For significant pathways produced from GSEA, you can overlay the gene scores to investigate the locations of these gene products having extremely high or low scores and to understand the potential impact caused by these extreme scores. Excuse me. You can also visualize cancer target information on the reactant pathways. The so-called cancer target was constructed by collaborator Guan Meng Wu at the Oregon Health and Science University by collecting and then aggregating drug targets interactions from four drug databases that include Drug Bank, Therapeutic Target Database, IUFAR, and BindingDB. He assigned three evidence levels to each of these drug target interactions and these evidence levels are as follows. The first one being just the database annotation, the second being the database and some form of reference. And finally, the database, you know, the literature reference and some form of, you know, additional assay to support the drug target interaction. These are essentially, again, as the interact with elements in the diagram, they can right-click and fetch cancer drugs and you can do this overlays that you can see here in the background. Moving along, talk about de novo subnetwork construction and clustering. You can apply, you know, a list of alter genes, proteins or RNAs to a much larger pre-constructed functional interaction network. Just here earlier in the lecture, you can then identify topologically unlikely configurations by filtering away the unnecessary interactions. You can then extract clusters of tightly connected gene interactions and then annotate those clusters with pathway and other functional annotations and essentially reducing the, you know, the hundreds, you know, genes within your list, down to a handful of altered or diseased modules. Now, the current implementation of the Reactome functional interaction network app recognizes four different file formats, the first being a single list of genes here, which is a traditional input for cytoscape. There is also the gene sample pair file. So the first column again is the gene name. The second column is the sample number. This is just essentially equivalent to the mutation frequency. And the third, which is an optional column, is the actual names of the samples. There's also the option to upload an NCI mutation annotation file or the NCI math file. There's multiple columns and it's running left to right, continuing along here and here. Essentially, again, the first column is the gene name or the gene symbol. And the additional columns relate to different annotations from the source of the sequence data from the broad NCI build and so on, the type of variance, the reference all the other children and so forth and again the sample information so it's a rather long complicated file, but essentially these types of files can be uploaded into the Reactome FI network. And finally, the microarray data file. Now, just to point out, it should be a tab-delimited file with headers. And the first column should be gene names as again, but each subsequent column refers to expression values in the different samples. And what I'm trying to say is you can't, if you have a spreadsheet or a table that has gene name, full change, and some form of statistic, those files, sorry, some additional kind of, yes, statistical value. These files should not be uploaded into the Reactome FI network. The microarray data file only refers to normalized expression data. So the steps in performing a gene set-based analysis is as follows. You upload, you select from the Reactome FI app, the gene set mutation analysis feature, you get this pop-up dialogue where you can select the version of the network you'd like to use. Some people that have contact as previously are papers that are in the process of being reviewed, and they have to re-analyze their data from an earlier version, and so you can do that through our tool. You upload your data in front of the formats that I just mentioned. You will select the data type. There is this option to do sample cutoff. So if you have a gene sample pair list or an NCIMath file, you may be interested in genes that are more highly mutated and less likely interested in samples where there's only one, where the gene is mutated only once. Sorry, I've got to stop that again. You may not be interested in genes where they've only been mutated in one sample. And so sometimes the cutoff could be three or four or much higher. And then you can select additional features, parameters to perform the network analysis. So fetch functional interactions just basically allows you to add the edge attributes for the interactions. Of course, if you didn't do this now, you could do that later in the network. It's very fast to do this now, so it's easier just to select that. As I mentioned earlier, you can use linker genes to increase the connectivity within your data set in order to generate that subnetwork. I typically, when I'm analyzing data, won't necessarily select the linker genes until I know what has been generated from the initial uploading of your data set. If it's a small network, I might consider adding a linker gene. But then again, if it's a much larger network that I see or something that I think has quite a significant amount of data in there, I'm not necessarily going to select this feature. And then you say okay and run it, and you'll get presented with this kind of view in side escape where you have this kind of network to you here. You have this table view below which, depending on which tab you select will actually show information about the nodes, the edges, the network, and as you add additional analysis results, additional features will appear additional tabs will appear here. You can of course use the side escape features that have already introduced you to earlier today. You can zoom in and zoom out, you can rearrange the network. Depending on your interactions you can select nodes and edges and view different annotations. If you were to select the white space here and right click, you will get, you will be able to access a variety of different network features some of this to here for you to perform downstream data analyses. And I'll talk about that very shortly. But first I'd like to talk about some of the functional interaction annotations that are available through the network. So, we do have some rather detailed information here. The edge attributes as well as we do have the annotations, there is some directionality, and there's a score, a numerical value associated with the predicted functional interactions. So, for example, the arrow edges represent the interactions derived that are reflecting either activating or catalyze same functions. There's there's inhibition events that are showed by the T bar. The solid line illustrates that the proteins or the genes interact with one another as part of complexes or inputs, and the dotted line represents the predicted functional interactions. So, when selecting an edge, it will highlight it read, you can use your right click feature on your on your mouse or your trackpad. And the invoke the react to my five feature query fi source, and you can read more more details about the source of that particular functional interaction. So in this case, we're looking at a functional and annotated fi, you will look at the source, the reaction source, or the target transcription factor data source for that interaction. You may get a link out to a database, there may even be some additional publication information if it's available as well. And when you select a dotted line, it will again highlight read, and then you can right click again performing that same query fi source, and you can view the predicted functional interactions. All of that, all of that interaction. And there's two. There's nine different sources of functional interaction, like nine sources to support the functional interaction. In this case you're seeing 123 sources here at a nine, even a score of 0.96. An annotated fi would have a score of one. So having a score of 0.96 indicates that there's some reliability quality to that interaction, and that it's more likely to occur. It's possibly more likely as a physiological interaction that occurs within the cell. An interaction score much lower, I would say less than point five would be certainly low reliable would not be as reliable an interaction. So it's likely just some network the next step in the analysis is to extract clusters of these unlikely configurations or perform module. So we run the spectral prediction algorithm from human and Gervin on the react to a fi network. It's typically used for most of the analysis. The gene expression data sets that you might be analyzing will use the MCL algorithm. And the typical outputs of the clustering is demonstrating this hypothetical network so here before all the nodes of color green. As I mentioned earlier, the nodes are recolorized depending on which module that they belong to. And then, once you've identified the network modules the next step in the analysis is to label these clusters with pathway or go terms or annotations. So, again this feature of the enrichment analysis accessible by selecting a piece of the white space within the background of the network, right clicking, seeing the react to my fi, going down to the analyze module functions and selecting one of the annotations that you want to apply. I would like to point out clearly that when you perform the clustering analysis. You want to select the analyze module functions. If, on the other hand, you have a small network, and you've just chosen to treat it as a small network. You may and you've not performed clustering analysis, then I would suggest that you may want to try analyze network functions, and that's just going to treat that much smaller network as a single module in a sense. And I'll go back again and I do apologize. Now the results of the enrichment analysis are these tables. The first column is indicating the associated module within the within the network. By default the module numbering starts at zero. So bear that in mind. And then you have the gene set. Each letter here represents the source database so ours for react to case for keg for example, and then there's additional numbers, racial proteins in the gene set and so forth and module. And I'll just say that there is additional statistics performed in the enrichment analysis, as Veronica introduced earlier there is a p value, and there's an FDR value here as well. And then each the nodes represents the genes within the module that correspond to this particular gene set. So I do filters that can be applied either to be more to apply more stringent FDR, or to restrict your focus to modules all greater than a certain number of genes. Okay. Now, the other things that you could, the other functions that you can perform on react when I find network is to view detailed annotations for a variety of different data resources for selected nodes, or in some cases across all nodes within the network. So the example here is just the overlay of data from the NCI cancer gene index. So you can view at the individual node level. So you can select a node. Again, right click, present this dropdown menu you select react to my files, and then you will select fetch cancer gene index, and you'll see here, just in the background here, a list of evidence and annotations associated with this particular gene that can be found within the NCI cancer gene index. Now, alternatively, you can apply these annotations to all the nodes within the diagram. Slightly different feature here. You will go to react to my file. And then you will select. I think I've actually been used the same apologies I've used the same image, but the point is there is a way to I should have actually changed this selection there is a way to select the cancer gene index for all nodes. And you'll get a hierarchy of the diseases in the NCI cancer gene index here as you click on individual disease, disease topics disease terms corresponding nodes in the diagram, the network diagram that have these annotations will be highlighted in yellow. And you can also see from this window here that you can also overlay gene card information. And I think in the next slide yes I do have the option to show the overlay of cancer target information as well. The loaded drugs and interactions are shown rendered as green triangles. Sorry, green, green diamonds, and with blue edges in the lines. And once you select a specific cancer drug. There is the option to show this display table here, where there are additional annotations associated with the drug, the targets and the affinity for the molecules finding affinity and supporting evidence. And you can also overlay annotations for the cosmic database. You can preview the fairing to annotations for selected gene. And this example in the screenshot is just for the TP 53 gene. And then finally, we have here module based survival analysis. And here the goal here is to discover prognostic signatures and disease module data sets. The reaction I find that work. App performs to. There's two server side our scripts basically that runs, you know, Cox proportional hazards or Captain Meyer as well analysis. Data are shown here. And, you know, in the lab will actually run through an experiment, experimental data set hopefully to generate survive the Captain Meyer model. I predict a prognostic signature from a variant cancer data set. Now, in the remainder of the talk, I want to talk about using the IDG react on the web portal to better understand or better understand the role of understudied proteins in a pathway context. Now I have to apologize for the video content. We literally did a user interface update a few days ago. And so these videos are about to show you do present a slightly different website page colors that might actually look different in your example so, but I think you'll basically still understand the underlying features of the tool. I'll do my best also to talk about the features. As these videos run, and I may try to run some of the videos more than once if necessary but here goes. So the home page consists of, you know, a variety of features that allow users to launch the path. Whoops. Sorry about that I just put my computer into sleep there. Let me just run that again, since I have to stop. There we go. I'm closing my computer. So the homepage consists of a variety of features like users to launch the pathway browser within the IDG website, the react on pathway browser as well. Read some documentation and to search for proteins of interest, which hopefully will come in the next moment or so. There's also a little bit more information about eliminating the drug or genome project as well. So the search is this for NTN one. We're getting presentation of results here in two panels first is the hierarchy of the react on annotated pathway that is NTN one isn't the one already by react on there is pathway annotations associated with that protein. We navigate through the hierarchy of react on. And below that is the annotated pathway card which is showing the interacting pathways. So, these are where interactions are actually just the process just for a moment just to explain this. Again, just to remind you the interacting pathways are demonstrated by the fact that there's a protein that interacts with NTN one. That has been annotated in react to. And so what happens is you can see here, there's a list of genes that have been identified to interact with NTN one is five genes. And so what we do is then perform a pathway enrichment analysis on gene list, and you get these resulting annotations here in this table. And you can, by clicking a link, view out some of these information that the functional interactions score illustrates the strength of the interaction point nine being the highest one would be the best but you won't see that for interacting data interacting pathways, and you can modify that functional score. Now in this window here you're seeing a different way to actually just pause for a second. There is a variety of different resources being used a source information to generate the interactions that are used to link the understudy protein to the pathway annotation. And so you as a user can select individual pair wise relationship data as your source information. And once you select, you know, you can select based on protein interaction information gene similarity information as such gene gene ontology information. And when you select that information, you can simply click the add button, and this will add a little feature below we're not going to see this in this example but you will see where you can start adding different data sets. And you can then recreate these interacting pathway results. So I'm just going to continue again. And you can see all these different types you can also see that there's availability of cancer data sets from the cancer genome Atlas projects as well in there it's co expression data, you can select the biosources, you can add that. Now going back here. If you were to collect select select a pathway stable identifier you're going to open up the pathway diagram. It's going to this is just the overview as we navigate into the pathway view. And you can see that there's pink highlighted borders around some notes and that's to say that these are the hits. These are the nodes that contain NTN one. Okay, that could well be with the way that we curate a reactome that could be individual nodes or complexes. I'm there. Now if we just reload and just now do a search with CLK for L. This is actually an understudy protein. And so it is going to interact with a protein that is all there's a component in another reactome pathway so it's a one half interaction. You can see the pathway view again and see all the events here, you can download that gene list if you wish you for offline store offline records. As you update that function attraction score, you can update the list you can also search for example TV 53 conveying pathways. You can see a specific stable identifier. I'm now seeing a new pathway. And again, the, the interacting that what we're seeing is the interacting pathways here, and you're seeing pink highlighted pathways or entities within the diagram. You can see that they contain an interactor of the selected term and in this case that was CDK L4. So these pink nodes are represented, these pink nodes are representing CDK L4 interactions within this pathway diagram. The other thing just to kind of finally mention here is, as we've been navigating through these diagrams you're seeing these different colors here, and these are representing the four distinct development levels of the IDG program so the red, or T dark, T clean dark blue, light blue T, right. Now moving along to the functional interaction view. There we go, starting here so in the pathway diagram the node content menu which you're now seeing contains two tabs molecules and pathways and it gives you more information about the selected protein and links to other pathways that this entity contains. And on any diagram, there's also this side escape view button to just click there, and this toggles between the pathway and the FI view. And clicking that gear wheel allows you to select different layouts for the interaction functional interaction view. Again, the protein nodes are colored in accordance with the IDG protein classification as well. And hovering over an individual node will present you with gene information or maybe the Uniprot identifier, click on nodes, and you can right click and you see additional linkouts to other resources that are relevant to these proteins. You can look at the IDG Uniprot, the Tharos target data resource, which is the main repository of data for the IDG project, essentially. You can click on that window down, you can click on the edges. And at the same time as you do that, the information will be updated in the pathway hierarchy on the left there. The edges themselves can be directed, there can be an arrow conferring directionality and hovering over the edges as said presents user with different information about the interaction in the edge. And right clicking on that edge will present you with a menu displaying the sources of information for that at edge. Moving along. There are two overlaying data. Sorry, there we go. There's two types of data we overlaid onto the pathway diagram in the FI view. This expression data from 19 different sources from the target central resource database. And the other one is pairwise interactive data set. And so what we're seeing right now is just navigating through six of those selected. If I just go back a second. This works. There we go. We've selected the suppression type this is HPM protein selected six data types you can select up to 12 expression data sets, hit the overlay button. And then if you hit that little like movie button or you can navigate individually, or you can hit movie button a little arrow there in the middle. And then you can start navigating through all the expressions as expression for each of the different tissues here. This pathway, you can convert it to the functional interaction view. And again, you can play that little movie again. So see the individual expression values for the individual proteins. And then select the overlay relationships. And now we can look at some pairwise interaction overlays. And you can select those interactions based on different relation types data sources, and you can apply whether they're having a positive or negative interaction. And you can see the color line, the edge now changes to red for positive interaction regulation and the interaction. So here you can actually then display these interactions in the pop up window, and then I'll go through that in the moment. But the other thing to point out is these little decorators in the pathway view, allowing you to view the on the right, you're seeing you click that and that will show the protein protein interactions. In the right now show this another video you can actually display the protein drug interactions, the pairwise interactions. And you can see here the table below the information about those protein protein interactions, the overlay value in terms of like the IDG classification, but that's a positive or negative regulation interaction. The source of the interaction, this is a string just to say this is biogrid bioplex string. This is human data. And you can obviously when you click this little view button, it'll highlight the individual interaction. I think that's the end of that video. I think that's going to the next one. I'll just talk a little bit about viewing a drug targets interactions. There we go so drug target interactions are low relate by default on the pathway diagram by selecting one of the little decorators on the left side. That's a little purple one and then you can see these nice little interactive views with all these kinds of drugs here. You can see the drug individually you can see more information about that drug and click on the edge, you can see the drug target information in the target, the table below. And as well you can transition as well to show the drugs in the FI view as well. And again, you can click on individual nodes or edges, and you can see that information in that display there. And so, and oh yes, and yes, if you click on a note, you can see you can click this little RX button, and you see this little pop up appearing again with your interested node and all the drugs that interact with that particular target. And again, just to kind of explain a little bit more about that pair wise pop up. So, it allows you to visualize a variety of different interactors to just typically you're just gonna see 10 tractors will be showing the source nodes are represented as circles and interactions are represented as triangles. And those are colored to represent the currently overly the expression. Overlay drugs are referenced into this purple hexagons. The edges are colored and dash to help users identify which interaction set the interactor belongs to. And in the pair wise relationship table users can view all the available interactions. Overlay values and interaction type. And in the drug targets table users can view information about the display drug, including the protein that interacts with the action activity type and interaction value.