 My name is Dick Kreisberg, and I'm from Institute for Systems Biology in Seattle. I work for ILIA, and in particular, I'm a software engineer, so you won't be hearing any interesting cancer results from me. This talk is more about our approaches to tool building, especially in the face of the challenges that TCGA data presents. So in this case, I want to talk specifically about visual analytics and how it pertains to the sorts of problems that we face in our work. We have these large heterogeneous data sets as ILIA talked about earlier, and we look for ways, advantageous ways to explore this data. The goal is to allow a person who uses our tool to quickly formulate testable hypotheses and relationships between different parts of the data or even external parts of the data, such as pathways and outcome. And I'd like to say right away that there's many excellent tools, aside from ours, and especially within the TCGA community, you have tools from places like UCSC and Broad and MSKCC. When we talk about visual analytics, there's a lot of things to keep in mind. A couple of them are, for instance, the first thing would be how we encode the information. So you might see in this slide, you know, there's a lot of encoding going on. There's color and shape and size and annotation. But the goal is that we want the user to be able to easily reason on what they're looking at. It needs to be as close to their mental model as possible so that they can, when they're performing these complex analyses and operations in the tool, the abstraction isn't so high that they're distracted by it. We also, in these tools, we want to manage the scale so that when we're viewing the data, it's cognitively and visually manageable. We don't want to clutter it or overwhelm the user. And hopefully, we can make it interactive so that the manipulation of the data allows the user to focus on the useful abstraction rather than how they're actually manipulating the interface. We would like to annotate, so provide additional information on demand, not upfront. Information like metadata, research, literature, error estimates and so forth. And then sort of the bigger goals are we'd like to connect the data to other sources and tools like the ones I mentioned before and place them nearby in the analysis and the visualization so that the user can gather additional evidence in support of their hypothesis. And of course, at the end of the day, what you want to be able to do is share this insight with others so you want to be able to collaborate and distribute and collect the different analyses and link them out to everyone else so that you can gather more and more sort of evidence. So quickly, the sorts of technologies that allow us to do this, you might call them emergent technologies, for us, the biggest one is the first one, the browser-based graphical rendering. So in the tools that I'll be showing again, much like the one the demonstration Ilya gave, we use SVG where the browser is rendering the image directly. But there's also Canvas and WebGL, which is a forthcoming standard that allows you to accelerate the rendering on the graphical card while still being in the browser experience. There's also cloud services that allow you to scale up your computation and your data management. And then new types of databases called NoSQL and those allow us to have adaptive data models. So often in TCGA, a new analysis is born and from it a new data model is created and it's hard in the classical sense with SQL to integrate those models quickly, the NoSQL technology allows us to do that on the fly. Also graph databases, they allow us in the same manner to be adaptive about the types of relationships that we declare exist between the data. And finally, graph computation is a new direction to go in where you can then reason upon the data, especially in our case that's structured like a graph. And you can do this in a distributed manner across large clusters of computers. So in this case, we talk about integrating data sources and then performing analysis, storing associations, bringing in other types of data and in the end visualizing it with tools. So in this one that I've highlighted that's a screenshot of the circular view from Regular and Explorer, the genome level view. And in my mind there's another arrow that sort of points from number five, the highlight one, back to number one because it's really an iterative process where the results are understood and internalized and then the user proposes a new data matrix to run the analysis on. Or a new analysis to run. So I'm not as brave as Ilya so I won't be doing a live demonstration. Instead I've just got some screenshots. So in this case it's like I said before the genome level view and this analysis is done with Random Forest created by Timo Erkula. He'll be speaking tomorrow. And it portrays around the periphery the human chromosomes and each node represents different feature types so the colors describe whether it's gene expression or methylation or copy number variation and so on. And as Ilya showed you can bring up the raw data which is critical for the user to be able to confirm or possibly deny the association that the analysis gave. And I want to emphasize that the circular view is very much a high level 30,000 foot view of the data. There's no particular one little insight that's just going to jump out by bringing up all these results. But it does allow the user to cognitively manage the data using filtering tools like the one that's on the right side of the screen and to bring up the data and iterate through it, paginate through it. So here's another tool in this case. The analysis is a metric that measures aggressiveness of tumors and colorectal cancer. And you can see on the right side there's a linear view which is brought up whenever you want to zoom in to the chromosome or subchromosome level and you're able to pan and zoom through the data and then you can see on there there's a pop-up and from there you can look at the data specifically or link out to other tools like the UCSC genome browser. And there's another one in this case, all pair of significance and in this case you can see the hovering bars would show more information. They can actually be brought up and pinned to the screen, you know, set off to the side so that you can sort of collect your data together and come back and look at it later. And finally we have the network view which Brady has a poster in the hall speaking more about but in this case we're talking about bringing together many different kinds of associations whether that's from protein domain mapping or the mining of medline or random forests and many other things. This is going to be more closely integrated with the entire tool set very soon but the idea is that the circular view is not ideal for many types of data and often what you really are trying to get at is the network, the topology of the graph itself. And that will be built up along with, I mentioned the graph computation earlier, we're going to, when our analyses on graph computation become more sophisticated, this tool is ideal for bringing in and reviewing those results. So in the future or in the near future there's a couple of other things I talked about, the emerging technologies of graph databases and graph computation and then what I'm excited about is the network topologies and exploring the very abstract topologies that can be created and you can begin to use them to possibly identify explanatory variables and drive further analysis in the correct direction by overlaying these graph outputs whether they're cross cancer or cross analysis and bringing in external information. But in a more concrete way we plan on soon adding explicit retrievable states so that for any given screen that you've brought up and you're at this point in your analysis and you'd like to share that with the colleague it's important to be able to just share that to them not send them directions about where to click and how and that will be just a URL based link. And also of course user data import is very, very important being able to bring in your own data and look at it in the context of other analyses or just simply for the usage of the tool and the benefits that it provides. So the website is up the explorer.cancerregulum.org the tool is open source hosted on Google code both the Regulum Explorer itself and the underlying rendering tools that create the visualizations that you saw. And we're always looking for more and more people to use it and give us feedback or even work on it with us. And it's very exciting for us. So I'd like to thank or acknowledge Ilya and then myself and Jake worked on the genome view. The network view is Andrea Eakin and Brady Bernard and the analysis was done by Vessen Thorsen, Sheila Reynolds and Timo Erkola. I won't go into all the acknowledgments but there's some list of some of the technologies that we've leveraged so far in creating these kinds of tools and our funding and so forth is at the bottom. Thank you. Any questions? Well, you probably are not expecting a question from me, Dick, but I'd like to ask and I get this asked a lot from people. Can people ask can I load my own data into Regulum Explorer? So do you see that being how do you see that happening that somebody would upload a data set or if it's a very large data set, presumably you would have to not move it but do it close to the data then? Right. So that kind of comes back to concrete and formalized data specification which we're pretty much at at this point and so if they wanted to match our format then yes we could provide some sort of either upload or be able to retrieve the data from them and display it but if it's a different abstraction on the data, a different data model, then it's also possible to take the tool, the underlying code itself and modify it quickly and provide it from their own side next to their data where it lives. Okay. No questions. Okay, if not, thanks again and I'd like to thank all the speakers of the session.