 Live from Las Vegas, Nevada. Extracting the signal from the noise. It's theCUBE covering IBM Edge 2015, brought to you by IBM. We're back, welcome to IBM Edge 2015. This is theCUBE. Very excited to have Gustavo Stolovitzky here. Gustavo is a program director of Translational Systems Biology and Nanobiotechnology at IBM and Donna Dillenberger is an IBM Fellow at IBM Research. Folks, welcome to theCUBE. That was a mouthful. And I feel smarter just being here. So really, thanks for coming on. Gustavo, you gave a keynote this morning. Donna, I don't know, have you been speaking at this event? Yes, I gave a session on Z Analytics. Great, so let's start there. So Z Analytics, wait a minute. I thought Z was a transaction processing system. Analytics on Z, Z13 changed the world. That's right. That used to be taboo. Yeah, Z is a transactional system but it could also do analytics and not just analytics on a small amount of data but analytics on healthcare data, genomic data, global climate modeling and some of the models that partners have asked us to work with them on. They would take a week to run on Intel. They take less than an hour on Z. They're faster than Cray systems. So Z is not just a transactional system, Dave. So I want to come back to that because I heard Cray systems, I heard Intel. So actually, let's stay on that for a second. So I think I'm thinking high-performance computing. Correct, correct. So Z is now participating in that space. We have the fastest microprocessor in the industry. Couple that with 10 terabytes of memory. You don't have to always be doing IO to get to your data. You could do all that, those calculations in memory space and you have a very powerful high-performance computer. Gustavo, you were giving a talk today on the wisdom of the crowds, conversation that was strikingly interesting and something that we love here. We started with crowdsourcing, our research company and so give us a synopsis of your keynote this morning. Sure. So it's clear that we live in a data-rich world but we don't yet know really how to extract actionable information from this big data, especially in the healthcare industry. We are trying, the community is trying. We know that there is information there that is important for health. But we could do better if instead of one person analyzing a big data set, everybody analyzes it, more eyes into it. So in 2006, with the support of IBM Research and encouraged by IBM Research, we launched the Dream Challenges which are open science competitions in which we curate big data sets, put together data sets from different sources and make it available for everybody to be able to address a question that we pose to them and typically the questions are very important, very pressing in the biomedical field. For example, one question that we asked recently is can an algorithm look at early, clinical data from a patient affected by ALS, amyotrophic lateral sclerosis, terrible disease, and predict its progression. That's very important because the progression could be very fast, like Lou Gehrig who died in two years or very slow like Stephen Hawking who has been living with it for more than 50 years. So it's essential for a patient to know whether he or she is going to live longer or five. So being an empowering physician to be able to answer these kinds of questions, taking advantage of all the data out there, facilitating an algorithm that does that, is very much necessary in ALS as well as in other diseases. So that's a data challenge. That's right. So how did you guys go about solving that and what's the relationship here? You're providing the infrastructure for that, is that right and the ideas behind it? Yeah, Gustavo's the head of these dream challenges. Other dream challenges that we've worked on together is rheumatoid arthritis where we got a lot of anonymous data for patients, their clinical data, their genomic data, to see of those traits what would be the traits that would predict whether those patients would respond to the drug for anti-inflammation. And my group at IBM Research, we provided the IBM cloud for these crowd sourcing models and one of the finalists in this challenge, one, because he said he was able to iterate through more models on the Z system than he was able to on his Intel system. Right, so what happens is that if you invite the crowd to participate in trying to solve a big problem, it's not necessarily a given that they will have the high performance computing needed to do that, right? So enabling them with the Z system was empowering them to be limited by their talent and skills but not by the high performance computing. And I think that's what we are doing in our challenge. Yeah, in fact, the crowd's not going to have access to low cost compute resources. So you put those compute resources to the crowd and what happened? Well, for example, in the rheumatoid arthritis challenge, we are able to predict with reasonable accuracy but we need to do more. The ability for an algorithm to determine from clinical data and genetic data whether a patient is going to respond or not to anti-TNFF, a particular type of medicine. That was the question that we were asking. You know, as Donna was saying before, the complexity of the problem is immense because we have 22,000 genes, we have an enormous number of clinical information as well and it's growing. So the number of features that you have to handle in order to be able to predict, you know, what is the handle that you use in order to be able to predict whether a patient responds or not? You have 22,000 genes to choose from to help you. Trying to sort, find the needle in the haystack there is a big combinatorial problem and you know, you need a big system to do that, a system that could be scalable and powerful. And as these models were running on the Z system, what we did was we also looked at the characteristics of that workload and then we took the math models that were being used and we optimized them and we built hardware accelerators for them as well, so that they would exploit the vector instructions and that's another reason why they run so well. So it's interesting, because I would think that you could throw a thousand, you know, Intel boxes at this problem and people talk scale out and so help us understand why that approach and it's been an approach in the HPC world for a long time and it's always the arms race and who's got the fastest supercomputer but they're, you know, right there. Explain again why does he wins out. Well, for one thing, Z is not only a scale up machine but it's a scale out machine. You could, on one Z box, which is like the size of two refrigerators, you could provision very quickly 6,000 virtual machines. The number of blades and boxes that you would need to provision 6,000 virtual machines in a data center, it's enormous. The hot room. Right and on the mainframe you could do it within seconds. It's scale up because instead of always having to partition your data, partition which genes are going to influence an outcome and then merge them back because these genes, they influence each other. When one is turned on, it has the effect of other genes. You could put all of that genetic data in 10 terabytes of main memory and have it all interact with each other at once. So in this type of models, it's better to have as much memory as possible as opposed to splitting them, merging them and splitting them. You know, it's interesting. The wisdom of the crowd you talked about this morning, the crowd's always been very efficient handicappers. I used to go to, before I had four kids, I used to go to the racetrack a lot. Now I have better things to spend my money on but handicappers know that a two to one shot more often comes in than a three to one, more often than a four to one, a five to one and so forth and every now and then a long shot comes in, but of course the favorites only went about a third of the time so that's a problem too. But suddenly, you know, books like Wikonomics which inspired Wikibon, you see Wikipedia, I met Jimmy Wales and that was another inspiration. Why do you think it is that suddenly there's this awareness that the crowd is good at solving problems? Well, it's not a new awareness. You know that one of the famous stories that kickstarted this wisdom of crowds concept starts in 1906 when a famous statistician called Francis Galton wanted to prove that democracy is not good because democracy is to some extent a wisdom, sometimes a madness of crowds but in areas where there is an objective answer not where we don't know which is the best president because we cannot test whether a president that was not elected will have been good but we can say like Francis Galton did in order to disprove the goodness of democracy he said, let me go to this country fair and see whether this group of people who were playing a game, the game was, figure out guess what is the weight of this ox and he said, okay, this is my opportunity to see that these people don't know what they are talking about. So they had to give cards and say the weight of the ox is such and such pounds and so he got the cards, he went to his house, he did the mathematics of it, completed the average and the average was one pound away from the weight of the ox, this is the average and it makes some sense if you think of it because people have a right guess and the right guess has a piece that is information and a piece that is noise. You know, eyeballing something, you are more or less ballpark. Some people will be more, some people will be less. So if you put together all that information it seems that the noise cancels out and the information remains. And actually the cube, journalism is the wisdom of the crowds because the more you could disseminate information, the more information you collect and instead of just getting our knowledge from like one scribe or one monk, that's the whole reason why books are so valuable, right? The more information is disseminated, the more the crowd helps to disseminate what each one of us knows. So you actually have a crowd source business, you're absolutely right. That's how we started this philosophy. Yeah, it's important to understand that it is not the case that crowdsourcing and the wisdom of crowds always is better than individuals. Prediction markets sometimes fail. Sometimes fail. So in order for something to work with the wisdom of crowds the individual solutions have to be somehow predictive. It cannot be random. They have to be predictive and they have to be diverse. They have to, each of them should look at the elephant from a different perspective. And so if that's the case, then you can aggregate them and the aggregate is probably very often going to be better than the best of the individual solution. So when we do our challenges, it's often the case that we aggregate the solutions and the aggregate is pretty good, very often the best. But one important characteristic, it is robust. Suppose that we ask in this conference 6,000 people something and a guy says, I know the answer. And now we ask something else. And other person will say, I know the answer. The people who would make the best guess is not always the same, but the aggregate is always going to be at the level of the best guess, whoever says it. So it's a robust answer. And so that's what we are trying to obtain when we run these crowdsourced strategies. I've often wondered, I don't know if you guys are sports fans, but I've often, I'm a football fan, I've often wondered if the crowd could call better plays sometimes than the coaches, but anyway. Yeah, and actually the Z13 is perfect for this, to try to aggregate as many responses as possible. You need a system that's capable of massive concurrency, massive data input, and not little servers that have small memory footprints. Well, it's interesting too, we've talked a lot on theCUBE recently, especially about innovation. A lot of talk about Moore's law beginning to attenuate, innovation is going to come from combining technologies. This whole idea of the second machine age and things that computers do better than people, but the more interesting discussion is, what are the things that people do better than computers? So we didn't think self-driving cars were possible five years ago. The interesting example that I recently read was that deep blue beat Gary Kasparov at chess, but a computer is not the greatest chess player in the world. It's a combination of chess and human, or computers and human. So when you think about the wisdom of the crowd and you think about things like Watson taking over human tasks increasingly, we see it all around us. What is that last mile of human? Is that the crowd? What are your thoughts on? I wouldn't say taking over, but augmenting, helping us be able, giving us more options to discern against instead of using our brain cycles to find out what are the patterns. Computers will enable us to reach the crowd and get a response back from a question we have and then we as a human could decide what do those choices mean to us? You know that there was also an interesting crowdsourcing project. I don't know what they call it crowdsourcing. Really, what we do is an expert crowd, really. It's not a million people, right? It's rather more like 200 people. All of them with PhDs, so it's a very special crowd. I wanted to tell you that there was another interesting example in which Gary Kasparov played against a crowd. I don't know whether you are familiar with this one. So it seems that Gary Kasparov played against the computer and against the crowd. And the crowd was organizing such a way that they would vote for moves and produce moves and someone who was a master in chess would decide which one to use, but the choices were coming from different people. And apparently there was a move that was the confused Kasparov immensely. Eventually he won, but he said that that was the most difficult game that he ever played. So there we go, so the killer combination is the humans and the computers. Don, I'll give you the last word, we're out of time, but where do you see these challenges going? I mean, what's next for you guys? Do you have dream challenges? Do you have grand challenges? What's next for you guys? We would like to help people be as healthy as possible. I think if we could help with cancer, I think that would be an immense achievement. So that would be another health challenge that I'd like to see us continue to make progress on. Right, yes, these challenges serve many purposes. On the one hand, they benchmark algorithms. We don't know which algorithms are good or bad. We can have any app in our cell phone that makes predictions, but who validated it? Most of these apps are not FDA approved. We want to be sure that there is a rigorous assessment of the value of the algorithms. And we want to create communities that could talk with each other and move forward in the direction of wisdom of crowds, but also a collaborative look at the problem. And we want to democratize data in such a way that data is not in silos, but it's available for everybody to be able to use it. Finally, what we want to do is to create algorithms that empowers decisions to be able to help their patients. And then Dr. Watson in my pocket. Exactly, and you know, for journalism, instead of reporting on events that have happened, analytics that predict what will happen. I love it. FutureCube of Gustavo and Donna, thanks very much for coming on theCUBE. It was really a pleasure meeting you and having you. Thank you very much. All right, keep right there, buddy. We'll be back with our next guest. We're live from Edge 2015. This is theCUBE, we'll be right back.