 Live from Madrid, Spain. It's theCUBE, covering HPE Discover Madrid 2017. Brought to you by Hewlett Packard Enterprise. Welcome back to Madrid, everybody. This is theCUBE, the leader in live tech coverage. And my name is Dave Vellante, and I'm here with Peter Burst. This is day two of HPE Hewlett Packard Enterprise Discover in Madrid. This is their European version of a show that we also cover in Las Vegas. Kind of six month cadence of innovation and organizational evolution of HPE that we've been tracking now for several years. Sherrod Singhal is here. We cover software architecture for the machine at Hewlett Packard Enterprise, and Matthias Becker is a postdoctoral researcher at the University of Bonn Gentlemen. Thanks so much for coming in theCUBE. Thank you. You know, we talk a lot on theCUBE about how technology helps people make money or save money, but now we're talking about something just more important, right? We're talking about lives and the human condition and specifically, yeah, hard problems like Alzheimer's. So, Sherrod, why don't we start with you? I mean, talk a little bit about what this initiative is all about, what the partnership is all about and what you guys are doing. So we started on a project called the Machine Project about three, three and a half years ago. And frankly, at that time, the response we got from a lot of my colleagues in the IT industry was, you guys are crazy, right? We said, we are looking at an enormous amount of data coming at us. We are looking at real-time requirements on larger and larger processing coming up in front of us. And there is no way that the current architectures of the computing environments we create today are going to keep up with this huge flood of data. And we have to rethink how we do computing. And the real question for us, and those of us who are in research in Hewlett-Packard labs was, if we were to design a computer today, knowing what we do today, as opposed to what we knew 50 years ago, how would we design the computer? And this computer should not be something which solves problems for the past. This should be a computer which deals with problems in the future. So we are looking for something which would take us for the next 50 years in terms of computing architectures and what we would do there. In the last three years, we have gone from ideas and paper study, paper designs and things which were made out of plastic to a real working system. We have, in around Las Vegas time, we basically announced that we had the entire system working with actual applications running on it. 160 terabytes of memory, all addressable from any processing core are in 40 computing nodes around it. And the reason is it's, although we call it memory-driven computing, it's really thinking in terms of data-driven computing. The reason is that the data is now at the center of this computing architecture as opposed to the processor, and any processor can reach into any part of their data directly as if it was doing addressing and local memory. This provides us with a degree of flexibility and freedom in compute that we never had before. And as a software person, I work in software. As a software person, when we started looking at this architecture, our answer was, well, we didn't know we could do this. Now if, given now that I can do this and I assume that I can do this, all of a sudden the programmers started thinking differently, writing code differently, and we suddenly had essentially a toy to play with, if you will, as programmers where, we said, you know, this algorithm I had written off decades ago because it didn't work. But now I have enough memory that if I were to think about this algorithm today, I would do it differently. And all of a sudden a new set of algorithms, a new set of programming possibilities opened up. We worked with a number of applications ranging from just Spark on this kind of an environment to how do you do large-scale simulations, Monte Carlo simulations. And people talk about improvements in performance from something in the order, oh, I can get you a 30% improvement. We are seeing in the example applications we saw anywhere from five, 10, 15 times better to something which where we are looking at financial analysis, risk management problems, which we can do 10,000 times faster. So many orders of magnets here, but you don't have to wait for the horrible storage stack. That's correct, right? And these kinds of results gave us the hope that as we look forward, all of a sudden these new computing architectures that we are thinking through right now will take us through this data mountain data tsunami that we are all facing in terms of bringing all of the data back and essentially doing real-time work on those. Right. Matthias, maybe you could describe the work that you're doing at the University of Bonn, specifically as it relates to Alzheimer's and how this technology gives you possible hope to solve some problems. So at the University of Bonn, we work very closely with the German Center for Neurodegenerative Diseases and in their mission, they are facing all diseases like Alzheimer's, Parkinson's, multiple sclerosis and so on. And well, in particular, Alzheimer's is a really serious disease and for many diseases like cancer, for example, the mortality rates improved, but for Alzheimer's, there's no improvement in sight. So there's a large population that is affected by it. There is really not much that we currently can do. So the DZNE is focusing on their research efforts together with the German government in this direction. And one thing about Alzheimer's is that if you show the first symptoms, the disease has already been present for at least a decade. So if you really want to identify sources or biomarkers that point you in this direction, once you see the first symptoms, it's already too late. So at the DZNE, they have started a cohort study in the area around Bonn. They are now collecting the data from 30,000 volunteers. They are planning to follow them for 30 years. And in this process, we generate a lot of data. So of course, we do the usual surveys to learn a bit about them. We learn about their environments, but we also do very more detailed analysis. So we take blood samples and we analyze the complete genome and also we acquire imaging data from the brain. So we do an MRI at an extremely high resolution with some very advanced machines we have. And all this data is accumulating because we do not only have to do this once, but we try to do that repeatedly for every one of the participants in the study. So that we can later analyze the time series when in 10 years someone develops Alzheimer's, we can go back through the data and see maybe there's something interesting in there. Maybe there's this one biomarker that we are looking for so that we can predict the disease better in advance. And with this pile of data that we are collecting, basically we needed something new to analyze this data and to deal with this. When we heard about the machine, we thought immediately, this is a system that we will need. Let me see if I can put this in a little bit of context. So Dave lives in Massachusetts. I used to live there in your Framingham Massachusetts. I was actually born in Framingham. You were born in Framingham. And one of the more famous studies is the Framingham Heart Study, which tracked people over many years and discovered things about heart disease and the relationship between smoking and cancer and other really interesting problems. But they used a paper-based study with an interview base. So for each of those kind of people, they might have collected, you know, maybe a megabyte, maybe a megabyte a half of data. You just described a couple gigabytes of data per person, 30,000 multiple years. So we're talking about being able to find patterns in data about individuals that would number in the petabytes over a period of time. Very rich detail that's possible, but if you don't have something that can help you do it, you've just collected a bunch of data that's just sitting there. So is that basically what you're trying to do with the machine is the ability to capture all this data to then do something with it so you can generate those important inferences? Exactly, so with all these large amounts of data, we do not only compare the data sets for a single person, but once we find something interesting, we have also to compare the whole population that we've captured with each other. So there's really a lot of things we have to pass and compare. This brings together the idea that it's not just the volume of data, I also have to do analytics and cross all of that data together, right? So every time a scientist, one of the people who is doing biology studies or informatics studies asks a question and they say I have a hypothesis which this might be a reason for this particular evolution of the disease or occurrence of the disease. They then want to go through all of that data and analyze it as they are asking the question. Now if the amount of compute it takes to actually answer that question takes me three days, I have lost my train of thought. But if I can get that answer in real time, then I get into this flow where I'm asking a question, seeing the answer, making a different hypothesis, seeing a different answer, and this is what my colleagues here were looking for. But if I think about, again, going back to the Framingham Heart Study, you know, I might do a query on a couple of related questions and use a small amount of data, technology for do that spin around, but when we start looking for patterns across brain scans with time series, we're not talking about a small problem. We're talking about an enormous sum of data that can be looked at in a lot of different ways. I got one other question for you related to this, because I got to presume that there's the quid pro quo for getting people into the study is that 30,000 people, is that you'll be able to help them and provide prescriptive advice about how to improve their health as you discover more about what's going on. I got that right. So we're trying to do that, but also there are limits to this, of course. Of course. For us, it's basically collecting the data and people are really willing to donate everything they can from their health data to allow these large studies, so. To help future generations. That's what the quid pro quo, yeah. Okay, there isn't, okay. Yeah, so. But still, the knowledge is enough for them. Yeah, their incentive is they're going to help, you know, people have this disease down the road. I mean, if it is not me, if it helps society in general, people are willing to do a lot. Yeah, of course. Oh, sure. Now, the machine is not a product yet, that's shipping, right? So how do you get access to it, or is this sort of future? So when we started talking to one another about this, we actually did not have the prototype with us. Okay. But remember that when we started down this journey for the machine three years ago, we knew back then that we would have hardware somewhere in the future. But as part of my responsibility, I had to deal with the fact that software has to be ready for this hardware. Does me no good to build hardware when there is no software to run on it. So we have actually been working at the software stack, how to think about applications on that software stack using emulation and simulation environments where we have some simulators which essentially are instruction level simulators for what the machine does or what our prototype would have done and we were running code on top of those simulators. We also had performance simulators. We say if we write the application this way, this is how much we think we would gain in terms of performance. And all of those applications on all of that code we were writing was actually on our large memory machines, Superdom X to be precise. So by the time we started talking to them, we had these emulation environments available. We had experience using these emulation environments on our Superdom X platform. So when they came to us and started working with us, we took their software that they brought to us and started working within those emulation environments to see how fast we could make those problems even within those emulation environments. So that's how we started down this track. And most of the results we have shown in the study are all measured results that we are quoting inside this forum on the Superdom X platform. So even in that emulated environment which is emulating the machine, now of course in the emulation Superdom X, for example, I can only hold 24 terabytes of data in memory. I'm saying only 24 terabytes because I'm looking at much larger systems. But an enormously large number of workloads fit very comfortably inside the 24 terabytes. And for those particular workloads, the programming techniques we are developing work at that scale, right? They won't scale beyond the 24 terabytes, but they'll certainly work at that scale. So between us, we then started looking for problems and I'll let Matthias comment on the problems that they brought to us. And then we can talk about how we actually solve those problems. So we work a lot with genomics data and usually what we do is we have a pipeline so we connect multiple tools and we thought, okay, this architecture sounds really interesting to us, but if we want to get started with this, we should pose them a challenge. So if they can convince us, we went through the literature, we took a tool that was advertised as the near optimal solution. So prior work was taking up to six days for processing. They were able to cut it to 22 minutes. And we thought, okay, this is a perfect challenge for our collaboration. And we went ahead and we took this tool, we put it on the Superdome X, it was already running instead of five minutes instead of just 22. And then we started modifying the code and in the end, we were able to shrink the time down to just 30 seconds. So that's two magnitudes faster. So we took something which was they were able to run in 22 minutes and that was already had been optimized by people in the field to say, I want this answer fast. And then when we moved it to our Superdome X platform, the platform is extremely capable. Hardware-wise, it compares really well to other platforms which are out there. That time came down to five minutes, but that was just the beginning. And then as we modified the software based on the emulation results we were seeing underneath, we brought that time down to 13 seconds, which is a hundred times faster. We started this work with them in December of last year. It takes time to set up all of this environment. So the serious coding was starting in around March. By June, we had nine X improvement, which is already a factor of 10. And since June up to now, we have gotten another factor of 10 on that application. So I'm now at a hundred X faster than what the application was able to do. Two orders of magnitude in a year. In a year. Okay, we are out of time, but where do you see this going? What is the ultimate outcome that you're hoping for? For us, we're really aiming to analyze our data in real time. Oftentimes we have biological questions that we address, we analyze our data set. And then in a discussion, a new question comes up and we have to say, sorry, we have to process the data, come back in a week. And our idea is to be able to generate these answers instantaneously from our data. And those answers will lead to what? Just better care for individuals with Alzheimer's or potentially, as you said, making Alzheimer's a memory. So the idea is to identify Alzheimer long before the first symptoms are shown because then you can start an effective treatment and you can have the biggest impact. Once the first symptoms are present, it's not getting any better. Well, thank you for your great work, gentlemen. And best of luck on behalf of society. Really appreciate you coming in theCUBE and sharing your story. You're welcome. All right, keep it right there, everybody. Peter and I will be back with our next guest right after this short break. This is theCUBE. You're watching live from Madrid, HPE Discover 2017. We'll be right back.