 Live from the Sands Convention Center, Las Vegas, Nevada, extracting the signal from the noise. It's theCUBE, covering HP Discover 2015. Brought to you by HP. And now your host, Dave Vellante. Welcome back to HP Discover 2015, everybody. This is theCUBE. theCUBE is our live mobile studio. We go out to the events. We extract the signal from the noise. Check out hpdiscover.social. It's our digital experience, all the social data coming into the stream. Bill Minnell is here. He's the vice president and general manager of the Apollo server group at HP. And he's joined by Peter Longgreen, is the COO and technical university of Denmark. Gentlemen, welcome to theCUBE. It's good to see you. Thank you. Thanks for having us. So, Apollo, we were here a year ago. We did the reveal. I think we had it. We had it live in theCUBE, guys. And then we, yeah, I think so. And Antonio was there. We had the astronaut was walking around. That was really cool. So, we're a year in. Yep. What's the update? Well, we've had a great first year from that standpoint. So, we've had a lot of shipments of both 6,000 and 8,000. A great set of customers. So, the 6,000, both combination of university customers, such as Denmark Technical University, as well as the Minnesota Institute, and also a lot of commercial customers as well. On the 8,000 side, we've actually had a number of very interesting installations, including one recently in Poland from that standpoint, which is actually our largest installation. So, Peter, I'd love to talk to people at universities because you're open. You know, you're not hiding everything from your competitors, maybe a few things. But generally, you speak freely about what's going on. So, tell us about the university and your role there. Yeah, what we actually decided last year after some intensive years of discussion at both the political level, but also on the academic level, was actually to split the Danish national HPC budget in two parts, one dedicated to life sciences and the other being dedicated to more general HPC application areas. And the reason for that is actually because the life science contribution to the Danish economy is pretty big. If you take a pharmaceutical company like Novo Nordisk, for example, contributes more than 6% of GDP to the Danish economy. So, it's an important area. And that's why we actually decided to try to go ahead with this because the relations between industry, academia and healthcare in Denmark is quite unique. And that's why we actually made this 16,000 core system based, as Bill mentioned, on the Apollo 6000 platform. Okay, so tell us more about the system and what you're doing with that. Yeah, what we're actually doing with this system is that we are covering a wide range of life science areas. On this system we have up to, I think currently we have about plus 800 different algorithms that constantly changes. The turnaround on development time for algorithms within this booming area is down to six months. So it means that we have a built a system that is flexible enough to accommodate that fact. And that's very much in contrast to more traditional high performance computing systems. The system is also built in a way where it will make it very easy for different kind of profiles to use it because the life sciences is a diverse domain. It consists of both biologists, computer science, but also doctors. And many doctors, they don't have a computational background. So it means that this system needs to be flexible and easy to work with. And we managed to establish that. So tell me more, how would you approach this problem with a traditional HPC system? Well, I would say our talents would be, we could not actually use a more traditional HPC system because that would be optimized perhaps for five to six different codes. That would be very strict policies regulating the access to the system. The system would not be built in a way where you can manage sensitive data. And that is very important because in many cases we handle sensitive data like patient record, medical journals, stuff like that, for example. So we cannot use these kind of systems. We need to have our own and we need to get to this scale of magnitude that we are now. So Bill, is this a common story you're hearing in the customer base? Maybe you could talk a little bit more about how people are using Apollo. Sure, and so a lot of times customers are using Apollo in a variety of different ways. Sometimes it's running a few applications very well. Other times they have a very large, diverse customer base. So they might have hundreds of different users with all different types of applications. So we see both of those across the customer base. Typically more in our commercial clients like oil and gas customers, which use the Apollo 6000 a lot, they have a seismic application that they just run all the time to try to find oil, for example. So Peter, what are you ultimately after? What's the outcome that you're trying to achieve and how close are you to getting it? What we're really trying to achieve is actually to enable the different domains of industry, healthcare, and life sciences to access a number of the same datasets in a secure manner. So we would like to make data available, whether it's for conducting research, whether it's to actually cure people, or whether it is to generate new drugs, new treatments. So that's what we're ultimately after. And HPC can bring us part of the way. We are on top of this system building a private cloud that enables these different stakeholders to access data in a secure manner without violating statutory acts. How does it work with the stakeholders? What's the commercial relationship between the university and the stakeholders? And are they funding the initiative and then they get value back? How does that all work? That's a combination, actually. They are partly funding, but they are also, in some cases, paying per the hourly use of CPU cores, memory, and storage. So we have different business models that really tries to accommodate the specific needs of each individual domain. So what's this private cloud that you're talking about? What's it look like? Can you unpack that for us? I mean, what's in there under the covers? What we have actually deployed is, together with HP, we have developed quite a unique concept where we combine open stack technology with more proprietary systems. That actually allows people to access sensitive data in cloud without others having access to it. That's pretty unique, so you can actually provision bare metal here without violating the security that you need to uphold to. So Bill, I wonder if we could talk about the market a little bit that you guys are going after. How do you look at it? How do you segment it? How big is it? Maybe break that down for us. I'm sure, so we're going out to the high-performance computing market for these types of Apollo products. And so the high-performance computing market, roughly about a $10 billion market, so it's roughly 10% of the overall server market. We tend to break it up into segments. So you've got the lowest segment, which is what we call the workgroup segment, which tends to be small companies, small departments in existing companies. Then you move into what we call the departmental range, so that's bigger than workgroup. So larger companies. Then you work into the divisional range, so that might be a division of a major company, let's say, of an aerospace company. Maybe it's one division. And finally, it's what we call the supercomputing area. And that tends to be systems that are half a million dollars or more and tend to do large jobs. Maybe they have lots of users or their capability system to run one application over the entire system. So a lot of people talk about sort of HPC as sort of was the harbinger of big data. I wonder if you could talk about the sort of data-driven apps. It's one of your four transformational areas that Meg Whitman talked about on day one. Sure. How has that been a tailwind for you? What are people doing? Oh, sure. Well, so in reality, until big data has just recently become very in vogue in enterprise, but in the HPC world, we've been using big data for decades. So we had to worry about- Big data, big deal, right? Yeah, exactly. We had to worry about big data relative to climate models, to weather models, to life sciences models. And so we've had to manage it and secure it and do all those things. And now, the good thing about it is we've done pioneering work in HPC and big data. We can bring that into the commercial space as well and offer those to more enterprise customers and be able to have them have an easier journey if you will. So has the big data theme changed what you do? You're sort of joking with Bill, that big data's always been around in your world and the HPC world. But have things like Hadoop and now Spark and Yarn and all these other sort of innovations, how has the HPC world, your world leveraged those? I would say, I'm glad you asked this question because what we're really challenged from that's actually the growth of data. If you look at some of the new technologies for generating data, like next generation sequencing, where you sequence whole genomes, that price has gone down from, when Craig Venter, I think it was around two million, or two billion US dollars when you sequence the first human being, that price has now come down to below $1,000. And it means actually that the doubling time of data is now down to below six months. And that poses a big challenge because these kind of data needs to be combined with a lot of other types of data because otherwise you cannot translate what they really mean. You need to combine them with patient journals, for example, medical journals, literature. So there you really establish a big data problem. And that's also actually one of the main reasons that we established this cloud on top of the supercomputing infrastructure that is actually to be able to access all of these data that will, when it's combined the right way, create new correlations we have never thought about, enable precision medicine, where people can actually, based on their own genome, get more relevant recommendation on how to live, what to do, what not to do, and also give concrete recommendations concerning what kind of treatments to take, so on and so forth. So we are looking in this area towards a virtual revolution on the way we do medicine. How does a consumer actually access that information? What's the channel? I would say they don't do it as of yet. We are research, but we predict that within five to 10 years, a huge part of the population will have their genome sequenced. And it means that people will have it to take a much more personal responsibility for their health than what we know nowadays. And that is the big change that is going to come. And in fact, we can to some extent predict that pretty accurately, that that will happen. Well, you're seeing it today and I mean, you're right, it's narrow cases. I mean, you see it with breast cancer in some cases and other diseases, certainly diabetes is another one that you're seeing that. But you're talking about a massive shift in the way a consumer is able to understand his or her risks and to make personal decisions about how to deal with those. Exactly, because we can predict the future here. We know that the price on generating data is going down, and we also know that the price on computing and storage is going down. So it's a question of time when people will ask for these kind of services. And we predict this will happen within the next five to 10 years. And there's no way a political level can can avoid taking that discussion. Well, yeah, it's almost like the politicians are kind of protecting the past from the future, but it's a tsunami that's going to be there. Whether they bless it or not, that information is going to be available. And I hear you right, you're saying the data is doubling now every six months? Is that? That's just for one data type. You have other data types as well. Look, for example, in Denmark, we did one study on 6.2 million patient records. We've also done a study together with the US that involved 150 million patient records. Ultimately, you would like to get access to all patient journals that are available because the more data you get, the more precise you can become. Yeah, I mean, do you see the day where sampling is, pass A, sampling is dead? Is it, are we there yet in some cases? I like that question because I think at some time we will probably get there, but that's a controversial statement. Yeah, well, I mean, but it's happening in certain industries and the financial services with risk, you remember, it used to take months and months and months and then you'd get a call. Hey, you better check your statement. Now it's within seconds. I would say that's one thing that goes against it. And that's the fact that our genomes constantly changes. They mutate. So we cannot only do one sequencing because it could change within a week or two weeks or whatever. So that's a massive data problem. Yeah, but if you accidentally combine it with the fact that your own genome is just a little bit apart of the organism that constitutes you, if we look towards sequencing your whole organism with bacteria and so on, that would amount to 160 petabyte. That's where we ultimately would like to get. So there's no end in sight to what you can potentially do with lower cost compute, lower cost storage and faster processing. Correct. Wow, that's exciting. So the curve is actually reshaping. We used to follow Moore's law, the technology business. You're talking about the curve actually steepening. Yes. The innovation curve. And that's exactly why we need to deploy clouds because there's no way that we can generate these data ourselves. We need to have a division of labor where we can access through bursting into different clouds to get access to these data. Because moving these data across networks, it doesn't make sense. They're simply too big. So Bill, but that's interesting. So the customer here is moving faster than the traditional Moore's law curve. People talking about Moore's law coming to an end. We've been predicting the end of Moore's law for a long time now, haven't been right. But there's things like multicores and other things that are keeping that alive. How do you keep up with the demand that is where that curve is bending? Well, I think the interesting thing is that we're seeing more and more technologies coming available. So we've got what we call accelerators that are coming on board. So a variety of multicore type processors that are much more dense, if you will, have many more cores than the more common Xeon processor are now becoming available. FPGAs are being used. A whole bunch of different types of technologies for interfacing with data, creating data lakes so that you can share data. So it's really driving a lot of technological change as well as we go forward because the demand is out there that helps to foster that level of innovation. In a lot of cases, we're wanting to catch up. And so there's a lot of investment in that area as well. Yeah, and you're seeing flash storage as another piece of the puzzle. Are you using flash storage? We're actually planning to deploy flash. We would like to get for some databases that are updated regularly, up to 200,000 IOPS, if possible. We need to do that. But you're not using flash today. It's all, this is all on spinning disk. We are using this flash, but not in the magnitude that we will do within a year. And I guess in certain use cases, it's okay because you're doing a lot of parallelism. Correct. But still spinning. We have a lot of individual jobs. Sometimes in our queuing system, we have up to 20,000 jobs. So it's really quite diverse to workload. It depends on the data type, on the problem type that you're dealing with. What do you use for a research scheduler? Is that your homegrown, or is it something off the shelf or a job scheduler? No, it's, I think we're using currently, I can't remember, so sorry. But are you using Hadoop? No, we're not because Hadoop will give too big performance degradation, perhaps in the future. The spark and the yarn. Yeah, but with these kind of data we're working with, Bill, you know, it's more tricky to use systems like Hadoop. Yeah, yeah, okay. Well, that's evolving, but do you see that when you're customer base, they're pretty much staying away from Hadoop, or you're seeing a lot of Hadoop adoption? You know, it depends on the particular industry. So we see a lot of Hadoop in industries where there's a lot of data and a lot of relatively parallel processing from that standpoint. So, you know, as an example, we have a customer in the auto industry that's using Hadoop to understand the data that is now created by all the cars that are out there. So most of the cars themselves are an internet of thing by themselves. And so they upload a lot of data back to the home office and they use Hadoop to understand some of the quality data, the warranty data that they're getting from this data upload all the time. So I have to ask you, Bill, so you go to Denmark, you meet with Peter and his team, they tell you how they're using Apollo, they're happy, but they have a lot more that they need to get done. What kind of questions do you ask him about needs and futures over that conversation? So a lot of times I would ask Peter about, you know, what his workloads look like, what his user environment looks like, how much usage of the system, is it 90% use, 95% use, do they have, are people waiting on the queue, how much additional processing power do they need? Are there additional needs for storage? What are their plans for securing their data and also long-term storage, for example? Those are some of the questions. So what about some of those questions? I mean, are you sort of out of capacity, you need more, bigger, faster, better? I can give you an example with storage, for example. We deployed in November last year a three petabyte file system. We generated a loans from 1st of January, 2015, up until now, one and a half petabyte. So you see, we are getting into situations where we need to expand our systems and that's going to continue like that in the foreseeable future. And how about security? What's that conversation like? I mean, security, we actually handle that through our cloud deployment, combined with some of the things that we develop ourselves. We have 50 developers looking into virtualizing bioinformatic workflows and applications so that you avoid moving data around and that's important because then you don't move sensitive data around. People don't like that. It's kept safe, it's hyper-compliant, it's done in a way where people can rely on that we will not in any way allow others to gain access to people's data. Now, Peter, same question for you. Bill comes to town, you describe what's going on. What kind of questions does your team have for a senior executive from HP? I mean, I think we already touched upon it but I think some of the questions we would have would definitely go in, how are you going to help us address this big issue we have with data doubling times of down to six months and how do we actually manage to feed our supercomputing systems fast enough in the future because when data becomes so big, it really becomes a problem to do so. So that's some of the questions that my team is asking Bill and his team. You must love those gnarly problems. That's why I always bring engineers with me. All right gentlemen, we're out of time but thanks very much. You're right, thank you. It's great really to have you. Thanks for having us. Keep it right there, everybody. We'll be back with theCUBE HP Discover 2015. Right back.