 Hi everybody, we're back. This is Dave Vellante of Wikibon.org and this is SiliconAngle.com's continuous coverage of HP Discover, we're in Frankfurt, we're live. This is theCUBE, where we go out, we extract the signal from the noise, we try to find the guests that are really experts in their particular domains. We try to cover this, events like this like a blanket. We have with us Alistair Veach, who's the member of HP Labs, he's the director of storage and information management platforms within HP Labs. And he's been very intimately involved in a couple of innovations, notably store once and express query that have come out of HP Labs. Now, HP Labs has been criticized by a lot of outsiders and frankly some insiders as not bringing enough innovation to market that can be commercialized. These are two examples now in very recent history that we've seen and Alistair, you're involved in both of them. First of all, welcome to theCUBE. Thanks Dave, great to be here. Good to have you and so we're here to discover 9,000 people. Big event, we did HP Discover in Las Vegas earlier this year in June. I guess it's similarly sized, a little bigger actually, but comparable. So how are you spending your time here? You was just meeting with customers? Doing stuff like this? Yeah, stuff like this, meeting with customers, manning a couple of the booths. We've got some demos here showing everybody who comes along, our partners, customers, some of the internal sales, how these new technologies work and what's coming up in the future. So in the last, say 10 years in this industry, we've seen a real shift in the way that large organizations behave and one of those changes is they've gone out and started making many, many, many acquisitions. HP certainly participated in that trend or Oracle, IBM, virtually everybody, EMC. It's necessary and a big part of the reason for that is R&D is risky. There's a lot of hit and hope in R&D. But at the same time, you've got to do it because some of the world's greatest innovations comes out of R&D, you've got to compliment that with inorganic acquisitions. Now, as I said up front, HP Labs, there was sort of a gap in some of the commercialization of some of those inventions. Two that are recent, Store Once and Express Query came out of your group. So let's talk about those. So let's start with Store Once. Take us back to the beginning. What did you guys have the idea? I mean, data de-duplication is not anything new. You guys came up with it, I guess, what, a couple years ago? Three, three years ago, with the first set of announcements. Yeah, okay, so take us back to when you guys started first working on it. So we probably started working on this six years ago, actually. And we were talking with our partners in the storage business and they knew they had to move to online disk-based devices, basically. And they came to us and talking to them. We had some people who had been looking at different forms of de-duplication, storage of large data, and removing the duplicate parts. And we realized we had a really good match here. So we focused down on that particular research and figured out how we could apply that to virtual tape library systems, basically. Which is what the store want system is. And figured out what the optimal algorithms were, what the real problems were, and how we could make that the most efficient system that we could. Came up with what we think are some really, really great algorithms for doing this. And the big advantage our stuff has is that it lets you not only de-duplicate the data, but when you come to restore it, for instance, it's many times, you know, five times faster than our competitors on the restoration. Now, a lot of people think, okay, it's just de-duplication. Lots of people can do that. We do that, but we also optimize for that restore process. And for backup, that's really important. A lot of people don't think about that. Now, is that capability algorithmic, or is it architectural, or a combination of those? It's a combination of those. It's in the algorithms we use to de-duplicate. And then how we lay out the data and how we store it to enable us to retrieve it really quickly. Okay, and I presume there's patents associated with much of this? Yes, there are. Can you talk about that a little bit? I mean, how many have you filed, or are they public at this point? Dozens, a handful? There's several dozen. Several dozen patents and claims around all those core technologies, yes. Okay, now, one of the other things that HP touts with regard to its store once technology is the ability to take that technology and to put it in a lot of different places. So I infer from that, you could use it for backup on hardware devices, like the store wants B60, whatever 100 it is. But also, you could put it into software in theory. You could put it onto primary storage devices. Is that true? So it's more optimized for the backup sort of streaming applications. And that's because of performance, or? The nature of the algorithms. You want different sorts of algorithms for primary storage a lot of the time. But one of the interesting things, you know, you say software, so we've taken that core de-duplication technology and we've integrated it into data protector. So you can get, yeah, and so that allows us to get that really high efficiency of the de-duplication. We can do it client side, backup side. We can move that data around the system. We don't have to essentially re-hydrate the data to move it somewhere else. We keep it in its de-duplicated form and can do lots of things with it. So I was poking at primary storage before and you confirmed that you got to have different algorithms to really not get in the way of performance, really is, I would presume. But at the same time, you see, you guys have had a lot of success in three part with things like thin provisioning and others have had as well. Things like compression and de-duplication. Can you leverage what you're doing now and how do you see that going into primary storage? Because I'd like all my storage to be more efficient. Yes, yes. Certainly you can and we are. There's lots of interesting things to do there. You know, the thin provisioning that we already have actually buys you a lot. Compression is built into some of our systems now as well and the de-duplication opportunities are there. It's just that they don't tend to be as extreme an advantage in primary storage because unlike backup, you know, you're saving the same thing over and over and over again. There's a huge amount of opportunity in backup. The opportunities there in primary storage, it's just not as much. Yeah, you might get a 10 to one or a 15 to one or a 40 or 50 to one, depending on the mix of data, right? Yes. In backup, whereas in primary, you might be lucky if you can get two to one. Yeah. So, okay. So that's, I think I'm inferring the value proposition of saying it's higher for backup. That's why you started there. Let's talk about Express Query. Can you describe for our audience what that is and then we can talk about some of the use cases? So Express Query is something we've just announced. We did all the announcements in the last day or so. And Express Query is a capability in our store all product line, which we've just announced as well. And basically we're looking at this issue of metadata management in our storage systems. So metadata is information about the data. And we've been working for a couple of years on the premise that knowing what you have, especially in these big data systems, you've got millions, hundreds of millions, billions of files that you have to manage and figure out what they are and who touched them and when they touched them and everything about them basically. And all of that information is metadata. So we wanted to store all of this and then be able to query that information. And we started out looking at, okay, we'll just throw this data into a database because that's what they're supposed to be good for and realized very quickly that for storage purposes, a conventional database just didn't cut it. We found all sorts of performance problems as you tried to scale up to these billions of pieces of information about files. And you're also updating it very frequently. You're putting new files into the system or writing to files, changing various attributes, pieces of these metadata. And if you're trying to do queries at the same time with a large amount of information and these rapid rates of updates, we figured out that the conventional databases just weren't cutting it. So we had to develop our own sort of specialized database technology to store this sort of information. And then we spent some time integrating it into our product group, the new store all product. And now we can essentially keep all of this information, store it very, very efficiently and query it extremely fast. So we did a benchmark on a 500 million, half a billion files in a single file system. And one of the things that, for instance, if you have to scan and find all of the files that meet a certain attribute, maybe in autonomy's use case or backup use case, you have to find all the files that have been changed, for instance, in the last hour or day or week, since the last time you looked. Or maybe you're a system administrator and you want to find all the files that have been written by a certain user in the past day or week, or just all of the files that are greater than a certain size because you're running out of space. And that takes on these very large file systems that can take days, literally days, to get a response back to one of these queries. And using the Express Query stuff, we get that 100,000 times faster. It's a second. 100,000 times faster. I mean, so something that takes a half a day to run a job on, you can do in less than a second. Yeah, absolutely. Yep. Yeah, right. Significantly less than a second. That's amazing. So, and the enabler there was a combination of things, but part of it was that you developed your own database system. What's the nature of that system? So, it's optimized for, as I said, the rapid incoming information. And what we do, actually, in a conventional database system, what you do is you process that right then and there, and you're trying to update indices and things like that. What we do is we actually put a lot of that processing into the background, where it doesn't affect the foreground workload or anything else or the other operations that are going on in your file system. And we update everything in the background, make sure it's consistent, and then we make it available to the user for query. So, I was really intrigued by this announcement, the store all announcement generally, but specifically the Express Query piece of it, because we've been pounding on this metadata. If we have time, I want to talk to you about that, but what about applying this technology to Hadoop, where you've got these big batch jobs that, and bringing real time to that environment is very difficult. Is there a play there, or is it oil and water? It's a tenuous connection, to be honest with you. You could use, or you can use, store all quite effectively with Hadoop type systems and data processing, because we have the multiple servers, it's a scale out system, and the data is partitioned across them. But the Express Query set of features is not something that you generally use in a Hadoop name system. They're sort of taking these large files and running the map and reduce across those files. Yeah, but so once I find my nuggets, so I got all this data, I want to search on that data. I want to query that data, so why wouldn't it be? Yes, that's certainly you can do. And one of the great features that we've put into Express Query is the ability for users or administrators or new applications to actually define their own types of metadata. So you can put arbitrary pieces of information in and attach them to your files or directories. So if you've got an image library, for instance, you might want to tag all of the images with, you know, this is an image of a fish or a plant or whatever it was or my holiday. And then later on you can go back and search these. Another example I like to use is, you know, imagine a medical imaging application. You've got images coming off of an MRI machine, for instance. You might want to tag those images with, you know, the identity of the machine, the technician running it, the doctor, the patient. All of that sort of information you can now store into the file system directly and then search it extremely efficiently and retrieve it. Yeah, so now the other piece I wanted to talk about, I think we do have time, is this whole metadata piece. We just put out a piece on wikibon.org around defining software-led infrastructure. And we follow that up with the software-led storage piece. But essentially what we said is, look, software-led infrastructure is running services as software on top of commodity and standardized hardware. Now that in and of itself really isn't new, right? I mean, everybody's been talking about Google doing that for a long time now. And the enterprise business is now delivering that. What is new, we think, is bringing together silos of metadata that are currently locked into, you know, individual purpose-built systems. We saw express query and store all as a way to potentially, over time, enable that. Yes. Does that make sense to you? And can you maybe talk about that a little bit? Yeah, absolutely. So you're perfectly correct, right? There's a huge amount of value locked up in our storage systems just in general. And that information is siloed. So one of the things that we're trying to do with our storage business is reduce those number of platforms and the things we can use and build up these common sets of things so we can integrate some of that information over time. I mean, it's going to take time. Nothing happens overnight in this business. But there's a huge potential there for taking that wider view over all of your information resources. And between now we have StoreOnce and our new three-par systems and StoreAll. That potential is definitely there to integrate and make those things work better together. So the pressure on HP Labs in general, and I know it's almost like an academic institution is this hands-off, hey, let's do something. But the pressure to commercialize the R&D has been accelerated in the last couple of years. You've got two successes and they happen to be in storage, so why? Why is that? Why those areas and why the success is? What's the process that has led to that innovation in commercialization? So there are many, actually, other successes out of HP Labs. I mean, I'm just representing the storage, the piece here. But those are two that Meg Whitman talks about and it's not just the storage guys talking about them. It's the senior leadership team, right? So what we focus on is we're an interesting organization. We have this blend of the academic and the business. So we actually do the technology development and we do the research. So I have an interesting job, it's somewhat like a professor but it's also somewhat like a product development manager, engineer, marketing, even, aspects of things. And what we have to do is bridge that gap between the product and the research. So that means that we put, what we do is we put the teams together, myself and my engineering team, the research team work with the engineering team to actually build out these systems, make sure they work and really work hand in hand to get these things out to market. And to a product that people can use and adds value. All right, Alistair, thanks very much. We're out of time but I really appreciate you coming on theCUBE and sharing your insights with us. Good luck, congratulations and love to have you back sometime. All right, great. Right there, everybody, we'll be right back. With our next guest, this is Dave Vellante of Wikibon. We're live, this is theCUBE, SiliconANGLE's coverage. Here in Frankfurt, we'll be right back.