 Okay, we're back here inside theCUBE, our flagship program about the events and they extract a signal from the noise. This is Stratoconference, O'Reilly Media's big data event. We're talking about Hadoop, analytics, data platforms, and big data's coming through the enterprise through the front doors, we heard yesterday. I'm John Furrier with Dave Vellante, Wikibon.org, and we're here with Jack Norris, our CUBE alumni and a favorite guest here. You're in charge executive at MAPR. You guys are leading the charge with the decision of Hadoop. Welcome back to theCUBE. Thank you. Okay, so let's chat about what's going on. What's your take on all the big news out here for the distributions, all the big power moves. You guys have a relationship with EMC. Okay, exclusive relationship with those guys. Intel's got a distribution, Horton versus with Microsoft. A lot of things are going on. So this is your wheelhouse. So what's your take on the Hadoop action here? Well, I think there's an article in Forbes where I think they said it best. This is showing that MAPR's had the right strategy all along. And what we're seeing is this, basically there's a fairly low bar to taking Apache Hadoop and providing a distribution. And so we're seeing a lot of new entrants in the market and there's a lot of options if you want to try Hadoop and experiment and get started. And then there's production class Hadoop, which includes enterprise data protection, snapshots, mirrors, the ability to integrate. And that's basically MAPR. So start and test and dev with a lot of options and then move into production class with MAPR. So break it down for the folks out there who are tipping the toe in the water and hearing all the noise, because right now the noise level is very high. Yep. Right, with the recent announcements. But you guys have been doing business obviously for many years in this area. So when people say, hey, I want to get a Hadoop distribution with enterprise, what should they be looking for? Okay, because it's not that easy to kind of squint through the noise. So could you share with the folks out there? Well, what to look for in like the table stakes, the checkboxes, because there's a lot of claims. There's a lot of noise. Is this and that. There's a lot of different options. Some teams have more committers or no committers than others. So that's all noise. But what are the key things that customers need to know? So I think you're smiling. There's three areas. One is kind of how it integrates into your enterprise. And with Hadoop, you have the Hadoop distributed file system API. That's how you interact. Well, if you're able to also use standard tools that can use standard file and database access, it makes it much, much easier. So MapR is unique in supporting NFS and making that happen. That's a big difference. The second is on dependability. And there's high availability capabilities. And then there's data protection. So I'll focus on snapshots as an example. You've got data replicated in Hadoop. That's great. But if you have a user error, an application error, that's replicated just as quickly. So having the ability to recover and go back in time. So if I can say, hey, I made a mistake. Can I go back two minutes earlier with snapshots? That makes it possible. MapR is unique in snapshot support. And then finally, there's disaster recovery mirroring, where you can go across clusters, mirror what's going on, across the WAN, and being able to recover in the case of a disaster where you lose a whole cluster or use a whole section. And that's not available in other? Those aren't available either. That's enterprise feature. They've been announced. Snapshots has been on the JIRA list for over five years. Yeah, okay. So I wonder if I could follow up. And then there's third, because I said three, and I said two. The third is performance and scale. That would be four. Let that be four. You got integration, dependability, and speed. Dependability, DR. Oh, okay. So dependability is the DR's part of the dependability. DR, snapshots, and DR. Okay, so let's talk about the performance because you guys had, obviously, Google's a big partner in the US, so we should, we just had them on theCUBE. It's Strata, so you have to have a record setting. Do you have a record setting? I think you have a record setting. We did, we did. Okay, okay. EMC, take that. Well, you work with EMC. So, I mean, talk about the performance real quick. Then we'll talk about some of the EMC conversations, but performance. You have, I had diverse performance benchmarks, Google, and within the enterprise. Can you talk about those two? So, what we announced this week was the minute sort world record. So minute sort runs across technologies. It's just how can you, how much data can you sort in 60 seconds? And if you look back at the previous record that was done in the labs with Microsoft, with special purpose software, and they did 1.4 terabytes. Hadoop hasn't been used since 2009, in several years, because it's got features in there that work against performance. Things like checkpointing and logging, because it assumes you've got long running, map-reduced jobs. So, we set the record with our distribution of Hadoop, so we have kind of one hand tied behind our back, given that technology. Secondly, we sent it in the cloud, which is the other hand tied behind our back, because it's a virtualized environment. So, we set the record with just with your legs. Just with your legs, just with your legs. And 1.5 terabytes in 60 seconds, very proud of that. Well, the cloud is interesting, because we've been doing a lot of labs testing Dave and I and our teams on cost, right? So, and it's interesting benchmark, because you always don't look at the nuance. The cost to compare a cloud performance versus bare metal, most people don't factor into the setup cost of deployment. Exactly. So, can you just quickly talk about that, and how significant of an order of magnitude is it? And what your customers are doing there. So, the previous Hadoop record took 3,400 servers, about 27,000 cores, 13,000, almost 14,000 disks, and did 600 gigs, actually a little less than it, 578. And on Google, we did it with 2,200 virtual instances, 8,000 cores, and did 1.5 terabytes. And cost to spin up the Google version? The cost, basically, if you look at that, you assume conservatively 4,000 per server. It's $13.8 million worth of hardware previously, and the cost to do that run on Google was $20.33. Well, you got a discount, didn't you? I mean, come on, that's obscene. Yeah, you're a partner. Yeah, I mean, does it really cost that much? I mean, that's what they would charge, and you customer? It actually, that's based on that minute, if you look at the actual charges, it'd be $1,200. Okay, I mean, it's not millions. No. It's to thousands. Yep. Okay, that's impressive. We'll have to go look at that numbers, like we're going to look at Green Plums numbers in the next couple of weeks. We'll talk about the Google relationship in a minute. I'll play with that, where's that going? Very excited about it. We're actually deployed throughout the cloud. We've got multiple partners, Google's in limited preview, so we've got a number of customers, kind of testing that and doing some really interesting things in the cloud. So we monitor the data center market, obviously, with our proprietary tool that you know about, the VFinder and CrowdSpots, and the thing is that the data center vertical is interesting, right? If you look at the sentiment analysis of what the conversation is on just the Twitter data, it's Facebook, Apple, these companies, and when we dig into the numbers, it's not so much the companies, it's the fact that their data center operations are significantly being looked at as the leading indicator for where CIOs are going. So, I want to ask you, in your conversation with your customers, what are the conversations around moving through the cloud and where are they on that transition? Because we hear, yeah, we're going to the cloud for all the benefits you were mentioning, but Google and Facebook, these are the gold standards as architecture, not necessarily a cut and paste architecture, but they see the benefits that they're doing. So what are your conversations with your enterprise customers around the cloud, cloud architecture, and what are the features besides replication and disaster recovery are they looking at? Well, it's basically workload-driven and data set-driven. So data that's already in the cloud are kind of a natural first step, is why don't I do the analysis there as well? So things like Google Earth and digital advertising data, it's real interesting candidates for that. Also periodic workloads, that they have workloads that need to spin up and spin down, the cloud works really well for that. And in some cases it's driven by their own environments. They've got data centers that are approaching capacity and they need to kind of do offloads and then looking at the cloud because it's easy to get up running quickly and use as an alternative. I wanted to come back to one of your three sort of value props here, particularly the dependability piece and specifically the snapshot. So somebody asked me one time, how do you, a couple of years ago, how do you back up a petabyte? It's like, well, how else you could do this? Not great. And then his answer was, well, you don't. Yeah, yeah, yeah, no. So I want to ask you how your customers are protecting and what you guys are bringing to the table there. So, snapshots is not a bolt-on feature. It's basically a low-level feature based on the underlying data architecture. So when we architected that from the beginning, snapshots was a core feature. And if you use a technique called redirect on write, you're not copying the data. So you can do petabyte snapshot, basically almost instantaneously because you're tracking the pointers of the latest blocks that have been written. So if the data change rate is, basically data is not changing, you can snapshot every minute and not have any additional storage overhead. Right, okay, and so you can set that. So you map our technologies will allow me to set that dial that up, dial it down. So we support logical volumes. So you can set policies of that volume and you can say, well, this volume, it's critical data. And then I can set policies, well, critical data's every minute. And then I can change what the definition of critical data is, maybe it's every five minutes, et cetera. So you can set up these different policies at volumes and have snapshots happen independently for each of those volumes. And I can do that by workload or data set or by application or whatever. It's actually provided as a service as opposed to kind of a one-size-fits-all approach. Exactly, and that also corresponds to user access, administrative privileges, other features and policies within the cluster. How about this whole trend toward bringing SQL into Hadoop? What's your take on that and what's your angle on it? So interactive SQL's an important aspect because you get so many people trained in the organization and leverage SQL. But it's one of many use cases that needs to run across a big data platform. So there's a range of big data analytics, batch analytics, interactive capabilities with SQL, database operations, NoSQL, search, streaming, all those are kind of functions that need to be run across a platform. So it's a piece, but it's not the big driver because what we've seen is that there's high arrival rate of machine-generated data and machine-generated response to respond to those for digital advertising, for recommendation engines, for fraud detection, can really move the needle for an organization, have huge swings in profitability. Yeah, and move the ball down the field big time. Yeah, and having an interactive piece with kind of a human element involved doesn't really scale and work on a 24 by seven basis. Jack, final question. We're over now by a minute, but when I had one parting question, Osterly, very competitive landscape right now in terms of competitiveness. The stakes are higher because the demand and the market opportunities is massive. What's MAPR's business strategy going forward? No change in direction. Is it going to be same old, same old? Do you guys have any new things going down that you see the marketplace? Yeah, we've got a huge lead when it comes to kind of mission critical enterprise grade features. And our focus is one platform. So the ability to support enterprise Hadoop, enterprise HBase, and provide those full capabilities for ease of use, for dependability, for performance. And we've seen a lot of companies test on one distribution and switched to MAPR, and we'll continue to help that in the future. Well, we will say we've been covering this big data space now going on four years now, Dave and I, and we've watched all the players pivot a few times. You guys have not. You guys have been true to your mission from day one. And we know where you stand. No one knows where you stand. Enterprise grade, it's a good strategy. I think everyone's putting that on their label now. So enterprise grade washing, we call it. But congratulations, MAPR inside theCUBE. We'll be right back with our next guest here on day three, wall-to-wall coverage at O'Reilly Media. We're going to do our news hour next from 12 to one. We'll be right back after this short break.