 Live from Midtown Manhattan, theCUBE's live coverage of big data NYC, a Silicon Angle Wikibon production. Made possible by Hortonworks, we do Hadoop, and when this goes, Hadoop is made invincible. And now your co-hosts, John Furrier and Dave Vellante. Hi buddy, we're back. This is Dave Vellante with Jeff Kelley. We're with Wikibon and this is theCUBE. Silicon Angle's continuous production. We're here at Big Data NYC right across the street from the Hilton where Stratoconf and Hadoop World is going on. We've got a multi-time CUBE guest, Jack Norris, the CMO of MapR is here. Jack, welcome back to theCUBE. Good to see you. Thank you Dave. So by the way, thank you so much for the support. As you know, we're across the street here at the Warwick Hotel. MapR, you guys have always been so generous supporting theCUBE. We can't thank you enough for that. So really appreciate it. Thank you. So we were able to listen to your keynote yesterday. We weren't broadcasting head to head yesterday and had an opportunity to hear your keynote. So first of all, how did that go? I wanna ask you some questions about it. It was really well received and I think people were kind of clamoring to try to separate the myths from reality on Hadoop. So you had three myths that you talked about, one related to the district. I actually like to get into some of those. So what was the first myth was around the distribution battle. So take us through that. So the impression that it's a knocked down, drag out, competitive battle across Hadoop distributions was the first myth. And the reality is that all of the distribution share the same open source Apache code. And this is one of the first markets that's really created or the first open source technologies that's really created a market. I mean, look what's happened here with this whole big data and Hadoop. But given that early stage, there's the requirement to really combine that open source code with additional innovations to meet customer needs. And so what you see is you see those aggregators that are taking open source. You see others that are taking the open source and then adding maybe management utility, couple of different applications on top. And then our approach at MapR is we're taking the open source with those management innovations, doing some development in the open source community with things like Apache Drill, and then really focusing on the underlying architecture, the data platform, and providing innovations at that layer. So you guys have been actually sort of of the three major distros that we talk about all the time. You guys, Hortonworks and Hadoop, you guys have been consistent the whole time. That's as Hortonworks, right? Cloudera basically put out a post recently saying, hey, we're kind of going in a different direction, sort of what I called it, tapped out of the Hadoop distro piece of it. But so there's a lot of discussion around it. You're putting forth, hey, it's not an internecine war, but does it matter is my question. Well, I think if you take a step back, the Hadoop ecosystem is incredibly strong, growing very, very quickly, fastest growing big data technology, one of the top 10 technologies overall. And I think it's because we are sharing the same API. It is possible for customers to learn on one, develop and move seamlessly to another. And in the keynote, I talked about the difference between the NoSQL market, which is there is no consensus there. And customers have to figure out not only what's the right workload, but what's the technology that's actually going to have some staying power? Well, that's a powerful commune. Amazon turned the data center into an API. The Hadoop community is essentially turning data access into an API. And that is a very powerful and leverageable concept. Okay, your second myth was around the whole NoSQL. Yes. Piece of it. You helped me put up a slide. I thought I read Jeff Kelly's reports and I thought I knew them all, but there were a couple in there that I didn't recognize. You probably knew them all, but so take us through myth number two. And I'm sure we missed that. I missed some. There wasn't room in the slide for anymore. But the, yeah, it's basically about the consensus. There is no real consensus. There's no common API. There's no ability to move applications seamlessly across NoSQL solutions. If you look at one NoSQL solution, and that's HBase, a big inherent advantage because it's integrated with Hadoop. This whole trend is about compute and data together. So if you've got a NoSQL solution that's on that same massive data store, big leg up. And then we got into the, well, if you've got HBase, it's included in all the distributions and all the distributions share the same open source, then obviously it must run the same across all distributions. And there we shared some pretty interesting data to show the difference when you do architectural differences and innovations underneath that you can dramatically change the performance of not only MapReduce, but of NoSQL. Yeah, so okay, so not all NoSQL is created equally. Not all HBase is created equally is essentially what you're saying there. Now the third piece was Hadoop is enterprise ready, right? So you guys were first to say, well, we have a Hadoop platform that's enterprise ready. Way ahead on that, got criticized a lot for going down that path, shrugged and said, okay, we'll just keep doing business with customers. And you've been again, very clear and consistent on that. So talk about the third myth. And that's, is Hadoop ready for prime time? And I think the way to combat that myth is by customer examples and showing the tremendous success that customers are enjoying with Hadoop. And we don't have time on the cube here to go through all of them, but I like to point out 90 billion auctions a day with Rubicon, they've surpassed Google in terms of ad reach, they're doing that on MapR. 1.7 trillion events a month with ComScore, that's on MapR. You look in traditional enterprise, a single retailer with over 2,000 nodes of Hadoop. I mean, it's a key part of their merchandising and retail operations. And combining all sorts of data feeds and all sorts of use cases there, financial services over 1,000 nodes, a risk mitigation, personalized offers, streamlining their operations, it's dramatic. And then we shared some of the more interesting ones, esoteric ones like garbage and whiskey and weather prediction. Okay, but your customers consider these, even as diverse and eclectic as they are, they consider these mission critical applications, right? Oh, absolutely. No, it's, and I think that's the difference because what we're talking about is not Hadoop as this cache, right? This temporary processing where we can do some interesting batch analytics and then take that and put that someplace else. And yes, there are applications like that, but companies soon realize that if I'm going to use this as a key part of my operations and it's about data on compute, then I want a consistent permanent store. I want a system of record. So all of the SLAs and high availability and data protection features that they expect in their enterprise applications should be present in Hadoop. That's where we focus. Let's run down a couple of those. What are some of the key capabilities that you need in an enterprise-grade platform that MapR is addressing? Well, let's take business continuity because that's important. If you're really going to trust data there, and one of the big drivers as you expand data is how much am I gonna spend on it? And if you look at a large investment bank, $270 million of their budget, not total but incremental to address the additional capacity, there's a big emphasis for let's look at a better way to do that. So instead of spending $15,000 a terabyte, you can spend a few hundred dollars a terabyte. That's a huge, huge advantage. And that's the focus of Hadoop. But to do that well, then the features that are in this enterprise storage have to be present. And we're talking about mirroring and not a copy table function, but replication. That's how organizations do it, right? We're talking about recovery too. And recovery, you can't back up a petabyte of information through a copy function, right? You have to do a snapshot. And the snapshots have to be consistent, right? And we're not saying anything that an enterprise administrator doesn't know. There is some confusion when you're more on the developer side as to what these features are and the difference between a fuzzy snapshot and a point in time consistent snapshot. So let's talk a little bit about the enterprise data hub, this concept that Michael Oswald-Cluder introduced yesterday. Tell us a little bit about your take on Mike's, I guess, definition and essentially, I think trying to name the category of kind of what Hadoop can do and where it sits in the architecture. Did you agree with his characterization? I mean, if you look at that description, it's about, I'm taking important data and I'm putting it in Hadoop and I'm combining a lot of different data sources. And it's been referred to as a data lake and a data reservoir and a data, I mean, we've heard a lot of terms. We worked with an outside consultant that was originally an architect at Ter Data. It's been about eight months, almost a year ago now, where he defined an enterprise data hub. And it's, he went through kind of the list of requirements and once you move from a transitory to a permanent store, then that becomes an enterprise data hub. And an enterprise data hub can be used to select and process information, maybe it's ETL and serve some downstream applications. It can also be useful to do analysis directly on it to serve different business functions. But the system requirements that he established for that, I think are absolutely true and it's you have to have the full data protection, you have to have the full disaster recovery, you have to have the full high availability because this is going to be important data serving the organization. If it's data that you can lose, if it's data that you don't really care about having highly available, then it's a very narrow use case that that data hub serves. So you're saying the enterprise data hub isn't ready for prime time in your view? No, I'm saying that there are requirements and we have companies today that have deployed an enterprise data hub and they are quite successful with it and the quotes are the ETL functions that they're doing on that hub are 10 times faster and it's 10 times cheaper than what they're seeing. You're saying your enterprise data hub's ready for prime time, the other guy's isn't. Right. Did I understand? It's a better sound bite, Dave. Well, I think, I agree that. Okay, but it's nuanced, right? And so, you know, the customers, because you know, vendors, they're all saying the same thing to the customers, right? So you got your messaging that you've proven out over the last several years and then the entire market starts to use the same terminology. So it is, this is why I like Jeff's question about what are those things that make a difference? We're in a little bit of this, you know, kind of marketing fog here in the relative early stages. I think the best response there is customer proof points. And I think some education in the very beginning, you know, when they're in development and test, it's really important to understand, you know, what is Hadoop and what can I use it for and what data source am I gonna leverage? I think the features that we're talking about really start to show up as you deploy in production and as you expand its use in production. And there we've enjoyed tremendous success. I don't think anybody would argue that you have a lead in this space. I wouldn't, I don't think you would either. This space being robustness, enterprise ready, mission criticality. Is your lead increasing, decreasing, staying the same? What's your sense? Well, it's hard because there's no, you know, there's no external service that's out there, you know, interviewing every customer and giving numbers. I do know that we pass 500 paying customers. I do know that we've got significant deployments and you can measure those in terms of number of nodes, you know, in the thousands of nodes, you can measure those in terms of use cases. So we've got, you know, one company, they've passed 20 different use cases on the same cluster. I think that's an interesting proof point. We're scaling in terms of the number of people in an organization that are trained in leveraging the data in MapR, again in the thousands. So, you know, I think this market is so big and so dynamic that this isn't about, you know, one company's success at the expense of everyone else's. It's not a zero sum game. I think, you know, we're all here kind of raising this boat and focusing on this paradigm shift, but when it comes to production success, that's our focus and I think that's where we've proven that. One thing I really wanna get your opinion on, you know, as Sadoop matures and some of the innovations you guys are doing and making the platform, you know, basically a multi-application platform. You can do more things with Hadoop. And we've been talking about this on theCUBE is that as that happens, you're gonna start, you as an industry, you're gonna start bumping up against the EDW vendors and some of the other database vendors in the traditional world and now you're doing some of the things that those tools can do. Now, you know, two years ago it was very much that this is all very complimentary, Hadoop and your EDW. There's no overlap, we're gonna all play nice. But increasingly we're seeing that there is an overlap. How do you view that? Is that, and what is your relationship with those EDW vendors and what are you hearing from customers when you go into customer accounts? So, I mean, there's a lot in that question. Yes, well. I think the first comment though is don't look at Hadoop through the single data warehouse lens. And if you look at trying to use Hadoop to completely replace an enterprise data warehouse, well, there's a few decades of experience there. There are many organizations that have a lot of activities that are based on that data warehouse and that's where we're seeing a data warehouse offload that is complimentary, but it gives organizations this lever to say, well, I'm gonna control the fill rate and I'm gonna take some of the data that's no longer really active and put that on Hadoop and really change my ability to manage the cost in a data warehouse environment. The other thing that's interesting is that the types of applications that Hadoop are doing, I think are creating a new class. It's about operations and analytics kind of combined together, taking high arrival rate data and making very quick micro changes to optimize, whether that's fraud detection or recommendation engines or taking sensor data and predictive analytics for maintenance, et cetera. It's just a tremendous number of applications. In some cases, leveraging a new data source, in some cases doing new applications, but it's just opening things up and I think organizations are moving to be very data-driven and Hadoop is at the center of that. That's great and you control the fill rate. That's another really good soundbites thing for that. And these that you mentioned, this high arrival rate data, this fraud detection, predictive analytics, maintenance, these are things that you're doing today with NABAR, right? Yeah, absolutely. Great. All right, Jack, well listen, always a pleasure. Thanks very much for coming by. Great to see you again. Thank you. Keep it right there, everybody. We'll be right back with our next guest. This is theCUBE, we're live from The Big Apple.