 The Cube at Hadoop Summit 2014 is brought to you by Anchor Sponsor, Hortonworks. We do Hadoop. And headline sponsor, WAN Disco. We make Hadoop invincible. Okay, welcome back everyone live here in Silicon Valley in San Jose. This is Hadoop Summit. This is Silicon Angle in Wikibon's The Cube. It's our flagship program. We go out to the events and extract the civil noise. I'm John Furrier, the founder of Silicon Angle. I'm joined by my co-host Jeff Kelly, top big data analyst in the community. Our next guest is Jack Norris, CMO of MapR, security enterprise. That's the buzz of the show. It was the buzz of OpenStack Summit, another open source show. And here this year, you're just seeing move after move after move talking about a couple critical issues. Enterprise grade, Hadoop, Hortonworks announced a big acquisition, went all in, as they said. And now Cloudera follows suit with their news today. Are you sitting back saying they're catching up to you guys? I mean, how do you look at that? I mean, because you guys have, that's the security stuff nailed out. So what, tell us how you feel about that. No, I think, I mean, if you look at the Hadoop market, it's definitely moving from a test experimental phase into a production phase. We've got tremendous customers across verticals that are doing some really interesting production use cases, and we recognize very early on that to really meet the needs of customers required some architectural innovation. So combining the open source ecosystem packages with some innovations underneath to really deliver high availability, data protection, disaster recovery features. Security is part of that, but if you can't protect the data, if you can't have multi-tenancy and separate workflows across a cluster, then it doesn't matter how secure it is, you need those as best as you can. I got to ask you a different question since we're here at Hadoop Summit. Because we get this question all the time, SiliconANGLE, Loogie Bond. You guys are so successful, but I just don't understand your business model. We're free content and we have some underwriters. So you guys have been very successful, yet people aren't looking at MapR as, you know, the quiet leader. Like you're doing your business, you're making money. Jeff, she had some numbers with us that in the Hadoop community about 20% are paying subscriptions. That's unlike your business model. So explain to the folks out there, the business model, and specifically the traction, because you have paying customers. Yeah, oh no, we've got over 500 paying customers. We've got at least $1 million customer in seven different verticals. So we've got breadth and depth. And our business model is simple. We're an enterprise software company that's looking at how to provide the best of open source as well as innovations underneath. You provide the most open distribution of Hadoop, but you add that value separately to that, right? So it's not so much that you're proprietary at all, right? Can you clarify that? Right, so if you look at this exciting ecosystem, Hadoop is fairly early in its life cycle. If it's a commoditization phase, like Linux or relational database with MySQL, open source kind of equates the whole technology. Here at the beginning of this life cycle, or early stage of the life cycle, there's some architectural innovations that are really required. If you look at Hadoop, it's an append-only file system relying on Linux. And that really limits the types of operations, the types of use cases that you can do. What MapR has done is provide some deep architectural innovations to provide complete read-write file systems to integrate data protection with snapshots and mirroring, et cetera. So there's a whole host of capabilities that make it easy to integrate enterprise secure and scale much better. Do you feel like you were maybe a little early to the market in the sense that, we heard Berv, Adrian, in his keynote this morning, talk about, you know, it's about 10 years when you start to get these questions about security and governance. And we're about nine years into Hadoop. Do you feel like maybe you guys were a little early and now you're at a tipping point where as more and more deployments get ready to go to production, this is going to be an area that's going to become increasingly important? I think our timing has been spectacular. Because we kind of came out at a time when there was some customers that were really serious about Hadoop. We were able to work closely with them and prove our technology. And now as the market is just ramping, we're here with all of those features that they need. And what's an issue is that an incremental improvement to provide those kind of key features is not really possible if the underlying architecture isn't there. And it's hard to provide online real-time capabilities in a underlying platform that's a pand only. So the HDFS layer written in Java relying on the Linux file system is kind of the weak underbelly, if you will, of the ecosystem. There's a lot of important developments happening. Yarn on top of it, a lot of really kind of exciting things that we're actively participating in, including Apache Drill. And on top of a complete read-write file system and integrated Hadoop database, it just makes it all come alive. Yeah, I mean those things on top are critical, but it's the underlying infrastructure that we asked Wikibon community about that. What are the things that are really holding you back from Hadoop and production? And the biggest challenges they cited were high availability, backup and recovery, and maintaining performance and scale. Those are the top three. And that's kind of where MapR has been focused since day one. So if you look at major retailer, 2,000 nodes at MapR, 50 unique applications running on a single cluster, 10,000 jobs a day running on top of that. If you look at the Rubicon project, they recently went public with a hundred million ad auctions, I mean a hundred billion ad auctions a day on top of that platform. Beats Music, they just got acquired for $3 billion. Basically it's the underlying MapR engine that allowed them to scale and personalize that music service. So there's a lot of proof points in terms of how quickly we scale the enterprise grade features that we provide and kind of the blending of deep predictive analytics in a batch environment with online capabilities. So I got to ask you about your go-to market. I'll see Cloudera and Hortonworks have different business models, just talking about that. But Cloudera got the massive funding. So you get this question all the time. How do you count of that army and the arms race? What's your answer to that? I think Dan Woods just wrote an article in Forbes and he says cash is not a strategy. And I think that was an excellent article when he goes in and in this fast-growing market, an amount of money isn't necessarily translate to architectural innovations or speeding the development of that. This is a fairly fragmented ecosystem in terms of the stack that runs on top of it. There's no single application or single vendor that kind of drives value. So an acquisition strategy is somewhat limited. So your field Salesforce is direct or indirect, both mixable, how do you handle the, because Cloudera's got feet on the street and every squirrel will find a nut. If they're parking sales reps in the SDs and all the enterprise accounts, they're going to get the squirrels going to find a nut once in a while. And they're going to actually try to engage the client. So I guess it is a strategy if they're deploying sales and marketing. And I think the beauty about that, and in fact, we're all in this together in terms of sharing an API and driving an ecosystem. It's not a fragmented market. You can start with one distribution and move to another without recompiling or without doing any sort of changes. So it's a fairly open community. If this were a vendor lock-in or, you know, then spending money on brand, et cetera would be important. Our focus is focusing on our execution. You have direct sales? Yes, we have direct sales. We also have partners and it depends on the geographies as to what that percentage is. We have John Schroeder on with the HP at the big data NYC. How's the update with the HP relationship? Oh, excellent. In fact, we just launched our application gallery, our app gallery, make it very easy for administrators and developers and analysts to get access and understand what's available in the ecosystem that's available directly on our website. And one of the featured applications there today is an integration with the MapR Sandbox and HP Vertica. So you can get early access, try it and get the best of kind of enterprise-grade SQL on top of MapR. Basically, it's the first Hadoop app store, basically. Yeah. If you want to call it that way, right? So like, how many apps are available? We launched with Clista 30 with a whole wave kind of following that. So talk a little bit about speaking of Vertica and kind of the SQL on Hadoop. So there's a lot of talk about that, some confusion about the different methods for applying SQL on Hadoop. Where MapR takes an open approach, I know you support things like Impala from a competitor, Valdera. Talk about that approach from a MapR perspective. So I guess our perspective is kind of unbiased open source. We don't try to pick and choose and dictate what's the right open source based on either our participation or some community involvement. And the reality is with multiple applications being run on the platform, there are different use cases that make different sense, whether it's a Hive solution or a drill, drills available, or HP Vertica, people have the choice. And it's part of a broad range of capabilities that you want to be able to run on the platform for your workflows, whether it's SQL access or a MapReduce or a Spark Framework, Shark, et cetera. So yeah, because there's so many different, there's Spark, you can run HP Vertica, you've got Impala, you've got Hive and the Stinger Initiative. Is that whole kind of SQL on Hadoop ecosystem still working itself out? Are we going to have this many options in a year or two years from now? Or are they complementary and potentially, each has its role? I think the major difference is kind of how it deals with the new data formats. Can it deal with self-describing data sources? Can it leverage a JSON file? Does it require a centralized metadata? And those are some of the perspectives and advantages that the Apache Drill has to expand the data sets that are possible, enable data exploration without dependency on an IT administrator to define that metadata. So another, maybe not always as exciting, but taking workloads from existing systems, moving them to Hadoop is one of the ways that a lot of people get started with Hadoop, whether it's those data transformation workloads or there's something in that vein. So I know you've announced a partnership with Syncsort and that's one of the things that they focus on is really making it as easy as possible to make those. And we'll talk a little bit about that partnership, why that makes sense for you and what it's going to bring to your customers. I think it's a great proof point because we announced that partnership around mainframe offload. We have, well, ComScore and Experian in that press release. And if you look at a workload on a mainframe going to Hadoop, that seems like, that's really an oxymoron, but by having the capabilities that MapR has and making that a system of record that full, high availability and that data protection were actually an option to offload from mainframe, offload from sand processing, and provide a really cost-effective, scalable alternative. And we've got customers that had tried to offload from a mainframe multiple times in the past unsuccessfully and have done it successfully with MapR. So talk a little bit more about kind of the broader partnership strategy. We're here at Hadoop Summit. Of course, Hortonworks talks a lot about their partnerships and kind of their reseller arrangements. Flatterer seems to take a little bit more of a direct approach. What's MapR's approach to kind of partnering and as that relates to kind of reseller arrangements and things like that? I think the app gallery's probably a great proof point there. The strategy is an ecosystem approach. It's having a collection of tools and applications and management facilities as well as applications on top. So it's a very open strategy. We focus on making sure that we have open APIs of that application layer, that it's very easy to get data in and out and part of that architecture by presenting standard file system format by allowing non-java applications to run directly on our platform to support standard database connections, ODBC and JDBC to provide database functionality in addition to kind of this deep predictive analytics. Really, it's about supporting the broadest set of applications on top of a single platform. Because what we're seeing in this kind of, this modern architecture is data gravity matters and the more processing you can do on a single platform, the better off you are, the more agile, the more competitive. Right, so in terms of, so you're partnering with people like SAS, for example, to kind of bring some of the analytic capabilities into the platform. Can you kind of tell us a little bit about any work you're doing there? Yeah, there's companies like SAS and Revolution Analytics and SkyTree and I mean just a whole host of companies on the analytics side as well as on the tools and visualization, et cetera. Yeah, well I mean I bring up SAS because I think they get the fact that the whole data gravity situation they've got to go to where the data is and not have the data come to them. So I give them credit for kind of acknowledging that kind of big data truthism that it's all about going to the data and not bringing the data to the computer. Jack, talk about the success you had with the customers. That's a pretty impressive number. We're talking about 500 customers. Merv Angel from Garden was on with us earlier. It's actually reiterating, not mentioning that bar, he was just saying what you guys are doing is right where the puck is going. Some think the puck is not even there and the same rink, some other vendors. So you got to give your props on that. So I want you to talk about the success you've happened specifically around where you're winning and where you're successful and where you guys have struggled to need to improve on. Yeah, there's a whole class of applications that I think Hadoop is enabling which is about operations and analytics. It's taking this high arrival rate machine generated data and doing analytics as it happens and then impacting the business. So whether it's fraud detection or recommendation engines or supply chain applications using sensor data, it's happening very, very quickly. So a system that can tolerate and accept streaming data sources that has real-time operations that is 24 by seven and highly available is what really moves the needle. And that's the examples I used with ad Rubicon project and cable TV. What's the primary outcome? What's the primary outcome your clients want with your product? Is it stability in the platform? Is it an evil development? Is there a specific, is there an outcome that's consistent across all your wins? Well, the big picture, some of them are focused on revenues. Like how do we optimize revenue? Either it's a new data source or it's a new application or it's an existing application or exploding the data set. Some of it's reducing costs so they want to do things like a mainframe offload or data warehouse offload. And then there's some that are focused on risk mitigation. And if there's anything that they have in common, it's as they moved from kind of test and looked at production, it's the key capabilities that they have in enterprise systems today that they want to make sure they're in Hadoop. So it's not anything new. It's just like, hey, we've got SLAs and I've got data protection policies and I've got a disaster recovery procedure and why can't I expect the same level of capabilities in Hadoop that I have today in those other systems? It's a final question. Were you guys heading this year? What's your key objectives? Obviously, you get these announcements out of flurry of announcements, good success. State of the company, how many employees? Were you guys at? Give us the quick update on the numbers. So, we just reported this incredible momentum where we've tripled quarter growth year over year. We've added a tremendous amount of customers. We're over 500 now. So, we're basically sticking to our knitting, focusing on the customers, elevating the proof points here. Some of the most significant customers we have in the telco and financial services and healthcare and retail area are, you know, view this as a strategic weapon, view this as a huge competitive advantage and it's helping them impact their business that's really spring our success. We're growing at an incredible clip here and it's just, it's a great time to have made those calls and those investments early on and kind of reaping the benefits now. I've always said, since the first Hadoop Summit when Hortonworks came out of Yahoo and this whole community kind of burst open, you had Hadoop World, now O'Reilly runs out. It's a whole different vibe of itself. This one's still got the developer vibe. So, I got to ask you, we've always been a big fan. I mean, everyone has enough beach head to be successful. Not about Mapo Arbors, Hortonworks or Cloudera. That's why I always kind of smile when everyone goes, oh, Cloudera or Hortonworks. I mean, they're two different animals at this point doing two different things. You guys are over here. Everyone has their quote, swim lanes or beach head. There's not a lot of super competition. Do you think or is it going to be this way for a while? What's your fork? At some, at what point do you see more competition? 10 years out, I mean, Merv was talking to 10 year horizon for innovation. I think that the more people learn and understand about Hadoop, the more they'll appreciate these kind of set of capabilities that matter in production and post-production and it'll migrate earlier. And as we focus on more developer tools like our sandbox so people can easily get experience and understand kind of what Mapo Arbors is, I think we'll start to see a lot more understanding and momentum. Awesome. Jack Norris here inside the Cube CMO Mapo Arbors. Very successful enterprise grade. A Duke player, leader in the space. Thanks for coming on, we really appreciate it. Right back after the short break here live at Silicon Valley, I had Duke Summit 2014, the right back.