 Okay, we're back here live in New York City for Strata Hadoop World. This is siliconangle.com, Wikibon's The Cube, our flagship program, we're out to the events and extract the students from the noise. This is where all the action's happening for big data. This is Big Data Week in New York City. The Cube was just in Las Vegas last week for information on demand and we're here now with all the actually emerging startups changing the game and big data. I'm John Furrier, the founder of Silicon Angle and I'm joined by my co-host. I'm Dave Vellante at wikibon.org. Go there for all the open source research where practitioners and peers get together. We're here with John Schroeder, who's the CEO and co-founder of MapR. John, welcome to The Cube. Hey, thanks for having me. Well, thanks for coming by, you guys. You guys joined in the Hadoop craze when it was really kind of exploding. You had a different approach. Now, Hadoop has completely gone mainstream and we've heard from many of our guests. But it's not just Hadoop that's big data anymore. A lot of other things are happening. So share with what you guys are doing with Hadoop with your customer base, your technology. We last spoke at Google IO where you had this most incredible benchmark with Google Compute Engine. Similarly, things going on here. So give us the update on MapR and unpack your value proposition relative to what's going on in the Hadoop. Yeah, and actually what we're doing is pretty mainstream for an open source related companies. We've got a free distribution for Hadoop. We've got a paid for enterprise distribution that we provide 24 by seven support on. It does have enterprise enhancement so we've done some things to really improve the Hadoop experience. Really focusing on making it easier to build, deploy, and run Hadoop applications of production. Really improving the reliability with things like backup and recovery and high availability, disaster recovery, mirroring, things like that. And then in general, we can speed up the Hadoop environment quite a bit. So those are really kind of the three pillars for what we do. And we made some new announcements this week as well so we've got some more product coming out. So are you, by nature, an impatient guy? Does this what happened? So you were watching the Hadoop movement and said, ah, it's not going fast enough. I'm going to accelerate it. Take us back to the early days. Well, actually, my first company was in the business intelligence space. So analytics has been something I've worked on since early to mid 90s. So it's a sweet spot for me. Certainly I could see that big data was a high priority issue for really any large companies or even federal agencies from starting with Web 2.0 but rolling across the more mainstream market segments. And then you could really see that Hadoop was really the platform that was going to be able to support the broadest range of use cases. So if you look at, if you go back to 2008, there were so many different competing NoSQL databases and NoSQL technologies. You could see Hadoop growing from its roots where it was going to start from batch, predictive analytics, but it could grow to support a much broader set of use cases. Now we're seeing it grow to do more interactive query processing and even some real time capabilities. So that's really what I saw out there and partnered with a Google technologist that had a vision for how to put together an architecture that would really service the market. And we were off to the races in 2009 and have a great customer base we've built up now. You guys see the take the best of the Hadoop concept and really bolt on commercial industrial grade features to make it stable and run well in normal business environments enterprise environments, whether it's BI, what not. But performance is obviously number one thing people want to get at when they want to get these workloads and you've got Moore's Law always happening and that's one of the main problems with the infrastructure today is one, we talk about the data tsunami on one end, but the other big problem is we need more compute power and performance to crunch the data, right? So talk about that because you set a world record by sorting the terabyte, what, in 54 seconds, something like that, but this speaks to the performance. So share with us one what's going on in this pull performance sector and why is it important. Yeah, well really what you're looking at with big data, the whole revolution here is around having basically infinite capabilities to store and process data, right? So that's becoming the increasingly affordable resource out there. And that's what's really spurring this kind of industrial revolution with these new applications and use cases and we need to continue to push the envelope on that. So we've been partnering quite a bit with some of the cloud providers. We've got a great partnership with Amazon but we've also done a lot of work with Google and this week we announced that we'd spun up a thousand node cluster in the Google compute engine cloud service and we did complete a TerraSort faster than any other Hadoop TerraSort was run before in 54 seconds and we did that on a much smaller hardware footprint as well. So we had about 40% less servers, disk and cores than a previous benchmark. So a pretty dramatic improvement in the performance of Hadoop and a new Hadoop world record. So the stats were 1000 servers, 1003, 4000 cores versus the 1460 previously which was the benchmark was set, record was set by Yahoo. Right, exactly, yeah and that was on-premise equipment. This is also in a virtualized environment so it's also a testament to how well Google compute engine's been architected that they didn't introduce any latencies or performance issues but it's a combination of both. I think it's a common vision for how do you really push performance and scale to its limits. So map are people talking about moving up the stack and obviously the big hype about HBase, obviously HBase is a big part of the growth of Apache. What are you guys looking at that? Cause you're talking about drill and some other things. Talk about your view on that. Yeah, so first there's kind of our way of really creating a standard platform that customers feel comfortable deploying applications on. Everything that touch the application has to be industry standard. So if a customer makes an investment building out an enterprise application they need to be able to move it between the different distributions with very little switching cost. So everything that's done at that layer needs to be defined with an open source. So we didn't do, we were involved with a project called Apache Drill. They're really excited about this building interactive query type technology into Hadoop so it extends that investment that you made in your Hadoop cluster to support another set of use cases. With the case of HBase we've got a well-defined API out there and almost half the customers that we work with that run Hadoop clusters also use HBase. But they do struggle with some of the things. So we added some enterprise enhancements to that and it's really along the same pillars. How do you make it easier? How do you make it more dependable? How do you make it faster? And it's some implementations that if you use HBase you know that compactions cause a lot of outages and issues with running HBase. Region servers are hard to manage. Region splits are difficult to do and problematic. So we've solved those. We put in standard backups so you can back up and do point time recovery. You can do mirroring for disaster recovery. And then again, we did a performance beat up. So what's the path for a customer? So you talk about existing HBase customers. You see them everywhere. What's the path to go from where they are to where you want to take them? The absolute move unchanged. It's a very low switching cost process. I mean basically this is all application compatible binary compatibility. So you don't even have to recompile an app to move it over there. So on MapR the value proposition that you're saying is that you want to bring us in for the kind of those features that are needed for enterprises. What's next? What's on your horizon? Cause MC, your partner when I run a panel in Silicon Valley around big data monetization. The monetization conversation really is about utility. So above, as you move up the stack is applications. Juan, how do you guys play in that above the stack, above you, and what's your vision there? Yeah, some of it is, we are contributing development resources to make it easier to build apps. So we've deployed a file based interface for Hadoop so that makes it easier to build apps. We've, we're contributing resources to the Apache drill project. I think our HBase implementationally easier to build apps for. But then we also have a number of partners out there drawn to scale and datamere and had apped and a number of others that are building technologies that make it easier to build apps on the Hadoop stack as well. Further, I think cloud's a big deal. So, you know, we're an OEM with Amazon, so MapR is a, can be chosen off a menu within Amazon Web Services and you're charged by Amazon. Amazon pays us royalty. And then we've done quite a bit of partnership like when you and I last spoke at the Google IO conference where we're bundled into the Google Compute Engine. And we feel like cloud service is going to be a really, really big innovation. Workday thinks cloud's a big deal too, that's all. Yeah, yeah, yeah. Talking about Google Compute Engine because right now people like to deploy Hadoop and equivalent technologies on bare metal. And the cloud can be a good place to go but some say it's not baked yet. And even Google Compute Engine is really not available yet to the crowds. Talk about that dynamic. It's not that cloud isn't right for Hadoop. Maybe it is and isn't now but it certainly will be. That's a destination that's pretty clear. So there's a dynamic there. Can you share or clear up or disagree? Yeah, so I think really the biggest issue with the cloud is where your data resides. So if you've got a petabyte of data it's hard to move it into the cloud. So if you're Amazon and you're running within Amazon cloud or ready for your applications and that data is already resident there it's a little easier to do analytics on it. If your app isn't there how do you get your data into the cloud, right? So that's an inefficient process. But you look at new applications. We have a server vendor that is a customer of ours. If you look at auto manufacture or actually even if you look at like beverage machines like the Coke freestyle machines these are all devices that are going to be hundreds of thousands to millions of devices that are going to be spread across the world. And they want to be able to do basically telemetry or telematics apps to log back from those devices back into a cloud. Well they need to do that into a regional cloud. So you need to have a data center in Asia back. You need to have a data center in the Eastern Seaboard and the Western Seaboard and so forth. Well who else can do that but the big cloud providers, right? So Amazon and Google will have those presence and then you still have the ability to do a global aggregation, right? So I think as you bring those new apps on board well that data's got to come out of basically a device whether it's a car or a Coke machine or whatever or a server they can make their way into the cloud just as easy as it could to your on-prem equipment. So I think there's some use cases there that are just no brainers for the cloud. Can you talk a little bit more about M7? Yeah sure. You guys made that announcement. Where are you at? Can you talk about, do you have customers and just give us an update there? Yeah, we'll have some reference customers that we'll talk about tomorrow at the keynote session. And it is in beta so it'll be available by the end of the year. It's basically binary compatible, plug and play compatible with your Hadoop application but there's, I don't know how technical we want to get this session. You can geek out if you want for a minute. Geek out. But there's things that happen within HBase called compactions and it causes an IO storm. So your HBase, basically your performance will go in the tank for a period of time while these compactions happen. We basically eliminated that process so it's all really clean. If you look at region splits, within HBase you kind of, what you'd call like key ranges like A through F, G through B, you know. And those key ranges get overloaded so you have to do manual region splits and that's a real headache. Probably causes 30, 40% of our support calls as customers have tried to do a region split and ended up with a problem. So we've made that all go away. That's all automated. We automatically balance all the regions. The backups are really big issue. Disastery cover through mirroring is a really big issue. Performance is a big issue. These are table stakes for enterprises. Right, right. So really if you look at it now with M7, if you were thinking about doing an app with Mongo, you might look at HBase and M7 is a more scalable alternative. And if you're looking at like a big blob store where you wanted a key to be able to access a blob, instead of using something like React or Scalely, you could look at HBase and say, I've got a strong consistency model. I got the ability to back up the data and I've got a set of programming APIs that I could build APIs to actually build apps or deploy use cases against that same data. Hmm, awesome. So what do you think about Impala, Cloudera's big announcement? Because it's so much real time. We're watching Cloudera there, there the lead horse in the Hadoop race. Oh, you mentioned Hadaap too, sort of playing in that same sandbox. Yeah, I think there's a number of SQL in interactive query initiatives out there. So Hadaap's already got product out there, Hive's been out there for a while. Trondoscale has a SQL engine that's a full read write database. I think Apache Drill and Open Source, we're getting a good implementation done that's going to be available on every distribution. So I wouldn't say it's, you know, coming from a business intelligence platform where Slice and Dice query was really important in 1995. It's still important. Don't see it as that much of a breakthrough. I think it's something that's needed and there's quite a few... It's an aspirant, it's an aspirant. Yeah, there's quite a few things you can choose from too, right? There's already been a couple things on the market and then there's some new things coming out. What's a breakthrough in your mind that's coming that hasn't hit the scene yet, that people are working on? I think the most exciting thing is we're seeing even now that'll be deployed in 2013 is around some lightweight OLTP. So we're working with a telco provider that's building a telco billing application on HBase. And that's a big change, right? We're always talking about predictive analytics and here you've got really kind of a lightweight OLTP and you can see why, you know, 20 years ago you had one phone number, then maybe you gained a few cell phones in the family. Well, now you've got a data plan, there's so many transactions to be tracked that you need something more scalable than the traditional technologies, so HBase fits the bill for that. We also have a shipping company that we're doing basically a package tracking application for logistics and similarly, there's just so much activity there, HBase is a great platform for that. So I think the real breakthroughs you're going to see next year is we're going to be talking about some lightweight OLTP. John, final question, we're getting the break here, but I want you to answer this final question of two parts. One, as an entrepreneur, you've been on the startup scene, you've done multiple successful startups, it's addictive on how it is, we don't want to go there on that question, but advice for startup entrepreneurs out there, because one, the market's growing like crazy, so give the perspective as a serial entrepreneur and experienced entrepreneur, advice in this growing ecosystem of how to navigate, what to do, how to get their companies, our ideas off the ground and companies, and the second part of the question is your vision for the next five to 10 years in terms of the ecosystem, obviously looking back over the two years, a lot's changed, and looking forward, say five to 10, what would you see as the ecosystem change? So advice just to either a geek entrepreneur or other companies how to navigate this growing marketplace. So that's within the big data market. Yeah, the whole big data ecosystem. Within the big data market, I'd say almost anything you start as a startup, you have to start with customers. You got to find something that's super high priority, something they need to get done to drive their business, to either make money for their business or save money for their business and make sure that's a top three priority form. If you interviewed CIOs, CTOs in 2008, that's what you'd hear in every single interview. Big data analytics is critical to my business. And then look for where you can add a value proposition. Where could you help? What sort of issues are they running into or what sort of opportunities could you open up for them by creating a new technology? And that's really where you start and then get to the technology part of it. Then go see if you can have the vision or partner with the right technology team that can create a vision for an architecture that can provide that value proposition. And then after that, it's all just execution, right? Okay, your vision for the next five years as this ecosystem is going, we've got Hedappt getting some success with the SQL thing, you're seeing Impala, Yarn, all kinds of new tools, a lot of stuff going on, Hedappt, Hortonworks, Cloudera, MapR, huge growing ecosystem. What's your vision for that ecosystem? I think we're having a good slug fest for who owns the platform, right? And there's a couple players there, I think that's spoken for and we're going to slug it out in that arena. And then I don't think the market's ready for a vertically integrated stack yet, right? I mean, this is like being in the database market in 1991 or 92, where you had Oracle, CyBase, Informix, they'd slug it out based on better technology or better marketing or better sales, whatever. And that's the mode we're in. And then we need other vendors to address other layers of the stack. So we've yet to seen really a turnkey application, right? So it'd be great to see some companies really come out with a turnkey. We are partners with Boeing, where they embed us in one of their applications. But we could see more of that. I think at the app dev layer, we'll continue to see a lot of innovation there. But it won't be probably 10, 15 years until we see really a consolidated stack or vertical integration. And that's an Oracle jump in, John. Yeah, yeah, let's take credit for it. Yeah, maybe this time there'll be a new grand winner. One, great content, great advice for young entrepreneurs trying to make a difference in the world. And obviously the vision is it's a platform war right now and it's all great stuff, great innovation and disruption. So it's an exciting time. Josh wrote a CEO of MapR. Thanks for coming on to theCUBE. Appreciate it, this is SiliconANGLE. We'll be right back after this short break.