 Back live in Silicon Valley in San Jose, California for the continuous coverage of SiliconANGLE.tv and Hadoop World 2012. This is ground zero for the Alpha Geeks in big data. This is the tech elite, we call them tech athletes and we're excited to cover it on the ground. Extract the signal from the noise here. This is theCUBE, our flagship telecast. I'm joined by my co-host Jeff Kelly from wikibon.org. The best analyst in the business. Jeff, welcome back. Another segment into the day, day one. Love it every minute. Okay, we're here with our guest, Jack Norris, the CMO of MapR. Jack, welcome back to theCUBE. You've been on a few times. So you guys have some news? Yes, yes. So let's get right to the news. So you guys are a player in the business, so share with your news, the folks. Excellent. And we'll jump right in. So, two big announcements. Today we announced that Amazon is integrating MapR as part of their Elastic MapReduce service and both edition, our free edition M3 is available as well as M5 directly with Amazon in the cloud. So, what's the value proposition? Why would a customer say, all right, I want to do this in the cloud, MapR and Amazon cloud rather than doing it on premise? Okay, so let's start with, I mean, there's a lot of value propositions all pulled up into one here. First of all, in the cloud, it allows them to spin up very quickly. Within a couple minutes, you can get hundreds of nodes available. And depending on where you're processing the data, if you've got a lot of data in the cloud already, it makes a lot of sense to do the Hadoop processing directly there. So that's one area. A second is you might have an on-premise cloud deployment and need to have a disaster recovery. So MapR provides point-in-time snapshots as well as wide area replications. So you can use mirroring. Having Amazon available as a target is a huge advantage. And then there's also a third application area where you can do processing of the data in the cloud and then synchronize those results to an on-premise. So basically process where the data is, combine the results into a cluster on-premise. So you don't have to move the raw data on-premise. Exactly, it's all about let's do the processing on the data. Well, you know, the whole value proposition of big data in general is let's not move data as little as possible. Yep. You know, so you bring the computation to the data if you can. So what do you take on this event? I mean, we've got, this is, you know, the fourth Hadoop Summit, you know, Hortonworks is now fully taken over the show and talk about what you see out here in terms of the other vendors play and just the kind of the attendees, the vibe you're seeing here. It's a lot of excitement. I think, you know, big difference between last year which seemed to be very developer focused. We're seeing a lot of presentations by customers. A lot of information was shared by our customers today. It was fun to see that. ComScore shared their success, Boeing, YapMap, it was great for us. Fantastic. So Amazon, we look at Amazon. Amazon, first of all, is the gold standard for public cloud, right? They've knocked it out of the park. Everyone knows Amazon. But they've been criticized on the big data front. Because of the cycle times involved on EC2 and some developers. I mean, for web servers, spinning up and down, no problem. And we're seeing businesses like Netflix run on Amazon. So, you know, Amazon is not a stranger to running scale for cloud. But Hadoop has kind of been a collusion thing for Amazon. So I think, you know, talk about why Amazon and you guys, it's a good fit. Obviously the market reach is great. So you guys now have a huge addressable market. Are you guys helping solve some of that complexity with the MapReduce side? What's the core? I guess the first comment, first response would be, I think every customer should have that type of Cluj. If they could have the success that Amazon has in Hadoop. They have a huge number of Hadoop deployments. They've been very, very successful. I think we're... Well, I mean, you know what I mean. It's natural, it's Clujy everywhere right now. That's the problem. Amazon has huge scale and Hadoop's not a natural fit there. And by not a natural fit. For the data component and the H-Base, for example. H-Base component. So where Amazon's made it very frictionless is the ability to spin up Hadoop, to do the analysis. The gap that was missing is some of the HA capabilities, the data protection features, the disaster recovery, and where MapR now gives options to those customers. If they want those kind of enterprise grade features, now they have an option within EMR. They can select M5 and get moving if they want performance and NFS, they've got the M3 option as well. Well, congratulations. I think it's a great deal for you guys and for Amazon customers. My question for you is, as you guys explore the enterprise ready equation, which has been a big topic this week, what does that mean to you guys? Because it means different things to different people. It depends on where, how high up to OLTP do you go, right? I mean, where, how far from batch to real-time transactional levels you go. I mean, low-end batch, no problem. But as you start to get more near real-time, it's going to be a little bit different gray. And this obviously security with HDFS as well. Yeah, so Hadoop represents the strategic platform, right? Deploying that in an organization, moving from kind of an experimental and a lab-based to production environment creates a different set of feature requirements. How available is it? How easy is it to integrate, right? How do I kind of protect that information? And how do I share it? So when we say enterprise grade, we mean you can have SLA. You can put the data there and be confident that the data will remain there, that you can have a point-time recovery for an application error or user mistake. You can have disaster recovery features in place. And then the integration is about not recreating the wheel to get access to the information. So Hadoop is very powerful, but it requires interacting through an HDFS API. If you can leverage it like through MapR with NFS, standard file-based access, standard ODBC access, open it up so I can use a standard file browser, applications to see and manipulate the data really opens up the use cases. And then finally what we announced in 2.0 was multi-tenancy features. So as you share that information, all of a sudden the SLA's of different groups and well these guys need it immediately and if you've got some low-grade batch jobs are going to impact that. So you want the ability to protect, to isolate, to secure information and basically have virtual clusters within a cluster and those features are important to cloud but they're also important to on-premise platforms. That's also great for the hybrid cloud environments out there. I mean the multi-tenancy crack on the code on that, that's huge. I mean that is basically, I mean right now most enterprises are like private cloud because it's like, they're basically extension of their data center and you're seeing a lot more activity in the hybrid cloud as a gateway to the public cloud so. And you know, frankly people are kind of struggling with an experimental with Apache Hadoop and the other distributions, the policies are either at the individual file level or the whole cluster. It almost forced the creation of separate physical clusters which kind of goes against the whole Hadoop concept. So the ability to manage a logical layer, have separate volumes where you can apply policies to apply, that applies to all the content underneath really kind of makes it much, much easier for administrators to kind of deal with these multiple use cases. And with Amazon, Amazon has always been one of those cases for the enterprise where it's been one of those, and they've, this has been talked about for years, put the credit card down, go play on Amazon but then bring it back into the IT group for certification. And so I think this is a nice product for you guys to bring that comfort, you know. We're very excited about it. To the enterprise, hey, come play in Amazon, it's bulletproof, enterprise ready. So congratulations. Undra, can we talk use cases? So what are you seeing in terms of evolving use cases as Hadoop continues to become more enterprise grade, depending on your definition, but how is that impacting what you're seeing in terms of, even if it's just the mindset people think now, okay, now it's enterprise grade, well maybe, and depending on who you talk to, it's been that way for a bit, but what kind of use cases are you seeing developed now that it's kind of starting to gain acceptance as like, okay, we can trust our data's going to be there, et cetera. So there's a huge range of use cases that differ by industry, differ by kind of data set that's being used against everything from really a deep store where you can do analytics on it, so you're selecting the content to something that's very, very analytic, machine learning intensive, where you're doing sophisticated clustering algorithms, et cetera. Where we've seen kind of an expansion of use cases are around real time streaming, and you get streaming data sets that are kind of entering into the cloud, and some of the more mission critical data, moving beyond just maybe click stream data or things that if you happen to drop a few, not a big deal versus the kind of trust the business type of content. So talk a little bit about the streaming aspects, because of course, we think of Doop, we think of a batch system, in terms of streaming data into Doop, that's something we haven't heard a lot about, so how do you guys approach that? So one of the artifacts of HDFS, which is a distributed file system that stores it in the underlying Linux file system, it's a PEND only. So as an administrator, you decide how frequently do I close the file? I'm going to do that an hourly basis on every eight hours, because you have to close the file for other applications to see the data that's been written. So one of the innovations that we pursued was to rewrite that, create this dynamic read-write layer, so you can continue to write data and any application is seeing the latest data that's written. So you can mount the cluster as if it's storage and just continue to write data there. Really opens up what's possible, companies like Informatica, they're all from messaging product, integrates directly in with MapR and provides those advantages. So what kind of advantages does that provide to the end user? We'll translate that into real business value, why is that important? Well, so one example is ComScore. ComScore handles 30 billion objects a day as they go out and try to measure the use of the web. And being able to continually write and stream that information and scale and handle that in real time and do analytics and turn around data faster has tremendous business value to them. If they're stuck in a batch environment where the load times lengthen to the point where all of a sudden they can't keep up and they're actually reporting on old news and I think the analogy is forecasting rain a day after it's wet isn't exactly valuable. Jack, so you guys are obviously a great deal of the enterprise ready for Amazon, big story, big coup for the company. What's next for you? I want to ask that and make sure you get that out there on your agenda for the next year. But then I want you to take a step back a year, maybe a year and a half ago, and look back at how much has changed in this landscape. Share your perspective because the market has gone through an evolution where there's been a market opportunity and then everyone goes, oh my God, it's bigger than we actually thought. I mean, Jeff Kelly's groundbreaking report about the 50 billion dollar market is now being talked about as too low. So big data has absolutely opened up into a huge, and it's changed some of the tactics around strategies, your strategy, Hortonworks strategy, even Cloudera. And it's still evolving. So what's changed for the folks out there from a year and a half ago or a year ago to today, and then looked up for the next 12 months, what's on your agenda? Well, if you look back, I think we've been fairly consistent. I'm not going to take credit for the vision of our CEO and CTO, but they recognized early on that Hadoop was a strategic platform. And to be a strategic platform that applied to the broadest number of use cases and organizations required some areas of innovation. And particularly how it scaled, how it was managed, how you stored and protected the information needed a rearchitecture. And I think that architecture matters when you're going through a paradigm shift, having the right one in place creates this ability to speed innovation. And I think that's, if there's anything that's changed, I think it's the speed of innovation has even increased in the Hadoop community. I think it's created a focus on these enterprise grade features on how do we store this valuable information and continue to extract value from it. And I think one of the observations I'll make is that on that notice is that it really focuses on everyone to be just mind your own business and get the products out. You know what I'm saying? We've seen everyone, the product focus be the number one conversation. What we've seen is customers start and they expand rapidly. Some of that's due to data growth, but a lot of it is due to more and more applications are being delivered and the value is kind of extracted from the new platform and success breeds success. Well congratulations for all your success. Great win with Amazon, web services is going to make that a little bit more easier, more robust and more features for them and you, more revenue for both. And I want to personally thank you for your support to theCUBE. We've expanded with the new Studio B you saw for extra interviews and want to expand the conversation. Thanks to your generous support. You can bring the independent coverage out to the market and great community. Thanks for helping us out and we appreciate it. So thank you. Okay, Jack Norch with MapR would be right back to wrap up day one with Jeff and I will give our analysis right after the short break.