 Okay, we're back here live in New York City for Big Data Week. This is SiliconANGLE.tv's exclusive coverage of Hadoop World, Strada plus Hadoop World, big event, Big Data Week We just wrote a blog post on SiliconANGLE.com calling this the South by Southwest for data geeks and and it's my prediction that this is going to turn into a quite the geek fest. Obviously the crowd here is enormous packed and an amazing event and we're excited. This is SiliconANGLE.com. I'm the founder John Furrier I'm joined by co-host. I'm Dave Vellante of Wikibon.org where people go for free research and peers collaborate to solve problems and we're here with Jack Norris who's the vice president of marketing at MAPR, a company that we've been tracking for quite some time. Jack, welcome back to theCUBE. Thank you, Dave. I got to hand it to you. We met quite a while ago now as well over a year ago and we were pushing at you guys saying, well, you know, open source and nice. Look, we're solving problems for customers. We got the right model, we think. This is our strategy. We're sticking to it and watch what happens. Like I said, I have to hand it to you. You guys really have some great traction in the market. You're doing what you said. So congratulations on that. I know you've got a lot more work to do. Yeah, and actually the topic of openness is one that's pretty interesting. If you look at the different options out there, all of them are combining open source with some proprietary. Now, in the case of some distributions, it's very small, like an ODBC driver with a proprietary driver. But I think it represents that any solution combining to make it more open is important. So what we've done is make innovations, but while we've made those innovations, we've opened up and provided APIs like NFS for standard access, like REST, like ODBC drivers, et cetera. So it's a spectrum. Actually, we were in Oracle Open World a few weeks ago. You listen to Larry Ellison talk about the Oracle Public Cloud and it makes it a very strong case that it's open. You can move data, it's all Java, so it's all about the standards. Yeah, absolutely. But it's really all about the business value. That's what the bottom line is. So we had your CEO, John Schroder, on yesterday. John and I both were very impressed with essentially what he described as your philosophy. We announce a product when we have customers when we announce that product. That's impressive. He was also giving some good feedback to startup entrepreneurs out there who are obviously a lot of action going on with the startup community. He said the same thing, get customers and that's it. Use your tech, but don't be so locked into the tech. Get the customers understanding the needs and then deliver that. You guys have done great and I want to talk about the show here. You guys have a big booth, a big presence here at the show. What are you guys learning? How's the positioning? How's the new M7 news hitting? Give us a quick update. So, a lot of news. First started on Tuesday where we announced the M7 edition. I brought a demo here for you all because the big thing about M7 is what we don't have. We're not demoing region servers. We're not demoing compactions. We're not demoing a lot of manual administrative tasks. So what that really means is that we took the stack and if you look at HBase, HBase today has about half of Hadoop users adopting HBase. So it's a lot of momentum in the market and used for everything from real time analytics to kind of lightweight OLTP processing. But it's an infrastructure that sits on top of a JVM, that stores its data in the Hadoop distributed file system, that sits on a JVM, that stores its data in a Linux file system that writes to disk. And so a lot of the complexity is that stack. And so as an administrator you have to worry about how data gets kind of basically written across that. And you've got region servers to keep up when you're doing kind of writes. You have things called compactions which increase response time. So it's a complex environment. And we've spent quite a bit of time in collapsing that infrastructure. And with the M7 edition you've got files and tables together in the same layer writing directly to disk. So there's no region servers. There's no compactions to deal with. There's no pre-splitting of tables and trying to do manual merges. It just makes it much, much simpler. Let's talk about some of your customers in terms of the profile of these guys. I'm assuming, and correct me if I'm wrong, that you're not selling to the tire kickers. You're selling to the guys who actually have some experience with Hadoop and have run into some of the limitations and you come in and say, hey, we can solve some of those problems. Is that right? Yeah, that's pretty good characterization. I think part of it is when you're in the evaluation process and when you first hear about Hadoop, it's kind of like the Gartner height curve, right? And this stuff, it does everything. And of course you've got data protection because you've got things replicated across the cluster. And of course you've got scalability because you can just add nodes and so forth. Well, once you start using it, you realize that, yes, I've got data replicated across the cluster, but if I accidentally delete something or if I've got some corruption, that's replicated across the cluster too. So things like snapshots are really important so you can return to what was it five minutes before? Performance where you can get the most out of your hardware, ease of administration where I can cut this up into logical volumes and have policies at that whole level instead of it an individual file. So there's a bunch of features that really resonate with users after they've had some experience. And those tend to be our kind of key customers. There's another phase too, which is when you're testing Hadoop, you're looking at what's possible with this platform? What type of analytics can I do? When you go into production, now all of a sudden you're looking at how does this fit in with my SLAs? How does this fit in with my data protection policies? How do I integrate with my different data sources? And can I leverage existing code? We had one customer, a large systems integrator for the federal government. They have a million lines of code that they were told to rewrite to run with other distributions that they could use just out of the box with MapR. So let's talk about some of those customers. Can you name some names and get specific? Sure, so actually I'll talk with, we had a keynote today and we had this beautiful customer video that we had to cut because of time. So it's running in our booth and it's streaming on our website. And I think we've got actually some of the bumper here we kind of inserted. But I want to shout out to those because they ended up in the cutting room floor. Yeah, it's good we've actually been running it here this week. So one was Rubicon Project and they're an interesting company. They're a real-time advertising platform at Auction Network. They recently passed Google in terms of number one ad reach as mentioned by ComScore. And a lot of press on that. I particularly like the headline that mentioned those three companies because it was measured by ComScore and ComScore is a MapR customer and Google's a key partner. And yesterday we announced a world record for the Hadoop Terrasort running on Google. So M7 for Rubicon, it allows them to address and replace different point solutions that were running alongside of Hadoop. And it simplifies their, potentially simplifies their architecture because now they have more things done with a single platform, increases performance, simplifies administration. Another customer is Ancestry.com who, you know, maybe you've seen their ads or heard some of their radio spots. They do a tremendous amount of data processing to help family services and genealogy and figure out family backgrounds. One of the things they do is DNA testing. So for an internet service to do that advanced technology is pretty impressive. And, you know, you send them, it's $99, I believe, and they'll send you a DNA kit. You spit in the tube, you send it back, and then they process that and match and give you insights into your family background. So for them, simplifying HBase meant additional performance so they could do matches faster and really simplified administration. So, you know, in Molina Graham's words, you know, it's simpler because they're just not there, those components. Jack, I want to ask you about Enterprise Grade, Hadoop, because, you know, and then Ted Dunning, because he was mentioned by Tim Estes on his keynote speech. So you have some rock stars in the company as to his management team. We had your CEO and we've interviewed MC Shrievis and Google I.O. when we were on a panel together. So I have to know your team. Solid team. I was supposed to talk about Ted in a minute, but I want to ask you about the Enterprise Grade, Hadoop, conversation. What does that mean now? I mean, obviously you guys were very successful at first. Again, we were skeptics at first, but now your traction and your performance has proven there's a market for that kind of platform. What does that mean now at this event today as this is evolving, as Hadoop, the ecosystem is not just Hadoop anymore, it's other things? Yeah. There's three dimensions to Enterprise Grade. The first is ease of use. And ease of use from an administrator standpoint, how easy does it integrate into an existing environment? How easy does it fit into my IT policies? Do you run in a lights out data center? Does the Hadoop distribution fit into that? So that's one whole dimension. A key to that is, you know, complete NFS support. So it functions like, you know, like standard storage. A second dimension is undependability, reliability. So it's not just, you know, do you have a checkbox HA feature? It's do you have automated stateful failover? Do you have self-healing? Can you handle multiple failures and, you know, automated recovery? So, you know, in a lights out data center, can you actually go there once a week and then just, you know, replace drives? And a great example of that is one of our customers had a test cluster with MapR, it was a POC, went on to do other things, they had a power failure. They came back a week later and the cluster was up and running and they hadn't done any manual tasks there and they were just blown away. The recovery process for the other distribution is a laundry list of... So I got to ask you... What's the third one? The third one is performance. And performance is, you know, kind of raw speed. It's also how do you leverage the infrastructure? Can you take advantage of the network infrastructure? Multiple nicks. Can you take advantage of heterogeneous hardware? Can you, you know, mix and match for different workloads? And it's really about sharing a cluster for different use cases and different users. And there's a lot of features there, it's not just raw speed. So ease of use, fitting into the existing IT infrastructure policies, the whole, what happens when something goes wrong? How do you automate that and then speed? Easy dependable fast and we did the same thing, making HBase easy dependable fast with M7. So the talk of the show right now, you had the keynote this morning, is that MapR Marketing has dropped the big data term and we're going with Data Cosm. Is that true? Is that true? So Joe Hellestine just had a tweet. Joe, his name is Cal Berke, professor of computer science, professor of now's CEO of a startup. What's the name of the startup? Trifecta. They're doing a good couple of epic tweets this week. So shout out to Joe Hellestine. But Joe Hellestine's tweet just says, MapR Marketing has decided to drop the term big data and go with Data Cosm, with a shout out to George Gilder. So kind of an intellectual kind of humor. So what's your response then? Is it true? What's happening? Are you a VP of marketing? Yeah. Well, if you look at the big data term, I think there's a lot of big data washing going on where architectures have been out there for 30 years or all about big data. So I think there's a need for a more descriptive term. The purpose of Data Cosm was not to try to coin something or try to change a big data label. It was just to get people to take a step back and think and to realize that we are in a massive paradigm shift. And with a shout out to George Gilder, he recognized what the impact of making available compute meant. He recognized with Telecosm what bandwidth would mean. And if you look at the combination of, we've got all this compute efficiency and bandwidth. Now Data Cosm is basically taking those resources and unleashing it and changing the way we do things. And I think one of the ways to look at that is the new things that will be possible. And there's been a lot of focus on SQL interfaces on top of Hadoop, which are important. But I think some of the more interesting use cases are taking this machine-generated data that's being produced very, very rapidly and having automated operational analytics that can respond in a very fast time to change how you do business, either how you're communicating with customers, how you're responding to different risk factors in the environment for fraud, et cetera, or just increasing and improving your response time to kind of cost events. Yeah, I met it earlier. He called it actionable insight. Then he said assigning intent and being able to respond. Well, it's interesting that you talked about the George Gilder riff and getting to the concept, abstract concepts. But he also was very big in supply-side economics. And so if you look at the business value conversation, one of the things we pointed out yesterday in this morning's opening review was the top conversations in site and analytics as a killer app right now, the app market has not developed and that's why we like companies like Continuity and what you guys are doing. Under the hood is being worked on at many levels, performance units and those three things. But analytics is a no-brainer insight. But the other one's business value. So when you look at that kind of data-cosm, I can see where you're going with that. And that's kind of what people want because it's not so much like I'm Republican because he's Republican, George Gilder, and he bought American Spectator, everyone knows that. So obviously he's a Republican. But politics aside, the business side of what Big Day is implementing is massive. I guess that's a Republican concept. But not really. I mean business is all parties. So relative to data-cosm, I mean no one talks about e-business anymore. We're talking IBM at the IBM conference and they were saying, hey, that was a great marketing campaign but no one says, hey, are you in e-business today? So we think that Big Data is going to have the same effect which is, hey, do you have Big Data? No, it's just assumed. So that's what you're basically trying to establish that it's not just about Big Data. Yeah, let me give you one small example from a business value standpoint. And Ted Dunning, you mentioned Ted earlier, chief application architect and one of the co-authors of the book Mahoot which deals with machine learning. He dealt with one of our large financial services companies and one of the techniques on Hadoop is clustering, near us neighbors, different algorithms and they looked at a particular process and they sped up that process by 30,000 times. So there's a blog post that's on our website. You can find out additional information on that. On this one point. But I think to your point about business value and what does data cause them to really mean, that's an incredible speed up in terms of performance and it changes how companies can react in real time. It changes how they can do pattern recognition and Google did a really interesting paper called the unreasonable effectiveness of data and in there they say simple algorithms on Big Data, on mass amounts of data beat a complex model every time and so I think what we'll see is a movement away from data sampling and trying to do an 80-20 to looking at all your data and identifying where are the exceptions that we want to increase because they're revenue exceptions or that we want to address because it's a cost or a fraud issue. Well that's what I would give a shout out to the guys at Digital Reaching. Tim Asty's one plugged Ted. I idolized him in terms of his work. Obviously his work is awesome. But two, he brought up this concept of understanding gap and he showed an interesting chart in his keynote which was the data explosion. It's straight up, right? It's a massive amount of data. 64% unstructured by his calculation. Then he showed a flat line called attention. So as data has been exploding over time attention, user attention is flat with some uptick maybe. So users and humans they can't expand their mind fast enough so machine learning technologies have to bridge that gap. That's analytics, that's insight. There's a big conversation now going on about more data or better models people trying to squint through some of the comments that Google made and say does that mean we just throw out the models? But the question I have is do you think and your customer's talking about okay well now they have more data can I actually develop better algorithms that are simpler and is it a virtuous cycle? Yeah, I think, I mean there are a lot of debate here a lot of information but I think one of the interesting things is given that compute cycle given the kind of that compute efficiency that we have and given the bandwidth you can take a model and then iterate very quickly on it and kind of arrive at insight and in the past it was just that amount of data and that amount of time to process okay that could take you 40 days to get to the point where you can do now in hours. Right, so I mean great example is fraud detection. So we use the sample six months later hey your credit card might have been hacked and now you get a phone call or you can't use your credit card or whatever it is but there's still a lot of use cases where you know weather is an example where modeling and better modeling would be very helpful but excellent. So Data Cosmo are you planning other marketing initiatives around that or is this sort of chunk and cheek fun throwing out their little red meat into the chum in the water as John says? You know what really motivated us was you know the cubes here talking for the whole day what could we possibly do to help give them a topic of conversation? Okay Data Cosmos now of course we found that on our proprietary HBase tool you know Jack Norris thanks for coming in we appreciate your support you guys have been great we've been following you continue to follow you've been a great support I want to thank you personally while we're here MAPR has been generous underwriter support of our great independent editorial want to recognize you guys thanks for your support and we continue to look forward to watching you guys grow and kick ass so thanks for all your support and we'll be right back with our next guest after this short break thank you 10 years ago the video news business believed the internet was a fat the science has settled we all know the internet is here to stay bubbles and busts come and go but the industry deserves a news team that goes the distance coming up on social angle are some interesting new metrics for measuring the worth of a customer on the web every morning we're on the air to bring you the most up to date information on the tech industry with scrutiny on releases of the day and news of industry-wide trends we're here daily with breaking analysis from the best minds in the business join me Kristen Folletti daily at the news desk on silicone angle TV your reference point for tech innovation 18 months ago we saw