 Okay, we're back here live in Silicon Valley, San Jose Convention Center, this is theCUBE, this is Hadoop Summit, this is Hortonworks, Yahoo, the ecosystem of big data developers in the Hadoop ecosystem talking tech. We're theCUBE, we're here in the ground zero for two days of live coverage. I'm John Furrier, the founder of SiliconANGLE, and I'm joined by my co-host. I'm Dave Vellante at wikibon.org. Merv Adrian is here at Merv. Merv gave a keynote talk today. He just completed a big survey. Merv's with Gartner, the lead big data analyst down there, Merv, welcome to theCUBE. Thanks, great to be here, and I can't take credit for the survey. Couple of my colleagues did it, that's the nice thing about being in a big research firm. You get to take advantage of other people's work. So how'd that work? You sort of gave them input to help design it, and then they go off and execute it, and then you get the data dump, right? Yeah, the big research firm process is very collaborative, we share content early, we share content late, we design survey instruments, we analyze the information together, and we write the research together. So I got to help shape some of the questions, but I can't take any of the credit. They did a great job. It's our second year. It's a thing we call the Gartner Research Circle. It's a set of people who are enterprises. The mean for this particular survey was about 3.2 billion in size, 5,100 employees. Global, every geography represented. And it's the second year we've done the survey. So you get some longitudinal comparability, which is very useful when you're looking at an emerging market like this. I'm really excited that you're on theCUBE because, one, we've been following your career obviously in big data since the beginning, since it started, actually. And we do a lot of the events, I'll see all the Duke summits and Duke worlds and strata and whatnot, and you're active on Twitter, so we see you out there every day, you're doing some great work, really one of the best analysts on the planet in terms of your data. You go deep, you poke at things that people aren't seeing, and so just kudos to you. Thanks for coming on theCUBE and sharing your awesome insight. So let's get right to it. So what are you seeing right now? Let's go in and talk about some of the things that you're seeing here today. Obviously, don't mention security in your keynote. You got Storm, you have Knox, you have... A lot of different stuff going on. What are you seeing? What are you looking at right now? What's emerging out of this summit that you could share with the folks that wasn't mentioned in the keynote? Sure. Well, the theme there was maturity, the fact that a lot of things are actually getting close to being used by real people for real things. Obviously, there's real deployment going on now. Serious use of the technologies is underway. And I have to tell you, I was reminded during my prep for the speech, looking at a little bit of past coverage and I remember seeing something from a year ago about yarn and how cool 2.0 was gonna be and hear all the great things. And I heard almost exactly the same speech today and that's an artifact of the open source process. Things are visible in the open source community long before they're ready for mainstream consumption. And that's a good thing, but it also is a challenge because people come up and remind you, you said this last year and it's still not shipping. What's the deal? Commercial software companies get to cover that. They don't talk about it until they're reasonably ready to put it in beta. So what we talked about last year is real now. And some things that we hadn't talked about last year are real now. So search is really showing up. The folks at Lucidworks are really excited about everybody coming back to Lucene. We're seeing the deal they did with MapR. We're seeing the work that Cloudera is doing to leverage this and other people are doing search as well. So once you bifurcate the data resource into the pre-curated optimized data warehouse side and what everybody seems to be increasingly calling the data lake, the undifferentiated pile that we're gonna go and pull stuff out of from time to time, search becomes the absolutely impossible to avoid first step to do anything useful there. So that I think is gonna be huge. The rise of interactive analytics on Hadoop is probably the second biggest top of mind theme right now. Even though there's a lot of, I think, misstatement about it being real time, there's a continuum here. Batch is at one end and real time is at the other. And what's in the middle is something called interactive and that's what we're doing now. There's a little activity in real time, but not much. So I got to ask you, obviously, Dave and I were talking about this on the intro, is that Hadoop has run, it's still so much demand for it, but there's been an effort to almost force commercialization a little bit too early, whether it's making it more of a business-oriented event or trying to push for monetization. We saw just two years ago the big data fund from Excel and Cloudera saying it's going to be a tsunami of apps, but yet only analytics kind of hits the stage. So I got to ask you, do you see applications coming next? Do you see platform stability and community coherency or our stability as a job one to the bigger picture of app developers? Because MongoDB has got a lot of traction on the developer side because of how easy it is with the LAMP stack. So where are we with Hadoop? I mean, are we yet to hit that wave of apps coming in or are we still stuck on the analytics and data warehousing piece and platform stability? How do you see that? I think we're in the early moments of the platform becoming accepted as a vehicle for moving forward with a variety of different applications. There are a couple of necessary preconditions here and really the gaps that I talked about at the end of the presentation are those preconditions. We have to have confidence in the security of the platform. That's the number one problem. We just heard Intel talking about it. There are vendors out there who currently are making their play on it, Zettiset, is talking about their distribution as the secure Hadoop distribution because the market wants that before they rely on it. We need a governed platform and we heard a little bit today about Falcon. We need security and we heard about Knox. For the open source community to catch up with what commercial software platforms already provide, they've got to fill those gaps. Once those gaps are filled, we'll see the kind of explosion you're talking about. I call it the weakest link in your talk today and it clearly is you put your data in Hadoop. The biggest problem, what I said was the biggest issue we see year after year for at least five years in Gartner surveys of CEOs, when we ask them about IT, what's the number one thing? It's trust. Can I trust what these guys are handing me? And I spent a decade building governance for data in my organization and income these hoodies with their new Hadoop platform with a bunch of data. Where did the data come from? We downloaded it from the internet. Well, what did you use on it? We got these open source tools. Okay, who's the owner? What's the provenance? How private is it? I don't know the answers to any of those questions. Okay, so you want me to rely upon the results you've derived from working with that data? I don't think so. My customers, Gartner's customers, are mainstream organizations for whom that trust in the data remains the number one issue and until they can have confidence in it, that barrier is still going to be there. So one of the big findings of your survey was that 31, and it was consistent from year to year, last year to this year, last year, there's 470 people this year, over 650, 31% said they have no plans over the next two years to initiate big data investments. Yeah. So on the one hand, that sounds pretty big. On the other hand, I'm not that surprised depending on who you were talking about. Now were you predominantly talking to IT people or was it sort of a mix? Were you going after the line of business folks and would you expect that response to be different if you were going after those? This was very skewed to IT and it was across all industries and all geographies and sizes as I said. But at the end of the day, it's IT that the CEO and CFO rely on if they're going to make business decisions. And certainly a lot of experimentation happens outside and I talk with the guy I call Chuck with the cluster in the closet who's off experimenting outside the IT organization and coming up with exciting new things. But at some point if there's value, it comes back to IT and the suits are right. They are the guys who get called in to clean it up and make sure that it's always available and that can always be relied on. And for that to happen to your earlier question we've got to fill those gaps. And I don't know that the third of people who aren't ready to invest in big data are not doing so because they don't think the platform is ready. I think they're not doing so because they don't have a real understanding of the value. I would say I... I'm confident maybe too. Yeah, confidence is an issue but what I'm really going after here is that I get as many questions still at Gardner about what is this thing? Is it really just hyper? Is there real value there? We still have some more selling to do to get some of those guys to believe that this isn't just the latest marketing campaign from the IT organization. Yeah, because business value oftentimes trumps some of those security concerns and the confidence concerns. Quite often. Because of business value they'll oftentimes take the risk but we've seen this movie before, Meriv. I mean, three of us have been around for a while. We started with distributed computing. We're certainly seeing it now with the cloud. Do you think that the dissonance between shadow IT, lines of business and whatever you want to call it in a core IT will, that gap will close faster than it has historically? Are we going to make the same mistakes that we have over and over and over again and create a mess and then ask IT to come in and clean it up? What's your thoughts on that? Well, if I could restate your question I think you're asking me do even people who do know history find themselves doomed to repeat it? I think the answer to that is yes. The truth is we do keep doing this and as we look at wave after wave of technology innovation, when its value is proven it becomes hardened, it gets put into place and it becomes the new religion until the new profit appears in the wilderness and we're here at the Hadoop Summit but there's already people running around here talking about forget that and let's start talking about SPARC, okay? So there's always innovation at the edge and consolidation at the center. That's a good thing, it's a very healthy thing and one of the most interesting metaphors I heard today was the Peloton metaphor and if you heard that at the end if you know bike racing the Peloton is all those cyclists that were together and every once in a while a few guys break away and if you're watching on TV or if you're one of those riders that lasts for a while until the Peloton catches up. It's a wonderful metaphor for the open source community that people will come up with extraordinary innovations and folks out here are going to say you know what, I can write that, I can reproduce that code and I can do it out here in the open source community. Well that's democratization, that's competition, that's a free market. And you talked about alternative architectures to Hadoop. Are you suggesting that open source ultimately will be the Peloton that catches up to those alternative architectures or can they add enough value fast enough? This is an entirely new market dynamic. We've seen it develop with increasing rapidity since I guess Linux was the first really big poster child, right? This time around the cycle is a lot quicker. Yeah, sure is. I mean nine years ago they were adding the distributed file system to Nutch, right? And here we are with 2,500 plus people, 70 sponsors selling products and an enormous number of clients I talk to every day who are calling saying, okay, I need to get involved in this thing. How's it going to help me and how do I get started? So we talk about those guys all the time. So I talked to a CIO, he was talking about compliance issues, that's an inhibitor, but he's still doing a huge POC, billion dollar operating budget they have that do a huge POC on Hadoop. Big issue is compliance. So I was a little skeptic there in terms of confidence, but still value, just looking at the value. But I got to ask you about alternatives because we're living in an API world where there's layers of abstraction where value can be created at any different layer. Think OSI stack, whatever you want to analogy you want to use. Amazon has certainly shown that, hey, I can produce a kick-ass cloud stack and do big data. Dave and I were also commenting throughout the past month that all the different CUBE events we've done that the industry standard bodies now are open source communities. As almost as a matter of fact, I'm pretty confident on that. But I got to ask you about a role of Amazon as an agitator, as an innovator, as a commoditizer and also open stack. So now these are market forces that are somewhat adjacent to Hadoop pure play. So is that another angle that people have to catch up to or what's your take on that? How does that all shake out for you? I tend to divide things into twos a lot and I look at the web-native organizations out there, businesses that started on the web and I think that it's unlikely that they'll ever go to an on-premise world. They may have a tiny fraction of their infrastructure at some point in the future on-prem, but if they started in the cloud, they're probably going to stay in the cloud. For the rest of the world, tracking what's going on at Amazon is a little tricky because they're quite opaque. They report large aggregate numbers and they tell us stories. The stories are useful, but it's not always clear how representative they are. So we know that they started five and a half million elastic map-reduced clusters between March of last year and March of this year. That's a pretty amazing number. Even if half of them were Hello World, that's a lot of elastic map-reduced in the cloud that nobody bought CapEx for. Nobody went out and bought servers and storage and ramped up, they just used an interface. Great place to experiment and we know of people who are doing useful things in the cloud. What we don't know until we really do some serious formal research and I hope we can get to it is even those poster children, how much of their portfolio is inside Amazon? Lots of people have some stuff out there and it's working very well for them, no knock. Shadow IT, some tire kicking, no problem, test dev, no big deal. And some production apps. Right now, Amazon's stated vision and goal is to be the high volume, low margin provider. And that's a very, very interesting differentiation from all the other commercial IT providers. Most of us like to think about extracting margin. Certainly the Wall Street guys who follow those companies want to think about margin. Amazon is determined to drive margin down and to dominate the market by making themselves unassailable. It's really hard to compete with them. And bless you. And at the same time, preserving the value proposition that you would normally expect is to be associated with high margins. Remember, I want to come back to this notion you were talking about, about the open source cycle. Sure. And the pace of innovation and how much faster it is now than it was, say, during the Linux days. We asked this of all our smart guests but we want to hear your opinion. Are you going to ask me too? Yeah, we are. Will there be a red hat of Hadoop? Not in the next three years. Well, it took Red Hat a long time, right? Because the world kind of left him alone. But John Furrier's latest answer was, well, the Red Hat of Hadoop might just be Red Hat. Not yet. They haven't moved in that direction. But look, I think it's clear, as I said in my speech, that at every layer, I've got this sixth layer model, pick your own number of layers. But at every layer of the stack, there are multiple alternatives. Even the things that Apache call core canonical Hadoop, like HDFS itself or MapReduce, are substitutable by other pieces already, some of which are open source, some of which are not. Every distribution is a composition. And there's lots of room for more compositions. And the idea that anytime in the near future, the set of possible technologies and functional capabilities is going to become so uniform that somebody can really own it, seems unlikely to me. I don't think we're anywhere near the end of the innovation cycle where new things are going to continue to pop up in the stack. I love the way you laid out the horses on the track. You had leading pure plays. You had others, specialists. You had the mega vendors and the mega partners. Like a NetApp or a Dell, for example. And a good way to look at sort of the ecosystem. Well, my customers, Gartner's customers, tend to skew to the somewhat conservative mainstream adopters. And as interested as they might be in a pure play and as much as they might see value in something truly new and innovative. They don't want to get fired. They also have a large investment in the rest of their portfolio. And if they're, pick one, pick an IBM shop because there's lots of them. And they've got Tivoli and they've got Guardian and they've got DataStage. And they say, if I'm going to drop this new rock in the middle of this lake, how bad do I want the ripples to be? If this thing can enter the water smoothly and not disturb the surface, that makes a lot of sense. That gets back down to your value question, which is understanding the consequences is a way to assess the value. I was talking to one bank in Boston, a financial institution, they said, hey, I got a lot of NetApp filers. I got some EMC drives running email. I have Red Hat and I got IBM services integrating all this stuff. So Amazon, what are you, high? I need OpenStack. That's a direct quote. So I'm like, okay, so why OpenStack? It was, honestly, OpenStack gives me a warm blanket of comfort because I know that I might be able to tool it and develop with it. So do you hear that same feedback? I mean, is that something consistent that you're hearing? Because again, that's legacy. Is he saying OpenStack or is he saying HP OpenStack, for example, or Raxby? No, he's just saying OpenStack in general, OpenStack Summit, we just came back into OpenStack Summit. So that was essentially saying, OpenStack's the dream scenario for him because he now can look at the ripple effect and saying, okay, I know what I'm dealing with with OpenStack. It's modular, I can code, I can take the best practices of this and plug it into there. At the first point, they want safety too, right? Amazon was a little bit risky, it was a little bit of a black box for them, they were concerned, so that's what we were referring to. People don't want black boxes unless they are completely confident that the black box is going to operate seamlessly without errors and they have confidence in the person providing the black box. So you have to separate the question of which architecture do I want from the question of who am I getting it from? And the mega-vendors on that chart are people who are perceived as being strategic partners to their customers. And so they've already made that mental leap that I can rely on these guys. And so if I'm going to get this technology from them, I have a level of comfort. And if I'm an alpha and I want to play with every piece of technology myself, that's a different kind of company. There are plenty of them, there always will be, but they represent a small slice of the market. So if the Gartner customers are ultimately controlling the chessboard and the spending, which remains to be seen. Yeah, I wouldn't suggest that. No, I'm just saying, if in fact that's the case, well they have historically, and certainly in the enterprise with mobile that's changing, but to the extent that they are able to control that, that would say the cartel ends up winning. Oracle will make it that position. The rich get richer. When has it ever been any different? The oligopoly. Okay, so we're getting the time crunched here. Murph, we'd love to have you on theCUBE as long as we can because you're a great friend to us and on Twitter. You follow Friday and Friday me a bunch of times and I get you back there. So there's nobody in front of us right now going like this. We're high-five. No, no, we have us, the airplanes are lining up as Mark Hoppe has always said. They're all wanting to land and they're running out of fuel. So I got to ask you the final question though is, is that what are you looking at now? So you, 27 days ago, I saw your tweet, I got a lot of briefings, my presentations in flux. You had a lot of changes going on. You had a great keynote here, but what's changed since your keynote in your mind from the conversations, what you were expecting, not expecting, and what are you looking at now? What are you poking at? Since this morning? What are you poking out on the horizon from the conversations, from the feedback? I might have missed that. I like that trend. What are you eyeing in terms of, what rocks are you picking up and looking under? What areas are important to you? Well, I start with where I perceive the gaps and I've kind of already identified a few of those, security and governance and administration. And certainly we heard a lot about that here. I'd say we heard more about that this morning than at any other Hadoop event. I've ever gone to. Suddenly we've got Knox to talk about. We've got Falcon to talk about. Didn't have those things before. As an analyst, that means I've got some work to do. I haven't dug very deep into those things yet. And I have a little bit of a safety net. My clients, of course, are, to some degree, lagging the leading edge. They'd like me to come back and report from it. They're not ready to jump there yet themselves. So I have a little time to go do some digging and understand and try to put it in context. That's my next job. They want to understand where their blind spots are, as John often says. They rely on us to say, you've got some perspective. You've taken a look at this. Is it time for us to think about it yet? And I have to give them a reasonable answer. No, or maybe, or you're late. Well, Murib is doing some great work out there. Gartner, Murib is the analyst, top analyst at Gartner in big data space between Gartner and Wikibon. We've got the world covered. Obviously some of the firms have to do some great work. Congratulations on the survey. An excellent keynote here at Hadoop Summit 2013. This is where all the developers are. This is where the communities are galvanizing, making the decisions, hardening the platforms, trying to onboard developers, get the white spaces, gaps filled, security, governance, administration. All this stuff is killer. This is the cloud, big data story. Thanks for coming on theCUBE. We'll be right back with our next guest after this short break.