 Welcome back, we are here live in Silicon Valley, heart of Silicon Valley, San Jose Convention Center, this is Silicon Angle and Wikibon's The Cube, our flagship program, we go out to the events, extract the signal from the noise, this is our fourth season with Hadoop Summit, Hadoop World, we've been covering the big data ecosystem when it was a small cast of characters, one of them was Yahoo, now he comes with Hortonworks, Cloudera, among others, huge ecosystem developing, and the theme here is enterprise grade and solidarity in the community, really sticking together, not forking the community, staying together, I'm John Furrier with Dave Vellante, and we're going to discuss the Hadoop ecosystem here with one of the Hortonworks executives. Again, I'm John Furrier, I'm joined by my co-host. I'm Dave Vellante, wikibon.org, Sean Connelly is here, he runs strategy for Hortonworks, Sean, welcome back to The Cube, good to see you again. Thanks, thanks Dave, nice to be here, John. So you gave a great keynote today, I understand it's 2,500 people here, is that right? I think it might be a little north of that, but yeah, it's a good crowd. 3,000, did we hit 3,000? And it's almost 70 sponsors as well, which is way up from the 40-something. It's a great venue here, really good space and a lot of activity. You basically put forth the architecture, the new data architecture, and I wanted to sort of talk a little bit about what you talked about in your keynote, and you sort of showed the old way, and then said, okay, here's the new way, and the old way and the new way looks similar, but you've got this new piece in here, all this unstructured data, you've got Hadoop, all these new tool sets, but they coexist, right? It's not a rip and replace, and that was one of your key messages here. Talk about that a little bit and what you're seeing out there. Yeah, I think the way we look at it, and particularly in the enterprise customer base when we talk about it, it's how do you extend the typical transactional processing to get at that interactions and observational data, which is all those new sources? And so if you can pull in a data architecture and introduce a peer to the existing systems and integrate it tightly, then you actually have a really sort of vibrant solution for enterprises as they begin to evolve their architectures towards being able to deal with these staggering volumes of data. So we're seeing really a lot of that as Hadoop really has emerged as that additional enterprise data system that's uniquely qualified to store massive amounts of data at scale. For cost structures, frankly, that we're hearing in our customer bases, 10 to 100x less per terabyte cost, so very significant financial. Well, and I want to ask you about that because when you look at some of the initial instantiations of Hadoop, they started with a blank piece of paper. There were things that you couldn't do with the traditional RDBMS, and so it was interesting to see that layout, that kind of schematic that you put forth, and it was to me anyway a very enterprise-oriented vision. So that's what you're seeing now, with sort of, we're seeping into the enterprise, the enterprise requirements are starting to come in. So talk about where we are. You talked about this event last year, Jeffrey Moore was here, talked about crossing the chasm. Have we crossed that chasm? Is the enterprise ready? We heard Merv's talk that 31% aren't adopting, but the balance are, nearly 70%. I'm a half full kind of guy, so I like the fact that it's about 70% who have some sort of big data strategy. As far as where things are, you're correct. Last year's tone was different than this year's tone, and it's by design and by maturation of the market in general. We deal with a lot more mainstream enterprise users. There was, actually CIO.com had a survey where it was, big data went from like 13% of CIOs who had a plan to 37%. And if you look at the details behind that, almost 60% of those respondents were, they deem themselves laggard adopters, not even early adopters. And it came up with, I think, one of their top two or three initiatives that there was. Exactly. So what we hear a lot from our customers is, how do we enable, how do we arm them with a message so they can help bring the enterprise across, right? So there's a lot of entrenched skills when transactional processing, right? So if we can pitch a narrative where you're extending your architecture that can help you deal with these new ways and extend your skills and integrate with some of your tools, then it works really well and it gives people a path forward versus let me just rewrite everything from scratch again. Sean, I got to ask you, you've been on theCUBE multiple times. We talk off camera a lot. But you're a strategy guy over there, which means you dabble on the product side at the look at the marketplace. You know, look at the chessboard in the marketplace, but also understand the product. So really two questions here. What's changed in the past year for you guys? Since we last talked, what's the biggest change that's happened in the Hadoop ecosystem that you can point to? And then the other thing is, is that what are the success parameters now in open source? If open source is going to lead and win, as Merv was pointing out, it's going to take catching up when the big players move in and it's going to take some teamwork. So what are the telltale signs that we're winning? And so what's happened in the past year and how do we know when we're winning? So a couple of factors in there is if you look at the message it was delivered at this conference. This new technology called Yarn enables more workloads, interactive workloads, streaming workloads. Why is that important? Because we're seeing enterprises want and need that type of solution and want to be able to leverage this platform for a wider range of use cases. That was not conversation we were having 12 months ago. 12 months ago was I have a pilot or POC and I'm a little stuck in the mud on how to get this thing going forward. We see a much broader swath of the market. Really I would argue a fair amount of mainstream brands who have been dealing with the technology for a while and are looking at either solving very particular application needs or are starting to, and in my session I phrase that agile data lakes architecture as the way to phrase this next gen architecture. That's phrasing that I'm hearing back from the customer base. I don't know where this data lake came from. I look at it as an ocean. It's like turbulent, a lot of different currents, a lot of different, you know. If you look at mainstream enterprises, a lot of times they'll start with a pond, grow it into a lake, right? And then they'll go from there. Glass of water. To your point though, Sean, the emphasis on security that we've been hearing is a big deal. So it's like these enterprises, maybe it's a line of business, ah, I got this shiny new toy, and now it's, whoa, whoa, whoa. Wait a minute, that data, that governance, that policy, is there any security edict that you're adopting? Huh? Oh, wait a minute, time out. Let's bring that to the table. That's one of the things, if you listen to me talk about a lot of these things, I'm very pragmatic as to where Hadoop is. It is on the right side of the chasm. It's moving up. There's still a lot more work left to do, right? And in my session, for instance, I brought up in Moby, who's a mobile advertising provider, where they've rolled it out at scale over the past three years, and have built up technology that they contributed into open source for data life cycle management. That moment, and then afterwards, I had about a half dozen people saying, we're doing similar things. How do we get involved in the open source community? So, to your point is the momentum of open source model, and the fact that it innovates faster and single vendors is what we're saying. We're seeing that play out very well. Yeah, it's what's funny. We always talk, Dave and I, and you've been following our career. It's looking at Angle of Wookie Bond for a while, and you know we're open source content. So Dave and I always talk to folks about trying to get some support and expand our mission. And we always get the, what do you guys do? Where's the value, right? And so that's a question that Murrow just brought up. In early pioneering businesses, people say where's the value, right? And so, only a lot of early adopters will see that, but a lot of the mainstream guys that you guys are now bringing to the market crossing the chasm this year, always just saying where's the value? What can you say, what can you point to specifically saying, this is the value of Hadoop? Yeah, so, I'll give you an example. A common pattern we're seeing across retail chain, who have both web and brick and mortar, is it gives them the place, the data lake, that they can bring in a true 360 degree view of the customer and how they interact, whether it's through mobile, web, brick and mortar, or what have you. And finally get that into a place where it's a reasonable cost, and you could begin to really dissect it and analyze it and join it for true 360 degree view analytics on that customer. So then you could begin to project how you can target them much better. That's always been talked about, that's been the promised land, and we're starting to see customers who, that's one of their first areas. They have other areas, as far as filling the data lake, that tends to be one of the first ones, particularly if they have a mixed way of engaging customers or suppliers or what have you. That is a rinse and repeat type of thing for very mainstream, retail oriented brands. And that has been the promised land, and in many ways the old BI and data warehouse business sort of failed to live up to that promise. And I'll say it, failed to live up to that promise. And the reason was, because that vision was there on paper, but in implementation it would take days or weeks or sometimes months to actually get to that point and the market moves so fast. So you talk about this notion of bringing analytic and transactional data together. And that's kind of what your diagram showed. And then you start to bring in, so I like the way Merv just said, you got a batch, you got real time, and you got interactive in the middle. And he said, today we're sort of in the interactive phase, but you have things like streaming technology that are coming into play to allow machines to make decisions faster than humans can. So do you agree with that sort of spectrum and where are we on that spectrum? I clearly agree with that spectrum. I think it needs to be a multi-purpose workload platform. Merv, the other point that Merv made was, it's adding the context on the transactions back into the mix and that context may have come through web channels, mobile channels, or other new sources of data. Historically, that was exhaust. Now it's actually be able to be brought together. The other point is we have a customer, a new star, and Mike at Newstar talks about, it's a catch basin, so it's the data goes through the existing systems, but he could get it into Hadoop so he can do the new analytics on it and not disturb what's running in the business, right? And so that flexibility and cost structures are really what drives it for him and that's what we were seeing people try for that. All right, Sean, we got a wrap. We just got the, Sean says the plans are backing up, but last but not least, we'll give you a running strategy. Just summarize the bumper stick, a Hortonworks strategy for the audience. So we make no bones about it. I kind of like the tagline we rolled out recently. People say, you know, what do you do Hortonworks? And if you go to our website now, we do Hadoop. We're focused on making this an enterprise viable data platform. We feel it's that important for the market. And we highlight the engineers who are behind the tech, driving it, working hard, so. Those tech athletes, as we say on theCUBE, I got to say, great show. What I love about Hadoop Summit and testament to you guys in Yahoo, Hortonworks and Yahoo for this is that you really make it about kind of the business, but it's really about the people and the code, right? So it's about, you know, the developers. Yeah, we have a business mind. We all kind of want to make money and there's a business to be done here. But as a community, this is where the action is. And I've always said, and this has been a theme of our summer tour, these open source communities are the new standards bodies. This is the new IETF, this is the new IEEE. Those standards, the stacks are being developed and people are voting. And they're voting with their code and they're voting with their feet and they're voting with their writing the app. So congratulations on having this great event. I hope you guys keep it pure and don't get it too commercial oriented. I'm going to be an executive track in the future, but for the most part, great event. Thank you. Sean Connolly here at Hortonworks VP of Strategy. Great guests on theCUBE. We'll be right back with our next guest after this short break live from Hadoop Summit. This is theCUBE, right back. Thanks, John.