 I'm John Furrier, the founder of Silicon Angle. This is theCUBE, our flagship program. We go out to the events, extract the ceiling from the noise. theCUBE has been part of the ecosystem here and the Hadoop ecosystem for now on our fourth season. We've been pretty much at all the Hadoop worlds except for the first original one and the first Hadoop Summit under the Yahoo Hortonworks brand. We're really excited to be here. Broadcasting live for two days. We're live in Silicon Valley in San Jose for Hadoop World. I'm John Furrier and I'm joined by my co-host. I'm Dave Vellante at wikibond.org. Rob Bearden is here, he's the CEO of Hortonworks. Just came off the keynote. Rob, welcome to theCUBE. Good to have you on. Great, thank you for having us and more importantly guys, thank you for being here. It's very important that we get the exposure of the community out on to even broader format. You made the point in your keynote. You said there's two things that have to take place. One is we got to harden Hadoop and the second is we have to deliver data services and you really stressed, by the way, we can't do this alone. It has to be done as a community. So I wonder if you could talk about that a little bit and talk about this event in that regard. So Hadoop is an enormous platform and it's open source for reason. But the bigger opportunity for Hadoop is to become truly an enterprise data platform. And when we get Hadoop as an enterprise data platform it has the ability to generate the next generation data architecture through the enterprise. But note this is too big for any one company to do alone. And what's critical path and table stakes is that the entire community come together and to collaborate and build those projects out at the level of enterprise ability that we then take through a very disciplined structured productization, packaging, QA and release process. So then we can give a distribution that's truly enterprise viable that has all of the data services and is architecturally enabled for an enterprise to manage mixed workloads in a mission critical environment for both badge and interactive query. Rob, you obviously have a lot of experience with Jboss, Spring, doing open source companies and successfully doing open source. Everybody says, ah, open source, I don't really get it. How do you make money? You've shown that you can make money. But Hortonworks and the whole Hadoop movement seems to be much bigger than anything that you've ever done. A lot of people are sort of asking, okay, when we talk about this new data architecture, we think about this Hadoop, are we talking about a completely new way of managing and architecting our data infrastructures or are we sort of evolving our existing world? And there seems to be sort of a tug of war there, right? The IT guys don't want to obviously rip and replace. There's a lot of innovative people in lines of business and shadow IT that are driving really hard. What you taking all that? There shouldn't be the tug of war is the short answer. And I think that's part of us evolving the tech to be truly an enterprise data platform for what Hadoop does well. And the other side of that is we must work with the ecosystem so that the ecosystem understands that Hadoop is not here to try to disintermediate what it does so well. Whether that's a transaction environment, a BI platform, or an EDW platform, Hortonworks' role in mission is number one, first and foremost, to be deeply engaged and involved in the community and being a steward of helping a Hadoop evolve to become that enterprise data platform. Number two, a core part of our strategy and overall corporate model is to go to market and enable reference architectures with the other core data platforms, example, EDW, BI, the transaction environments and let them extend the data platforms and the data sets that they process so incredibly well, but give them the ability to bring in this net new data set that leverages Hadoop and the benefit of the bargain of open source and the ability to manage very, very cost effectively and architecturally efficiently on commodity environments, but manage this very, very large data set and do it transparently to their users, right? And then what's been able to evolve from that is that the enterprise understands there is the ability to now take and bring and unlock the value of all the data that they have and bring it and put it into a central repository of Hadoop, pre-process it, and then either access it and manage it there in Hadoop or push it to the platform which does so well what it does, whether it be analytics or whether it be transaction environments. And so this new data architecture is emerging very, very rapidly and what it's doing is it's enabling the enterprise to move from doing analytics and transactions, post-transaction versus interactive with their customer and supply chain pre-transaction. Yeah, Rob, one of the things that Dave and I have been talking about in theCUBE now, this is our fourth season, is the modern infrastructure, the modern era of computing. You've been through a few eras yourself with open source, but again, we're tracking the conversations here to do some, it's the normal stuff that we've normally seen, open source, big data, business intelligence, no sequel, you know the normal buzzwords in the community, but now this year more than ever there's a pressure to accelerate the enterprise grade and you guys talk about this, in words like hyperscale, high availability. So there's a lot of work going on at Hadoop in the platform to make it enterprise ready, yet there's a whole versioning community of developers. We're seeing the success of, say, MongoDB and others in the LAM stack, but you have a developer marketplace that's saying, hey, we want to build apps. Okay, we want a code, we want DevOps. So Hadoop is kind of forcing into that conversation of enterprise grade, modern infrastructure, converged infrastructure with Flash, I saw Fusion IO was with MapArt today, an announcement, this teases out this enterprise grade and then also this growth in developers. So there's a lot of demand on the developer side and pressure on the platform to fill those white spaces. So talk specifically about what you're seeing on the white spaces in the platform, I'll see Yarns move the ball down the field on that and the management side, what are you guys doing on the platform to make it enterprise grade and then what's the message to developers up the stack because there are guys who just want to code and write apps. Exactly, and what we want to do is we want to help make the market function as big and as fast as we can. Step number one, make Hadoop an enterprise viable data platform. So that's first and foremost, continue to innovate the core. You heard a lot about what we're doing with the Yarn and hardening of 2.0, number one, number two. We provide those data services. So things like monitoring and management and provisioning, DR, full stack HA and all the things that an enterprise requires and expects to manage large volumes of data at scale for mission critical applications. And then we take that process through an incredibly disciplined productization packaging, QA and release that then and I think this is the operative point. 100% Apache license directly or as close to the Apache trunk as possible. Because when we do that, we de-risk it for the enterprise to consume it and deploy it and for the ecosystem to do the same. Is that why OpenStack's popular right now? I mean, OpenStack, you guys had a participation at OpenStack, we were there doing three days of live coverage. The enterprise love OpenStack because it sends the message of modularity and insight. How is that affecting some of these Apache, 100% Apache situation? I think it triangulates very well to that and you see the correlation to the all open environment and that's what the ecosystem is asking for and as these markets form and get very big that then gives a very large net new market for the developer community to come and add value and build the next generation of applications that leverage the OpenStack platform or leverage the big data sets and can build that next generation of application. And here's the parallel. As web-based technologies and web-based content evolved, that gave a whole next generation of applications that were able to evolve and be built and whole new computing models that evolved. Big data has the ability to do the exact same thing maybe bigger. There were two things you said in your keynote in this regard. One was you sort of called out people that are trying to fragment the Apache Dupn. The second thing is you called out those with what you said were disingenuous licensing practices. Let's talk about each of those. What did you mean? Maybe we can unpack that a little bit. I think they're actually highly correlated. And by that I mean, there's a couple different ways. There are very large entities who are threatened by Apache Hadoop truly becoming an enterprise viable data platform because they can't afford to lose control or not have this data set under management. Because if they don't have this data set under management, they lose footprint and pricing control with that customer, right? And so they are very incented to do whatever they have to do to try to keep this net new and rapidly growing data set underneath their roadmap and footprint and their technology. And so when they take core components of Hadoop and then surround it with proprietary componentry in the stack and then provide that as an offering, it still has not allowed the enterprise to reach its full potential. And it still locks them in to a very specific direction to very specific price points versus getting the leverage that a truly enterprise viable Apache Hadoop platform can give them. So you're specifically talking about layering proprietary services, bow guarding the code basically, layering proprietary services on top of it and trying to lock people into that pricing model. That's specifically what you're saying. And our mission is to make Apache Hadoop truly enterprise viable as a data platform at every layer in the processing and storage stack. Okay, but so if I'm a big buyer, so somebody might say to, hey Rob, I buy that, I love the vision. I know the community eventually is going to get there, but I got to solve this problem today. So I'm going to actually go down that path. What would you say to that CIO who's actually spending some money on those that are essentially committing that crime, if you will? Well, I think what we do is we show them what others like then have achieved with a completely open platform. And I think that's the value of what this community's accomplished in over the last 12, 18, 24 months is the evolution of each of the projects, the evolution of Apache Hadoop through the distributions, like ours of HTTP, have truly gotten to the point to where the CIO can manage that petabytes of data on thousands of nodes in production in a highly predictable, stable, reliable environment. And what we're doing now is taking that to the next level with yarn-based architectures and enabling now to go from just batch processing to actually batch and interactive, mixed workloads. But the last piece or the last mile of this is truly enabling the ecosystem to embrace and adopt to do as an extension of their overall platform or service. A great example of that's what Microsoft's doing. Another great example of that is what RackSpace is doing. Another great example of that is what Teradata is doing to have a reference architecture that lets their platforms continue to do what they do also well, but extend the dataset exponentially by leveraging and getting the benefit of the bargain of Hadoop to put this huge dataset underneath management. Because- Let's talk about the community, you mentioned that, because this show here is, I was talking about Rune, co-founder of Hortonworks and your colleague. And if this show is about the community, it's about the developers, it's not, I won't say it's a sell-out, that some people use that term. But there's kind of a, not an undertone, but kind of an undercurrent of, are we commercializing too fast? Okay, one. And two, everyone wants primetime platform for the enterprise, whether it's service providers or a large enterprise, bank, down to, just a generic enterprise who wants scale-out open source. A lot of stuff to do there. So it seems to be a lot more work to do to get that platform, to get those developers on-boarded. At the same time, there's all this pressure to commercialization, go public, so two questions. Talk about the state of the ecosystem relative to the doing the right thing on commercialization, platform development. And also, you know, are we, do we have the right metrics for this community? Because now you have competition. You have Pivotal, Green Plum Now, Underwall, you got IBM, you got Intel. You guys aren't the only show in town, Amazon's on the horizon with their stack. So there's competing stacks, maybe not one size fits all, but again, this ecosystem is growing. Are we over-commercializing? And what are the key metrics for the companies in the Hadoop ecosystem? Right, so I think to your point, there are a number of new entries that have come in. And I think it says that there's huge validation that the major market makers want to have and ensure that they have this data underneath management on their platforms. We believe it is important. Our breakout sessions are about to begin. We can still talk, I think it's hard. We believe it's important to accelerate the market and make the market function even bigger and faster by doing it all 100% in open source. And that by doing that, the market gets bigger, it gets faster, it's de-risk, and the ecosystem then adopts and pulls through. Now, I think the bigger you make the market, the easier it is to monetize, quite honestly. And in our model, you pay us or you don't. And we either add value with the value add services that we provide, and if that's value, great, pay us. If not, continue to move Hadoop and the platform into the enterprise and get a lot of data under management very quickly. Well, you guys just got $15 million in financing. That's a great shot in the arm of adrenaline, as I said on my Forbes post. That's a signal of the health of the ecosystem once. So congratulations on the financing. It's really hard to do, especially in this market, have that validation. But there's also talk of M&A, companies going public in the space, the leaders, and Cloudera's been rumored to go public, and that's kind of a rumor, unconfirmed, but still it points to the frothiness of the commercialization. So you've been here before, a little bit different landscape, you got competition, you mentioned that. What are the key metrics of success for you guys, the Cloudera, and the folks in leading this industry who are building an industry? Is it consultancy? Is it subscription? Is it the recurring revenue? What are the key business metrics that you guys are doing? And share with us some data around you guys doing more consultancy. Is there more subscription revenue? How's that all shaking out? Because there is a pot of gold at the end of the rainbow, so to speak, in either liquidity or a market that's growing. From our standpoint, our model is based on providing subscription services. Should you choose to want those from us to provide the level of enterprise support in SLAs that you can get from any other enterprise player? There's a lot of reasons we think we're better and better positioned and better at, that than anybody in the world to do it. That's a different discussion for the other day. The whole point that's important is that we go and enable this market to function very big, very fast. And the way we do that is innovate the core of Hadoop, make it an enterprise viable platform by adding all the other incremental services and take it through a productization and packaging process that's all 100% open source with an Apache license. But you're going to do a road show for that 50 million. I'm sure people thinking about going public want to show some metrics, scalable metrics, as they say. Consultancy isn't scalable. Dave and I know that business. You know, we have our research consultancy. What is scalable as a recurring site? So what mix of your business is percentage wise? Can you share numbers or just kind of order? 70, 30 subscription versus services. We actually, in our model, we want to have a de minimis services revenue stream. And even of that 30% of our overall revenue stream, half that will be training. We think that's important for us to do world-class knowledge transfer. The other portion of our services, we want to really make the market big so that the SIs and the regional integrators have the ability to go in and add a lot of value and let them do that consultancy work. We can never hire enough people fast enough. Training is a little hanging fruit. You can knock down training all day long, but that's not 70. You're talking 30. I'm talking 30% training. And 70% subscription services for support. And it's a real, yeah, exactly. And the bulk of the consulting is training. That's right. At least 50%. And it's the means to then, because we want a knowledge transfer to the consultancies out there so that they had then the ability to go add value with their customers and a knowledge transfer and enable Hadoop into the enterprise architecture. The other thing that we're hearing too, honestly, you look at Amazon, has a great record with developers and that's a little bit different market. Although they do have some big data stacks and some technology in their stack that's going to be compelling. The issue with Hadoop has always been, hey, make it easier. So what can you share about progress that's different than last time we sat and talked? There's been a lot of innovation that's happened at the management platform layer for provisioning and monitoring and managing and getting a better view of it. We still have work to do there. The community's piling in and supporting heavily. The ability to run in multiple data architecture. So whether that's on a Windows environment, a Linux environment, whether that's in a virtualized or a cloud environment or sit within the appliance, right? Well, we got to go and wrap up. Thank you for your time. I want to ask you one final question, kind of leave the soundbite out there. What do you want to share with the folks out there about this show? What they should walk away with? Why this show this year? What's different? What's the big positioning in the community? What did you know about this year's show? Well, this is a show very, very simply about engaging the community and bringing the community and letting the community have a voice in what they want to understand about Hadoop. The community actually sets the agenda and sets the priority for each of the tracks and the content and talks within the track. The community votes on which papers make it in there. And our job is to be a steward of that community, facilitate that process, finance it, or take the risk of the financing out of it. But our role as a steward is to facilitate the process of the community understanding what Hadoop is, where it's going, how they can get value from the different use cases of the many enterprises and ecosystem partners who are doing it. And we'll continue to play that role and we think it's vitally important with this show for it always to be for the community with the community. And you'll never see us deviate from that and never it's a long time. Okay, Rob Bearden, the CEO of Hortonworks, as we always say, enterprise great is what everyone wants to do. And the standards bodies of today are the open source communities and that's the model of how things are getting ratified inside the community and a great message and we'll see what comes out of the show. Thanks for coming on theCUBE. We'll be right back with our next guest after this short break. This is theCUBE. I'm John Furrier with Dave Vellante. We'll be right back with our next guest. Rob. Thanks.