 The Cube at Hadoop Summit 2014 is brought to you by Anchor Sponsor, Hortonworks. We do Hadoop. And headline sponsor, WAN Disco. We make Hadoop invincible. Okay, welcome back. When we're here live in Silicon Valley in San Jose for Hadoop Summit 2014, I'm John Furrier, the founder of SiliconANGLE.com. We're with Jeff Frick, general manager of our Cube business and commentating here in Silicon Valley. I heard the president of Hortonworks is our guest, the Cube alumni. He runs the show, makes the trains run on time, you and Rob Bearden and your team. Really amazing what you guys have done, you know, coming into the market. Going back, we had a Yahoo guy on yesterday, Senior Vice President talking about the impact of how good Hadoop has been as an ecosystem from the days coming out of Yahoo, Hortonworks as a company. You guys got your wings, the markets on fire, the growth is phenomenal. And your show here is about serious business. Yet, people are still collaborating, it's still got that vibe of a, still a young industry. So what's your take on that? How young are we still? I mean, obviously the business is going on. What's your view on this market right now? Yeah, it's a great question, John. First of all, thank you for having me on the Cube again. It's always great to be here. I appreciate the show you guys run. So it's always good to be here. Thank you for our pleasure. But what we're seeing in the market right now, and I think some of the guys said it yesterday, even Merv Adrian with Gartner, which is Hadoop has come a long way, but it's still so early on in the cycle in terms of what the possibilities are. And what really changed all that was last year is Apache Yarn came out into the market. Now you're starting to see, and Merv said it really well yesterday from Gartner, that Hadoop 2.0, that was not only the inflection point, that's really the beginning of Hadoop. That's the beginning of enterprise Hadoop. As this market moves away from what could I do around batch analytics to what are the possibilities, which almost become endless, if I could use this as a true data platform for both economic savings and other data monetization opportunities. That's very different to what we've seen before. The other thing Merv mentioned was, in our quote here, just quoted from our crowd chat, if you have any questions, go to crowdchat.net slash Hadoop Summit, we'll answer your questions if we're following along. He says, meturization cycles take 10 years for big inflection points like big data. So we are early, he did comment on that. Then he said, it's not a blunt instrument. It's a sophisticated stack that supports many use cases. So he's talking about the old positioning of Hadoop as just a pile of data to know it's sophisticated nuance. Can you explain what he means by that? Sure. And this was back to, as you lay what Hadoop needed, if you look back a couple of years ago, Hadoop needed an operating system to say, how can I go control all these resources and do more than just batch analytics? And with that, as yarn came out, then it opened up the possibilities of, okay, how can I stream data into the platform? There was a great demo on this morning on one of the keynotes where they showed a series of trucks and where those trucks are by GPS and location and now I want to stream in data of different violations around what's happening with those trucks through storm and other things. And I want to manage all that in the platform. Now I want to query right now, this second, I want to get a result back in a second or two against all of these trucks that are running and I want to make a decision and I want to go impact the movement of that truck out there. That's a real-time platform. You're moving in the whole world of complex events processing, real-time platform, totally different from where it was before. That's where we're going. So Jeff and I were talking on our intro segment this morning about how serious this business has become when you start hearing legacy Hadoop or I forget what they said on stage yesterday, the previous version. I mean, it's not that young, it's not that legacy like the old one. These are dog ears, Jeff. Yeah, dog ears. But in here, we have three segmentations of players here in the ecosystem. You have your startups through Series B, Series C funding to pre-IPO, and you have public companies. So the whales, the tunas, and the minnows. If you want to call it, it's a fish analogy. Yeah, Cisco's here, Intel, IBM. They're looking to look at companies, who they're going to buy, who they're going to partner with. So a lot of this dev going there. The other companies looking for funding. What does that mean to you guys as stewards of this ecosystem in Hadoop? You guys are the leader in the open source community. How do you manage that? Because now you have the big whales in here. Everyone's in the sandbox. What's your, how do you deal with that? How do you foster more collaboration and more innovation? It's a great question. It's actually, that's what's interesting about this is we, I'll say it this way, we bet right a couple years ago in the thought process around this, which is if Hadoop becomes a central component of a data architecture, which it is, that all the whales and all the big companies will want to participate because the market's large, there's a lot of money, and everybody wants to monetize components of that. And the whole goal from the very beginning is if we become the platform for Apache Hadoop and to do everything through open source and work through Apache Hadoop, and become a great substrate and effective data utility, right, that you can look at. What that allows is everyone else, the whales, to go monetize on top through analytics, through integration, and through lots of other ways, the things that they've done well, and work with that substrate, work with that platform. And now you're starting to see as the market's become more mature, those companies are entering the market with different solutions. You saw many of them, Microsoft, Red Hat, and others, right on stage, talking about it in the last couple days. Herb, you know, we talked before we came on, we loved to smile, talk about the competition, the fud, and all this conversation. But at the end of the day, there's a lot of money on the table, and it's a lot for everybody. You know, MapR has been criticized, oh, those guys, they got 500 customers, and they're spending their employee base. You guys are doing extremely well. You got Intel working with Cloudera. The swim lanes are being established. How do you talk to people when they ask you that question, the big three guys, Hortonworks, MapR and Cloudera, when they talk about that, how do you talk about those swim lanes? Because there's a lot of discussion, obviously, you guys announced your acquisition of the security space. Cloudera followed that with a move. They partnered with Intel. You guys have your partnerships. How do you talk about the competition? Yeah, so it's a great question. Customers ask that all the time because they want to understand what the difference is. And what we look at and say is, for us, we've been pretty consistent with what our strategy and approach has been, and what the business model is, which is do everything in open source, make Apache, a dupe, enterprise viable, and go help work with the ecosystem to make a dupe a viable component, and don't compete with the ecosystem. And what we're seeing is, that's actually bringing the ecosystem, many of the whales that you described, together onto a common strategy where they want to start to contribute both development resources into that model. And they want to actually go, whether it's Microsoft or SAP or IBM, they say, I'm going to help contribute to open source. And they may even want to go resell this as part of their platforms. And you're seeing that with a number of partners. That's what's driving it. And what we're seeing is, as customers ask that question, they say, it's actually a year or two ago, this was a harder question because we all looked very similar. We did. Now, actually we're diverging greatly. We are actually very different in approaches and strategies in terms of how we're going after this market, and whether it's work with ecosystem or compete with ecosystem, or whether it's do it all in open source or try to monetize other components around the top. I think we've been pretty consistent on that. And it's actually become easier for customers to say, I now can see a bigger difference. Now it's which choice do I want? Yeah, at some point people had to hang their shingle out of B who they are, put a stake in the ground, say, okay, we're going to pick a strategy and we're going to ride that straight and narrow. And we keep hearing this over and over because we do a lot of open source shows, right? And it's invariably it's the little community that gets going and then eventually they get some traction and then the big whales come in and there's always this conversation, can we maintain the open source and can we maintain the innovation? And the purity are we going to get polluted when HP and IBM and those types of folks come in. So is it getting easier to manage that or are people getting more used to that kind of dichotomy of being able to straddle both sides of the thing? I think the larger companies are getting more comfortable working in that open source world and contributing to it. I mean, you can see it with many of the companies that are on stage in the last couple of days, right? Microsoft talking about contributing tens of thousands of lines of code to Apache Hive to make it more performant. So the entire ecosystem of the deep users can prosper. Now for Microsoft that makes all their tooling run faster on top, it's great for them but it's great for the entire market from the take-through expertise. They're far more comfortable and many companies are far more comfortable doing that. It's actually, we're seeing this model and I liken it to, if you think of an analogy, it's the peloton, it's the bicycle peloton. And as the peloton keeps getting bigger and bigger, right? And it's moving in a certain direction. Yes, somebody's going to try to break away but what happens inevitably is they're climbing the hill and the tour de France and somebody breaks away from the peloton. It catches up. And it's the same thing. If you've got enough people working together, you're going to catch any break away. Just the proven innovation model of open source is just fascinating when you get an engaged, enthusiastic community actively solving problems together at a really rapid pace. So I got to ask you the open source question. That brings up, we had the same conversation at OpenStack Summit in Atlanta which is more cloud focused obviously but it's some big data will certainly sneak its way in there when the data virtualization conversation starts kicking in next year. So I got to ask you, as an executive, it's all about metrics. We asked Murr of the same question, certainly Tony Bear and Dave Vellante and Jeff Kelly same question, look at what point do the metrics start to show themselves when you say that's a real industry? Is it total addressable market? Is it market share revenue? What do you view as the metrics that start showing up on the leaderboard when you say this is a real growing industry? I'd say it's a couple. So there's the total addressable market. You go by different analysts, IDC and others, they say big data, $100 billion market, Hadoop 50 billion of it, et cetera. So that's out there today, right? So I think what separates it is to say customer adoption, partner adoption and real use cases. And that's what's different on this show, I think is you're starting to see a lot of real use cases, right? You've had some of the folks on here like TrueCar and others that talk through what they're doing and the opportunity of how they can go grow their business. We're seeing those real use cases happen. Let's talk about TrueCar because that was a very fascinating conversation. Obviously it went public and they're a green field opportunity for big data. And take us through what they did because I think your 2.0 comment you mentioned is interesting. So they had an interesting trajectory. Can you share that story about how they really hit that tipping point in terms of value and what that did for their business? TrueCar is a great example of a company that's come out more recently and actually just went public recently, very successfully. But they built their entire infrastructure on Apache Adupe, right? We worked very closely with them from early on as they built this out. They didn't have legacy involved, right? They got to start from scratch. They are the perfect example of a company that monetizes data, right? Data around what's happening with car prices and car inventory and other things and they're able to go monetize that at scale and create an entire business model around that highly successful and provide a great service to the market. Very different from what you've seen in other places before. So we got some comments here. What is the direction of Hadoop development going forward? The direction of Hadoop development? If we can continue to make Hadoop a unification platform to say with Yarn and now with all of the components around this, can we make it a place to unify ways you want to access all that data and make it simple and easy that as somebody else comes out with a great new way to analyze data and they want to go work with what's running inside of the Hadoop platform, make that simple to access it and immediately as you plug one of those engines in, let it take advantage of all the core services of the platform. Security, governance, operations, deployability, being able to run it on-premise, off-premise, have all that happen automatically. That's where it's going. So we got some, I'm getting some text here. So it says trending now to do some in Hive, Apache, Yarn, HBase, Tez, and Storm. Make sense of that for folks out there. What does all that stuff mean? So all of those are great ways of accessing the data. So if you've got Hadoop and you've got HDFS, an Elastic Scalable Storage Layer, now you've got Yarn as the operating system. Now I want to go access through an interactive query. I'm going to use Hive as a query into Tez. Now I want to stream data into the platform, continuous data just streaming in through Apache Storm. And I want to make sense of it and use Yarn to go make sense of it. That's, those are different ways to access that data, all the data in one place, different ways to access it based on your business model. The last couple of questions I want to get to you quickly. I know you've got to go, is that SQL has been a gateway drug, as Tony mentioned this morning for the enterprise. It makes things easy to use. You're starting to see adoption. You mentioned a few of them. What things are you seeing where resource management's been a big issue? And Yarn in particular, I want to ask you the question. What have you seen with Yarn that has been really surprising to you? It's been a great success. This idea of a data platform has enabled a new construction and enablement in the true car. What about Yarn has made it really, what's impressed you the most about the Yarn's success? What's impressed me the most is how the rest of the ecosystem is responding to say, I want to be able to now plug into that. So can you help make it easy for me to go plug into that substrate? So whether it's through Tes, for interactive type of workloads, whether it's through what was announced here, Apache Slider is a way for always-on workloads like streaming, make it easy. So give me the easy way to plug in. I want the easy button, big, right easy button. I want to hit it. I just want to plug right in and make sure that I can now leverage that platform. And seeing that actually work and take off, and you saw yesterday AT&T announced that they're working with continuity and they're putting out JetStream, which is their streaming platform, and they're putting it all out into open source and it all plugs into Yarn. So what you're basically saying to answer that other question earlier, ease of use is the direction for Hendoop. It's huge. The easy button. Big time, absolutely. Okay, final question. What's going on at Hortonworks for you guys? Employee, headcount, funding, are you going to do a monster zillion dollar private equity round? Tell us your M&A strategy now. I'm pulling it. So we've been doing very well as a business, right? In the last year we gained over 250 customers. We're growing 50 to 80 customers a quarter, doing really well growing the business, right? Built on a very sustainable model, built on renewing customers, and at the end of the day, what we have to do is we have to keep them happy and successful because ultimately we're just trying to monetize customer sat and make sure they're successful in that. That model has worked very well. Companies grown, we're close to 400 people. We're continuing to scale out across all aspects of the company and it's a rosy future. We're looking forward to it. Jeff, I was talking last night to the MapR guys and the whole metrics thing. We're really watching the metrics. So Jeff, Kelly, Dave Vellante, and Jeff and I are looking at the metrics trying to look at what are the good metrics to clear out the fud that's been flying around. To me it's about employee headcount and revenue because those are two numbers show healthy growth. And you guys have done extremely well. Congratulations. And this industry is still young. So final question, how big do you really think it's going to be? What will be the preferred outcome of this industry? I mean if we're early days, like Murph said and you agree, we all agree what's going to look like in the 10 year horizon in your mind? Paint the picture of the 10 years out, what this is going to look like and compare it to other industries. If you take and are going to go back to something that Trukar said in their presentation, if you take the economics of being able to run workloads and going from $19 down to 23 cents, that has an impact that's driving as a center of gravity for workloads to go run in this environment. And we're seeing that. So that's only going to continue to grow and the pace of data volume is only going to continue to grow. So you've got all the right trend lines moving in the direction of this is the right platform, more data is going to onboard and more and more companies are going to use it. So what I really believe is this market will spawn one or more billion dollar companies who are going to support that market and going to do that. Clearly we have a name to be that. It will disrupt others too. People who don't react, right? There's, I mean, every market, yes. So that's what makes these markets fun. I love this market, it's very disruptive. This is theCUBE, we're here with the president of Board and Works. Breaking down the show here, the ecosystem, it's open, it's friendly, but it's all serious, all business. It's about business outcomes and growth. This is theCUBE, extracting the signal from the noise. We'll be right back with our next guest after the short break.