 Our next guest is Arun Murthy for co-founder and chief architect of Hortonworks. Come on in the Cube. All right, Arun, what are you doing? Fantastic, great to see you again. I would love to see you online, but I don't know what to say. Great to be here. Thanks for having me over. Yeah, well thank you for taking some time. We've been talking about you guys all day and asking all the hard questions. Hopefully good stuff. So congratulations, guys, on a couple things. One, Post Yahoo Summit, the big announcement, big funding, cost a little ripples in the water. You guys are now established. You got a new COO on board. You guys are out doing a ton of activity, huge outreach to the marketplace, so congratulations. So what's the feedback? I mean, you guys have very tight messaging, pure Apache all the time. What's, how's the, what's people saying? It's great. I mean, so we're, like we say, we're in the business of making it bigger and more popular and more widely used. And our message about doing all of this in the open source is actually getting a lot of traction in the market. And we're excited to be an independent company and doing it outside. The one way of looking, one way of looking at it is, we've done a lot of work at Yahoo. We've done a lot of mistakes. We've fixed it up. And now we're outside and we have an opportunity to do a lot more work, talk to a lot more people. And that's exciting. So we had some guests on early in the day. I forget who mentioned it. My brain gets a little mushy around this time. But we had a guest on earlier that said, everyone knows each other in the open source community. So it's not like, I mean, competition, Kader is saying, hey, we don't really compete and we're just moving forward, grow the market. I'm sure you guys probably say the same things, but you guys have to work together. The communities are tight. What's the dynamic in the community right now? So, you know, that's a great point, right? I mean, we have, you know, we have our PR engines and we have marketing and so on. But at the end of the day, you know, we are both in the business of making Hadoop better, right? And to make Hadoop better, we have to cooperate. We have to coexist. And, you know, we have a great relationship with all the engineers. You know, we work together. I mean, we have a lot of talks. For example, in the Hadoop world itself, you can come by to talk, you know, learn about some of the stuff we're doing with the next release of Hadoop, which is O2-3. And, you know, we work with a lot of the people. It's not just, you know, us and Clouder, there's Facebook, there's LinkedIn. It's really exciting. And if you look at, you know, we've got more than a few hundred active contributors to Hadoop at this point, which is, you know, phenomenal for an open source project. You guys are the two leaders in terms of numbers when you go and put the numbers on contributors. Clearly, open source has got a lot of traction. Still small community. My question to you on the more of the kind of what's going on is, high availability is a huge deal, right? You're hearing some of the talks here, node name and some other ones. You guys are doing your own talk. You guys are looking at splitting up the resource manager and the job schedule. Can you just elaborate on what that means and share with the folks out there what it means to the Apache code base and community and for customers? Yeah. So first let's talk about, you know, HAA on the name node, right? It's been a hot-button topic. You know, lots of people are interested. What name? The high availability on the HDFS name node, yeah. So we're doing a lot of work and jointly with the Cloudera folks. In fact, you know, Suresh is one of the co-founders. He's talking with Adam from Cloudera. I'm doing a joint talk on HAA. And there's MapReduce itself. We're talking about the next generation of Hadoop. In effect, we're taking Hadoop and making it a much more general data processing system. So far, MapReduce was the only option you had when it came to processing data on Hadoop, right? So that's, you know, that's great for a lot of things. I mean, I've done MapReduce for six years now. I love the framework. I love the idea. I love the paradigm. But there's also a lot of need in the enterprise for different things other than Hadoop. I mean, there's HPCC where there is, you know, simple MPI masterwork of kind of paradigms which are really, really important for, you know, putting out an application, a data application end-to-end. So the idea with MapReduce next-gen is that you can now process data in ways other than MapReduce. We still expect MapReduce to be the, you know, the very high majority of the applications. But we also see a lot of other applications. What about the MPI thing? And because there's been a lot of, it's kind of more of a directional thing, but is that market ready, the HPC market ready for this kind of parallelism? In a lot of ways, like I said, I mean, for example, one of the cases that we had at Yahoo was, you know, MPI is still the right way to, you know, do a small, you know, set sort of the applications, right? So now people, it was hard to get a Hadoop cluster and an MPI cluster, right? So what people used to do was take MapReduce applications and kind of retrofit MPI on top of the MapReduce paradigm, right? So the idea with next-gen is you can now do it as an independent project, as, you know, as a project which is sitting in the same Hadoop cluster. You don't go by another, you know, MPI cluster. You buy the same Hadoop care. You manage it in the same way. You deploy it in the same way. You operate it in the same way. But then you can process data in different ways. And we want to give people the option to process it in the best way possible, right? That's the idea with next-gen MapReduce. So the benefit there is flexibility. Yeah. Simplifies things. Exactly. So it's one compute framework, right? So you don't have to manage two different sets of frameworks. You don't have two operations teams. And, you know, the whole overall cost of ownership comes down significantly because, you know, it's just one framework. And that's actually very exciting for us. So high availability is a path to that? So we don't, yeah, absolutely. It's the first step. I mean, you know, one node goes down. Yeah. So we're, so in next-gen, we're also doing the high availability for the resource manager. So when you deploy the next-gen resource manager, there's, the resource manager itself is going to be highly available. So if something happens to let you quickly come up and, you know, the applications continue to run and, you know, as the application, you don't see a difference when there's an issue with the resource manager. We heard at Oracle Open World, we had theCUBE there. Obviously, theCUBE goes the most important tech of the world, as you know. But Larry Ellison just couldn't stop using the word parallelism. Yeah. So this is kind of where it's going. I mean, at the end of the day, it's system software. It's software. You guys are putting together that software. Yeah. Use commodity machines around here. Absolutely. What's your vision on that? How far away are we on that? I mean, what is it happening now? To what levels? So, I mean, in a lot of ways, Hadoop is fairly young technology, right? So what we are focused, you know, at Hardenworks and as the developer community, is to take Hadoop to its kind of, you know, full vision. But you can then, you can store, you know, our stated goal as a company is you have half the words data on Hadoop, right? Now, having data is great, but you also want to process it. You want to get insights from it. And you want to do it in a simple manner, an efficient manner, and a cheap manner, right? So we, as a company, think, you know, in about whatever, it's going to take some time frame. And, you know, Hadoop is still young, there's a lot of, lots of, you know, new grounds to cover, but, you know, we want to get to a point where, you know, half or not more of the... 2015, I think Eric says. Yeah. It's a good number. It's a good number. It's a good shot, right? It's a good shot. So how about your support experiences? I mean, obviously that's a key value proposition you're offering your customers. Yeah. How many customers do you have? Can you talk about that? We got at least one, right? More than one, like Hadoop, right? That's a pretty big one. One big one. Are they an outlier though? I mean, that's a pretty big deployment. I mean, you have a management contract as part of the deal, right? Yeah, yeah. You guys moved out. And do you have more than one? Yes. I mean, something less than 10,000. But again, our idea is that we can get a lot of other people. I mean, I think I want to, you know, the JPMorgan talk where, you know, people talked about how using Hadoop, they're now spending more on legacy infrastructure, on ETL and so on, because they finally have an opportunity to exploit data they didn't have with Hadoop, right? So we see this as growing the whole pie for the whole market. And as a result, we hope to see a lot more Yahoo with a lot more Facebook. Look at eBay, right? eBay a couple of years ago was nowhere on the map. At this point, they're one of the largest Hadoop installs in the world. I mean, they're not as big as Yahoo or Facebook, but they're getting close, right? That's exciting. We hope to see more and more. I mean, we're seeing a lot more actually. I mean, we heard from Kirk Dunn who mentioned that, you know, you know, it's the disruptive marketplace when the people that you need to hire don't exist, right? So like data science movement, we're going to have Hammerbocker on later today. Hopefully, you know, when you go out to those, like the JP Morgan's, because it, you know, the clear thing is that everyone's agreeing on. It's moved out of the web, okay? It's now mainstream. You're seeing financial services, government, healthcare, blah, blah, blah. The list goes on and on. When you go out to those environments, they don't have the guy, they don't have to support it, right? So it's like, they got maybe someone who's a thought leader, geek, alpha geek goes in there, spins it up, runs a five note free, gets in plays with it and goes, boss, we can do some shit with this. Excuse my language. And then they go, okay, here's some cash. Right. And he can't hire any. So you guys get the call or Cloudera gets the call. How do you support that environment? So it's, so we're living- And then how do you support the JP Morgan's of the world? So we're, we're living, we're living what we have to live on our experience, right? So for example, I was the guy who was responsible for all of MapReduce's service for Yahoo, right? Which meant 50,000 machines running MapReduce. If anything went down, you know, we had level one, level two, level three. But at the end of the day, if you finally had, you know, one person to call at three in the morning, that would be me, right? So it's, it's, it's great experience. It's great learning. What that means is we take that learning and we translate it into the framework, right? We translate in the framework so the framework itself becomes better. And if you look at, you know, I can't unfortunately share stats, but look at the number of support tickets we have for Yahoo and it's dramatically different from what was a couple of years ago at this point, right? Which means what has happened is we've gotten better at taking the software and making it, you know, much more reliable, much more multi-tenant. That's a big deal. I mean, JP Morgan talked about this too, right? Like I said, the way this starts off is you have two or three alpha geeks, you know, running five nodes and 10 node clusters. At the end of the day, you don't want to be running lots of 10 node clusters. You want to be running a thousand node cluster. And if you run a thousand node cluster, the challenges are very different, particularly in terms of multi-tenancy, right? Because you want to be able, lots of people sharing one cluster is very different from one to one or two people sharing the cluster, right? And that's one of the biggest, you know, learning curves we've had as a team in the last, you know, couple of years. And we've translated a lot of that into the, you know, the release, which is the 205, which has come out recently and the 203, the whole two exact series is, you know, the work we've distilled into the learnings we've made from multi-tenancy, which is something we'll see more and more of. And we're excited to see more and more. And, you know, JP Morgan was a great example of that. So, Rune, I mean, you've obviously got a lot of experience. You know, you've experienced with the largest to do instance in the world. You've made tremendous contributions to the open source community. You've created a lot of value. Great, I get that. Why does the world, if you could explain to us, why does the world need Hortonworks and the Hortonworks data platform? So, the primary thing is, you know, as with any open source software, you're, at the end of the day, you want to be able to call on the people with the most amount of expertise in the system. And you gain, as with, you know, any open source project, you gain credibility or street credit by, you know, being in the system, contributing patches versus bug fixes or new features or whatever. And we feel like we have a lot of experience and rich history doing that. Not just in Haroo, but, you know, lots of other projects like Big and, you know, we're starting with Hive and Edgepace and so on, right? So, as a result, that plus the experience we've had of being, you know, I mean, we talk to a lot of customers and, you know, they're excited to be talking with us because, you know, they know that we've been there and we've supported the, you know, five, six, 10,000-dollar cluster. And as they see them going, themselves from going from 200 to 1,000 to 5,000. They get a free product. Exactly. And they pay for support, unlike Cloudera, which they pay for the management piece. Yeah, and, you know, our story has been all open source. You know, we're working on, you know, open source management software. We've got some of our best people working on it. I mean, Owen is probably one of the leading contributors to Hadoop. He's been, you know, working on the systems. So Cloudera can have their own proprietary thing all day long, as long as it's different from the management piece that it's going to be free. Yeah, but, and again, they're only as good as what the alternative is, which is free, right? And also, you know, we're also very focused on making sure we work with them on the core, right? I mean, none of us are going to be in business for very long if the core doesn't improve at a fast enough clip, with fast enough features and fast enough buck things and so on. So as a result, we're in this, you know, co-optition thing, if you will, but we're focused on making the core much significantly better, not just for, you know. Yeah, I mean, it's clear to me that there's no war in the open source community. There's total love going on, because Rising Tide is floating all boats. Yeah, exactly. It's just difference of philosophy. Right. Right, the business. Sure, you're going to come out, you're doing a barry and it's going to be competitive with Cloudera's management suite, where they make all their money. So essentially, your answer is, the world needs to be more open. Right, right, yeah. The world needs to be more open. Well, it's actually a free management console, and Cloudera has to keep their unique differentiator value enough that people would pay for it. And also, the other thing we're focused on is making sure, you know, at every layer of the stack, we're focused on having not just, you know, the software is free, but open APIs, right? So, you know, different system integrators can integrate with the APIs at every level in the right, you know, the best possible manner. And that's something we're focused on in terms of the API story. So, you can actually take, you know, your existing tools and, you know, and get them to work with Hadoop at every layer in the stack. That's something. What's the status of the management console right now? So, we have, you know, an alpha-ish product. It's right now in Apache Incubator. I mean, you can go to Incubator at this point and download the code, play with it. We've got an alpha version of it, you know, a pre-alpha version, if you will, as part of HTTP1, the Hardware Data Platform 1. So, you can use it to install Hadoop in the whole stack. It does RPMs and all of that. We're working on monitoring and so on. We hope to have that in our, by the time we get to GA early next year. All right, Arun, congratulations on all your success. It's fun to watch. I mean, you know, the press, I think GigaOM kind of went over the top with, you know, the whole Civil War post. And I think we had some similar headlines ourselves. It's totally great on the PR. You guys are pumping up the volume. Turned on the megaphone, which is kind of benchmark kind of strategy. We kind of watched what they do over the years. Good backers. Congratulations. And we've been trying to get in and see you guys. So, we've been super busy with the events last week and so on. So, again, thanks for having us here. We're excited to be part of, you know, Hadoop and we've been there for a while and we hope to be here for a long while. Final question before you leave. Now that you're out of the big company, Yahoo. Okay, you're the founder, co-founder of the company. What's it like? Share with the other entrepreneurs out there. What kind of roller coaster has it been? Oh, it's been, you know, it's been completely, I mean, we thought Yahoo was, you know, being part of the Hadoop team at Yahoo was crazy, but we took from, you know, type project to running at 50,000 nodes and, you know, pretty much lots of the dollars Yahoo made was running the Hadoop. And we thought that was bad enough and this is like completely insane. It's worse. It's much, it's fun. It's a roller coaster. Oh, it's absolutely. Would you agree it's roller coaster competing? Highs and lows. Yeah, I mean, we've talked to a lot of people outside, you know, the ability to be outside, talking to people and not just at Yahoo, lots of other folks and Wall Street and so on. It's exciting and it helps us take our vision for Hadoop and make it real and we kind of see it becoming real and as a technology geek, that's really exciting for me. Do you find that like as a startup, it's like at Yahoo, little elves took care of things and like, you know, now it's like, wait a minute, who's getting the lunch? Wait a minute, where's the cafeteria? It was funny, right? You're doing it yourself, right? The first week we were trying to, you know, break down cubes, you know, trying to figure out what is our, you know, buck tracking system, where's our, where do we store our code base? It was, you know, completely different, right? You guys are doing great and congratulations with Keep Progress and we'll be watching. Thank you, thanks for coming on the queue. Thanks, guys. Nice meeting you. Okay.