 theCUBE, we are live in Silicon Valley in San Jose Convention Center. This is Silicon Angle and Wikibon's theCUBE, our flagship program. We go out to the events, extract the signal from the noise. I'm John Furrier, the founder of Silicon Angle. I'm joined by my co-host. I'm Dave Vellante at wikibon.org. Arun Murthy is here, a CUBE alum, founder and architect over at Hortonworks. Arun, welcome back to theCUBE. Thanks a lot. I mean, it's a pleasure to be back. Always fun to be with you guys again. Okay, so I got to ask you, what the hell is a resource negotiator and why does the world need yet another one? That's a great question. Frankly, you know, as I look back, it's phenomenally stupid in some ways. We were, I mean, you know, as engineers too, right? Obviously the tech isn't born, and the naming is important, but we were looking on thinking, what is the lamest name we could come up with? Right? And you know, like you guys can validate it, we succeeded. Yeah. So obviously we had Rob on the CEO, talking about some of the business stuff and obviously the metrics, the subscription, recurring revenues, kind of nice metrics, 70, 30, mix, you know, obviously training is the low-hanging fruit and the marketplace wants training, but there's pressure right now for the market, just the demand, the sheer demand in the marketplace for enterprise grade, you know, Dave and I call it the modern era of computing where you got convergence truly happening with, you know, SDN, software-defined everything. In fact, you know, we were going to come on theCUBE and say, you know, the theme here is software-defined Hadoop. It's already software-defined. So the world is going in that direction where you have infrastructure that's programmable, DevOps, this is what people want and developer communities. So there's a real pressure for meet on the bone. You mentioned Yahoo has 30,000 nodes, Yarn, a tweet, just I saw on Twitter, Yarn makes an elephant into a cheetah and kind of play on the elephant of Hadoop, but you know, this is the pressure. There's pressure for a viable platform for enterprises and you know, they might not care about all the inner workings that's going on Hadoop because they have Red Hat, they have, you know, IBM services deploying stuff, they have a mismatch of legacy stuff. They want to bring Hadoop in and make it really viable platform. So my first question is, what do you see as stakes in the ground that are already there that make that viable for enterprise-grade and what things are here being announced and what's coming? So you know, as you know, John and David, we've been working on Yarn for a long time and even before, you know, Hartenworks, you know, while we were already Yahoo, it was pretty clear to us as we saw Hadoop take off, you know, both internally and in the community, that, you know, Hadoop and MapReduce was going to be a big, big piece of the puzzle, right? It was a tremendous amount of value in the ecosystem and the software that, you know, there was going to be an opportunity to monetize it and commercialize it. But having said that, it's, you know, if you don't sort of disrupt yourself, somebody's going to do it, right? And that's a really, you know, that's a mantra we all live by because the fact of that matter is that, you know, being part of the community means we see all of these things on a daily basis and we get that quick feedback from the developers. You talked about the developers, right? We get quick feedback from the developer community about this works and this doesn't work and I want to do this, this and that, right? So if you add all of these things, you know, Yarn's a really big puzzle, a piece of the puzzle in the sense that it allows you to interact with data in ways that were not possible before. And if we make the, and obviously we as a company and we as a community are making a big bet that all your data is going to land up in Hadoop. You know, our motto as a company, we want half the world's data on Hadoop. If that happens, then it's not enough to, you know, just do one thing or the other, right? This year you've seen a lot of things on SQL, next year you'll probably see event processing, you'll see machine learning. So we've always been very aware of the fact that you need, we need to be a platform and not just a solution because, you know, one solution is going to solve a set of use cases but a platform can actually, you know, something like Yarn can help people, that can help the ecosystem build out a set of solutions and we don't have to be doing all of the innovation ourselves, right? And that's the really key part of being in the community is that we allow people to come in and make their enhancements and make their innovations and you're already seeing that with things like, you know, Falcon for example, coming from, you know, InMovie, right? InMovie is a startup, they don't really care about being in the software business but they do care about solving their problem, right? But they do care about solving their problem in a way where the community embraces it and the community can sustain that innovation and that's a really key piece of where we are going with Yarn. So talk a little bit more about Yarn. So everybody knows Hadoop but maybe that everybody doesn't know, maybe we can get a little didactic for a minute. So Hadoop is this batch system and MapReduce allows you to run one job at a time. Absolutely. Yarn changes that. Talk about specifically what Yarn is and what it enables. So, you know, a simple way of looking at it, you know, as I talked to a lot of customers, you know, some of them are more savvy at the other, right? Talk to a lot of customers and users. A simple way of looking at it is, you know, if you go back three, four, five years ago, while I went back seven years ago and when I was sort of with Hadoop, was that you could run, you had all the data in Hadoop. You could put any data you want on HDFS. HDFS is great, right? It's really economically feasible to store data there. But then MapReduce was the only algorithm you could use to process that data. As any engineer will tell you, that sucks, right? Because there's no silver bullet, right? There's no silver bullet, there's no big hammer. Now, we understood this a long while ago and what Yarn is, is sort of a re-imagination or re-architecture of Hadoop itself. And that's a really nice way for us to call it Hadoop 2.0, right? It's not just a version number. It's also the second generation of the architecture, right? So Yarn allows you to do multiple applications on the same platform while providing, you know, core services like resource management, security, multi-tenancy performance, all of these things happen by the platform itself. Now, you can ask, what do I do as a user? What do you get now is the ability to not just run MapReduce on your data, but seven other algorithms, right? Seven other implementations. I mean, I was at this great keynote by Yahoo today, where they talked about how they now have, you know, 300 nodes of real-time event processing on top of Yarn. They have, you know, 40, 50 nodes worth of machine learning on top of Yarn. And these were not just use cases, but capabilities that you couldn't do with MapReduce. And this is making a meaningful difference for the business at Yahoo, and it's showing, you know, what you can get when you actually put up a more generic system like Yarn below what is- Yeah, you tweeted out, I picked this up, 30,000 nodes, 400,000 jobs per day, 10 million compute hours per day in Yarn, twice as much work on the exact same hardware. It's amazing, isn't it? It's, I mean, we can spend another hour talking about what it means and how we did it. But the key part is, it's key to understand that, you know, you get significantly more value out of your existing investment, right? And that's a really key message of what you're getting with Yarn. And by the way, we're just getting started. And that, you know, the stats you saw from Yahoo were just, you know, existing MapReduce applications. And that doesn't include all of the new stuff they're already doing on top, but just, you know, a storm or spark or any of these things. So there are people who would say, well, look, Hadoop was meant to be batched and it should stay batched. But I'm reminded of, because the mainframe used to be batched, too, when it became transaction. So was there a bit of a tug of war alongside the community or was it sort of, everybody sort of agrees this is the direction that we need to go? Absolutely. I mean, you know, in some ways, you know, one of the strengths of the Hadoop community is that it's very pragmatic, right? It's also run by, you know, people who've not only developed the software, but also, you know, supported at scale. You know, yours truly, for example, not only was I developing the software, but it was also on hook to actually make sure it works, right? And that gives you a perspective, a discipline, and more importantly, a sense of what is important for the customer. A little skin in the game, more than a little skin in the game. Aruna, I got to ask you about obviously getting down the theme of enterprise-grade, security is a big deal. So identity and authentication is needed. You guys have the support there in the community for that. But perimeter security is a big deal with REST API. So what's the answer for that? What are people doing? What should people know about what's being announced here? Is there new security things? There's some talk about NOx out there. What's out there right now? So Apache NOx is, you know, again, sort of our attempt at taking the use cases and the requirements we're seeing in the enterprise and putting it in the core platform, right? So Apache NOx gives you a single sign-on facility where you can come in one place, authenticate, authorize yourself, and once you get through that, you're done, right? Is that replacing the other projects? I mean, it's an add-on. It's an add-on which, you know, you got Hadoop, which is security, you got Hive, you got Edgepace. All of these things have, you know, slightly different takes on security. What NOx does is it comes in and provides a, you know, uniform, you know, sort of umbrella under which you as the end user now don't have to think about all the vagaries of security of, you know, three different platforms, right? So you can now do it one place and NOx will take care of talking to Hadoop or Hive or Edgepace or Edgecatalog. And that's a security focus for you guys? That's absolutely. And that's, again, true to what we do, it's in Apache. It's an Apache Software Foundation incubation project. I'm sure you've heard from Rob about this. So, you know, we talked to a lot of folks out there. You know, we mentioned it a little bit last night. Compliance is a big issue. A lot of folks, you know, big IT enterprise shops have, you know, Red Hat in there. They got a bunch of other open source stuff. Now they want Hadoop. There's a lot of POCs going on a large scale. They have to have some certain compliance. What are you guys doing in that area? Is there a certification involved? Or what's the update in that? So, absolutely. We definitely work with, you know, really, really key and important partners like Microsoft, AirData, Rackspace, and Red Hat. You'll also see us launch, you'll also see us launch the Yarn certification program. This allows, you know, end users, more importantly, ISVs to come in, certify their application or tool, works on top of Yarn. And then it's easy for us to go tell the customer that, look, we've certified it, but we know that this stuff is going to work on Hot and Work's data platform and Hadoop too. So you can be sure that you can, you know, get good support and, you know, feedback from the vendor. So, how do you see this evolving? So, Merv today said that security is the weakest link. You called that out. And we've talked to a lot of practitioners within IT. They'll say, well, we have our security, you know, edicts, and then, you know, some line of business will come out and do their thing, and we go and ask them, are you, you know, compliant? And they say, oh, no. And so it's almost like they got to start over. Can you talk about what's, in real world, examples, maybe not examples, but, you know, generalize what's happening in the customer base and how are they going to deal with that problem? So, I mean, you start all the way at the bottom, right? I mean, what we see a lot is, you know, again, you go back to the developers, right? You have a bunch of developers who think Hadoop is the right solution and it's bubbling up and saying, okay, now we have Hadoop. So what suddenly you see is you get, you know, not one, but, you know, seven or eight implementations of Hadoop in the enterprise. That's when, you know, sort of, and a vendor like us gets called in and we go in and help them understand. It's not just about, you know, getting an application on top. It's all through compliance and security and auditing and, you know, all the ITs that are really important for the enterprise. So that's one of the main reasons why something like Knox was born, right? As we saw this happen over and over again, it was pretty evident to us that we had to take that enterprise requirement and solve this in the core of the platform for once and for all in some sense. I'm always right. I mean, you know, Hadoop has had Kerberos authentication, I would say about three years now. It's not, it's not really young, but not as a really old, right? I remember the time we went through, then we were already Ahu, and it was a sort of, I wouldn't call it a nightmare, but it's pretty close in some sense. You know, how do you get everybody to understand what is security and what is Hadoop? So that's what we see now play out in the enterprise. And that's where you'll see, once we get Knox going on with HTTP2.0, that's where our roadmap is, that'll make it easier and easier. And I think a lot of it is like business value. It's like the cloud. The business value outweighs the risk. So people go forward. They charge forward, you know, and then the torpedoes, and then they, right, they work like crazy to try to get it back up. How do you spend your time these days? Obviously, you know, Rob was talking about, you guys have a nice 70% of your business mixes recurring subscription, obviously that means you got some validation with customers, 30% training, consultancy, which means obviously education, whether it's service providers. You know, it makes a lot of sense. But how do you spend your day with when you talk to customers? I know you and I were talking last night, you're talking a lot of your top customers. What are they saying? What are their top pain points? What's their escalation in terms of solutions and POCs moving to production? What are just some of the trends? Can you share some anecdotal stories or data that you could provide to the audience? I mean, it's sort of all across the board, which sort of makes it really fascinating, right? So, you know, sometimes they go to a, you know, a customer, let's say, you know, take Chicago, right? And you're talking to this customer and, you know, You're still Boston. We don't like Chicago. Even I know. I'm only kidding. We're still a morning. All right, let's say Boston. Blackhawks did a good job with the permits. Not the Yankees, you know, Yankees. Red Sox, Red Sox. Okay, same with Chicago, I'm only kidding. Right, so, you know, it's fascinating, you know, you go talk to them and they're like, you know, you know, I recently had this conversation, unfortunately, you know, like it was Chicago. I kind of like about it. And this customer is like, you know, we love yarn, we understand all these things and this is what we're doing today. And as we see all these use cases, we'd love to move that on the yarn. And I'm like, awesome, right? I mean, frankly, you know, some of it is big, like it's an eye-opening thing for me personally, right? On the other hand, you also go to another set of customers, you know, probably 200 miles north or south, and you still have to help them understand what Hadoop is. You have to help them understand that Hadoop is not a silver bullet. You have to go through, you know, it's the, you basically go through the steps of, you know, crawling before you walk. We're not trying to tell them to, you know, install yarn and do seven different things on day one. What we're trying to do them is, you know, get your data on Hadoop, make it secure, run your basic jobs, prove value to your business, right? Whether it's cost or savings or better insights or, you know, value from data. And then you can go take the next steps. Yeah, I mean, it's the classic early adopters have specific use cases they know where they are and the ones that are just right across the bridge if you know the chasm, they're trying to figure it out. So give me examples specifically where customers who actually know I want it for this product, this project and those use cases, what they look like specifically and then folks who are kind of like kicking the tires and need to be explained, what's their orientation? Be specific around some of the things that they're trying to understand or the inhibitors or the objections. Yeah, so I think one of the interesting things that I've seen recently is real time event processing. People, you saw the Yahoo use cases today where they want an insight, where they want to click to get into your, you know, EDL system. They want to price the ad, you know, the real time media exchange so on. They want to price the ad in response to a click, right? So those sort of things are coming increasingly. I wouldn't say, I wouldn't say I see it all the time but, you know, it's definitely people are interested. They're saying, oh, you know, we know that Yahoo is running, you know, Storm on the Iron for example or something else on the Iron and they're like, we want to do this, we love this idea, we love the fact that we can have one platform in which you can run, you know, MapReduce and batch and interactive queries with Hive and also interactive sort of real time processing. Talk about Storm. You mentioned Storm. It's getting a lot of buzz here. There's some tweets going around that you're finally seeing Yahoo kind of talk up Storm. What's the latest on Storm? You know, the really big cool thing for me is, you know, Yahoo's investing a lot and that's, you know, that augurs really well for the community. Obviously you saw a bunch of, you know, things come out in GigaOM today, a couple of other new startups with sort of real time event processing engines built again, completely on top of Yarn. That's sort of really interesting in a sense but obviously, you know, sort of going back to my open source roots, you know, the community is still, you know, we strongly believe the community always wins and the more you can attract as a community, the better you are off as a project. Great, we got a break here. Arun, co-founder of Hortonworks. Final comment, I'll give you the last word here. What should the developers, the folks in the community, walk away from this year's Hadoop world? From your perspective, obviously, the community having solidarity is key. You know, debates and arguments here and there, it's always a good and healthy, it's all fun, we're all professionals, we all know each other. You know that. So what's the message you want to share with the folks out there? We're contributing, coding, looking for navigation on, you know, what's hot, what's not, what's the key message? So the key message is, you know, let's try and make the core of the system better and better and better, right? Let's, and we can, whether you work for a vendor or whether you work for a, you know, sort of a heavy end or a power user, that makes a really big difference. So if you can help, if you can work with us and make the core of the platform better and better, Yarn's a good example. You know, it's been out there for now for a while, but I'm sure things can be better than where they are. That's my key point. If you can help with us, you know, make the platform and the core better then everybody sort of benefits the, you know, the rising tide benefit, you know, pushes all boats. Excellent. David Richards is up next, John. We'll be right back with our next guest here inside theCUBE, Arun, the founder, co-founder of Hortonworks here at Hadoop Summit. We'll be right back with our next guest after this short break. Got all the programs out there and identified a gap in tech news coverage. There are plenty of tech shows that provide new gadgets and talk about the latest in gaming, but those shows are just the tip of the iceberg and we're here for the deep dive. There's a difference between technology consumers and those who live the business day today and our viewers recognize that. The market begged for our program to fill that void. We're not just touting off headlines. Our goal is to provide you with a story, but we also wanna analyze the big picture and ask the question.