 Okay, we're back live at Strata Conference in Silicon Valley. This is where the big data show and conference is changing the world. Big data is really making an impact, not only at a technology level, but also at a society level. And siliconangle.com is all the coverage. Go to that site and you can get all the news from the event as well as all around the tech industry, cloud mobile and social, where computer science meets social science. That's our motto. I'm John Furrier, the founder of Siliconangle.com and I'm joined here with my co-host. I'm Dave Vellante of Wikibon.org. This is day three at Strata and we're here with Cube alum, Arun Murthy from Hortonworks, co-founder and one of the lead architects there. So welcome back. Thanks a lot for having me. It's, you know, a pleasure as always. You guys are doing an awesome job covering Strata. Thank you very much. So we met on theCUBE last fall. You guys were, you know, came out. You know, when we were here a year ago, right John? There was, we had some folks on. We said, geez, there's not a lot of competition at the platform level. And all of a sudden, all this competition came out, which is great for the user community. You guys have taken a really unique strategy and we've been covering that like crazy. What's changed since we talked at Hadoop World? In a lot of ways, you know, we've learned a lot. We've been in the market now for, you know, another four or five months. We've talked to a lot of customers. We've got a lot of great feedback. As you guys know, Hortonworks data platform, which is HTTP one, is now being in tech preview for a few months, got great feedback. And we're really, really looking forward to, you know, having a GA product in a matter of weeks at this point. And also it's really exciting to, you know, get a lot of validation from the market. You've seen the partnerships we've had with Microsoft and Teradata. It's really great to get validation of, you know, both our business strategy and our open source strategy from, you know, really key partners like Microsoft and Teradata and, you know, also talent. Arun, I want to probably get in some of the geek talk. What's it like now at Hortonworks? You guys were a fantastic startup story. Obviously, you guys were at Yahoo. And, you know, Yahoo's been in the news for all and all the news organizations trashing Yahoo because they're, you know, the whole turmoil at Yahoo. But, you know, Yahoo's still a massive company, massive pioneer in Hadoop. You were there. All your co-founders were there. You guys came out of Yahoo. Cloudera was the only game in town. You guys came out both contributing to Hadoop in a big way. The dust has settled. The community is vibrant and growing strong. You guys have funding from Benchmark, which is no stranger to open source and competing in and doing well. And, you know, they're not, you know, a hack of E.C. for them. They're pretty solid. What's going on with the company right now? Tell us as a founder, a co-founder, you know, what's it like and how do you feel and what's the team look like and just what's the vibe at Hortonworks right now? You know, like I said, it's been a really amazing learning co-op. I mean, we've been at Yahoo for six years and we did, you know, a lot of work. We learned a lot of stuff in terms of technology. But when we come out, it's actually been great to learn about how people use Hadoop outside. And for us, it's helped us to actually improve our understanding of Hadoop itself in some sense. And we've been focusing on improving Hadoop, not just for, you know, the average enterprise, but also for the end customer, which is, you know, you see a lot more consumer facing startups using Hadoop now. You see a lot more, you know, social guys use Hadoop now, like you guys pointed out. It's been great to learn from these guys and help them use Hadoop better and better. You know, the fact that now, you know, people use Hadoop to actually do geo-research, to predict earthquakes is amazing, right? It's, we feel as an engineer, you're actually making a real difference to the world. You know, it's not just about getting, you know, better ads in some sense. It's about making a huge difference to the world. And that's actually exciting. You know, I was talking with Eric, your co-founder who was the CEO now. He's back doing the product stuff and you know, Rob's taking over. You know, he's an operating CEO. You know, he's been a success. That I'm really excited to say that, you know, I want to say congratulations and some of the things that I was speculating about Hortonworks may or may not make it. You know, I was wrong and you guys have done well and the market's changing. So, you know, it looked like Cloudera was the only game in town that they were the leader. They're still doing great on the revenue side and stuff with Hadoop. But you guys have really played above the fray in terms of high integrity. He kept the messaging really within the community. He kind of kept peace. You know, as he quote the godfather, you know, he kind of kept the families in order. But more importantly, that aside, congratulations to that, the market's changing. Here at Strata, the big message that we're sharing with folks is that there's a huge diversity of applications. So, it depends, you know, to use upon, you know, depending on what view of the elephant in the room you're taking, there's a different upside, right? So, we're hearing Abhi Mada talk about applications. You're hearing people talking about business warehouse, business intelligence, pure play Hadoops, as Jeff Kelly pointed out. So, given that, it's a lot more room for you guys to operate. Visa, V-Cloud error, or anybody else, Vertica. So, what's the strategy technically? You guys are coming out and building your code base. Where are you guys settling in in your go-to-market from a tech standpoint and what's that going to do for the community? So, you know, we've been extremely clear about this since day one, right? I mean, if you looked at, you know, like the Hadoop Summit, which we actually came out last year, right? We've been very clear that anybody who, you know, collaborates with us or with the community, effectively, right, on Hadoop is a friend of ours, right? It doesn't matter if it's Clouder or anybody else. Our end goal is to make sure that Hadoop progresses because at the end of the day, to be honest, if Hadoop doesn't progress fast enough, nobody's going to be in business, right? So, we are very focused on making sure Apache Hadoop and the whole ecosystem, right, moves forward. And as a result, anybody who, you know, wants to collaborate with us is a friend. And, you know, it's something we've been very clear about since day one. And we're glad to see that, you know, the community and the market has taken on message well and, you know, there's been, the community's made a lot of progress in that space. So, okay, I totally buy that and endorse that 110%. Honestly, more Hadoop, better Hadoop, everyone wins. Jeff Hammerbacker from Cloudera even said that, hey, you know what, competition's a great thing and more people doing stuff at Hadoop and better for everybody. Okay, that being said, open source is great, we love it. But there's always enabling opportunities on top of Hadoop. We're seeing an application surge. What are you guys doing with Hortonworks? Because we're seeing Cloudera's revenue numbers as Jeff Kelly did in the market sizing report on big data, which you can go to wikibond.org slash big data and look at the numbers. The revenues are really strong. And so, they had a very successful add-on with their proprietary enterprise thing that's not necessarily part of Hadoop. You guys are going to do services, you've been clear from day one, but there seems to be other opportunities out there with the tech that it's not so much forking the code, it's just an enablement. What are you guys looking at there and what's the story? And that's a great question. And an example for that is the work we're doing with both Microsoft and Teradata. We're helping these guys work with Hadoop in some sense and as a result that's actually a great revenue stream to work with Microsoft, Teradata and the things we're doing with talent, for example, will make sure that it makes Hadoop easier and easier to consume for the average enterprise. So we're definitely, in terms of technology, at this point our strategy hasn't changed. We're still going to be all open source all the time. We think that partnering with folks like Microsoft and Teradata and Talon will actually help nice revenue stream on the site. That's fair. So what's your core focus right now when digging into some of the projects you're working on, obviously making the code better? What specifically are you working on you share with the folks out there? So primarily for the last 18 months now actually, my focus personally and out of my team has been on Next Gen MapReduce as we've talked about it at Hadoop Word. So we see this as an opportunity to make Hadoop a really, really generic data processing platform to move it just beyond MapReduce because there's so much demand out there. Once you have, I mean, the bottom line is there's going to be a lot of data on Hadoop. There's going to be a lot of data and once you have that variety of data, you're going to have a variety of applications which need to use that data and those applications can't be just MapReduce. So as a result, having Next Gen MapReduce and focusing and making it really stable and solid, we put out a blog post a few months ago which talked about how at this point we're actually better than Hadoop 1 and performance on pretty much every single dimension. That's pretty scary given that we redot all the code in the last 18 months, right? And it's been a lot of work. And as an engineer though, it's a lot of code, it's a lot of work. We're actually excited to be near the end of the tunnel at this point. Jordan Works is a pretty amazing phenomenon when you think about it. It's this company that hasn't even shipped a 1.0 product yet and you've got all this buzz in the market. I guess part of that is because you guys are the good guys, right? I mean, you're given back to the community and the other part is you got this unbelievable pedigree. But you've super glued yourself to Apache Hadoop and that is another, I think, interesting phenomenon. I have not seen this kind of buzz around a company that again hasn't shipped a product. I mean, I got to go back to, in my brain, I was thinking next when Steve Jobs started next year. You don't probably have that much buzz, but it's quite an amazing phenomenon, as they say. And because you have super glued yourself to Apache Hadoop, people I think associate things like NextGen, MapReduce, very much with your activities. Yeah, I mean, there are two ways of looking at it, right? I mean, one is we have this pedigree because we've contributed in the community for six years now. I mean, Hadoop started in 2006 and pretty much most of this team has been on Hadoop since day one full time, right? So we have built up a lot of history with the community. And in some sense, Hadoop's available, right? You can download today, you can run it, you can download for the last six years now and run it. So what we've come out and said is look, if you can use Apache Hadoop and you have a need for either training or support, we can help, right? So in some sense, we've had, of course it's Apache Hadoop and it belongs to ASF, but we've had a product out there. But again, like you pointed out, HTTP has been in tech preview for about four months now and we've got some amazing feedback from a lot of customers and some of them are actually running it in semi production effectively at this point. So we've got a great feedback and we're working continually hard to actually put that back in into the ASF and get Hadoop better and better as we ship our GA product. I mean, it seems like even at least I, my sense is you got some pretty patient capital because you guys have been pretty clear that you're going after the services side of the equation which is not the fastest way to flip a company, obviously. And then of course everybody talks about the Red Hat model and Red Hat was left alone for a long time to do its own thing and bake. So I mean, is that right? I mean, do you feel like, is it correct assertion that capital is patient, that there's a long roadmap ahead here? I mean, absolutely. I mean, you know, we've got a great sort of investors, you know, benchmark and Yahoo and index, right? And everybody has been very clear that this is a long-term play, right? We see that Hadoop is going to change, you know, the way you process data and the way you store data and big data is changing the industry. So if you think of it as a long-term play, then you know, it's been great that we've had investors who've been willing to be patient with us because they know that, you know, this is something you can do for a long time, right? Yeah, now, of course, it's a long play. Of course, in this world, I'm not sure how we define long-term. Exactly. It used to be in decades, but maybe now it's not defined that way anymore. So let's get into a little next-generation MapReduce. You know, everybody who comes on theCUBE and has a new idea and is trying to sell, you know, an enterprise version or an enterprise addition saying, oh, it's not enterprise, you know, Hadoop's not enterprise ready. We're going to try to make it better, et cetera, et cetera. It's not real-time yet. What's next-gen MapReduce going to do for all this? So that's, you know, exactly being our focus, right? One of the two parts is, you know, so far Hadoop's been only MapReduce, which means it's only been batch, right? Now, clearly, like I said, once you have so much data on the system, right, which is the file system, you have to be able to process it in different kinds of ways. And whether it's batch or real-time or MapReduce or, you know, more stream processing or whatever it is, right? So as a result, it's something we've seen for a very long time. And, you know, we've invested in this for over 18 months. So if you think about it, it was, you know, well before we came out, it was, you know, we saw these problems even at Yahoo, you know, 18 months ago in some sense, right? So we feel really, really confident and really excited about the fact that next-gen will actually change how people perceive Hadoop, right? So far, the perception of Hadoop has been, you know, something you do at the back end, right? As a result, you know, but HTF is getting better and better, HB is getting better and better, and MapReduce itself is, you know, another piece of the core, getting better and better. We are really excited to see, you know, the different kinds of applications that come on Hadoop. I mean, we talked to a lot of SISIs and they're excited about taking their existing applications and porting it to run within Hadoop as part of next-gen. And that is something, you know, we feel that we'll see like an explosion of, you know, innovation and applications in MapReduce. And we're already seeing it, you know, among the customers we talked to, among the, you know, folks we talked to. And we're confident that we'll see more and more of it in the next, you know, 12 to 18 months. That's an exciting thing. So when you look toward the future of Hadoop, I mean, we talk a lot about the nitty-gritty and the infrastructure, and do you see that conversation shifting to applications? I mean, look down the road three or four years. What do you think the conversations are gonna be like? Is they're gonna be much different than they are today? Absolutely. I mean, today not many people talk about Linux, right? Because it's taken as a given, it's a platform. What people talk about is, you know, the applications on the top, which are the databases or, you know, what have yours, you know, the application of the processing engines put on top. So we feel that as long as we can make Hadoop, you know, stable and solid and reliable and performant, we'll see more and more of the applications, you know, folks like Prasad are a great example, right? They're taking something which doesn't exist, a banking application, they're putting it, run completely on Hadoop, right? As a result, you know, the end user doesn't really know or care that he's running on Hadoop. But we want to be, we want our vision is to make sure that Hadoop can be that, you know, next gen, you know, enterprise backend effectively, if you will, right? If we can do that, I think, you know, we'll be in a great shape, not just us, but also the whole Hadoop community. Arun, we're gonna, we're getting the hook, Mark wants to give us the hook, but I don't want to give up the microphone because it has a million questions for you, but two, I have two questions. HBase, big, we believe in HBase, we're talking about before you came on, I see we're using HBase in a big way for SiliconANGLE's backend data project. But here at the show, it's exploding in terms of, wow, it's becoming kind of center stage, you're no stranger to HBase. Are you surprised by that? Not surprised? We're hearing it being the de facto on top of HDFS, it's going to be the play. Why is HBase so popular? Are you surprised? Can you share with us your perspective? And then I want to talk to you about Hadoop Summit. Yeah. So, I mean, in some sense, it's not really surprising because HBase is, you know, pretty much going through exactly what we saw with Hadoop, right? I mean, I was just talking somewhere that the first Hadoop Summit we had in 2008, we had 300 people show up, right? Today you do a Hadoop Summit and you'll have at least 3000 people show up, right? So, in some sense, we're really excited to be supporting the HBase community as they take HBase and make it a real, real changer. I mean, HDFS is a file system, but the ability to do something more real-time on top of it, you know, the fact that you can draw a dashboard, you can draw monitoring dashboards or you can draw, you know, the Twitter feed dashboard, you know, trend it with Twitter or whatever it is, is something you see a lot, like application. That's a real application that you see HBase really shine, right? So, HBase doesn't surprise me at all and be a very, you know, honored in some sense to work with HBase community to actually help them get HBase. Where is HBase going to go from a product standpoint? Obviously, there's a lot more work to do. Is it product ease of use? Is it more tightly coupled with UI work? Or what's the forecast in your team's eyes around HBase? You know, HBase is, there's definitely a lot more performance work. There's a lot of innovation happening in terms of features, I mean, but HBase 094, I think, you get co-processors, which is a really big deal if you're writing new applications, right? Co-process is big. And also the HBase community at this point is also getting more and more focused on multi-tenancy. Again, that's massive if you want to be in the enterprise. You want to make sure a single user or a single application can't, you know, harm your HBase cluster and so on. So there's a lot more work on HBase, I'm sure the community is, you know, focused on it. Government, what was the second part of your question? Summit. Hadoop Summit. So obviously you guys are running Hadoop Summit and we'll be there at the Cube. Absolutely. Thanks for your support. What's it going to be like, the more deep dives? Cause there's been some technical people here that I've talked to in the hallways and said, you know, it's too many suits and not enough deep dives on tech. Is it going to be more techy? Is it going to be like Strada? Hadoop World was rolled into Strada from Cloudera. So obviously O'Reilly is taking that over and it seems to be much more of an application-focused conference, not a tech geek drill downs. They've done some, Ed talked about that earlier, that they are doing some drill downs in key proven areas. But it's still an emerging community. What's the strategy for Hadoop Summit? So the strategy for Hadoop Summit is basically, it's going to be run by the community. I mean, if you look at, you know, any of the programs, you know, committees or the chairs or whatever, it's going to be out of a wide variety of the ecosystem. You know, definitely Hardenworks and Yahoo, you know, please do host it, but you definitely want to make sure that it's more of a community event and everybody feels like they're a big part of it. Again, being, you know, this is going to be in Silicon Valley, so there's going to be a lot of focus on tech. But again, it's going to be a lot more focused on, you know, a couple of the, one of the interesting things is, I don't see a lot of focus on the operational aspects of Hadoop. And we've actually been, we're trying really hard to make sure we get a lot of focus on the operational aspects of Hadoop A. And there's also, you know, data science and business intelligence and so on. It's going to be tracks on each of them. So it's going to be interesting fun. It's going to be a good mix of both tech and, you know, operations and also, you know, business. I actually got heard some good feedback from people in the community saying that we're looking forward to Hadoop Summit for, you know, to learn more, not so much use cases, but really get into the operational and tech details. Yeah, we're looking forward to it. It's at the San Jose Convention Center, which I think could be a, I mean, this is a great conference, right? But as John said yesterday, it's given an A except for the venue, you know, San Jose Convention Center is going to be a little bit more space. So we're really excited to have the cube there. And we'd love to have you guys, you know, to film all of it. Thank you very much for, yeah, right, we're talking about that. And where there's a will, there's a way. So we'll try to make that happen. So Arun, thanks very much for coming on the cube. It was a great pleasure having you on again. Great Arun is the co-founder of Hortonworks, the leader in Hadoop. He's got a background in Hadoop, really pioneering this whole team at Yahoo between Hortonworks and Cloudera. That is the core tenants of in terms of vendors out there, funded startups really drive on the commercialization of Hadoop in a good way. Great community. Obviously this show is to me is a testament to the success of the Hadoop community. And you know, that is really all a testament to the work that you guys and your team are doing with Cloudera. So congratulations, and I want to just.