 Like, strategy-wise, I think there's more to it. I mean, if you look at it, right, I've worked with Eddick for, you know, six years now, and seven almost, right? And I've learned a lot from him personally. I mean, he's a great technologist, first and foremost. And as a result, he's a great role model for, you know, I mean, six years ago, I was a study-eyed kid, you know, doing stuff, and, you know, there are lots of people like that, right? So it's actually great to work with somebody like Eddick, because at first and foremost, he's a great technologist. And you get a lot in the lot by working with people like Eddick. He understands the chessboard. Exactly. And he's playing the poker match at the same time. So this is what's going on. I mean, it's a rapidly growing market right now on all levels, the infrastructure. So it's challenging, you know. So the question I have is, there's no question you guys have rapidly moved from a number two, far behind Cloudera, coming out of Yahoo. But the expertise was a deep bench. That's well documented. We talked about it. You guys also kind of change your tune a little bit on the mojo. Like, hey, we're not fighting Cloudera. We're collaborating. And that was a great, it's always kind of been that way, but the press tried to create something there. Yeah, I mean, it's understandable. I mean, you know, that stuff sells. Well, that's what the press wants. Yeah, that's what they use, right? Yeah. I don't blame people for that. But even if you went back to Eddick's keynote last year, right, essentially his message was, if you want to work in Apache Group, you're going to be a friend of Hardenworks, right? That's the bottom line. You all understand that if this platform doesn't go anywhere and it's all stuck in fighting or whatever it is, at the end of the day, we all survive when Apache Group does well. So as a result, I think everybody understands that we've got to work together. We've got to take this platform, make it better and better and better over time. Because frankly, I mean, you know, Hadoop has been around for six years. We'd like to see it around for 60 or 100 or whatever it is. So it's going to take time to make sure. Well, it's clear you guys are, it's clear that everyone wanted to see that you guys come out with your GA product, Hadoop data platform, what's your move, right? So it moves pretty clear. It's pretty obvious. 100% all in on the technology. 100% open source. All money goes towards development, period. Hadoop, Apache, 100%. On the business side, it's pretty specific. Create bulletproof, enterprise grade, quality, OEM deals. Stable, reliable, OEM is the right word. Business deals with the big guys. And let that be the pull. Yeah, and we also talk to a whole bunch of individual customers here in the fortune, 5, 10, whatever you name it. But we feel like this is the easiest way for the enterprises as a whole. Our deals with the big partners, right? We feel like that's the easiest way for Hadoop to actually get real use cases, real pull in enterprise in sort of knocking doors one by one, right? I mean, we've been doing that one by knock doors one by one, but we feel by partnering with people like Microsoft and Teradata, we can actually take this technology and make it much more prevalent in a significantly smaller time frame. Yeah, I mean, we commented, wrote in my blog post that you guys are putting everyone on notice with this disruptive technology. But at the end of the day, the big guys need help, because they're, I mean, I don't want to say you guys are small potatoes compared to what they're doing, but in a way, from a dollar standpoint, the revenue that EMC and IVM-HB are doing are off the charts. I mean, you're a rounding error in terms of what they could do in terms of revenue. We want to change that. So, but the disruption you're providing in their market is significant. So, I think you guys are creating such a cost reduction of data store and data warehousing that they're eventually going to need to reposition their products. Microsoft, for example, huge risk in the IT side. You got Office, you got, you know, they're just their enterprise IT marketplace. It's so big. So, for Azure to kind of bundle in with you guys is a smart move, but you guys got to be ready. So, talk about the product. So, as you guys launched it, why those features you guys announced that provisioning I noticed was interesting, that's a key thing, and the age catalog. So, can you talk about those two products? Why specifically those two? So, I mean, clearly, let's talk one by one, right? So, on body, you know, the first step anybody needs to do is, you know, install how to manage it, monitor it. And our feeling, I mean, we've talked about it in the past, right? I mean, our game plan always has been we want to do stuff in the open. We want to use, you know, we want to put all the code in the open so that there's no people are not worried about proprietary bits and so on, right? And more importantly, we also want to have open interfaces so that you can interact with, you know, Hadoop, the whole Hadoop stack. And also we want to use open technologies available. We don't want to be reinventing the wheel. I mean, of all things we can do, I'd rather spend time, you know, improving HDFS and MapReduce and EdgePace than rather than improving, writing custom, you know, monitoring technology, right? That's not, you know, a useful way to spend our time. That's the ecosystem picked out, actually. Exactly, and we want to be able to, you know, integrate with, you know, an IBM Tivoli or an HP OpenView because frankly at the end of the day, you know, people have invested, you know, enterprise have invested millions of dollars in their existing technology, right? Nobody's going to come in and say, you know, our message is not, you know, take Hadoop and throw away everything else you have. That just doesn't work, right? So what I'd like to do is keep Hadoop open, keep Hadoop useful enough that people can also integrate with their existing stuff. And as they see value, they'll, you know, you want to work on the core, basically what you're saying, core. Core and also make sure that we integrate with, you know, your existing tech, whether it's databases, you know, that's, you know, what we did with Teradata and Aster for Edge Catalog. This allows us to actually integrate Hadoop in a very nice, clean way. And it's a very simple way, which is what we're all about, right? In a simple way, in your existing stuff and existing enterprise technology, so you can, you continue to get value out of your investments while you get, you know, a good force multiplier, if you will, with Hadoop out of it. The metadata is a big message, too, with H Catalog, I think a lot of people are kind of overlooking the importance of the metadata services. Absolutely. And we talked about Doug cutting about Avro. We just found out that Sam, ex-Cloud era labs guy, so this company of Twitter is working on some stuff open source in Avro. That whole notion of cataloging out metadata, where do you see that going? Because that's something that developers want. They don't want to have to recode everything over again and embed code in code. Absolutely, I mean, it's not just the developers, but also, you know, your tools, right? Your tools don't want to understand where it's an HDFS, for example, because they've been built with certain assumptions in mind. And also at the end of the day, like I said, the developer wants to think in terms of data sets. He doesn't want to think in terms of directories or files, right? You want to think, oh, I have a data set which represents my monthly revenue for July, right? Not, this is slash foo slash bar, whatever, right? So this way, you know, people are, we want to make it easy so that you take an existing knowledge, existing experience, and allow that to Hadoop to actually get much more benefit from Hadoop. Otherwise, it's going to take us, you know, the next six, seven, eight years before people understand the stuff, because frankly, as technology is new, it takes time, right? I mean, every technology adoption curve is thought. So we want to be able to make it simple and easy for people to actually take their existing knowledge and experience and use Hadoop in that context without having to learn a whole new stuff, a whole new lot of stuff. Edge catalog, like I said, one of the interesting stuff is, you know, the way you can now think of tables, the tables could be in edge space, the tables could, the actual data could be in edge space, could be in a database, it could be in HDFS itself, but by allowing this notion of a table at a metadata level, you can actually integrate all the existing apps without worrying about now what do you do when data moves from one place to another and so on? Which is actually some, we've got some really great validation for that in the market. I mean, we're excited about the work we did with Aster, our data, on SQL Edge. So there's a whole bunch of work we've been doing and you can actually expect more of that. Cool. Let's talk about MapReduce 2.0 or Next Gen MapReduce, as it's called. So, you know, we've heard, it's kind of in the alpha phase at this point, but explain to our audience really what it's all about. Right. So, I mean, if you look at Hadoop MapReduce, Hadoop today, right, you have two essential parts. One is the HDFS, which is a data storage layer, and you have MapReduce, which is the data processing layer. Now, unfortunately, what's happened is if you want to process data in HDFS and you're going to have, you know, mass amounts of data in HDFS, your only option was MapReduce, right? Now MapReduce, the programming model, is great for a whole bunch of use cases. You know, I've worked in MapReduce for almost six and a half years now. I really like it. But along the way, you know, it's pretty obvious that MapReduce is not a penacea, right? It's going to take the right set of use cases and not all use cases can be broken down into a MapReduce paradigm, right? So, our effort along MR2, or we call Yarn at this point, is essentially a way for us to take Hadoop and generalize it so that you now have the option to run, you know, different data processing applications in the same Hadoop cluster, right? It doesn't have to be limited to MapReduce. You know, today it could be open MPI, tomorrow it could be real-time data processing, you know, day after it could be, you know, graph processing, right-hand processing. What we're trying to do is essentially generalize the infrastructure of Hadoop to the point where you get much more return on the investment, right? People are not going to be comfortable buying a $5 million Hadoop cluster just to do MapReduce and another $2 million cluster to do something else. If you can allow them to run both apps in the same cluster, it's not only easier for you in terms of a CAPIX perspective or OPIX perspective, it also is easier from an operational perspective. You don't have to have two different sets of people monitoring a cluster or two different sets of tools monitoring a cluster and so on, right? So this way we think that allowing both of these to run in the same Hadoop cluster is a big win for the whole ecosystem and you'll actually see some of the work we've been doing with our partners to actually show up to see, you know, really cool applications which will show up on MapReduce on Hadoop itself and that's exciting from where I stand. Right, so what are some of those applications? What are some of the possibilities that are going to be available to enterprises once MapReduce 2.0 really kind of comes to fruition? So one really good example is MPI, right? So I want to give a shout out to folks at, you know, Green Plum who've been doing this stuff, right? So the essential, if you like an open MPI, right? It's one of the, you know, classic and well-known MPI programming APIs out there, right? So what this now means is you can take an open MPI application and point it instead of, you know, today open MPI can work with Slum or TARC or any existing resource managers. So you can now take over MPI and instead of pointing to TARC or Slum, you can point it to Yarn or MR2. And suddenly without your knowledge as a developer, you start using Hadoop for MPI processing, right? So it's actually a really useful way, a really useful way for people to start, you know, getting more value out of their Hadoop clusters without investing, I mean, frankly, it's almost, if they do our job well, it'll be zero investment, right? You have your open MPI app, you're using it with something else, just point it to Hadoop and it starts working, right? And that's a pretty, you know, strong feature set, if you will. So that's MPI. So we hope that should be available fairly soon. Then we have, you know, things like real-time data processing needs, right? There are a bunch of people trying to solve the real-time stuff. MapReduce is great at batch processing, not particularly so at real-time, but you guys probably saw the Storm talks or the S4 from Apache and so on. We're talking to a lot of people there. There's interest there to take their applications and port it so that it works on MR2 and you actually suddenly, you know, two days of, you know, two quarters from now, you're looking at, you know, seven, eight different apps running on the same Hadoop cluster along with MapReduce. That's actually significantly, that's significantly cool from where I stand, just as a technologist, but also great for, you know, from the, just for you. Talk about some of the other projects. So for the folks out here watching, Arun is the co-founder of Hortonworks. He's a tech geek. He's a developer. He's a part of the core team over there. Talk about the event here for the folks who are watching. They're on Twitter. They're going to Twitter Stream and watching our live stream. But outside of that media, there's, they want to know what's going on. Could you just quickly share the vibe here, the couple of sessions? What's the audience? You mentioned there's some business talk, but there's mostly developer focus all the way. Yeah. I mean, that's one of the really cool parts about, you know, Hadoop Summit is so much of it, you know, given its genesis in Silicon Valley, right? There's so much technology here. It's really nice to be able to get to a place where, you know, just behind us, you see a whole bunch of people, you know, showing the stuff they've been integrating with Hadoop, building for Hadoop, or using Hadoop for. But there's also a whole bunch of people talking about how they're going to take Hadoop and improve it. There's also a whole bunch of other people talking about how they're already using Hadoop in their ecosystem or in their production warehouse or whatever it is. And it's really fun to look at these use cases and think, you know, it's kind of blows my mind. When we started on this six years ago, we had, you know, we had, we never saw this explosion. What's the most mind-blowing thing you've seen here at the event? At the event, I've unfortunately hadn't much time yet. Or just, you know, within the community. Yeah, giving more range. I've talked to a whole bunch of people who are using things like Hadoop and MapReduce to do genetic sequencing, right? What they're trying to do is, you know, come with personalized medicine based on you, right? They're not trying to say you're a 35 year old male, but they're trying to say you, looking at your, you know, DNA sequence. I mean, if you look at the opportunities it provides, it's, you know, it's mind-blowing and awesome, right? I mean, the fact that we can make such a difference to, you know, humans as a whole is actually insane from where I stand as a technologist. How about the most mind-blowing thing you've seen within the community of the code base? Is it NameNote? Is it some of the Avro? Is it Storm? What are some of the cool projects that you go, wow, I love that one. That's got a lot of potential. I mean, definitely, I'm going to plug yarn. That's not a surprise, but. Which one? MapReduce too. Okay, got it. But of course, you know, Edge Base is great. I mean, we see Edge Base everywhere. You know, big kudos to the community. There's Edge Catalog, I mean, we see a whole bunch of people starting to pick it up. There's also things like S4 and Storm and all these things which are coming into the ecosystem and that's it. It's really exciting time to be in this ecosystem and see all these technologies being developed. You know, I'm not smart enough to build all of them but it's really cool to be able to learn from all these guys. But in the developer community, let's just talk global developer community from super, like we say with Edge Base with Todd, you know, the machine code to developer frameworks. In between that range of, I would call developer IQ, there's opportunities, right? What are you seeing in the developer landscape coming out of the Hadoop community? Tools, frameworks, we saw Spring, for example, had a lot of success selling to, say, VMware. Is there a framework around Hadoop that is going to be adopted that's the most traction? And then on the development side, what tools and what not is the most popular? If you look at, you know, the easiest way for things like MapReduce to be used in enterprise, right? I would say things like Hive and Pig are the big ones, right? You'd say, you know, SQL is something everybody's familiar with. SQL is something that a lot of, a whole lot of tools are familiar with, right? I mean, a lot of tools are built with SQL in mind. A lot of tools talk to SQL in the back end, right? So I would say things like Hive and Pig are some of the things we see all the time. And it's really exciting to be, you know, in this market to see, you know, what we can do from where I stand. I mean, I look at myself as, you know, like an assembly guy at this point, if you look at Hive or MapReduce, right? So I would look at myself as an assembly guy and I'm trying to figure out what are the tools I can provide to Pig or Hive? So it makes them better and faster and more efficient. And definitely, EdgeCat Log is in the same space. Yeah. So you're an assembly guy? Yeah. The hexadecimal is like your second language. Debugging registers in the old days. We actually had cobalt referenced on the queue earlier. That was the first, right? That was describing what, Hive or Pig? I forget what it was. Hive scripts, it's like writing cobalt, yeah. So. I wonder if you could talk a little bit about your partnering strategy. Yeah. You know, we see there's all the vendors out here that you're partnering with, some open source, some not. What, how important is that partner ecosystem, kind of building on top of Hadoop? How important is that to Hadoop success but Hortonworks success? I mean, frankly, we think of Hadoop success as being very closely tied to Hortonworks success, right? On the other hand, like I said, all the whole lot of enterprises have a whole lot of investments in existing vendors. You know, one of the things that really excites me about our partnership with Microsoft is, it opens up Hadoop to a whole new ecosystem, right? It's not just, you know, Windows is the operating system but it's a whole ecosystem you get with Microsoft. You know, whether it's Excel or PowerPoint or SharePoint, right? And by opening that up, the use cases we get, the whole, the new set of users we get is incredibly exciting. Same with our partnership with Teradata. I mean, Teradata is prevalent in so many data warehouses. The fact that we can take HTTP and, you know, Edge Catalog and all these components there integrate well with Teradata means that not only people get value out of it but we also learn a whole lot about how we can do better with Hadoop with our partners. So it's an exciting thing for us in terms of our partnerships. So you talked about, you know, getting more value out of your existing IT investments through Hadoop, through Hortonworks. But what about the potential for overlap? I mean, when you go into a company like Teradata where, you know, this is a long-time data warehousing vendor in the more traditional model and you come in and you say, well, we've got this new way to process tons of data that's inexpensive, it's scale out. So you're partnering with them now but is there some tension there or how do those conversations go? I mean, you know, at the end of the day, people are, you know, great technologists everywhere, right? I mean, Teradata has been in this business for a long time. They understand the use case that Hadoop is great at. We understand the use case that Teradata is great at. So we are fortunate to be working with them and at the end of the day, it's a win-win situation for both Hadoop and Teradata and for the customer. As a result, we're very happy working with them. So, again, in terms of this event, you know, we're seeing a good crowd, 2,500 I think, or close to it, fantastic. So, you know, looking ahead, so, you know, we're here next year, let's say we're sitting here on theCUBE. What can we expect to see? What is it kind of on your roadmap, on your priority list in the next six, 12 months? So, the first thing foremost is we want to make sure that Hadoop, like I said, has to be enterprise ready. It has to be stable and reliable. So we're spending a whole lot of time adding a lot of the enterprise features like Eche and so on. We've also invested a whole lot of time in, you know, in the future of Hadoop, which is Hadoop too. There's, you spend work on yarn. You can expect, you know, multiple programming interfaces to Hadoop, MPI or graph processing or whatever it is. So, we definitely want, you guys were here last year, I mean, at least in the Hadoop world, right? So you've seen a lot of development in the last, you know, eight months or so. You can definitely expect that to increase. It's exciting that people like Microsoft are there investing, so you can expect that there'll be much more, you know, engineering investment going on into Hadoop itself. And for me, that's the most exciting part. Sitting as, you know, wearing my Apache hat, it's really exciting to see all these people invest in Hadoop and help make Hadoop better and better for everybody. So my question is, what should we ask Eric tomorrow? He's going to be in the queue. He's always shy. What should we ask him tomorrow? Ask him, you know, about me. Okay, we'll do a personal evaluation on you. Okay, how's his code? Have you reviewed his code lately? He's going to say, you know what? You don't document. It's just you're an assembly guy, that's why. Arun, thanks for coming on. Congratulations on all your success. Thanks, Loc. Really proud to watch you guys grow like you have and watch you guys really change and come out as a scrappy startup, really mature, take the high road all the way. It's great to see the good citizenship and on the product side, you guys are now not a far number two. I think you're really right behind Cloudera and I love the approach and we'll see kind of how this shakes out. And, you know, it's pretty clear what you guys want to do and keep us updated. Thanks for coming on theCUBE. Again, thanks for having me on theCUBE. We're from all of us at Hortonworks. It's a pleasure. Hortonworks is putting on the event. Soon this event's going to be so big they'll have to get rid of it like Cloudera got rid of Hadoop World. Obviously a sell out, the demand is strong. The Apache community is so on fire right now. HBase, HDFS, MapReduce, MapReduce II. Great innovations and all that's going to be empowered in with big mainstream applications. We're expecting big ecosystem growth. Arun, thank you for coming on theCUBE. We're right back with our next guest after this short break.