 Welcome back everybody. This is theCUBE, SiliconANGLE's flagship video production. We are live at .conf 2012. That is Splunk's annual user conference. We're here in Las Vegas at the Cosmopolitan Hotel. Where we've had now a day and a half or so of wall-to-wall coverage of all things Splunk and big data. My name is Jeff Kelly with wikibon.org and I'm joined by my co-host Jeff Frick from SiliconANGLE. Thank you Jeff. Welcome back everyone. I think we've had a tremendous day of guests today. Really insightful and learning a lot. Hopefully you're learning a lot. Again, the hashtag is data journey. We hope that you'll join us on this journey through Twitter and so now I'm happy to welcome our next guest, Arun Murthy, who is the founder and architect of Hortonworks. Welcome Arun. Thanks a lot. A cube alum. And also a cube alum, that's right. We have not had too many cube alums today. So have you been finding the show? Have you enjoyed yourself? I mean, of course it's Vegas. That's a known story of itself. But it's fun to be back on the cube. Thanks a lot for having me over. Great. So what do you see any great surprises in the last couple of days? Any things that really stuck with you? It's really fun to see how the two user communities, the Hadoop user community and the Splunk user community are beginning to attract with each other. That's, I mean, I'm also here to learn. Learn from people in the Splunk community and what we can do better, how we can work together well. So it's been all good. So yeah, I think we definitely want to understand from you a little bit more about how Splunk, what Splunk does and what Hortonworks does kind of compliment one another. I know there's a little bit of news today as well. HDP, the Hortonworks data platform 1.1 release was today. Tell us a little bit about that. What were some of the key additions, I should say. Yeah, so I mean, as you guys know, we released our flagship 1.0 release of HDP in June in the HDIP Summit. We've got some extremely awesome feedback from the customers. So this is more of an outstanding incremental release where we go take feedback and put it in. We've done a lot of work with things like availability for both the HTFS policy system and also the MapReduce system for the job record and so on. So that's been actually really great to actually get feedback. We launched it in June. We've got feedback. As a startup, we run hard and long. You don't want to be fast. You guys didn't want it all on or off, right? It's like a four-year-old either on or off. Exactly. So you got to be on all the time. It's been really good being in the market and getting all this great feedback. We've hopefully got more things coming down the pipe in the next couple of months, but this was actually a way for us to take this feedback from the market, put it back as a release and hopefully iterate for the faster from now on. So it's been all good. Fantastic. A large part of our audience, obviously, he's going to be familiar with Hortonworks and Hadoop, but maybe you could lay out a little bit, kind of compare side-by-side, what Splunk does, what Hadoop does. There's one of the issues around big data courses. There's a lot of different startups and technologies and approaches, and there is some confusion. So maybe if you could help us understand, really, where does Hadoop play and where does Splunk play and how do they work together? Yeah, I mean, I'm a big fan of Splunk, right? I mean, I've read about the history of the company, how the founders started looking around and saying, there's a lot of machine-dunner data, not just today, but even 10 years ago, and how they've gone ahead and taken very simple things, but also made them really intuitive for the end user. That's a really big deal. I mean, Hadoop has a lot to learn from Splunk in that fashion, right? It's Splunk makes it really easy for anybody to come in, on-board, get data, get reports, all that stuff. And that's really very impressive as a product person, not necessarily as a technology person. That's actually something we can learn a lot from in terms of ease of use, how you onboard customers, and so on, right? Now, Hadoop, of course, is at kind of a different end of the big data spectrum. I mean, like you guys know, a lot of us are hard-working, we've got experience with people trying to clean insights out of multi-terabyte, if not petabyte data sets. So that's kind of the other end of the spectrum. As a result, we see a lot of synergies between Splunk and Hadoop. I mean, there's definitely a lot of stuff we can learn from each other. There's also a lot of ways for you to, you know, I'm excited about things like the guys, you know, the stuff that Splunk guys are working on, things like Hadoop Connect, whether, you know, bi-directional data movement between Hadoop and Splunk, and so on. And that's some of the interesting aspects. And also, like I said, I'm also here to learn. I'm, you know, attending a lot of talks from things like Hadoop Connect, Hadoop Ops, and so on, really excited about, you know, the fact that the two user communities can, you know, learn a lot from each other. Yeah, that's interesting that you mentioned that. The communities actually do, and it'll remind me of one another, because they're both very excited, very passionate communities, very involved, much more so than I've seen in other IT or technology areas. So talk a little bit about in your community. Hadoop community, the role the community plays, and how Hortonworks is working to really engage that community and work with them to kind of bring the best out of them to build your product and your services. Yeah, I mean, you know, like I said, Hortonworks is pretty much all about open source and all the Apache Hadoop community, right? So we've got a long, I mean, all of it just, you know, a 12-month-old company as individuals, we have a lot of experience, you know, being in the community for about, you know, five, six, seven years now, right? And we've learned a lot from our user base. I mean, not just, you know, not just the open source community, but also the end users. I mean, right now we've got a bunch of, you know, customers using Hortonworks, we've got a bunch of people at Yahoo, for example, using Hadoop for a very, very long time. So as a result, we've got, you know, great feedback from the community and that's key. I mean, you've got, I mean, that's key as, you know, Hadoop is still very young, right? It's only four, five years old in the sense. And as any young product, you've got to, you know, shift fast, you know, kind of unfortunately break things, but also learn fast from it. And having a good and vocal, passionate, engaged user community is a really big part of it because, you know, that helps build the ecosystem and gets us feedback from the users of our software, which makes it, you know, critical for us to listen to their voices and help. And that's primarily the way for us to actually, you know, improve the project of the product in that sense. So that's been actually great for us. Yeah, but then of course you've got to balance kind of their feature functionality requests. I need to solve a problem quickly. I can't do it the other way things are set up versus kind of more the long-term vision of where you want to go. Yeah, and that's, you know, one of the things that, you know, me and everybody, you know, the founders, the other founders of hardworks and also in the rest of the communities is really passionate about is, you know, you've got the short term which, you know, somebody is screaming in your, you know, you've got the customer on the phone trying to tell you things don't work. But you also got to measure it against, you know, what this means to the project, you know, you know, five, six, seven years from now, right? So it's kind of all about balancing the long-term and the short term. And it's a tough, you know, wide act, wide thing to follow, but you've got to walk the trapeze wire in some sense and, you know, do that. And that's what, you know, something we focus a lot on at hardworks. It's something we're very proud of actually. I know also, you know, you guys are really focusing on kind of getting out into the community, doing training sessions, helping people learn more about, not just hardworks, but Hadoop as well. Absolutely. So tell us, I know you guys have, I think a road show, really, where you go around the country. Tell us a little bit about that and what your goals really are there and maybe where you'll be next. I'm sure some of our ideas will be interested in tenting. Absolutely. So we've got this road show series going on now and that's going to hit, you know, things like DC, Chicago, Austin, and so on, right? I mean, like I said, end of the day, Hadoop, it's got a lot of buzz, which is, you know, really great for both as an open source community and as for us as hardworks. But end of the day, you know, you still have so many people who haven't heard of Hadoop, right? Who haven't, even if they've heard of it, they've not used it. I mean, it's hard in some sense. It's kind of still hard compared to, you know, your existing enterprise software, right? So it's going to take us a lot of time and education to get there. Also, one of the problems you have with the disruptive technology like Hadoop is that you've got to actually build up a community of people who can understand the technology and, you know, effectively use the technology. And that takes, you know, years, not months, right? I mean, you know, it took a lot of time for people to understand SQL and databases and so on. I mean, you know, if you can, I'm talking from a 1970s perspective, right? So Hadoop's going to take a long time. And first of all, you need to have, you know, qualified people there, you know, really short supply. If you're an expert at, you know, Hadoop at this point, you're probably working for one of the vendors. And, you know, we're still hiring. But on that, there's a lot of people who really, really need, you know, qualified talent, right? And that's very, very premium in the market today. So for us to be able to take Hadoop much beyond its current, you know, web-to-auto kind of companies, you know, and the Bay Area and so on, it's going to take a lot of education, a lot of training, which is what we have focused on, in terms of spreading the message. It's not just about hard work, but it's about, you know, the whole Hadoop ecosystem. So are you finding some ways to accelerate that process? Especially on the training and to get more resources out there? Yeah, I mean, we're also partnering with a lot of people, right? You know, hard work is partnering with, you know, really important companies like Microsoft and Teradata. And as a result, it'll help us, you know, spread the message of Hadoop faster through all our partners, right? And, you know, end of the day, it's kind of like you've got to walk before you run. So as a result, we are still in the early stages of, you know, crawling and walking, and it's going to take time, as with any new technology. But we're confident we'll get there, given that Hadoop is such a, you know, turning out to be such a key company of the whole big data strategy for the entire industry. Yeah, so we're confident. So one of the themes we're seeing here at the Splunk show is that, you know, a lot of their customers are starting with one use case and then moving on to further use cases that maybe they hadn't anticipated. And one of the things that Splunk does to kind of accelerate that is make their product available for free to download. Hortonworks does something similar, a different model in that it's, the whole platform is free to download. Everyone can go to Hortonworks website and download it and start playing with it. But I wonder, have you seen a similar, a similar type of pattern where customers maybe download the platform to do one thing and then realize now we can really expand to other areas? Absolutely, and it also comes on to the whole education and training aspect, right? So what happens is typically you have one use case in mind and, you know, with Hadoop you've got things like advertising, you know, analytics and so on, right? People have this one use case in mind, they download Hadoop, it's typically, you know, not, it's interesting, what we've seen with Hadoop is both top up and bottom down, right? So bottom up, what's happening is that, you know, you've got a couple of engineers in your company, right? They have, you know, they're passionate about technology, so they go get Hortonworks, you know, HDP, Hortonworks, they have a platform, they install it, run an app, and then they go to their management and say, you know, this is what the value of Hadoop is, this is how we're getting, you know, returns from our investment here, right? And over time what you see is, you'll see not just these two engineers here, but you'll also see like two engineers and another organization, another organization, within the same enterprise, you'll see, you know, five, six, seven different teams and at some point it kind of bubbles all the way and say, then people realize that, you know, you've got to have an enterprise level strategy where you can't have every single, you know, organization doing it for their own, right? And, you know, then you come to this point where, you know, you probably talk of Hortonworks at that point, you know, you have a shared Hadoop cluster because that's how you kind of exploit your economies of scale and, you know, and also your people investments, right? You don't want 10 people, you know, kind of managing small, 10 small Hadoop clusters, you want, you know, you want your central IT organization adopt Hadoop and manage it. That's, so exactly what you saw, its point is what you're seeing a lot of, you know, it's kind of going bottom up and also top down. People also, at the CXO levels, they're realizing that, you know, Hadoop is a key driver of value and as well, you know, they have to look down and say, okay, how are we going to use Hadoop? And as you get this top up and bottom down, they're kind of going at some point they meet and that's, you know, kind of the sweet spot for Hadoop. And we see that over and over again, not just, you know, at the Web2Auto companies and insurance firms and, you know, banks, all these places, they're seeing that, you know, pretty much the same story repeat over and over again. But the whole hype about Hadoop also helps, right? It means that people are, lot more people know about Hadoop than, you know, even two years ago, right? When I started working on Hadoop six years ago, I didn't know about Hadoop. Right? So it's definitely, it's, you know, it's definitely getting better and better. So this attention and the coverage that you guys give at theCUBE and so on is actually great because this means it helps the, it helps spread the message of Hadoop faster and helps us, you know, get adoption cycles shorter and shorter. Yeah. And again, just to dig down one layer on that initial adoption, because you said it's a pattern you've seen over and over, which is great, right? That's what you want. So, you know, one of the top two applications at that first guy, more often than not, or maybe not more often than not, but on the cases where you've seen it grow in the enterprise. What exactly are those projects? Things like, you know, the top two would definitely be things like ETL and analytics, right? People are trying to find a needle in a haystack. You know, they've got, you know, they're, you know, they're data from existing data markets, they've got data from social, you know, you know, their websites and so on. And trying to find that needle, they're trying to find that one insight which they can use to drive business value. That's something we see over again, not just, you know, in the vector auto companies, but also in things like, like said banks. Right. Everybody's got a website now, right? You've got to, you know, any bank, they've got a website. They'll probably also have a mortgage division, right? And trying to find that one view of the customer is, you know, something we see a lot. And is it usually a view that they've got in their mind that they're trying to validate or to get the return on that investigation? Or is it, I'm just walking down the street and I trip over the, you know, the Goldman penny? It's really both, right? I mean, that's why you have this notion of data scientists becoming so popular these days. I mean, the data scientist, you know, in my view is somebody who can have a hypothesis based on, you know, existing data. But then you want to be able to test that hypothesis, not just on a subset, but you want to be able to test that hypothesis on your entire enterprise dataset, right? That is, that's kind of the key driver we're seeing. And what Hadoop allows you to do is do that, not just on a small subset, but also on an entire dataset. And that could be, you know, tenabytes and not tenabytes, right? So we see that you need to be able to have that insight in most places. But then once you have that, it's kind of, you know, training a brain, right? Once you have that, you continue to do better and better and better. And that's kind of the pattern we see over and over again. So we've got only time for one more question. So I'd love to get an update. What do you guys have on the horizon? You know, we've got actually, 2012 is coming to a close pretty quickly, or it seems like it was just... Exactly, yesterday. Yeah, we were just kind of starting the year. We were writing our predictions for 2012, and now here we are almost to 2013. But what's on the horizon for you guys both in the short term and a little longer term into next year? You know, our primary thing is also, you know, as much as we are a product company, we're also a technology company. And without technology, you can't drive a product in West Bursa, right? So we spend a lot of time and effort on, you know, things like Hadoop 2, right? Hadoop 2 is on the horizon though. It was in alpha for a while. It's close to a point where, you know, it's, you're seeing, you know, customers, you know, one that we can share of is, you know, we probably will have a two, two and a half thousand known install of Hadoop 2 in the next few weeks. So that's really exciting time, right? And that means that, you know, myself is going to be ringing fairly often, which is not so... Which is good news and bad news, right? It's good news and bad news, right? It's a good news study, actually. I'd rather be, you know, yelled at than ignored. Very better than the alternative than the quiet cell phone. Exactly. You know, I'd rather be yelled at than ignored, right? So as a result, that's exciting. As a technologist, that's really fun. I mean, I remember the stage when we did this and Hadoop 1.0 when it became, was in the similar phase about three years ago, right? It was insane. The last two or three months were completely crazy. So we're kind of getting up to the... And, you know, as you guys know, we're working a lot on yarn. You guys have colored it on a silicon angle. Yarn is going to be, you know, at a really large install base in the next few weeks, actually. That's really exciting. And for people who might not understand, yarn is kind of the next generation of map reduce. Absolutely. And bringing kind of new ways of processing data. Yeah, so far, you know, Hadoop's just been about HDFS for storage, which is raw bytes to file system and map reduce for processing. With yarn, what we're doing is we're taking Hadoop much beyond map reduce. Map reduce is, you know, essentially one algorithm, right? There's, you know, tens if not hundreds of algorithms who want to run a dataset. So with yarn, we love you to do not just map reduce, but, you know, MPI and graph processing and bulk synchronous processing and so on, which are really, really interesting from a lot of customer perspective. And, you know, a lot of it is, you know, use case driven. And with this, we'll be able to take Hadoop into significantly more use cases and help people solve more of these on Hadoop with all the data that you have on HDFS. That's really exciting. So with yarn getting into, you know, this, you know, outside this beta stage, it's actually, you know, one of the most exciting things I've seen personally for a while. That's great. Well, thanks for stopping by. Absolutely. We'd love to have the CUBE alum come by. We look forward to seeing you on the next time. So, again, we've got more guests lined up. Thanks, Arun Murthy, the founder and architect of Fort & Works for Stop & Buy. We've got another guest queued up by getting all mic'd up. So we'll be back in just a minute.