 We're with a famous celebrity, Cube alumni. One of the early Cube alums, Doug Cutting, founder of Hadoop, coined the word Hadoop. We've been through that. Go look at the archives. We've had many videos on that. We don't need to drill down on where you got the name, all that stuff. Doug, welcome back. Thank you very much. We are an institution and you are on the Cube and always an invitation. And we expect you to be on the Cube every time we do Hadoop World. So we'd love talking to you. So last time we talked, you were working on some cool stuff. Tell us, what's up right now? What are you working on? Sort of a continuation of the same stuff. I don't remember what I said last time, so maybe I'll repeat myself. But I still spend some portion of my time writing code, spend a portion of my time working at Apache, trying to keep the foundation running smoothly. And spend some of my time out talking with customers and seeing, you know, trying to stay on top of what's going on in the Hadoop World. We just had the new president from Hortonworks on her who was talking about open source 2.0. And we had a really good, we don't want to get into the whole Hortonworks clad era thing, but we really want to talk more about open source and how it's evolved. And, you know, we're old enough to look at, remember the first generation of open source. There's been evolutions. Give us your view now of what we're open sources. Apparently the word 100% is a very key word right now. So 100% open source seems to be the flavor. The communities are growing. More computer science students are native open source guys. So all this is happening. What's your perspective on open source community as it's evolving and whether some of the new things that are happening, new dynamics, good and bad? Well, I think in part there's a technical component to the success of open source in this space, which is simply, if you need things to scale, you don't want to pay per CPU you're running on. That's an hindrance to scalability. And the Hadoop World is all about scalability. We want to scale to arbitrary amounts of hardware. We want to use the most economical hardware we can. And so open source really fits that very naturally. It is a scalable technology at root. I think also people are more concerned about getting locked in to something from a vendor. And so this notion of having a platform that everyone can share and build on that is open source is a real advantage. You don't have to have that concern. It means the vendors, like Lidera, have to earn their paychecks. We can't just sort of just keep sending a bill every year and not do anything. We've got to provide real value to our customers. And we think we're doing pretty well at that so far. You are, he says. So two things you said there, one is commodity components and the other is lock-in. We were at Oracle Open World a few weeks ago and listened to Larry Ellison. We love Larry because he just gives us so much fodder. And he said, you know, big data meet big iron. And he showed us this basic demo and with a million and a half dollar infrastructure. So you think of big iron, expensive infrastructure. You think of Oracle. You think of lock-in. Is that sort of antithetical to your vision of Hadoop? Or does it fit in because it's bringing that capability to the legacy world? Well, I mean, Oracle has adopted Hadoop as its big data solution. So, you know, it does fit in obviously somehow. You know, if people find value in big iron, then good for them. We're really designing for a more commodity price point. But if you look at Oracle's big data appliance, it's not priced the same. The hardware is not priced in the same realm as their Exadata hardware appliance. It really is priced more like a commodity component as it ought to be. So one thing people talk about is performance. And that's with Impala, your new platform that Jeff was on earlier. We went into great detail about the vision there and the prospects of real time, et cetera, et cetera. But at the end of the day, there was some skeptics out there that didn't think the performance could be there around HDFS and some other things. You guys have checked those boxes. Take us through the view of the passion, but also more the technical things. Why that performance is getting especially HDFS. Well, I think a few years ago, we thought there were things that didn't really fit in the big data platform. Certain capabilities were kind of outside the scope. Joins and certain kinds of interactive workloads and transactions and things like that. And I think one by one, we're knocking those off. And I think it's really proving to be a very general purpose data platform. And we'll be even more so in the long haul. So this is just one more of those. It's been a big project, big effort to figure out how to get better interactive performance. And we're not through with that journey. There's going to be more improvements down the pike for Impala. But there's a lot of low level tricks. Code generation, being aware of the CPU cache, being aware of how busy different disk spindles are. There's a lot of tricks going on in Impala to get the performance that we want out of it. The point is your engineering. It's not just a pipe dream. There's some engineering involved. A lot of engineering. Speaking of engineering, I want to talk about H-Base. Because we love H-Base. And you've seen our little demo app we built on H-Base with Danny and our team. But I want to talk about the guys you just hired. Michael Stack, who's a great guy. Met him at H-BaseCon. And a few other guys. Two other folks? Yeah, so from double to five. Explain that whole dynamic now. I mean, you've got some aviators on the H-Base team. You know, we try to have people who are involved in just about all the projects that we include in the distribution in CDH. And there's, I think, 15-odd projects in CDH. And it's really helpful when we're supporting customers on those projects to have the people who are some of the primary developers on the projects in-house to directly answer those questions, respond to feature requests, and bug reports. And so H-Base has proven hugely popular. It's a complicated product. And it's moving quickly. So having more people in-house who can really help us with that and help make sure that the project is moving in directions that our customers are interested in is critical. So I wonder if we could tap your brain for a minute. You're talking about your involvement in Apache before. We had the CEO of Squirrel on earlier. And they're behind, of course, the Accumulo project. So I wonder if you could comment on that. And maybe in terms of, we're talking about H-Base, is H-Base evolve into something like that? Or is what Accumulo is doing distinct enough to create its own space? I can't answer that definitively. I mean, it really depends on the H-Base community and the Accumulo community and whether they want to come together or whether they think they can go it alone. Yes, not up to you. Apache doesn't have to feel compelled to have only one solution in one area, which if a commercial vendor would probably want to focus its customers on one particular solution. We're really all about fostering communities that are collaborating in a healthy manner at Apache. And Accumulo and H-Base both seem to be doing that. They seem to have slightly different user bases. Whether over time one will come to subsume the other, whether they'll merge, we'll see. I don't think they're getting along well. I think the competition is healthy. I think they're driving one another. Technically, they're quite distinct. Is that correct? I wouldn't say quite distinct. I think they're distinct. There's some features in each that the other doesn't have. And that some of their users depend on critically. So I think people having a hard time, people who've adopted one or the other, can't switch easily. But they're also very similar in many ways. They're both inspired by the big table paper from Google. What about visualization tools? You're hearing a lot of activity. We've heard a lot about it since our first Hadoop world. I know Tableau's made a bunch of announcements this week. People lining up to do business with them. And in fact, however, Tableau sort of predated Hadoop. What's your thought on that? How visualization tools will evolve to really take advantage of some of the concepts that you were talking about earlier? Distributed nature of Hadoop. I think as this platform gets greater adoption and becomes a mainstay of IT, we'll start to see things. Visualization tools develop directly for the platform. Currently, I think the bulk of those kinds of tools are things that either pre-existed or are developed outside and work with other technologies as well. And I think over time, that's going to shift. At Cloudera, we really want to develop a level playing field for vendors to compete building those kinds of solutions. So we're not doing anything in that space at this point. Doug, let's take a little mental break and not go into some of the technology stuff, but take a step back and talk about the Apache community. Something that you're very passionate about, obviously. And take us through for the folks out there, a day in the life of keeping the peace in Apache or holding the fort down and managing the projects because Apache is growing very fast. You have diversity in there. It's pumping on all cylinders and it's like an engine. It's got to run. You got to lube it up sometimes. So take us through what goes on. A lot of people don't have the inside baseball and it is changing. So share with us your color commentary on Apache. It's not as sexy as you might imagine. Sounds really good. What are they doing in there, Apache? I want to join. I've been on the board of directors for, you know, I guess three years now or more than that maybe. It's a man cave for geeks. It's like, you know, something's going on in there. I've been the chair for the last couple of years. And, you know, we have little power. I mean, we have a lot of power, but we exercise it very little on the board. Primarily because, you know, we're a volunteer organization. We can't really tell people what to do. They're all people who come there because they want to be there. And we need to make it a nice place for them to be. So mostly what we look for are places where people aren't behaving nicely, aren't listening to other people, aren't creating a level playing field for the creation of high quality software. And then sometimes we have to step in as a board and sort of say, you know, quit, stop doing that. But that's very rare, you know, on the order of once a year, we've got a hundred projects involved. There's a lot of self-governance. We try to push everything on. Mostly what we're doing is monitoring. We have a meeting every month. We read about 40 reports for every meeting. Every director has to read those. What kind of reports are those? What happened in the project that month? What happened in there? What happened to people in more like status updates? There's status updates from each of the projects, each of the, you know, project, well, a third of the projects reports every month to the board and submits a written report. So there's some process, but again, recognizing that this volunteers is not like, you know, people are grinding away, but people are working hard. You know, they don't submit a report. We come down on them, take it to the woodshed. Whatever that is. Proud meeting is the status we get. Shame them into. So is it more just process oriented, you know, communication or is it, you know, too way, you guys advising, giving direction? Definitely trying to advise. We see, you know, signs of things that we know in the past have worked poorly, have led to bad community structures, and we'll say, you know, that's probably not the best way to go. Here's some examples of how that might go wrong. So perhaps you oughta think of doing things this way. And, but you know, try to have a soft hand generally and just limit it to advice. Mostly, and I'd say the bulk of the time is just keeping that watch for light. So my final question, I know you're super busy and you are always shaking hands, kissing babies, stealing lollipops, you know, what you do in your business. I know you got to get back to your other stuff. Thanks for coming on theCUBE, but my final question is, what do you think of the show this year? I mean, obviously, you're a proud papa for Hadoop and Mike Olson was awesome this morning saying, you know, he remembered the first Hadoop role when 500 people showed up, they were blown away that 500 people actually know what a Hadoop is. What's your feeling of the show in terms of content and vibe? And then just talk about how you feel. I'm as honest with how big it is. I mean, it just keeps growing and growing. And, you know, I think it deserves to be that big, you know, and if I think about it rationally, but I'm still surprised, you know, and viscerally, I'm like, whoa, this is really, really happening. It shouldn't surprise me, but it does every time. When I see the ballroom this morning, you know, there wasn't an empty seat in the house, balcony was full, and you know, so that's exciting to see. And what about the show content? Looking at the evolution of Hadoop, what are some of the key themes that are orbiting around the ecosystem right now that are hot? I think we're seeing a lot of reports from real use cases that, you know, it's out of the pilots in more and more places. We're seeing spreads into a lot of new application areas, so I know it's just all exciting. It's all good, yeah. Okay, Doug Cutting, founder of Hadoop, one of the original guys working on the key project to get it off the ground. Great work, great success, always fun to talk to you, it's a great accomplishment, and proud to know you and interview you. Appreciate it. Thanks very much for having me. We'll be right back with theCUBE on SiliconANGLE TV right after the short break with our next guest.