 Okay, we're back at the, towards the end of the day here in San Francisco Live for the HBase Conference, the first inaugural conference for HBase. I'm John Furrier, the founder of SiliconANGLE.com, SiliconANGLE.tv's exclusive coverage of the Hadoop ecosystem, and we're proud to be here. John Shea, is that right? Yep, John Shea. Shea, I've butchered it a few times. Welcome back, so you work for Cloudera, I know that because we work together, although I didn't work for Cloudera in the same office. What employee number were you? I'm number 15. Yeah, you're close, okay. So you're under 20 in the early employees? Yeah, definitely. Todd didn't know whether he was a 10 or 11 or 12. I guess, you know. He's like a 10, I think. Like a 10, yeah. So you've been there for a while, so tell me, what do you think of what's going on right now? Just on a personal level, the ecosystem, you're heavily involved in the community, you oversee a lot of the development early on, and now you guys have hundreds and hundreds of employees. How do you, what's going on? I think like, you know, we're here at HBaseCon, which is kind of like, HBaseCon is kind of, HBase is kind of like the next system that's built on top of Hadoop. Like we started off with Hadoop, focused on the core storage and processing platform. And now we're focused here today on HBase, which is kind of like this system that gives you this real-time reading and writing and random reading and writing, which basically is going to open up a whole new set of applications that you can deal with at this big data level. And I think that one thing that we're seeing a lot of Cloudera is, all the Cloudera customers are using HDFS and MapReduce. But I would say a very large percentage of our customers are looking to HBase to solve that next set of problems. Now that we can store all the data, now that we can process all of it, some of these jobs will take hours or 15 minutes or a large amount of time before you get an answer. And the next kind of levels, like how do we get answers in milliseconds? So someone asked me, coming back to my Catholic roots, but they asked me, John, what's the whole eternity of big data relative to the ecosystem? Is it HDFS, MapReduce, and HBase now? Or is there other elements? There seem to be, those seem to be the core coming together. Is it four or is it three? I mean, I think most people who are users of the ecosystem are probably definitely using HDFS, definitely using MapReduce. I think more people will probably come into a hive or a pig necessarily before they come to an HBase. But, you know. HBase is scalable. HBase is scalable, HBase is scalable. I mean, there's certain things that you're gonna want to bang against, just MapReduce directly against HDFS. But if you're thinking about, serving real live traffic, HBase is potentially an option for doing that, whereas you would never consider using HDFS for that. And if you want to potentially get up to the second counters and statistics, people like Facebook and people like OpenTSDB, use HBase so that they can get up to the second or up to the- What's OpenTSDB? OpenTSDB is an Open Time Series database. And it's a project by one of the guys at Stumble, Benoit. He goes by the nickname Suna, TSUANA. And they basically use it to run all their metrics, application level metrics. That's an open source project. It's an open source project. It is, I believe it's GPL licensed, but it runs on top of HBase. So it's built to take advantage of what HBase can do with real time reads and writes and efficiently stores metric information, be it host level or application level, and presents visualizations so that you can actually see what's going on. Low latency or- Low latency. You will see basically up to the hundreds or tens of milliseconds ago, as opposed to having to wait for a hive job to come back after 15 minutes. So I mean, that's definitely kind of a use case. The Facebook guys have talked about something very similar, they have their own internal version of a system similar for keeping track of internal Facebook application metrics. And even their health management for their internal clusters. So what are you working on right now these days? Because I know you had multiple roles and you've been doing a lot of different stuff, project-wise within Cloudera. What's your core thing now that you're working on? So these days I primarily focus on HBase, and up until recently I've been primarily focused on kind of a combination of two things. A lot of what has happened is because we have this large uptake of customers and users, HBase is a little bit more immature than in HDFS or MapReduces. So there are problems that come up and because this is a production system and because it has essentially these high availability requirements and low SLA requirements, we need to be able to bring a system that goes down back up as quickly as we can. So I've been working on a lot of repair tools to make it so that the amount of downtime. Within HBase, not HDFS. Within HBase itself. Got it. We want to minimize the amount of downtime that's necessary due to let's say a bug or something funny happening in HBase. They still happen rarely, but when they do happen we want to be able to recover from them quickly. So that's been a good chunk of my time. Another part of it is going over these customer escalations is trying to find out potentially what some of these problems are so that we can solve those things more quickly. And then I guess the other major thing is Clonera has been growing. It's been growing pretty quickly. This year in particular. This year in particular. Throttle up, throttle that hard. Yeah, I mean and what we've really been able to do this year is like bring on a lot of new engineers. A lot of new support engineers. A lot of new people who are gaining expertise in the system. So actually a lot of my time is spent helping these folks come up to speed so they can do mentoring. And training, right? Training. Ramp up. Making tools so that they don't have to be as expert as like a committer on the particular project would be. And that's, it takes time, it takes effort. But it also makes it so that we can better serve our growing number of customers. How is the HBase community right now in terms of actual committers? How's the size? Is it still a small kernel and it's growing pretty rapidly? But you have your big whales out there Facebook and StumbleUpon and Riot Games and whatnot. But they're doing a lot of some good work there. So yeah, right now we have the community. The core community is 18 committers and PMC members. So that basically means that this group of 18 people essentially are the folks that are controlling the direction, the releases, and where HBase is going. I think two are at StumbleUpon, three are at Cloudera and one of them is myself. There's five or six at Facebook, one at Twitter, one at Trend Micro, one at eBay. So basically a lot of the companies I'm mentioning are obviously like major users of the system or providing support for the system. If I wanted, as we're, I've seen for our SiliconANGLE project, we're investing in HBase and a variety of other projects. We're totally into it and Hadoop, thanks to you guys and Mike and Amar and the whole team over there. It wasn't for you guys, we wouldn't be doing it, but we're going to start hiring engineers now. So this is kind of a use case that's just not my challenge, it's other people want to contribute. So I want to hire guys that can be contributing to the open source and really in an authentic way and really get behind it. What's the process for that? So I mean, I mean, Lajima, not a throw away, not a token gesture if you're a big company. We've seen that with other failed attempts. Legitimate startup and or mid-sized company or maybe even a large company. How do they ingratiate into the community? Yeah, so I mean, I think the easiest and most obvious way is sign up for the mailing list and start helping other people and kind of use the experience that you've gained from using the system and, you know, ask questions, answer questions. And, you know, we've definitely given people commit bit who are just helpful in the community and try to, you know, answer difficult questions based on their experience, right? The next level is to get involved with development. Like, you know, it might be that, you know, if we gave a better, like just as an example, if we gave a better error message, you can write a patch that will fix that error message so it'll tell me something actionable to fix whatever this symptom that I'm seeing. And, you know, we've seen a bunch of these kinds of things where like the symptom that we see doesn't really tell you what the root causes. And like, you know, if you've dealt with this problem multiple times and you just like, oh, I wish that error message was back, just write, like, it's going to be a relatively quick patch to say, you know, if we just had this little extra information, it would give us a better thread to pull on to find out what the heck is actually the root cause. That's good citizenship right there. That's absolutely a good citizenship. It is a great way to figure out how to do patches and how to, so it's a typical earn your way into a very collaborative way. We're a friendly bunch. And like, you know, we'll help you through the process of getting a patch in and all honestly, like starting off with a, you know, technically trivial patch is probably a really good way just to get started to understand, you know, there's all these tools you have to learn like, you know, Maven and the Java compiler and Git and SVN, there's a whole bunch of tools that, you know, that actually is probably the biggest hurdle to doing a contribution, generating a patch. But once you have that process down, then like, you know, you can focus more on getting a better into it. We use a lot of sports analogies here in the queue because we call it the ESPN of tech. Yeah, so this is kind of the hazing. Well, no, it's more a combine workout too. Like, well, hazing is once you get into the club, I guess, or the fraternity, but really it's more of show your skills and be a professional and, you know, just be honorable and show your muscle, right? I mean, demonstrate what better way than just write code, right? Yeah, I mean, you know, and like, And not be a jerk, right? I mean, there's, it's pretty much any time, like the people we look for in committers are making good technical contributions, are helping people in the community. And they're also, you know, agreeable people and people that we can trust, you know? And when I say trust, it's like, you know, I have the commit bit and I actually have the right to modify anything in there. But like, I'm only gonna modify and commit code changes to areas that I understand well, or unless I get like a second opinion from the person that I know knows that section well. Like, and, you know, that's kind of like, there's these guardrails and you know what your limitations are, and maybe you'll grow into some of these other areas over time, but, you know. Three strikes, you're out of philosophy almost, right? Yeah, I don't know about that, but, you know, I mean, you know, some of us have learned the hard way, some of us like kind of see this and, you know, I mean, it's overall healthy, right? Because I think as long as the net effect is a positive. Okay, being a heavy, heavy commit bit and being on the one of 18, the power 18, Karitsu there, what's going on in terms of timetable? What's the mindset of the group? Obviously, be cautious, do the right thing, but also, just some pressure to, you know, ramp up and get this thing going. The demand is obviously high. Here at the HBaseCon, it's sold out and there's demand. I mean, you know, everybody who comes to the table, everybody who's contributing code to the project, they have things that they care about. You could say it's their own agenda. You know, from my point of view, I'm at Cladera. I support a wide variety of customers and many customers. So from my point of view, I'm going to spend my time focused on correctness and focused on making it so that it's easier to support and operate because that is what relieves stress for my shoulders. Yeah, yeah, yeah. You know, different folks like maybe the Facebook guys, they'll just avoid certain features altogether and focus on performance of one particular part because that's what makes their application hum. You know, and other people like the Salesforce guys, they really care about their disaster recovery story. So they're going to invest there. And overall, like, you know, everybody in the community can provide feedback. And there's going to be some general, like- This is special isn't based on their interest, right? Yeah, yeah, yeah. Your agenda. I mean- They got to pay the bills. There's basically, this HBase project is, you know, the system is giving you these capabilities that you couldn't have before. And, you know- What have you learned over the past two years in particular with the HBase evolution? Because, you know, projects can get traction and become a lightning strike and grow in a mega way like this one is. What have you learned personally and also within the group? Yeah, so I mean, I've been involved in other projects in the past. And like, I think I like the way the community in the HBase project acts. You know, I mean, in general, we're pretty good at getting to a consensus. We are, we like each other as people for the most part. I'm most of the days, right? I don't mean that in any bad way. Yeah, I know. You know, like I'll go have a beer with half, like most of the guys were in San Francisco. I was just being crazy, just no problem, just joking. But like, I mean, we all kind of know that like, you know, everybody has their particular use cases that they're trying to get out. And like, and sometimes they overlap and sometimes they don't. And like, you know, sometimes we have these existential questions. What is the goal of HBase? What is the HBase gonna be in a year? What's it gonna be in two years? And what are kind of certain priorities? And like, you know, Cloudera's gonna push on certain things. Facebook is gonna push on certain things. Stumble's gonna push on other things. At the end of the day, it's an open source project. It's in the open. It's out in the open and like, you know, I think at the end of the day, the things that Stumble and Facebook and Salesforce and eBay are trying to solve with it are the things that like, Cloudera has to care about because some of these folks are our customers and all these folks are in the community with us. So like, you know, if there are certain things that they're less interested in pushing on, that's important to us, we'll push on them. Like an example, one thing that's gonna come out in the 96 version of HBase, which is probably six months from now, I'm guessing, whatever, it's like, you know, we've done a big push on wire compatibility so we can get the upgrade story better. And if you went to someone that earlier talks today, this is a pain point that Facebook has mentioned. This is a pain point that, you know, we care about because of a lot of our customers are concerned about it. It's a pain point that everybody has. But at the end of the day, it's not, you know, it's a lot of work to get this done. And, you know, one of our guys, Jimmy Xiang, has kind of taken the lead and kind of pushed that and helped coordinate a lot of the other guys who are interested in working with it. You know, it's great because we're working with the Stumble guys, we're working with the Hortonworks guys even, we're working with all kinds of folks to get this, you know, set of functionality pushed through so that, you know, next year when we start upgrading from HBase, it becomes a much easier path without having to deal with downtime. Well, congratulations on all your success. It's great to know you and I wish I could stay in your office now, you have more space. And I have an office space now, so it's great year and a half to get to know you on a personal level and watch you guys grow. As a company and individuals, it's been fantastic. Cloudera is a rocket ship and you guys doing some great work. And I think you guys have managed the balance of the integrity of the business and also the performance of pushing the code out in a good way. So congratulations. High availability tools, making HBase recover faster. It's great stuff. Thanks for coming on theCUBE. Appreciate it. And we'll see you again soon. So we'll be right back with our next guest right after this short break.