 Okay, we're back inside the CUBE SiliconANGLE.tv's exclusive coverage. I get excited when I have Todd Lipcon in the house here, CUBE alumni, Cloud Air employee. What's your employee number? I think 10 or 11, something like that. You're early, you're sub-20. If you're 10, that's good that you're in the top 10. Glory days now in the hundreds of employees. Welcome back. Thanks. This is a little bit different setup. Normally we do the little table, but nice to be here at this exclusive first ever H-Base conference. So you're all over the email trail. So all the stuff online, people go to a slide share, they're either looking at some of the work you've done or influenced, that's the code you've written. Take us through why, in your opinion, H-Base is so popular right now. It's just a few short years ago, which is part of Hadoop, and you've had a community that's growing, but why right now is H-Base popular? I think there's been a lot of this trend around real-time in the news and the blogosphere and all that recently. And H-Base basically brings this real-time idea on top of Hadoop. So Hadoop is great for this batch analytics, kind of back-end data warehousing kind of workloads. And then when you actually want to put a website off of it, it probably kind of falls apart if you're just using HTFest and MapReduce. The minimum MapReduce job takes about 12 seconds in the newest versions, 24 seconds in older versions, whereas the minimum H-Base query might take like two or three milliseconds. So it's a very different time scale it can work with. So batch is very attractive with HTFest and MapReduce. Don't get me wrong, batch is still attractive for a lot of workloads. Well, 26 seconds, 20 seconds not like days. Right, yeah, I mean, it's not weeks. Compared to a lot of the existing database solutions that's really fast, right? Okay, so let's back to the analytics. So that became popular. Why not other solutions? Why not other databases? I think H-Base is the one that, for one, scales really well. It actually lives up to the claims of scalability. People are running it at 800 nodes, 1,000 nodes scale, petabyte scale. And also integrates best with Hadoop because it's part of the whole Hadoop ecosystem. It's like our security infrastructure is the same security infrastructure. All our storage is on H-Base, which everyone trusts to store their data really efficiently. And a lot of it's there. It's a nice operating model from an integration standpoint. Yeah, everything kind of fits together, it works together, it's meant to work together. Whereas certainly there are a lot of other databases in the new SQL space that say, okay, we can integrate with Hadoop. But they're more like an export import kind of integrate with. They're not like a built together kind of integrate with. Okay, so now I want to ask you a personal question for the folks that know I've interviewed Todd before. And we always talk about computer science programs and since the computer science degree has been in the news lately, the Yahoo CEO claims he got a computer science degree. I actually have a computer science degree, he has one as well. But you went to Brown, right? So what are some of the schools that you see up there? I mean, you guys are hiring new people. Obviously there's the known names, but just give us the rundown in your mind's eye and your experience of the schools that are graduating the savviest computer scientist relative to this world of big data, which is Hadoop, operating systems, et cetera. Yeah. So I think it's a lot of the traditional system schools. I know Brown's number one, so we'll stay back. Brown, of course, is number one. We got, I think, eight people from Brown right now at Cladera, out of about 200 or something employees. And then other top ones that stand out are probably Stanford, Berkeley, MIT is pretty good. SUNY, Stony Brook actually is a very strong databases program. There's no one at Cladera from there currently, but there's some other pretty interesting startups out of SUNY, Stony Brook. I mentioned CMU is another top school in the database systems kind of area. Anything outside of the North, that's interesting. So SUNY, that's New York, right? Yeah, that's the University of New York. Any other programs, international you've seen? There's a couple of international schools like EPFL, it's a pretty good systems program. It's in, I wanna say, France or Switzerland. They're French speaking, but I can't remember where it crossed the border there. Someone was telling me Canada is a massive, it was, is it Toronto? Oh yeah, Waterloo is a good CS program. Waterloo, I heard Waterloo. All right, there you have it. Those are the top schools and soon to be a blog post on SiliconANGLE, I have my list. Pretty close to all those. The second, next question is, what do you think about this event? What are you hearing in the hallways? That's sold out, so it's obviously this demand for the HBASE conference. Not a surprise, but what are you hearing in the hallways? I mean, we talked earlier about the keynote from Facebook, very impressive. Sharing the data, want to articulate, fast-paced, very high quality. What are the things that you've seen that are like, wow, this is cool? I've definitely spent a lot of the morning talking with the Facebook guys and a lot of the community members who I mostly interact with online, but it's rare to get actual face-to-face with them. So like Michael Stack from AssembleUpon, Karthik who gave the keynote and talked to him for a while. In terms of other attendees, there's definitely a lot of people who have a pretty surprisingly high level of, they're already pretty advanced HBASE users. I was surprised by that. I tend to think of a lot of people being very new and excited by HBASE, but not really deploying yet. There's some pretty big deployers like multi-hundred-node clusters I hadn't heard about before, and that's just pretty cool. Cool, how about application support within the HBASE architecture? Have you seen any developments around there? Yeah, so there are a couple of other startups that have been founded recently, so Webe Data, I don't know if you've had anybody from there on that. Chris Doppelzahn, Aaron Kimballer at Hadoop World. Okay, yeah, so that's one example of a startup that's basically building an application platform on top of HBASE to make it easier to build normal apps on the infrastructure. We had VDP Finder on earlier. That's a stealth startup at Appelo Alto. I found that one on the front of VDP Finder. Who else have you seen? So Continuity is another example. We had Jonathan Gray over there. I don't know if you've spoken about that. So there are a few of these startups that are kind of starting to emerge. I imagine there are a few more that I haven't heard about yet that'll come up in the next six months or a year. Any other concepts and trends, topics and interests that are popping out that are surprised to you here that, or not surprised, is that obviously, what are the top trending items? If we were Twitter and this was like, we were harvesting these verbal tweets in the hallways, what would be the trending items? I think Eric Sammer, I don't know if he's been on here before, he had a pretty great tweet earlier. Or I said, the atmosphere is more like a hackathon than a conference in a lot of ways. A lot of people, pretty high-engineered population, pretty low, business person population. So I've heard a lot of talks about what exactly is the semantics of the consistency when you do both put all these very detailed things, which is rare for a tech conference. This is all developers. And alpha, I would say, I'd categorize it as alpha. Yeah, a lot of the same as Cisco started. But there's people here from Europe and East Coast and some of the U.S. It's the leaders you think about, Facebook, Apple, Google, Netflix, Amazon, that's all fidelity investments. We have media companies here, so it's getting diverse. Yeah. It's pharma. Yeah, and Gap is giving a presentation, or gave a presentation earlier today, they're storing their apparel catalog in each base, which is not what you think of as like a leading edge technology, but it's pretty cool. Well, I think it's really fantastic how far you guys have come. It's been great to be part of the CloudEra ecosystem at least about being part of your office for a year and a half and watching you guys grow. So congratulations on all your work with HBase. What are you working on now? You mentioned HDFS. Could you share with the folks kind of your current passion and focus? Yeah, so currently I've been working on the HDFS High Availability Project probably for the last six months or so in earnest. That's basically eliminating the single point of failure which used to be in HDFS. So there's a single masternode and now we have a failover for it. So if that should crash or the network gets unplugged or something, it can automatically failover. Another node which takes over seamlessly. This is something that we've heard in the DevOps community where DevOps is much more of a conversation now with CloudEra's movement into more production focused environments. And the thing that I'm hearing not only here, but also consistent with what's going on in the past few months outside of this ecosystem is the production environment numbers are up, right? So overall, that eBay was on talking about it. People care more about metrics now and more people care more about monitoring and uptime. It's uptime. I mean, exactly, yeah. Shit, you can't be down. Some businesses just can't be down. A lot of that's an artifact of the real time versus batch, right? If you're in a batch environment and your dog runs once an hour, you can take a one hour maintenance window, no problem. But if you're actually, all of your website visitors are hitting HDFS and you can't really do a, even a 10 minute window is expensive. Cartridge had a good discussion about how he's handling his replication. I think he said what? Back out in recovery, within the cluster, within the data center and then outside the data center. They want to do the fastest backup first and then if the whole data center explodes then they can go to this level. I thought that was interesting and I was just coming through Michelle Bailey who's here from BDP Finder. The next data domain is out there here in this marketplace because back up in recovery, data protection. Yeah, that's come up a lot recently. It's going to be a hot issue. But I think high availability is something that we've heard like, hey, the DevOps conversation is really interesting around cloud because DevOps, you think of Node.js, you think of Hadoop, developers doing operational things. You talk to some purists, they're ops devs where it's pure ops, no dev, because they have such strict and high availability requirements that they won't even let any open source into these projects. So we're seeing those barriers come down on these verticals. So I think that's a really important area. That's one thing Cladera has been focusing on is like DevOps is great and it's pretty exciting as a developer to be part of it. We found a lot of companies out there that don't have DevOps culture. It's slow to move into that kind of thing. People are still operators or developers. In the classic enterprise out there, there's a consumerization of IT going on. But what's happening there is they're definitely the couple. Ops and developers, they're shooting arrows at each other every day. That's what we see in our customer base in a lot of cases. We want to build tooling that makes it easier for a traditional ops person to actually run Hadoop. And I have to feel like, oh crap, we have to find these DevOps unicorns that maybe only exist in San Francisco. How do we hire them? It's really hard to find them. Exactly. Well, there's a whole personnel conversation, the training and requirements to get in. With that, let me ask you guys about the community. So the community of Hadoop has grown. There's a lot of inbound migration of new personnel, new people, new personalities. And with that is always more politics and more eccentric behavior. Share with the folks out there, what's the state of the current community? In your mind's eye, have you seen it right now? Describe to them what's it like and try to compare what it was like, what's changed about it, good, bad, or whatever. Yeah, actually right now it's pretty healthy, I'd say. The Hadoop community in particular is actually gearing up for Hadoop 2.0 release. We're in the process of voting on a beta release of that this week. So figures crossed, if no one vetoes the release for any kind of like bugs or anything, we should have a release this week. All that feels pretty healthy. Our release momentum picked up a lot. We're collaborating pretty well on the open source, even though maybe competitors who are actually contributing to the same open source project. That all feels pretty good. And in terms of the overall feel, the growth is definitely evident. There are many more companies that are now contributing code than there used to be. Even these big companies like IBM are starting to put patches in. It's cool to see all these new people showing up and bringing code to the project. Well, I was just at EMC World. We're actually broadcasting live this week, so we're here and there. And EMCs love their messaging. Cloud meets big data. And they're increasing the demand and educating the enterprises around big data. Do you see that evolution around consumerization IT affecting cloud deployments around Hadoop and the architecture on the operational side? Because the big debate is private cloud, public cloud, now this new hybrid cloud, which, you know, cloud is just a code word for outsourcing, right? So... I don't know what cloud is even a code word for. You know, it's just, okay, I'm going to outsource something, right? But there's requirements involved, right? So what are the challenges from your perspective on getting Hadoop to the point where it's fully extended out into the enterprise? And right now, it's pretty much the big vendors are saying Hadoop is it, and they're already connecting this work. So we're seeing that. So what is those challenges for you to make it over? So I think the off-site cloud thing hasn't really caught on with Hadoop yet because data has so much inertia. So it's great to say I can run 1,000 machines in EC2 pretty quickly. But if all my data is in a database in my data center, then the network bandwidth between my data center and EC2 is just not fast enough to make that a reasonable proposition. Similarly, the nice thing about these cloud, public clouds is you can spin up instances and then shut them down, bring them back up. But if you have a bunch of data sitting there, you can't spin them down because you lose your data. So you end up having to keep your data elsewhere, spin them up, copy a bunch of data, copy it back, spin it down. Yeah, so you get charged. You get charged all the copies that's pretty inefficient. You end up spending half your time copying instead of processing. So we're finding the private cloud deployments like within a data center much more popular among our customers. Certainly some of the newer startups are kind of all cloud and they find it more useful. Yeah, I think that's a good observation. Thank you for sharing that. The other thing I wanted to ask you is the vendors, when you guys started, you had no competition, right? And then the big hubbub was Hortonworks, right? And I was there at that event. It was kind of fun to watch Yahoo spin that out. That's on me last year. But a lot is mellowed out because everyone else came in. HP bought Vertica. You got EMC with Green Plum. You got now SAP with HANA. You have all these competing architectures competing for a big data position, technically. So what have you been seeing? I've noticed that they've been kind of yielding to the fact that- Yeah, I mean, it's like IBM has an IT user, but IBM is also doing Hadoop, right? And EMC has Green Plum, but they're also doing Hadoop. So all these companies- Yeah, they have the- EMC back down from MapR. Right, so they've yielded to Hadoop. Gene's out of the bottle. It's unstoppable. It's trained. It's left the station. Hadoop has kind of been accepted as the platform. And people with the other platforms, they're more specialized. Maybe they can do data warehousing workloads, but then we can for specific workloads. I would, as a general purpose, be data thing. I think Hadoop is kind of it. Yeah, so what do you think? What's their technical strategy in your mind? The connectors is what? To connect with Hadoop? Is that more of just a embrace and extend philosophy? Or is it legitimate work that they're doing? I'm generalizing now IVMHP and the big guys. I think currently those other data warehouses are pretty complimentary in that Hive is great. It can do a lot of workloads. A lot of workloads faster if they're really big workloads. But it doesn't really do advanced analytics as well. It doesn't do windowing functions and other kind of advanced SQL. So those other data warehouses have a 30 year lead on Hive and some of these aspects. So I don't think they're going away next week. So I love talking to you because it's wealth of knowledge. Let's talk about the network problems. So in big data, network latency seems to be a bottleneck with virtualization and compute. I mean, whether you're doing some work in the cloud or you've got a data center with a lot of machines and multiple cores, it's a pretty heavy duty. Network seems to be the band with late problem. Yeah, so in bigger clusters network becomes an issue. On smaller, like single rack clusters, we found it's usually not. Because you can get a single top of rack switch that does 10 gigabit ethernet, pretty cheap. There are companies like Arista that have come up in the last, I guess, five or 10 years that make 10 gigabit ethernet pretty inexpensive. It gets more expensive when you start getting to like a 400-nit cluster and all of a sudden you've got 10 racks worth of machines all trying to go to some giant core switch and that gets pretty pricey. So it's these kind of bigger rollouts that are more expensive in the network become a bottleneck. How did you just handle that? Wait and see. Like some work there is compression on the wire, for example, so the map produced shuffle when it transfers data between machines will compress it. Just to try to reduce the total amount of data going through those core switches. H2F assets and replication policies are all built around this idea of trying to minimize the total amount of data going through the core switch. So we definitely built that into the software, this sort of expectation that cross-rack transfer is expensive and kind of always will be. Because whenever we get that cheap core switch for 10 gig, all of a sudden we're going to have something faster on the single racks. It's always that ratio. Yeah, and there's all this type of software networking in the CIRA, doing some work in OpenStack, and so it's like all another generation of networking comes involving. I think a lot of that there has to do more with reconfigurability and elasticity rather than just strict performance. So like I said before, it's hard to move a Hadoop cluster around your data center because it's got all this data sitting on those machines and you're not going to move them somewhere else to be too expensive, right? What's the conversation like around application developers? Because again, the conference really starts teasing out this notion that the application integration into HBase is critical. To get those real-times. And we heard Mike Olson, the CEO of Talk at Hadoop World, tsunami of applications and at least got $100 million of funds. Does the data, the data is like a very important component of the application. It used to be app and infrastructure and then that got decoupled. That worked great. Now you've got data bound into the application. You've got SSD memory. So you've got good memory now, capability is a large number about a memory. What is the data? It's bundled into the app, infrastructure is a decoupled, all of the above. So it's like an architecture re-engineering. The lines are definitely blurring. So there's this new feature in HBase called co-processors, which allow you to take some processing that used to be in your application and actually push it down to the database. So you get this sort of blurry line where well is HBase a database or an application server or maybe kind of both, right? And that hasn't really been resolved yet. The HBase committers have had these discussions meet up before and we haven't really decided like exactly where it is. I think for now we're keeping it as the game. It's almost a religious war at some level, but. Yeah, where do you want to focus the project, focus our efforts? Yeah, it's like an operating system conversation. What do you want the function to be? But it's almost as if if you can keep it so that it's flexible, you can provision that wherever you want. Right, so that was our idea with co-processors. Basically it's a plug-in point and we'll kind of put that plug-in point out there and see what people do with it. It's pretty new so we haven't seen them much yet, but. Yeah, I think the use cases will drive that. So I think the best decision is no decision. That's the point. That's what opens ours is all about, right? You get people sourced and if they contribute patches then all right, now it's done something different. Question about DevOps, okay. So DevOps is emerging as a bona fide position in this new era of developer meets, operating system developer meets operations guy. I was talking to Jonathan Gray, he's like, hey, you know, at my startup before Facebook, I did everything. I ran the ops, I did the development. Yeah. That seems to be the profile of what I call new profile developers. Someone who has chops in code, almost an operating systems kind of guy and also can run basic operations. Yeah. What's your view on that? I mean, do you see that as just a evolution just because this is a new industry? Do you think it's an actual job title? And what skill sets do you look for in your peers and say, you know, that guy's a DevOps guy? Interesting, I think it's actually, I always look for the people to try to hire in general. So I think the more you understand the stacks below you and kind of above you, the better you can do at whatever stack you're supposed to be working on. So I also previously at Eclator worked out a startup and did everything too. So I definitely identified that. And I was before the firm DevOps was coined. So whether it's like a new... So you're saying if you're a young gun out there in school, learn up and down the stack. Yeah, so like in terms of hiring from Brown in particular, I know the courses there and the people I look for, the people who did really well in the operating systems course. Because that's the one that makes you actually, you write your code, then you also have to kind of understand what's below the code. Yeah, yeah. And if you aren't comfortable jumping layers like that, then it becomes difficult later in your career maybe you're not working on a kernel anymore, trying to understand operating, you're trying to understand hardware, but you're working on Java and you want to understand what does the SDK do. So about once a week, I find myself looking at the Sun Java source code that's freely available. And I kind of want to know the layer below me, right? And similarly working on HBase, kind of look at HDFS or as an application developer, you might want to look at HBase. I think that's the kind of key differentiator. It's not exactly DevOps, but it's a familiarity with the layers kind of adjacent to you in the stack. Okay, Todd, thanks for joining us inside theCUBE. We'll talk later over a cocktail hour. Todd Lucan, the senior engineer. I don't know, your official title's gonna say senior engineer. We're all just software engineers. Software engineers. We don't have the titles though. Employee number 10 at Cloud Air. He's been here from the beginning. One of the smartest guys I know. Young Gun at Cloud Air. We've had him on theCUBE multiple times from Brown University. Big plug for you there and congratulations on your success. Any Brown students out there? We're hiring, email me. Yeah, if you're Brown, you got to go to Cloud Air at nowhere else. Actually, Stanford and MIT, you can go everywhere else. We'll take all of you. Okay, we'll be right back after our next guest. Thanks a lot.