 Okay, we're back at the Cube. Charles Zedlecki, perfect, perfect. I should know I've interviewed 20 zillion times. We're here at the HBase conference at Cloudera Charles, VP product at Cloudera, Cloudera's sponsor in the event, running the event. Doing what good community citizens do, they put on events. They started Hadoop World, passed it off now, it's been so big, they passed it on to Riley Media. And now with HBase conference, you guys are doing the same thing, really enabling the community to bring the community together. I was talking with Mike Olson, the CEO of Cloudera earlier, and I didn't get to say this to him because he had the rush up. We were talking about some other things, but you guys are doing a good job of really downplaying the suit factor, you know, the business and vendor hype. Mike stepped off stage very quickly. I noticed during your keynote, you talked for a bit, got some great messaging and positioning out there about what the view is. Very humble, stepped away and let the conversation go technical. So, talk about HBase at a high level, then we'll talk about some of the technical innovations here. What's happening here? Yeah, sure. Well, thanks for mentioning that, that's kind of you to say. I think that was our aim for our support of this conference, but then I think also Stack kept us honest, and that was also part of why this conference is a tone of sort of by developers, for developers. And I hope that we keep that as free to core as it gets bigger. HBase was something that, I can only tell you from my point of view, someone at Cloudera, HBase predates us. A lot of the work was originally done ways back by a company called Powerset, which later on got acquired by Microsoft. And essentially it was the idea of like, how do I take all of the great properties that I have with Apache Hadoop? How do I take that flexibility and scalability and that great economics? But how do I also make it so that I can work with data that updates frequently? How do I make it data that I can work with in real time? How do I allow people to bring back different aspects of schema like into their applications? And HBase has been going through a maturation phase. One of the big steps in this evolution was when Facebook decided to make it their platform of choice for a whole class of real-time applications. And I like to think that the other big milestone was when Cloudera decided to put a large investment in it. We integrated it, we made it a first-class citizen in our open-source distribution. That's been the case for over a year now. And that started to expose this system to a lot of enterprise workloads that probably would have taken a lot longer for the technology to see if it was just a plaything of the web community. What are the key things about HBase that make it such an exciting environment? Again, a lot of entrepreneurs are here, King was just talking about that. The room was packed with people. Why HBase right now? Is it real-time? Is it the fact that Hadoop does such a good job with the batch and commodity hardware out there at the hyperscale environments? Or is it all of the above? So I think that it really depends on what you're comparing it to and where you came from last. So I think that you find there are three different classes of people that are getting enthused for different reasons. On the one hand you have the people that were already part of the Hadoop community, and they're saying what if you could have all the scalability properties and all the good economic properties of HGFS, but also do random reads and writes and do millisecond response time. That just sounds like a good deal all around. So I think you've got the existing Hadoop community that's just saying, wow, this just extends the utility of the platform. I think another crowd you have is the people that have been toying around with all these different kind of new SQL databases. So HBase will get compared to whatever. Mongo, React, Cassandra, and so on. Membase is going for it, Jay. Yeah, exactly. This is a whole litany of these guys. And a lot of those other new SQL technologies were way more aggressive to promote themselves early on. I think that a lot of people got further down the path. They realized that there was some pretty fundamental limitations in the design. And HBase actually made a much more mature set of design trade-offs. And so a lot of the new SQL crowd is realizing that HBase is actually one of the best options they can choose. And then the third crowd that was really represented by guys like me is, I think I've always been excited about Apache Hadoop and the ecosystem as a potential future architecture for data management and the enterprise. That's not going to be the case unless you have apps. And HBase represents the platform that I think the most commercial applications are going to wind up getting developed against in the Hadoop ecosystem. Why is that versus the other approaches? So the kind of simplest way, and I described this a little bit in my portion of the keynote, is that if you think about it in the world that came before Hadoop, how many applications do you know that are written against file systems? Not really very many, right? Most people write against, they need to understand concepts like objects and entities and scheming. That's just the basic level of abstraction that developers need to develop all the commercial applications that ultimately wound up making things like the Oracle database, the commercial success that it is today. So it's the same thing today. If you tell a developer, go build me, you know, whatever, a mobile user application, and then you say go figure out a bunch of file offsets and go figure out an adventure on schema and reimplement all that in your application, very few developers are going to want to make that trip. And instead, with HBase, you're going to get a lot of those utilities for free, and you're at a starting point as a developer where it's feasible to start considering developing some of these commercial apps. One example I think is, you know, if I look at the ISVs that certify on our open source distribution, there's probably like half a dozen, you know, different kinds of ISVs that certify against the MapReduce API part of Core Hadoop. There's probably 40 or 50 that'll certify against HBase, right? And that's just in the near term. So that's about the same as what you'd see in I think a traditional commercial database. And software development methodologies and practices, you know, they always debate is, you know, is the application bound to the infrastructure, you decouple it. Now that we're hearing the conversation of where's the data fit, is the data part of the application. So there's different little architectures that are emerging. With emergence of, say, Flash technology and SSDs, you now have really interesting memory capabilities that are really fast. So what's happening relative to that world, the app world, is it just going to be more complicated? Is it going to be decoupled in the data? I think we're learning as we go. I think that what I observe right now is that the more of the management of data that you foist on to the application developer, the smarter your app developer has to be, the higher the bar is, the more brains that person has to have, and the narrower the range of applications that are feasible to do. So like I said, there's only a handful of applications where someone can realistically figure out how to overlay all the application concepts they need on a file system. There's a larger set that seem to be able to work on top of HBase, and then you find those other vendors, Wibbidate is one of the sponsors of this conference, where they're actually adding another layer still, which is a flexible schema management on top of HBase. And I imagine that will open up the platform to an even larger population. So I think each time, as the platform takes on more and more of this kind of work, the bar for skill is lowered, and more developers are able to build interesting applications. We're here with Michelle Bailey, who's our data scientist, and she's doing a startup with us around HBase, and we've found HBase to be very, very strong candidate for some of the data stuff in the real-time analytics, but Cloudera Manager saved us months of time, so we use Cloudera Manager to build disclosure there. So you guys take advantage of this from a business-to-product perspective, and you guys have always had that open-source contribution with Cloudera, and then you differentiate with the products. So is that continuing to be the strategy for the products on the Cloudera site now? How do you guys continue to enable the growth of HBase on a community basis, and at the same time, continue to differentiate your product? Yes, so a couple things about that. So I think that the way we've always believed in is that we're a platform company, and we've always believed that the platform needs to be open-source. So from the day that we were founded until now and forever more, Cloudera's distribution is an open-source artifact. We think that most of the great platform technologies in the past 15 years have been open-source, so it's just smart business. And then I would actually argue that even our open-source distribution is in and of itself differentiated. We think it's in terms of the feature set, the performance properties, the security properties. If you contrast that with the alternatives in the market today, it compares very well. I think the only place where it's not differentiated is we now have lots of organizations that are embedding CDH, OEMing it into other solutions. So whether that be Oracle, or Dell, or NetApp, or SGI, and they all have their own differentiation as companies, which they layer on. Yeah, and Dell's been actually stopped by. I just saw the Craig Irons, who runs the hyperscale group out here. He's going to come by later, stop on the queue, but they just announced a laptop that's going to be developer-focused, the Linux laptop. Very cool. He's going to bring that. But these are the other venues. They just come out of the hardware, so they have a differentiated software. So how does that work for you guys? Do you say take our product? Are you doing any integration with the hardware guys? Sure. Or are you just going to use them as commodity? No, no, no. I think every one of those guys, if they were here talking to you now, they would have a very cogent explanation for how they're differentiated in their own way. I think if you look at someone like Dell and the work they do with Crowbar, if you look at Oracle and what they're doing with engineers' systems, not to mention integration with the rest of the Oracle stack, if you look at NetApp and some of the ways we're trying to make it easier to save on power, space, and cooling with the way they've architected the storage aspect of Hadoop, every one of them has had some aspect of innovation to add to the exercise. And that's always what we hope for with these kinds of relationships. So now we're not interested in thinking of these as commodities. Yeah, I think they have to do their work to step up their game a little bit. Let's talk about some of the companies here. So Webe Data was up there. Sure. I've interviewed Aaron, all of his Cloudera guys. As you guys go public, there'll be a Cloudera mafia. Like they talk about PayPal mafia now, Facebook mafia hopefully. You guys will be that successful, I think you will. But Christoph spun out Webe Data. What is their value proposition? Are they just a set of libraries on top of Hadoop? What's their story? How are they positioning themselves in this ecosystem? Well, so I think that, well obviously Christoph is going to do a better job of giving his own company his elevator pitch. But what I perceive makes Webe Data interesting is that it's giving you that next higher level of abstraction on top of H-Base. It's giving you the ability to give developers and users flexible schema management tools on top of what otherwise H-Base looks like this big distributed hash table. So I think that that's going to open up H-Base to more and more kinds of applications. And that's really the main value prop. They're first repeatable use cases. They're an application layer? They're like an application developer. Like Spring? Yeah, exactly. You could say a little bit like that. Yeah, exactly. Like some aspects of Spring. Imagine carrying some assets of Spring over to the H-Base world. Right? And that's kind of roughly kind of where they fit in the stack. But the first repeatable use case that they've had some traction with is around recommendation engines. So recommendation engines have been a really popular use case for a while with Hadoop. Using retailers, ad serving, content recommendations, what have you. And what they're doing is being going to say, well what if I could give you kind of a semi-packaged framework for building these kind of recommendation engines. Don't just start with a bunch of piece parts of technology. And what if you could do real-time recommendations as well as batch? So what are some of the exciting things that you're hearing in so far in the hallways? You're obviously a keynote. You heard the keynotes from Facebook and folks out there. Some of them are guy. What are the exciting startups and companies and conversation in the hallway? What do you hear? Yeah, so I think that the thing that I'm seeing right now is people are much like where Hadoop was two years ago. People going past that first or second use case. Starting to think about how they can make HBase a standardized platform or in the organization. People are going from maybe like one interactive read-write workload to maybe a time series oriented workload. Maybe a user-facing workload. Maybe a new kind of map-reduced workload. So I think it's just about diversifying the number of use cases that people are applying HBase to in the enterprise. Charles, final parting question. What do you think about this year as the application tsunami comes on? There's a lot of work to do. What do you think is going to happen this year? What are the areas that you want to share with folks out there? The areas to develop on. So as people start prototyping, obviously there's a lot of hackathons going on these days. People are just playing with data. They're partying with data. They're hacking around. Give some guidance. Give some navigation. What's going to happen this year and what areas should people start really jamming on to start pounding some products out there? Yes, so I think a couple of things I'd say. I'd say one is right now anything to make development against this new application platform easier is going to be really popular. You've heard people talk about things like where is the ORM of HBase? Where are some of the classic tools that you have at your disposal in the database world? We need all those to exist in HBase land. So it'd be great to see more of those appear. In terms of the enterprise applications, my sense is that there's a whole class of applications that work with semi-structured and unstructured data where it used to be that people built up all the infrastructure to run those applications from scratch. No one needs to do that anymore. You can wind up re-platforming those on the Hadoop stack and save yourself a lot of time and trouble. And of course performance and capabilities expand performance, increase performance, increase capabilities. You can do good by that. Yes, so we're going to try to do our part on the performance side, all the kind of classic platform things. So we want to make it more secure, more reliable, more recoverable, more stable. It's long, but our goal as a platform company is to some extent to be boring. We want to be this really stable, sturdy, predictable open thing. So we're going to try to do as much of that work and we want people to do a lot of the off-beat, hair-brain stuff. Because in our role as a platform company, off-beat and hair-brain platform are not really things that are supposed to be said in the same sentence. Charles has just seen the VP of product at Cloudera. We've had a lot of action here in the hallways at HB's Coverage Packed House sold out. We'll be right back if it's great. Thank you.