 Okay, we're back in San Francisco, the HBase conference. This is theCUBE, our flagship telecast. We go out to the events and find the stories. And today we are in Las Vegas with EMC World on channel one. And this is our first channel two. We're here in San Francisco for the inaugural historic first ever HBase conference where the alpha leaders and geeks are here talking about the innovations around HBase on top of Hadoop. And I'm here with Christophe Vasili who's the co-founder of Webidata. He's also the co-founder of Cloudera, the pioneer in the space. You worked at Google before that. Christophe, you're a player in the space. You're known, you're a celebrity I guess, and in the small circles it's now rising up to be a massive trend. Welcome to theCUBE. We had Aaron on at Hadoop World. Thank you. You're quite the character and you guys have a great reputation. You're funded now. Got some seed angel funding and some early venture funding. Webidata is really being talked about. You gave the keynote speech up on stage today. Kicking off this technical conference, sold out, huge demand. Tell us what is going on with HBase? Why is HBase so important and why is it so important right now? So I think there's a couple things that are contributing to this. So Hadoop has been around for a few years now and it's enabled us to do a lot of new things that are great. But a lot of these have been fairly offline and batch-oriented workloads. And what HBase is doing is providing this bridge to allow us to build and develop real-time applications in a way that simply wasn't possible before when you're looking at HDFS and sort of the batch-oriented map reuse processing that goes over that. So it's an exciting time where we can take advantage of many of the opportunities and the technical advantages that systems like Hadoop bring to the table but have a bridge basically for real-time interaction in a way that is really necessary for these large-scale applications. So the phenomenon in today's world as we go back to 2007, the interest in the iPhone created this whole notion of, wow, mobile and cloud started to grow behind that. But now everyone wants real-time data. And there's been databases out there and people in New York City say, hey, we've been doing big data for a long time. It's called financial trading systems. But that's old legacy kind of mindsets. So today the world's a little bit different, right? So the methodologies are changing, the heuristics are changing, the outcomes are changing. What's different about the environment today relative to those legacy environments who think they may have something relevant? Or quite frankly, they might have to retool. I mean, there's a lot going on there. So what's your view on that whole old-school market? Yeah, I mean, so with the caveat that these old-school folks that have been doing big data have been doing it since maybe myself and my co-founder and some of our engineers were in diapers. But what I think though is really sort of really fundamentally different is with systems that are sort of built on top of Hadoop and systems like HBase, when you design a traditional application that's built on top of a relational system, like going through that process of designing the schema upfront is, it's a big part of the process and it's really putting a stake in the ground of what these applications are gonna do and what its capabilities are. And updating that and changing that is oftentimes really challenging. To turn that upside down, I would say when you build applications on top of Hadoop and HBase, you almost assume from the beginning that the application requirements, the data sources, the way that you consume and export that data is gonna evolve over time. And systems like Hadoop give you that sort of flexibility to go beyond a traditional rigid schema modeling system to one that is flexible and diverse and can evolve as you start to do more things with the data. So for example, if you look at systems like recommendation engines it's not enough to just take all the data that you have and develop profiles and develop recommendations to really make this compelling. You need to then track how all those recommendations perform and then incorporate that performance into your models and sort of really get an iterative loop sort of creating and we're creating this whole new class of responsibilities for people that, and forgive me if this is a overused and ambiguous word but this emerging class of the data scientist really has a role now in application development whether that's sort of guiding the technology or guiding the roadmap that is very new and exciting. So the application development market and applications in general has changed over the years and we are actually broadcasting live at EMC world and they're talking about applications. We're at SAP Sapphire yesterday and they're all talking about mobile and analytics and applications. The whole consumerization trend into that world is changing so the requirements to build apps faster are more and more critical. And that says, you know, I don't think there's any debate upon that. The issue now is the role of data. Data, proprietary data, either to the application and then external data that you might want to bring into that application or use or not use at any given time is a fundamental philosophy. How do you see that world evolve and you see more of that, less of that, you see more of purpose built? I think that organizations or applications that don't sort of inherently acknowledge data as one of their core assets and features are gonna be at a disadvantage. And one of the examples that I use a lot is that we are, as consumers now in this digitally connected age, we have a fire hose of information coming at us all the time. Like if I tried to read like every tweet that I'm following or whatever, like I would never get anything done if I tried to look at every status update or all of this. So what's gonna really sort of create distinction in terms of the kinds of applications that I enjoy using are the ones that get to know me and they use the data that they have about the way I've interacted, maybe the way I've interacted on other services, maybe the way that I've interacted with their service in the past and use this to deliver a much more personalized and relevant experience. And why I'm so excited about this is it's systems like HBase that really allow us to do this effectively for Hadoop. I mean, if I go rewind back to our days at Google, we've been writing sort of map reduces and doing things with distributed file systems for even before Bigtable existed. And what was so cool at Google is when Bigtable came out, that was when we went beyond use cases like let's download the entire internet and index it. I mean, that works really great with just the distributed file system and MapReduce. But when we started saying things like, okay, well, how do we give me and you different search results based on how we've searched in the past? We couldn't even really fathom that until we had Bigtable. And also just like the computation model. I mean, Bigtable or the analogous HBase allows us to address information in much more manageable chunks and access that quickly. And so you can have one row for every single user or consumer of your product and then you can have thousands of columns which represent maybe facts you've observed about them, queries they've given, searches they've done on your site, profiles you've derived from them, recommendations that you've made for them using algorithm A, B and C and then you can serve those differently and store here which one they clicked on so you can see which algorithm's performing better and you get these living, breathing applications that are sort of, that evolve fundamentally over time. And it's really exciting. You know, we used to talk to where it's activity stream, streaming of data. You know, no pun intended but it's a fluid environment relative to the data you have not only massive amounts of volume and velocity of data coming in that could create corporas says we have new data coming in. So folks I talked to in the CS world are constantly excited by the challenge of how do I run data mining like algorithms in real time at the same time handling a massive ingestion of data? So in a way, Karteck was kind of talking about that today in his speech around, hey, you know, we have replication but still stuff's changing so fast. HBase seems to be a good fit for these kinds of environments. Do you agree with that statement and can you add some color to that if you do? I mean, I definitely do think that HBase is going to allow us to build sort of applications that have a real time interaction model in a way that we fundamentally couldn't before. And I also agree with you that when you look at some of the more exciting things that HBase enables it's the sort of application of these sort of machine learning models whether they're, that's a really, really broad term and that could mean a lot of different things with a lot of different people. But it very often takes the form of sort of some, making an experiment personalized, targeting more relevant content or ads and systems like this and that is the bread and butter of the kind of stuff that we did at Google with it and it is very much in line with what we see the thought leaders today doing with HBase. And so I'm really excited and very optimistic and when you look at sort of the big users and investors in Hadoop, I mean, I just want to echo something that Mike said earlier many, many of them are using HBase. Like the biggest investments in Hadoop also are bringing HBase along. And so that's exciting. That's because it's a clean sheet of paper for them is it because it's a better use case than some other approaches? I mean, I think there's a couple reasons. I think that one, a highly distributed, large scale key value store is a necessary piece of infrastructure for this next generation of applications and whether you use HBase or Mongo or React or insert Cassandra, insert your favorite distributed key value store here, what makes HBase unique is it's very deep integration with Hadoop and so it's not just HBase, you get MapReduce and you get GFS and you get Hive and you get Pig and you get all this stuff that kind of, you get this ecosystem around it. And it is, to be honest, the one that most closely mimics the data model that Google sort of developed on a big table and at least we have more data on how that story goes over time. A lot of times I see sort of new things come out in the HBase ecosystem and I'm like, how I remember when we got that. And therefore I can make a guess about what the next two or three things we're going to get are and it tracks pretty well. And the community is solid too. I mean, you have some real high-end examples, Facebook and among others, large scale, web scale companies using it and the community is adding new innovation and there's just a shout out to Aaron Kimball who was on theCUBE at Hadoop World last year. He was on, it's fantastic. We talked a lot about that and the conversation with Aaron was and at the time you guys got a lot of good props from the community for doing work on top of HBase that was hard, libraries, et cetera. So we were talking about that application developer model where it's kind of hard. I mean, it's not trivial to slap on an API, I got a program job, I got to do a lot of things in there, it's complex. So you guys are feeling that now where you're seeing developers work with you guys or you guys are sharing some of your platform. Is that continuing to be the case for you guys and what's the update on that? Well, I don't know if I have sort of a specific sort of customer update plan. I mean, what I will say is- But the value of your, you're sharing that, that your code- Yeah, I mean, we license our technology to our sort of early customers. I would describe it as a closed beta right now because we're still learning a lot. We have a lot of perspective through these experiences of sort of like what it means to sort of build like a really solid application framework on top of HBase and we've really sort of focused ourselves on people that are looking at personalization and recommendation style use cases because they have a lot of common themes technically that allow us to develop technology that provides sort of real value that is scalable and sort of appropriate to the size of team that we are right now. We don't want to come in and be like, okay, we're going to turn the entire world upside down all at once with a dozen people. Like no, it's, we want to do an iterative process where we understand what the real challenges customers are that are facing them and develop technology to address that. Yeah, and that's an opportunity. I really think that's a good strategy and I think the people I talk to say to me, you know, hey, I want it to be as easy to program as possible without being a DevOps guru and doing all the inner kind of tweaks with it. That brings up my next question about DevOps. So we had Mr. Gray on earlier, Jonathan talking about DevOps, obviously Facebook. I mean, you have an application track here at the HBase conference and you have the operations track. So really that's ops, right? So DevOps. I mean, the whole fact that it's called DevOps now is kind of interesting. I mean, it used to be like you had your sysadvance and you had your developers, but when you look at sort of systems like Hadoop, like you really need either operators that have some development experience or developers that have some operations experience. Anyway, I think. Well, that's consistent with seeing that in large enterprises, we've been doing a lot of coverage on Silicon Angle around the consumerization of IT where, you know, essentially cloud is essentially code word for outsourcing for these large companies. So what they're doing is they're going into hybrid cloud environments, which is essentially data center extensions of their existing proprietary infrastructure and that is actually outsourced. So what they have to do there is deal with things like ops in someone else's hands and there's some application development involved. So it's a really growing area and it's interesting because the DevOps mindset is the developer's mindset is driving the operations. Yet you got another class of people who say no, it's ops dev the other way around. So you know, there's always been that conflict. So what else can you share with the folks out there around Ruby data right now? Can you talk a little bit about the company and where you're at and what you're looking for type of engagements and what you're working on? Sure, I mean, what I would say we're focused on right now is helping people compressing that time that it takes to sort of build a personalization recommendation style application on top of Hadoop. So like right now, if you were to start kind of with raw Hadoop and HBase, you look at sort of like the best of breeds sort of solutions here, the best of breed applications here. We're talking about development cycles that range between nine and 18 months depending on sort of what your team looks like. And what we're doing for our customers right now is we're taking that and we're dramatically compressing that and we're saying that okay, if what you're trying to do is build something like a recommendation engine, we might get you 90% of the way there to start. Now the reality is you're gonna know things about your data and you're gonna have domain expertise that would be silly of us to presume that we can bring to the table. I mean, you've been working with your business, your data, like you're gonna know more about your problem than we ever will. But I would argue that if we turned that upside down, we're very lucky to have a team of people with a lot of experience building sort of recommendation engines. Like our first engineer was the tech lead for Google's personalization recommendation team. My co-founder was Aaron Kimball, Cloudera's very first engineer. We've got unique perspective on the kind of operational realities of doing this. You're the co-founder, Cloudera. Yeah, I mean, but I don't know why anyone keeps me around. I feel very privileged to have like a real, you know, incredible technical team to back me up. But no, and really it's really about sort of learning what we can do to make developing applications on top of Hadoop and HBase. More like developing applications sort of on top of traditional application development frameworks and trying to be cognizant of the fact that we can't do things exactly the same way, but we also don't wanna throw the baby out with the bathwater. There are sort of many really good paradigms and best practices that we see in existing application development frameworks that might use relational databases for backend stores. And so how do we sort of take what makes sense from that, combine that with what we've learned in our past lives, doing this at huge scale, and also looking at what the enterprise is currently getting out of systems like Hadoop and HBase and what they need to get out of that going forward and trying really hard to come up with something sane in the middle. Well, you've been a little humble. I think you have a lot of chops and I think the perspective is needed out there right now. Michelle Bailey, who's our data scientist and I were talking about what's going on in the organic or the community of HBase and Hadoop around entrepreneurship and the new startups that are coming out and you're now one of them and growing fast, compared to the legacy in big guys out there, IBM, HBase, and the EMCs, where they don't really have that product yet built and they're just kind of putting their toe in the water. They're wrapping around kind of like a lot of messaging around it, but in the community of HBase and Hadoop, performance and capability is job one. Everyone's pushing the envelope on performance and capability. So that being said, the community's growing and then you're a big figurehead in that. The question is, what would you share with the folks out there that are entrepreneurs and developers who want to get into the community and do some good work? What areas could you suggest that they could really turn the heat up and add more performance, more capabilities? Whether it's an app, whether it's infrastructure, DevOps, can you share your perspective on that? Yeah, I think it's easier to talk about it in terms of if someone wants to contribute to HBase, what is sort of the best way to do that? Now, whether you can build a viable business or not around that, it's a separate question and so let me address the first one. I think some of the places where it would be really awesome to have increased functionality or things like a generalizable sort of secondary indexing system. A lot of the difference between a system like Hadoop and an HBase and a relational database system is we don't really have indexes. We can't do queries like show me all users who exhibited this trait. That's something that you need an index for. Now, you can say show me everything I know about this one user, that's really easy. That works really well for a lot of workloads. So that's one point. On the offerability side, I would really encourage developers to do a better job of exposing internal metrics from HBase. Existing management and alerting technology can make sense of that. Whether that's things like Clutter Enterprise or the other sort of enterprise solutions from Hadoop vendors, whether that's things for the big boys like IBM and EMC that wanna hook into products like Tivoli and so you can manage HBase more like you could something else. And I would say auto-tuning as well would be really helpful. So automatically setting region sizes, automatically managing read-write caches, doing more same things around compaction and flushing. Yeah, I mean, I could go on, but there's a... Point is, there's a ton of room to innovate. Entrepreneurs out there, whether you're contributing to the community, just go play there. But if you wanna go do something, kick the tires and see if you can make business. There's plenty of automation, configuration management, things like that. Yeah, and I think ultimately, I mean, the other thing that's sort of undeniable about what we're seeing here today is they're as HBase sort of matures as a platform. And I cautious to use the word, choosing mature somewhat purposefully in that it's a great platform today for the people that are using it. And I think that with further investment, it's gonna be a better platform for a wider group of people. As that adoption penetrates, it's gonna become easier and easier to develop sort of holistic applications that kind of offer end-to-end solutions and make more assumptions about Hadoop and HBase being in the enterprise. Like today, like what? Maybe there are hundreds of enterprises that have Hadoop and HBase kind of stood up and there. In a year or three, that number will probably be thousands. And it won't be long after that before it's tens of thousands. So I feel reasonably confident as an entrepreneur betting on this ecosystem continuing to expand. Now, if that turned out to be a bad bet, you know, like, we'll see. But it's- It's looking good. You know, when we decided to do sort of, you know, we be one of the things that we decided is, well, do we want to double down on this Hadoop thing or do we want to diversify and maybe go do something completely different? And, well, we doubled down because it's hard to see anything else. It's a resource issue too, right? You've got brain capacity and you've got people constraints and dollars, right? Investment and all that stuff. We're here with Kristoff, the co-founder of Webidata, co-founder of CloudEra, worked at Google on some real seminal work around the space of big data. Great to have you on theCUBE on SiliconANGLE.tv. Obviously, we're expanding our coverage with more writing. We have done the original Hadoop World, Hadoop World 1, 2, Strata, Hadoop Summit's coming up. Hope to see you guys on there. My final parting question to you is, look out of the next couple of years, shoot the arrow forward in your mind's eye. And they're not going to hold anything to you. It's just more of your personal perspective. How is this going to evolve in your minds? And if the stars line up and things go the way they're going, what's going to happen in the next, you know, three to five years in this industry? Take a shot. I really look forward to a world where you have people with a diverse set of sort of capabilities all engaging and collaborating around the same data. So like right now, we've got a lot of people that are interacting with this data, better Java programmers or things like that. But what is it, to me, I get really excited when, sure you can use Java to interact with the data, or you could use Python, or you could use R, or you can use Excel, or you can use, and this is all sort of collaborating around the same data assets. And I think as we look at sort of systems like HBase that give us that real-time access to small subsets of data, when we think harder about what it means to model and represent that data in a way that is flexible and can evolve over time, I think a huge opportunity exists in getting people that maybe aren't programmers able to sit down with a programmer and say, hey, look what I learned. And that gets really exciting. It's a whole other application, data science, et cetera. We're here inside theCUBE at HBase Conference, the first inaugural HBase Conference in San Francisco. I'm John Furrier. We'll be right back with our next guest after this break.