 Live from San Jose, California, in the heart of Silicon Valley, it's the Cube. Covering Hadoop Summit 2016, brought to you by Hortonworks. Here's your host, George Gilbert. Good afternoon. This is George Gilbert. We're back at Hadoop Summit in the San Jose Convention Center. We are very pleased to have with us George Chow, who's CTO of Simba. And I am privileged to call him one of my guru friends because he has such perspective on so many things going on in anything related to databases, Hadoop, and for that matter, data. Anyway, George, so good to have you. Let's talk about SQL on Hadoop. You were sort of present at the creation, I think it was four years ago in Strata in New York where the Impala database first was introduced. Tell us how the conversation about managing data shifted after that. Yes. Those of you who remember Cloudera back in 2012, basically, Impala was announced at the, I think, was the Fall Strata. And it was, I think, those in the know obviously was aware that there was Impala, but I think the announcement of Impala for the marketplace was very important because it demonstrated that SQL was very relevant. What I saw was that from that moment, that shift into 2013, the whole marketplace was much more open to SQL. Basically, after the announcement of Impala, it seemed like the entire marketplace decided that SQL was good again. You mean because for several years before that we'd been on this NoSQL, I don't want to call it tangent, but that was the way we were, at least for our scale, we were sort of retreating from SQL. Yes, for a number of years, I mean with NoSQL's rise, I think with HBase, with Hadoop, a lot of pundits and a lot of vendors and maybe we're just thinking that we're over that. And for a while, I mean, when we were trying to sell the SDK technology and we're trying to interest people in building connectors, what we found was many people were just saying, not needed. And so the launch of Cloudera's Impala was a great thing because it turned the message back onto SQL. So Oracle's used to telling everyone that it takes decades to build a highly performant and tuned and stable database and for the most part, it's been true between them, IBM and Microsoft and Postgres now coming along. What changed that we have so many, at least MPP decision support databases first on their own and now on Hadoop? I think what's changed really is I think the interest in doing it in the open source, I can say model. Because if you look at it, when you go down the list of all the various engines that we have today, whether you're talking about Hive, Impala, Drill, Phoenix, you can go down the list. The one hallmark that everyone is going for is that they want to build these in the open so that it's available and it's an open core tech that everybody has access to, whether you're a vendor, whether you're a customer, whether you're a partner. But if doing a really good query optimizer takes 10 years, is it the fact that all of these vendors have access to a common pool of intellectual property? Is that sort of why the innovation has accelerated? Oh, I think it's just a matter of time. In that case, I mean, I don't recall whether anybody's done any measurement against the pace of innovation in the open or in the proprietary marketplace, but I think the number of decades to get an optimizer right I think just reflects literally the amount of time that it takes to build something, get it out into production, test and validate it. And in some sense, I don't see it as that's significantly different between the proprietary versus the open source marketplace. So I think that's the part where I don't recall a technology where there's been that AB comparison where people have done it once in the proprietary and done it once in the open source and were able to measure and say, hey, you know, we did it faster here by whatever, X percent. So, okay, for starting in, I guess, 2012, we saw an acceleration of these decision support SQL engines and it sort of revalidated that SQL was goodness for getting app data. But now we've seen a resurgence in transactional databases built on SQL that also scale out. So it's like, you know, we sort of took a detour and now we're back to the very heart of demanding systems. Why did that happen? I think it's pretty obvious that at the end of the day, whether you're enterprise large or small, you need that transactional system. And in this era, you wanted to be actually a good scale of technology that you could use and you can count on. The failings, I think, of the last generation was that, you know, at a certain point you actually had to give up and you have to turn around and build an analytical system. You have to build a secondary system that actually you use for analytical purposes. Today, I mean, I think what everyone's trying to do now is to merit that to workflow and say, you know, you don't have to operate two systems. You can actually run one system which suffices for your day to day transactional as well as be able to deliver, you know, analyticals. And who is farthest along on doing that to marry the best of both? Because my impression is it's easier to do the scale out decision support analytical workloads than it is to do the transactional and then to actually combine them is even harder. So who's farthest along and why are we making such fast progress? Well, it's actually hard to say who's fastest right now. I mean, the ones that stick in my mind, you know, as really kind of like taking this tack, you know, I think I would count like, you know, Spanner as one of them, Presto as another. Spanner from Google, Presto from Facebook. Facebook Teradata. Yeah, Teradata. And then Dylan, that comes to mind because I was just talking to John, a splice machine. These are the three that I think have came very clearly to declare, you know, transactional and analytical is the target. Okay. That's, I guess, if we're here, if we're here, if we come back in three years, what do you think the database market's going to look like? It's going to look very, very interesting. I think because that converge workload, I think it's still relatively early. If you take a look at it, none of them, if you want to count those three, let's say, you know, none of them have really made it big in terms of a lot of mindshare and a lot of customer they can talk to. I mean, even the case of Spanner, I think in that case, you know, Google hasn't, I think hasn't made much of it in terms of... Opening it up. Yeah, opening it up just yet. And certainly we haven't seen much, so... All right, so, well, that's a placeholder for our next conversation on theCUBE to do a benchmark and see how much we've moved. A lot has transpired in the last few years. Yes, it's going to be exciting in the next couple of years, so... All right, and you'll be ringside with us in evaluating how we're doing. This is George Gilbert. We're at Hadoop Summit in San Jose at the convention center. We'll be back shortly after this. Thanks.