 Live from the wigwam in Phoenix, Arizona, it's theCUBE, covering Data Platforms 2017, brought to you by Cubol. Hey, welcome back everybody to the historic wigwam and just outside of Phoenix, Arizona, 99 years young and we're happy to be here. It's 108 degrees outside, so we're happy we're inside as well, so we're at Data Platforms 2017, joined all day by George Gilbert, but we're excited to have the two co-founders of Cobol who are sponsoring this event and we have Ashish Tusu, co-founder and CEO, and Joy Deep Sinsarma, co-founder and head of Cobol India. So gentlemen, welcome. Thank you, glad to be here. Pretty good turnout in the middle of the busiest conference season of the year to get a full house. I think there's got 200 some odd people here listening to the way you guys have approached the big data problem, which is very, very different than kind of what we've seen in the past. Yeah, yeah, this is a great conference as David said that we wanted to have the hottest conference on the planet. That's why we are in Arizona. That's why we are in Arizona, but the turnout has been great, the engagement has been great, and I think I've met a bunch of people around who have actually come and said, hey, you are opening up our eyes onto a very different perspective of what big data in the cloud means, as opposed to what we're used to. So it's all great. So it's interesting to kind of sum up your guys' vision, it's data ops, kind of a new term, I haven't heard it before, maybe it's been around, but to really have this data-centric idea, and you guys both come from Facebook, which had a data-centric model, and thinking of data not as a service provider for a bunch of companies, but really an enabler for everyone in the company to make innovation and to transform the company. Very different strategy. Yeah, we are very different, and I think once you start embracing that type of architecture for your enterprise, a lot of things happen from there. Essentially, we discovered it at Facebook, and we continue to discover that at CuboL also for a lot of our clients. Very often our clients say, hey, we deployed one particular project, but since we opened up data to so many users, there are so many other innovations that happen. Right. Serendipitously, because data was available to a lot of people. And then the other thing that you guys have done that's different is right out of the gate, you're cloud native, so it's pretty interesting. You've got Amazon here, you've got Google Cloud here, you've got Azure here, and you even have Oracle here talking about building a new database in Seattle, or I mean a new data center in Seattle, cloud in Seattle, which is where all clouds live. So being cloud first, and now hindsight's 2020, enterprises are jumping into public cloud with both feet, really positioned you in an unique way to provide a data solution that's native cloud. It's not just cloud native. I think one of the key differences for us as a company is we really have expertise in both big data and cloud. So if you look at the spectrum of vendors and providers out there, some of them are like cloud specialists, some of them are big data specialists, and I think we stand out in the market as being specialists in both of them. And that allowed us to basically dive into projects like Hadoop, like Spark, and say, how can we make these projects run better in the cloud? We have done that from day one, and we've made, gone ahead, made changes, and offered sort of unique functionality, deep, that's embedded deep within these open source software projects. And that's the secret sauce that we have, right? That's what really, in part, really makes things simple, scalable, easy to use, and provides a lot of value to customers. So we were talking earlier about, it's about a year ago when we were at the Hortonworks data summit, where we just sort of got the sense that mainstream enterprises or the large ones were kind of tossing in the towel on trying to do Hadoop or big data on-prem. Now, you guys targeted the cloud, so you had programmable infrastructure, you had, you had big data services, not software. Tell us how someone like Cloudera who could say we had many years head start, we have a big R&D team, what are the constraints on them in moving to a cloud and having this programmable elastic service offering? Yeah, there are a lot of constraints. And these are typically constraints that happen for any incumbent in a particular industry when a disruptive change happens in that industry. For our generation, the disruptive change is cloud computing versus data center computing. So the way you used to build big data platforms, we built big data platforms on-prem for Facebook for a long time. And then we looked at the cloud world and it looked very, very different. All the automation that you can do there is completely different. So re-architecting those platforms for the cloud is a very, very hard thing to do, especially once you are invested in a particular stack, there is DNA that attacks the change in the stack. So that is one. The second thing is that in the cloud, the approach has been sass for day one. And sass and automation essentially are the approaches, which now they have to retool. And the third thing is that in the cloud, the way you sell software is also very different. So there's a business model change that also happens from going from subscription, fixed subscription prices and stuff like that to much more of a usage based model, which also, these are all reasons why companies find it hard to disrupt, hard to fight disruption for the same reason that back in the day, Siebel Systems was not able to win against Salesforce. It was a similar kind of a thing, on-prem software and this is the cloud software, sass software and I think there are only a handful of companies you can count on your hand who have made that transition, maybe Microsoft is the only one who's made the transition from on-prem to cloud. So I think that's why it's going to be a big, big challenge for them. Have you been able to measure a change in the sort of the consideration period where customers are, you know, well one opening it up to you where they couldn't on-prem and then actually a change in the requirements that they're evaluating products by? Certainly, so when the infrastructure changes, the way of solving the problem changes. So when people, many of our customers have actually gone from on-prem deployments of Hadoop using one of the distros to us. The drive has been because of cloud adoption. They say, okay, you know, we are going to use the cloud because it's much more agile, it's much more in tune with the speed with which we want to do business and when they make that decision at that time, they realize that this cloud infrastructure looks very different from on-prem infrastructure, so lift and shift doesn't work. So it doesn't work that, you know, I was running some sort of a, you know, distro on my on-prem, I run the same way in the cloud. So at that time it goes very quickly from, okay, we are going to move this, you know, over. So let's consider what is the best of breed which is present in the cloud and that opens up the concentration window. You want to add something there? Yeah, I think both to this question and the previous one, you know, sometimes expectation changes can be very subtle. So let me give you an example, right? So companies like Cloudera and Horton would have sort of releases, right? And so, you know, if you look at their release cadence, they probably do releases like six months or so, right? Or something like that. Big ones, I think once a year, or Horton works once a year and then a few times a year, right? Just the small ones, yeah. So they're pushing software almost every week, right? So our ability, so these are the intangibles, right? One is, you know, what do you see in the product? The other one is, if I have a problem as a customer today and, you know, I want to get it fixed by my vendor, Qivol can make a patch, can fix it and push it almost immediately, right? And that means that change in the software development life cycle, you know, for a fast-moving SaaS business, so different from what you do for regular distributions, right? It impacts everything, you know? It impacts the way you maintain your software branches. It impacts the speed at which you can do qualification because, you know, we are an enterprise software company. It's very, very important for us to ship highly reliable software, but how do we do that week after week after week and not break our, you know, customer workloads and release any bugs? And this is a very big challenge for us. And I'll be frank, you know, this is not a trivial problem, but we, you know, that's what we solve. Another question on, we've talked about the big data platform and we've talked about data apps and autonomous management and that being possible with a cloud-first architecture. Let's talk about the class of applications. When Hadoop or the progenitor came out of the, you know, Google and then in Yahoo, mostly applications were kind of like these analytic pipelines. And HBase sort of gave us a little bit of a push towards some operational applications, but the distinction between the analytics and the operational limited the class of apps. How do you see that evolving? So I think the class, if you look at the generic or the more expanded term, big data, which includes both operational systems as well as analytical systems, there's a clear distinction in technology which applies to the operational systems and analytical systems. So I think Hadoop, Spark, Hive, Presto, all of these are all squarely in the analytical systems side and they will stay there. The architecture which is needed to assume the operational side is very, very different. And then you have the other no-SQL engines like Cassandra and Mongo and so on and so forth attacking that. I think that distinction will continue to stay. You know, just combining the two is not technically feasible because architectures are different, the requirements are different and so on and so forth. And I think that distinction will continue to stay but these systems what is happening for the Hadoop class of systems are they going more deeper and deeper into the analytical areas, into maybe analytical applications can be built on top of it. Not the usual operational applications but analytical applications are certainly which are going to be built on the system. In that case, I'm guessing the way you're saying it is rather than sort of join them in a database, yeah, plug the output of the analytical application in the form of models into systems of record. And so there's a pipeline, there's a pipeline there. Correct, correct. There are new projects also emerging all the time. So for example, Druid, is it an Apache project? I'm not sure. It's an open source project. It's fast emerging as something in the middle, you know, something that is used primarily in operational domains but also integrates well with like the Hadoop and Spark stacks that he was talking about. So you've got like streams of data, you know, operational databases like Druid sitting in the middle and then you've got these deep analytics systems like Hadoop and Spark and they all sort of, you know, work with each other to different extents to, you know, help operational use cases as well. Though the key thing there just to go back to, you know, I like this, explaining, you know, why operational stuff is different. You know, usually operational stuff just cares about like relatively recent data, you know, like how are my systems doing in the last six hours, 24 hours, one week, right? And if you get things off a little bit here and there, it's okay, you know, you can lose a little bit of data and that's just fine, right? Whereas many of the applications that are built on Hadoop and Spark, for example, are much more sort of, you know, maybe just like financial stuff or, you know, advertising reporting, you know, you can't afford to get things wrong, you know, you can't bill your clients incorrectly because, you know, Hadoop dropped a few bytes of data, right? So the requirements in terms of the business requirements are very different and that drives the differences in the software. Right, right, and then of course the other thing, just the whole pipeline, you know, data pipelines now in the real time, we heard a bunch of examples this morning and real time, I always joke, what's real time, real times in time to do something about it, depending on whether it is, it could be minutes, days, hours, or milliseconds, but to your point, it's a very different way and if you just step back a minute and go, wow, you used to make decisions based on a sample of old stuff. Now we're making decisions based on real time, all the stuff, it's a very different way to look at the world, but a much more significant problem, technically. It's a significant problem. The systems, again, the systems that do real, real time are different from the systems that do analytics, are different from the systems that do operational stuff. The solutions, though, can encompass all of them. For example, a very simple machine learning thing that happens everywhere is recommendations. Recommendations does real time scoring. As your user is coming in, for example, I'm going to a website, I get scored on a model in real time. That's a real time application. Scoring is happening in real time and it gets served in real time, but the models that are created could be created in a batch-oriented way and they could be near real time. I'm not saying every day, but they could lag for five minutes or whatever, right? Right, right. So the combination of them create a very compelling data-driven applications, but the systems themselves look a little disparate. Those different classes of apps, do you see your autonomous operations and the data ops being applicable to all of those but in different forms, just because they're different application shape? Yes, I think an autonomous data management can apply to all of those domains. The analytical domain is especially difficult because the heterogeneity of the workloads is much larger than the heterogeneity of workloads in the online systems. You know, online systems are usually the same. Workload happens many, many times and the variation is how much of it is happening in case of analytical workloads. Heterogeneity of workloads is much larger, so making autonomous data management for analytical workloads is much, much harder problem. But I also think that with the cloud, a lot of things get automated away and each of these systems specializing in each of these areas is essentially doing that. And we are, we in Kubola essentially doing it for the analytical systems, the Hadoop and Spark and Hive and Presto related systems. And along those lines, will you be leveraging a baseline analysis of a customer's operation over time or can you anonymize the operation of many customers since you're effectively running them in the cloud and get much, much smarter on a much bigger data set about what's out of baseline and what's an anomaly? Yeah. You want to take this? So in terms of the model generations, you know what we are doing, we are doing all within the customer, you know, we're not kind of going around the customers and things like that. But the models that you learn, that knowledge can be aggregated. That doesn't need to be, you know, it's like, okay, you know, if you see workloads of this feature, then these, you know. There are a few things that can work across accounts, like for example, we buy machines on the Amazon spot market. To the extent that we have a lot of customers and they are all sort of buying machines from the market, we may be able to pool together the knowledge of the bids, the auctions, you know, happening across our entire customer base. And we can use that collective knowledge to help the next customer, right? Because we may know that he's better off, he or she is better off buying X versus Y because of what we have observed in the rest of the market. There is also one of the benefits of having a lot of internal data is we may be able to understand, like Ashish said, you know, the costs of certain kinds of queries or workloads because even though every customer's data is different, but the systems they are using, whether it's Spark or Hive, things like that, you know, the runtimes are, you know, kind of similar. So we can characterize those runtimes by looking at what is the aggregated sort of the performance runtime, you know, across all of our customers and use that. So I'll give a simple example. So let's say you come in and submit a workload to KubeBool and say, hey, I'm waiting, how long is it going to take? Right? Now I may be able to use the global knowledge that I've seen across all of my accounts to help answer that question. Because at the end of the day, you are running a Hive query and, you know, it doesn't matter whether you are customer A or B or C, it's the nature of the query you have written and the amount of data that it's going to process that determines how quickly you get a response time. So yes, so you can do things, most of the things we'll do are within the customer, within a single account, but there'll be some cases where we will leverage things across the account. George, I'm going to cut you off. I know we could burn for a while. George is never going to let you guys leave the table. So first off, thanks for stopping by. We could go and go, but since we don't have time, you guys just authored a book on data ops. So we're going to have you sign it before you leave the set. Thank you. And congratulations on this show. All the buzz in the hall is very, very positive. I'm hearing sales are going up, up, up. I'm done. You're probably going to close that. But Marcy's telling me the real numbers. Yeah, she's not telling me the real numbers. Anyway, so congratulations on your success and a great event. Thank you. We'll pass that to you too. Pleasure, pleasure to be here. Absolutely. All right, it's Ashish and Joy Deep. They're the founders of Quoble. He's George Gilbert. I'm Jeff Rick. You're watching theCUBE from Data Platforms 2017. We'll be right back after this short break. Thanks for watching.