 Live from San Jose in the heart of Silicon Valley. It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. We are wrapping up day one of coverage of DataWorks here in San Jose, California. On theCUBE, I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests for this last segment of the day. We have Sadeer Hasby, who is the Director of Product Management at Google, and Ram Ventakesh, who is VP of Engineering at Hortonworks. Ram, Sadeer, thanks so much for coming on the show. Thank you for inviting us. So I want to start off by asking you about a joint announcement that was made earlier this morning about using some Hortonworks technology deployed onto Google Cloud. Tell our viewers more. Sure, so basically what we announced is support for the Hortonworks data platform and Hortonworks data flow, HTTP and HTF, running on top of the Google Cloud platform. So this includes deep integration with Google's Cloud Storage Connector layer, as well as it's a certified distribution of HTTP to run on the Google Cloud platform. I think the key thing is a lot of our customers have been telling us they like the familiar environment of Hortonworks distribution that they've been using on-premises. And as they look at moving to Cloud, like in GCP, Google Cloud, they want a similar, familiar environment. They want a choice to deploy on-premises or go to Cloud, but they want the familiarity of what they have been already using with Hortonworks products. So this announcement actually helps customers pick and choose like whether they want to run Hortonworks distribution on-premises, they want to do it in Cloud, or they want to build these hybrid solutions where the data can reside on-premises, can move to Cloud and build these common hybrid architectures. So that's what this does. So, HTTP customers can store data in the Google Cloud. They can execute ephemeral workloads, analytic workloads, machine learning in the Google Cloud. And there is some tie-in between Hortonworks's real-time or low-latency or streaming capabilities from HDFS, HDF in the Google Cloud. Can you describe at a fairly closer to detailed level the degrees of technical integration between your two offerings here? Can I undertake that? Sure, I can do that. So essentially deep in the heart of HTTP, there is the HDFS layer that includes a Hadoop-compatible file system, which is a pluggable file system layer. So what Google has done is they've provided an implementation of this API for the Google Cloud Storage connector. So this is the GCS connector. We have taken that connector and we've actually continued to refine it to work with our workloads. And now Hortonworks is actually bundling, packaging, and making this connector be available as part of HTTP. So bilateral data movement between them? Bilateral workload movement? No, think of this as being very efficient when our workloads are running on top of GCP, when they need to get a data, they can get a data that is in the Google Cloud Storage buckets in a very, very efficient manner. So since we have fairly deep expertise on workloads like Apache Hive and Apache Spark, we've actually done work in these workloads to make sure that they can run efficiently, not just on HDFS, but also in the Cloud Storage connectors. This is a critical part of making sure that the architecture is actually optimized for the cloud. So at our scale, right, when our customers are moving their workloads from on-premise to the cloud, it's not just functional parity, but they also need sort of the operational and the cost efficiencies that they're looking for when they move to the cloud. So to do that, we need to enable these fundamental disaggregated storage pattern. See, on-prem, the big win with Hadoop was we could bring the processing to where the data was. In the cloud, we need to make sure that we work well when storage and compute are disaggregated and they're scaled drastically, independent of each other. So this is a fairly fundamental architectural change, and we want to make sure that we enable this in a first-class manner. So that's what this storage connector looks like. I think that's a key point, right? I think what cloud allows you to do is scale the storage and compute independently. And so with storing data in Google cloud storage, you can scale that horizontally and then just leverage that as your storage layer, and the compute can independently scale by itself. And what this is allowing customers of HDP and HDF is, store the data on GCP, on cloud storage, and then just use the scale, the compute side of it with HDP and HDF. So if you'll indulge me to a name another Hortonworks partner for hypothetical, let's say one of your customers is using IBM Data Science Experience to do TensorFlow modeling and training, can they then inside of GCP, HDP, 3.0, on GCP, can they use the compute infrastructure inside of GCP to do the actual modeling, which is more compute-intensive, and then the separate decoupled storage infrastructure to do the training, which is more storage-intensive. Is that a capability that would be available to your customer through this integration with Google? Yeah, so where we are going with this is, you're saying IBM DSX and other solutions that are built on top of HDP, they can transparently take advantage of the fact that they have HDP compute infrastructure to run against. So you can run your machine learning training jobs, you can run your scoring jobs, and you can have the same unmodified DSX experience whether you're running against an on-premise HDP environment or an in-cloud HDP environment. Further, that's sort of the benefit for partners and partner solutions. From a customer standpoint, the big value prop here is that customers, they're used to securing and governing their data on-prem in a particular way with HDP, with Apache Ranger, Atlas, and so forth. So when they move to the cloud, we want this experience to be seamless from a management standpoint. So from a data management standpoint, we want all of their learning from a security and governance perspective to apply when they're running in Google Cloud as well. So we've had this capability on Azure and on AWS. So with this partnership, we're announcing the same type of deep integration with GCP as well. So Hortlooks is that one pane of glass across all your cloud partners for all manner of jobs. Well, I just wanted to ask about, we've talked about the reason, the impetus for this with a customer that's more familiar for customers. It offers the seamless experience, but can you dive a little bit into the business problems that you're solving for customers here? A lot of times, the customers, our customers are at various points in their cloud journey. For some of them, it's very simple. They're like, there's a broom coming by and the data center is going away in 12 months and I need to be in the cloud. So this is where there's a wholesale movement of infrastructure from on-premise to the cloud. Others are exploring individual business use cases. So for example, one of our large customers, a travel partner, so they are exploring a new pricing model and they want to roll out this pricing model in the cloud. They have on-premise infrastructure. They know they'll have that for a while. They're spinning up new use cases in the cloud, typically for reasons of agility. So if you, typically, many of our customers, they operate large multi-tenant clusters on-prem. That's nice for sort of very scalable compute for running large jobs, but if you want to run, for example, a new version of Spark, you have to upgrade the entire cluster before you could do that. Whereas within this sort of a model, what they can say is they can bring up a new workload and that just has the specific versions and dependencies that it needs, independent of all of their other infrastructure. So this gives them agility where they can move as fast as the business system. Through the containerization of the Spark job, jobs or whatever. Correct. Okay, and so containerization as well as even spinning up an entire new environment. Yes. Because in the cloud, given that you have access to elastic compute resources, they can come and go. So your workloads are much more independent of the underlying cluster than they are on-premise. This is where sort of the core business benefits around agility, speed of deployment, things like that come to play. And also if you look at the total cost of ownership, so if you really take an example where customers are collecting all this information throughout the month and at month end you want to do closing of books. And so that's a great example where you want ephemeral workloads. So this is like do it once in a month, finish the books and close the books. That is a great scenario for cloud where you don't have to on-premises to create an infrastructure, keep it ready. So that's one example where now with the new partnership you can collect all the data throughout the on-premises if you want throughout the month, but move that and leverage cloud to go ahead and scale and do this workload and finish the books and all that's one. The second example I can give is a lot of customers are collecting, like they run their e-commerce platforms and all on-premises, let's say they're running it. They can still collect all these events through HDP that may be running on-premises with Kafka. And then what you can do is in cloud, in GCP you can deploy HDP, HDF and you can use the HDF from there for real-time stream processing. So collect all these click-stream events, use them, make decisions like, hey, which products are selling better? Should we go ahead and give how many people are looking at that product or how many people have bought it? That kind of aggregation in real-time at scale, now you can do in cloud and build these hybrid architectures that are there and enable scenarios where in past to do that kind of stuff you would have to procure hardware, deploy hardware, all of that, which all goes away. In cloud you can do that much more flexibly and just use whatever capacity you have. If federal workloads are at the heart of what many enterprise data scientists do, real-world experiments, ad hoc experiments with certain data sets, you build a TensorFlow model or maybe a modeling cafe or whatever and you deploy it out to a cluster and you know, so the life of a data scientist is often nothing but a stream of new tasks that are all ephemeral in their own right but are part of a long-running or an ongoing experimentation program that's, you know, they're building and testing assets that maybe or may not be deployed ultimately into production applications, so I can see a clear need for that capability of this announcement and lots of working data science shops in the business world. Absolutely. And I think coming down to, if you really look at the partnership right, there are two or three key areas where it's going to have a huge advantage for our customers. One is analytics at scale at a lower cost, like total cost of ownership, reducing that, running at scale analytics, this is one of the big things. Again, as I said, the hybrid scenarios, most customers, enterprise customers have huge deployments of infrastructure on-premises and that's not going to go away. Over a period of time, leveraging cloud is a priority for a lot of customers but they will be in this hybrid scenarios and what this partnership allows them to do is have these scenarios that can span across cloud and on-premises infrastructure that they're building and get business value out of all of these and then finally, we at Google believe that the world will be more and more real-time over a period of time, right? We already are seeing a lot of these real-time scenarios with IoT events coming in and people making real-time decisions and this is only going to grow and this partnership also provides the whole streaming analytics capabilities in cloud at scale for customers to build these hybrid plus also real-time streaming scenarios in with this partnership. It's clear for Google what the Hortonworks partnership gives you in this competitive space and the multi-cloud space, it gives you that ability to support hybrid cloud scenario. You're one of the premier public cloud providers that we all know about and clearly now that you've got the, you've had the Hortonworks partnership, you have that ability to support those kinds of highly hybridized deployments for your customers, many of whom I'm sure have those requirements. That's perfect, yeah, exactly right. Well, a great note to end on. Thank you so much for coming on theCUBE. Sidia Rahm, thank you so much. Thank you, thanks a lot. Thanks a lot, good to have you. I'm Rebecca Knight for James Kobielus. We will have more tomorrow from DataWorks. We will see you tomorrow. Yes. This is theCUBE signing off. Sunny San Jose. That's right.