 from San Jose in the heart of Silicon Valley. It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. Welcome back to theCUBE's live coverage of DataWorks here in sunny San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Pandit Prasad. He is the analytics product strategy and management at IBM Analytics. Thanks so much for coming on the show. Thanks, Rebecca. Glad to be here. Why don't you just start up by telling our viewers a little bit about what you do in terms of, in relationship with the Hortonworks relationship and then the other parts of your job? Sure. As you said, I am in offering management, which is also known as product management for IBM. Manage the big data portfolio from an IBM perspective. I was also, I'm also working with Hortonworks on developing this relationship and achieving that relationship. So it's been a year since we announced this partnership. We announced this partnership exactly last year at the same conference, and now it's been a year. So this year has been a journey in aligning the two portfolios together, right? So Hortonworks had HDPE, HDF. IBM also had similar products. So we have, for example, BigSQL. Hortonworks has Hive. So how Hive and BigSQL align together? IBM has a data science experience where does that come into the picture on top of HDPE? So it's been like, before this partnership, if you look into the market, it has been like, you sell Hadoop, you sell a SQL engine, you sell data science. So what this year has given us is more of a solution sell. Now with this partnership, we go to the customers and say here's an end-to-end experience for you. You start with Hadoop, you put more analytics on top of it. You then bring BigSQL for complex queries and federation virtualization stories. And then finally, you put data science on top of it. So it gives you a complete end-to-end solution, the end-to-end experience for getting the value out of the data leak. Now, IBM, a few years back, released a Watson data platform for team data science with DSX data science experiences, one of the tools for data scientists. Is Watson data platform still the core, I would call it DevOps for data science and maybe that's the wrong term, that IBM provides to market or is there a broader sort of a DevOps framework within which IBM goes to market these tools? Sure, so Watson data platform one year ago was more of a cloud platform and it had many components of it. And now we are getting a lot of components onto the on-prem world and data science experience is one part of it. So data science experience. There's Watson analytics as well for subject matter experts and so forth. Yeah, so, and again, Watson has a whole suite of services-based offering. Data science experience is more of a particular aspect focused specifically on the data science and that's been now available on-prem and now we are building this on-prem stack. So we have HDP, HDF, big SQL data science experience and we are working towards adding more and more to that portfolio. Well, you have a broader reference architecture and a stack of solutions around AI on power and so forth for more of the deep learning and development and so forth. So in your relationship with Hortonworks, are they reselling more of those tools into their customer base? I mean, to supplement instead, they already resell through with DSX or is that outside the scope of the relationship? No, it is all part of the relationship. So this has been, these three has been the core of what we announced last year and then there are other solutions like we have the whole governance solution, right? So again, it goes back to the partnership HDP brings with it Atlas. So IBM has a whole suite of governance portfolio including the governance catalog. So how do you expand the story from being a Hadoop centric story to an enterprise data like story, right? So, and then now we are taking that to the cloud and that's what Truada is all about, right? So Rob Thomas came out with a blog today morning talking about Truada and that is, if you look at it, it is nothing but government data like hosted offering if you want to simplify it. So that's one way to look at it and it caters to the GDPR requirements as well. For GDPR, for the IBM-Hortonworks partnership is the lead solution for GDPR compliance? Is it Hortonworks data steward studio or is it any number of solutions that IBM already has for data governance and curation? Or is it a combination of all of that in terms of what you as a partner has proposed to customers for, you know, Supernet's GDPR compliance. Give me a sense for- Yeah, it is a combination of all the above. So it has HDP, it has HDF, it has a big sequel, it has data science experience, it has IBM governance catalog, it has IBM data quality and it has a bunch of security products like Guardium and it has some new IBM proprietary components that are very specific towards data anonymization and how do you deal with the personal data and sensitive personal data as classified by GDPR, right? So I'm supposed to query some higher level information but I'm not allowed to query deep into the personal information. So how do you block those queries? How do you understand those? You know, these are not necessarily part of Data Stewards Studio. Yes, yes. These are some of the proprietary components that are thrown into the mix by IBM. You know, one of the requirements that's not often talked about under GDPR and Abbas Ricky of Fort Worth's got into it a little bit in his presentation. There's a notion that, or the requirement that if you are using an EU citizen's PII to drive algorithmic outcomes that they have the right to have full, they the PII subject has a right to full transparency into the algorithmic decision paths that were taken, you know, whatever it might be. Now I remember IBM had a tool under the Watson or under the Watson brand that wraps up a narrative of that sort. Is that something that IBM still, it's called Watson Curator is something a few years back. Is that a solution that IBM still offers? Because I'm not getting a sense right now that Hortonworks has a specific solution, not to say they may not be working on it, that addresses that side of GDPR. Are you know what I'm referring to there? I'm not aware of something from the Hortonworks side beyond the data storage studio which offers basically identification of what some of the data lineage as opposed to model lineage. It can identify some of the personal information and maybe provide a way to tag it and hence mask it. But the Truada offering is the one that is bringing some new research assets after the GDPR guidelines became clear. Then like we got into the effort of how do we cater to those requirements. So these are relatively new proprietary components. They are not even being productized. So that's why I'm calling as proprietary components that are going into this hosted service. IBM's got a big portfolio, so I'll understand if you guys are still working out exactly what the position is. So Rebecca, go ahead. I just wanted to ask you about this new era of GDPR. So the last Hortonworks conference was sort of before it came into effect and now we're in this new era. How would you say companies are reacting? Are they in the right space for it in the sense of they're really still understanding the ripple effects and how it's all going to play out. But how would you describe your interactions with companies in terms of how they're dealing with these new requirements? They are still trying to understand the requirements and interpret the requirements coming to terms with what that really means. So for example, I met with a customer. They have, they are a multinational company. They have data centers across different geos. And they asked me, I have somebody from Asia trying to query the data. So that the query should go to Europe. But the query processing should not happen in Asia. The query processing all should happen in Europe and only the output of the query should be sent back to Asia. Like you won't be able to think in these terms before the GDPR guidelines got issued, so now. It's exceedingly complicated. But do you coupling stories from processing enables those kinds of fairly complex scenarios for our compliance purposes? Yeah, so it's not just about the access to data. Now we are getting into even where the processing happens and where the results are getting displayed. So we are getting into. You may be severe penalties for not doing that. So your customers need to keep up. There was an announcement at this show at DataWorks 2018 of an IBM Hortonworks where solution, IBM solution, IBM hosted analytics with Hortonworks. I wonder if you can speak a little bit about that, Pandit, in terms of what's provided as a subscription service to, if you could tell us what subset IBM's analytics portfolio is hosted for Hortonworks customers. Sure, as you said it is a hosted offering. Initially we are starting that off as a base offering with three products. It will have HTTP, big SQL, IBM DB to big SQL, and DSX, data science experience. Those are the three solutions. Again, as I said, it's hosted on IBM cloud. So customers have a choice of different configurations they can choose, whether in VMs or bare metal. Again, I should say this is probably the only offering as of today that offers a bare metal configuration in the cloud. So it's for data, it's geared to data scientists, developers and machine learning models that will train, will build the models and train them in IBM cloud. But in a hosted HTTP, in IBM cloud, is that correct? Yeah, so I would refresh that a little bit, right? So there are several different offerings on the cloud today. And we can think about them as, as you said, for ad hoc or ephemeral workloads, also geared towards low cost. So you think about this offering as taking your on-prem data center experience directly onto the cloud. So it is geared towards a very high performance. So the hardware and the software, they are all configured, optimized for providing high performance, not necessarily for ad hoc workloads or ephemeral workloads. They are capable of handling massive workloads on sticky workloads, right? So it's not meant for like, I turn this massive performance computing power for like a couple of hours and then switch them off. But rather, I'm going to run this massive workloads as if it is located in my data center. That's number one. It comes with the complete set of HTTP. So if you think about it, there are currently in the cloud, you have, you know, I have the Hive and HBase as SQL engines and the storage separate. You know, the security is optional, governance is optional. And so this comes with the whole enchilada, right? So it has security and governance all baked in. It provides the option to use big SQL, right? So for, because once you get onto the Hadoop, the next experience is to I want to run complex workloads. I want to run federated queries across Hadoop as well as other data stores. How do I handle those? And then it comes with data science experience also configured for best performance and integrated together as a part of this partnership. So I mentioned earlier that we have progress towards providing the story of an NTN solution. So the next steps of that are like, yeah, I can say that it's an NTN solution, but are the products look and feel like as if they are one solution. So that's what we are getting into. And I will feature some of those integrations. So for example, big SQL IBM product, we have been working on it, baking it very closely with HDP, right? So it can be deployed through a body. It is integrated with Atlas and Ranger for security and we are improving the integrations with Atlas for governance. And so say you're building a smart, a spark machine learning model inside of DSX on HDP within IHAH, IBM hosting links with Hortemarks. On HDP, HDP 3.0, can you then containerize that machine? That's what I was going next. And then deploy it into an edge scenario? Sure. So that's what I was going to ask. First was big SQL. The next one is DSX. So DSX is integrated with HDP as well. So we can run DSX workloads on HDP before, but what we have done now is if you want to run the DSX workloads, like I want to run a Python workload, right? So I need to have Python libraries on all the nodes that I want to deploy. So suppose you are running a big cluster, 500 cluster. So I need to have Python libraries on all the 500 nodes and I need to maintain the versioning of it. If I upgrade the versions, then I need to go and upgrade, make sure all of them are perfectly aligned. In this first version, will you be able to build, say, a spark model and a TensorFlow model and so forth and containerize them and deploy them across a multi-cloud and orchestrate them with Kubernetes and to do all that magic. Well, is that a capability now or in the future within this portfolio? Yeah, we have that capability demonstrated in the pedestal today. So that is a new one, that new integration that we are around saying. So we can run virtual, we call it as virtual Python environments, right? So DSX can containerize it and run that as workloads in the HDP cluster. So now you are making use of both the data in the cluster as well as the infrastructure of the cluster itself for running the workloads. In terms of the layer stack, is it also incorporating the IBM distributed deep learning technology that you've recently announced, which I think is highly differentiated because deep learning has increasingly become a set of capabilities that are across a distributed mesh playing together as if they were one unified application. So is that a capability now in this solution or will it be in the near future? Which one? DDL distributed deep learning? No, we have not gotten that yet. I know that's on the power, AI on power platform. That's what we'll be talking about next year's conference. Yes, that's definitely on the roadmap. We are starting with the base configuration of bare metals and VM configuration. Next one is depending on how the customers react to it. Definitely we are thinking about, okay, bare metal that is optimized with GPUs, optimized for TensorFlow workloads, right? Exciting, we'll be tuned in coming months and years. Indeed, we will. I'm sure you guys will have that in there. Thank you so much for coming on theCUBE. We appreciate it. Great, very good. I'm Rebecca Knight for James Kobielus. We will have more from theCUBE's live coverage of DataWorks just after this.