 from Berlin, Germany. It's theCUBE, covering DataWorks Summit Europe 2018, brought to you by Hortonworks. Well hello, I'm James Kobielus and welcome to theCUBE. We are here at DataWorks Summit 2018 in Berlin, Germany. It's a great event. Hortonworks is the host. They made some great announcements. They've had partners doing the keynotes and the sessions or breakouts. IBM is one of their big partners. Speaking of IBM, we have, from IBM, we have a program manager, Piot. I'll get this right. Piot Mirza-Jewski. Your focus is on data science, machine learning, and data science experience, which is the IBM, one of the IBM products, for working data scientists to build and to train models in team data science, enterprise operational environments. So, Piot, welcome to theCUBE. I don't think we've had you before. I'd like you to, you're a program manager. I'd like you to discuss what you do for IBM. I'd like you to discuss data science experience. I know that Hortonworks is a reseller of data science experience. I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists, and others in those teams who are building and training and deploying machine learning and deep learning AI into operational applications. So, Piot, I give it to you now. Thank you. Thank you for having me here. Very excited. This is a very loaded question. And I would like to begin, before I get actually, why the partnership makes sense, I would like to begin with the two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially when you actually, well, there's this kind of perception. Like you can have a data scientist working on their Mac, you know, working on some machine learning algorithms, and they can create a recommendation engine, let's say in a two, three days time. This is because of the explosion of open source in that space. You have thousands of libraries from Python, from R, from Scala, and you know, you have access to Spark. All these various open source offerings that are enabling data scientists actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not that trivial when you have to expose this in a uniform fashion to actually various business units. Now, all this has to actually work in a private clouds, public clouds environments, on a variety of hardware, a variety of different operating systems. Now, that is not trivial. Now, when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. We have to be able to explain what the data was used. He needs to ensure- Explicable AI or explicable machine learning, yeah. That's a hot focus of or concern of enterprises everywhere, especially in a world of governance, where governance and tracking and linear GDP are so hot. Yes, you've mentioned all the right things. Now, so given those two things, there's no mobile data and ML is not easy. Why the partnership between Hortonworks and IBM makes sense? Well, you're looking at the number one industry leading big data platform in Hortonworks. Then you look at a DSX local, which I'm proud to say, and this is the first line of code, and I'm very passionate about the product, is the merge between the two. Ability to integrate them tightly together gives your data scientists secure access to data. Ability to leverage the spark that runs inside a Hortonworks cluster. Ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with the multiple technologies. Ability to actually work on your, not only smart technologies here, you're referring to frameworks like TensorFlow? Precisely, very good. Now, that part I'm going to get into very shortly. So please don't steal my thunder. Okay. Now, what I'm saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models on your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that is megabytes, gigabytes, maybe you know, you can pull it in. But in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data is right and leverage, for example, Yarn, the resource manager to distribute your workloads and actually train your models on your actually HTTP cluster. That's one of the huge value propositions. Now, mind you to say, this is all done in a secure fashion with ability to actually install DSX on the edge nodes of the HTTP clusters. As of HTTP 264, DSX has been certified to actually work with HTTP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now often happens that there are announcements but there's not much materializing after such announcement. This is not true in case of DSX and HTTP. We have had just recently, we've had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open source toolings and the various platforms. Now, you don't want to force your data scientist to actually work with just one environment. Some of them I prefer work on Spark. Some of them, like there are studio, there are statisticians, they're like are. Others like Python with the Zeppelin, I'd say Jupyter Notebook. Now, how about TensorFlow? What are you going to do with actually, we have to do the deep learning workloads when you want to use neural nets? Well, DSX does support ability to actually bring in GPU nodes and do the TensorFlow training as a sidecar approach. You can append the node, you can scale the platform horizontally and vertically and use your deep, you know, train the deep learning workloads and actually remove the sidecar out. So you should put it towards a cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists that actually code in Python and Scala or R but actually allows your business analysts to work and create models in a visual fashion. As of DSX12, you can actually, we have embedded, integrated and SPSS modeler. Redesigned and rebranded, this is an amazing technology from IBM that's been on for a while, very well established. But now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create a model in a visual fashion. And it was beautiful. Business analysts, not traditional data scientists. No traditional data scientists. That sounds equivalent to how IBM a few years back was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data mining and so forth with structured data. Go ahead. You want to steal your thunder here. I see it's the same phenomenon you bring the same capability to bring, to greatly expand the range of professionals, data professionals who can do, in this case, do machine learning hopefully as well as a professional, dedicated data scientist. Certainly. Now, what we have to also understand is that data science is actually a team sport. It's not only that it involves various stakeholders from your organization, from executive that actually gives you the business use case, to your data engineers that actually understand where your data is and can give you ground to access. They manage the Hadoop clusters, many of them. Precisely. Yes, so they manage the Hadoop clusters. They actually manage your relational databases because we have to realize that the data, not all the data is in the data lakes yet, you have legacy systems, which DSX allows you to actually connect to and integrate the data from. It also allows you to actually consume data from streaming sources. So if you had actually have a Kafka message cab and that actually streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with DSX, you can actually do prescriptive analytics as well? With the one two again, I'm going to be coming back to this one two DSX with the most recent release, we have actually added decision optimization. An industrial leading solution from IBM. Prescriptive analytics. Yes, for prescriptive analytics. So now if you have warehouses or you have a fleet of trucks or you want to optimize the flow in let's say an utility company, would it be for power, or could it be for, let's say for water? You can actually create and train prescriptive models within DSX and deploy them the same fashion as you would deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python. So we've come to XGBoost, TensorFlow, Keras, all those various aspects. Now, what's going to get really exciting that in the next two months, DSX will actually bring in natural learning, learning processing and text analysis and sentiment analysis via Wex. So Watson Explorer, it's another offering from IBM. It's called, what is the name of it? Watson Explorer, Watson Explorer, yes. So now you're going to have this collaborative master platform, extendable, extendable collaborator platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, yes, we can deploy an AWS. Yes, we can deploy an Azure on Google Cloud. Definitely, we can deploy in a software and we're very good at that. However, in majority of the cases, we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we design it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift with an IBM to realize that, yes, we do have 350,000 employees. Yes, we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and the Kubernetes that became industry standards? Bring in our studio, the Jupyter, the Zeppelin notebooks. Bring in the ability for your data science to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases. I'm not only talking about the model, I'm talking about the scripts that can go with that. Ability to actually pull the data in and allow the models to be retrained, evaluated, and actually redeployed without taking them down. Now, that's what actually becomes that, that's what is the true differentiator when it comes to DSX and all done in either your public or private cloud environments. So that's coming in the next version of DSX? Oh, outside of the way. We're almost out of time. Oh, I'm so sorry, yes. No, no, no, no, that's my job as the host. So if you could summarize where DSX is going in 30 seconds or less as a product, what's the next version is, what is it? It's going to be the one to one, one to one, and we're expecting to release at the end of June. What's going to be unique in a one to one is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both the developers and your business analysts. So essentially a platform, not only for your data scientists, but pretty much every single persona inside your organization. Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mirzyyewski of IBM. He's a program manager for DSX and for AI and data, ML, AI and data science solutions and of course a strong partnership with Hortonworks. We're here at Data Works Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone. We want to thank the host of this event, Hortonworks, for having us here. We want to thank all of our guests, all these experts for sharing their time, their busy schedules. We want to thank everybody at this event for all the fascinating conversations. The breakout's been great. The whole buzz here is exciting. GDPR is coming down and everybody's gearing up, getting ready for that. But everybody's also focused on innovative and disruptive uses of AI and machine learning and business and using tools like DSX. I'm James Gubilis for the entire CUBE team, Silicon Angle Media, wishing you all wherever you are, whenever you watch us, have a good day and thank you for watching theCUBE.