 Live from New York, it's theCUBE. Covering the IBM Machine Learning Launch Event, brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. Welcome back to New York City, everybody. This is theCUBE. We're here at the IBM Machine Learning Launch Event. Rob Thomas is here. He's a General Manager of the IBM Analytics Group. Rob, good to see you again. Dave, great to see you. Thanks for being here. Yeah, it's our pleasure. So, two years ago, IBM announced the Z platform, and a big theme was bringing analytics and transactions together. You guys are sort of extending that today, bringing machine learning. So, the news just hit three minutes ago? Yep. Yeah, take us through what you announced. This is a big day for us. The announcement is we are going to bring machine learning to private clouds. And my observation is this. You look at the world today, over 90% of the data in the world cannot be Googled. Why is that? It's because it's behind corporate firewalls. And as we've worked with clients over the last few years, sometimes they don't want to move their most sensitive data to the public cloud yet. And so, what we've done is we've taken the machine learning from IBM Watson. We've extracted that, and we're enabling that on private clouds. And we're telling clients you can get the power of machine learning across any type of data, whether it's data in a warehouse, a database, unstructured content, email, you name it. We're bringing machine learning everywhere. To your point, we were thinking about, so where do we start? And we said, well, what is the world's most valuable data? It's the data on the mainframe. It's the transactional data that runs the retailers of the world, the banks of the world, insurance companies, airlines of the world. And so we said, we're going to start there because we can show clients how they can use machine learning to unlock value in their most valuable data. And you say private cloud, of course, we're talking about the original private cloud, which is the mainframe, right? And I presume that you'll extend that to other platforms over time, is that right? Yeah, I mean, we're going to think about every place the data is managed behind a firewall, we want to enable machine learning as an ingredient. And so this is the first step, and we're going to be delivering every quarter, starting next quarter, bringing it to other platforms, other repositories, because once clients get a taste of the idea of automating analytics with machine learning, what we call continuous intelligence, it changes the way they do analytics. And so demand will be off the charts here. So it's essentially Watson ML extracted and placed on Z, is that right? And describe sort of how people are going to be using this and who's going to be using it. Sure, so Watson on the cloud today is IBM's cloud platform for artificial intelligence, cognitive computing, augmented intelligence. A component of that is machine learning. And so we're bringing that as IBM machine learning, which will run today on the mainframe and then in the future, other platforms. But let's talk about what it does. What it is, it's a single place unified model management so you can manage all your models from one place. And we've got really interesting technology that we pulled out of IBM research called CADS, which stands for the Cognitive Assistance for Data Scientist. And the idea behind CADS is you don't have to know which algorithm to choose. We're going to choose the algorithm for you. You build your model, we'll decide based on all the algorithms available on open source, what you built for yourself, what IBM's provided, what's the best way to run it. And our focus here is it's about productivity of data science and data scientist. No company has as many data scientists as they want. And so we've got to make the ones they do have vastly more productive. And so with technology like CADS, we're helping them do their job more efficiently and better. Yeah, CADS, we've talked about this in the queue before. It's like an algorithm to choose an algorithm and do the best fit. Okay, and you guys addressed some of the sort of collaboration issues at your Watson data platform announcement in last October. So talk about the personas who are asking you to give me access to mainframe data and give me a tooling that actually resides on this private cloud. It's definitely a data science persona, but we see I'd say an emerging market where it's more of the business analyst type that is saying I'd really like to get at that data, but I haven't been able to do that easily in the past. So giving them a single pane of glass if you will with something like data science experience where they can manage their models, using CADS to actually make it more productive. And then we have something called a feedback loop that's built into it, which is you build a model running on Z. As you get new data in, these are the largest transactional systems in the world. So there's data coming in every second. As you get new data in, that model is constantly updating. The model's learning from the data that's coming in and it's becoming smarter. That's the whole idea behind machine learning in the first place. And that's what we've been able to enable here. Now, you and I have talked through the years, Dave, about IBM's investment in Spark. This is one of the first, I would say, world-class applications of Spark. We announced Spark on the mainframe last year. What we're bringing with IBM Machine Learning is leveraging Spark as an execution engine on the mainframe. And so I see this as Spark is finally coming into the mainstream when you talk about Spark accessing the world's greatest transactional data. Yeah, Rob, I wonder if you can help our audience kind of squint through, compare and contrast, public cloud versus what you're offering today. Because one thing, public cloud, adding new services, machine learning seemed like one of those areas that we would add, like IBM had done with the machine learning platform. Streaming, absolutely here, mobile streaming applications absolutely happened in the public cloud. Is cost similar in private cloud? Can I get all the services? How will IBM and your customer base keep up with that pace of innovation that we've seen from IBM and others in the public cloud on-prem? Yeah, so look, my view is it's not an either or. Because when you look at this valuable data, clients want to do some of it in public cloud. They want to keep a lot of it in the system that they've built on-premise. So our job is how do we actually bridge that gap? So I see machine learning, like we've talked about, becoming much more of a hybrid capability over time. Because the data they want to move to the cloud, they should do that. Economics are great. The data of doing it on private cloud, actually the economics are tremendous as well. And so we're delivering an elastic infrastructure on private cloud as well that can scale the public cloud. So to me, it's not either or. It's about what everybody wants as cloud features. They want the elasticity. They want a queryable interface. They want the economics of cloud. And our job is to deliver that in both places. Whether it's on the public cloud, which we're doing, or on the private cloud. Yeah, one of the thought exercises I've gone through is if you follow the data and follow the applications, it's going to show you where customers are going to do things. If you look at IoT, you look at healthcare, there's lots of uses that it's going to be on-prem. It's going to be on the edge. I got to interview Walmart a couple of years ago at the IBM Edge show. And I mean, they leverage Z globally to use their sales, their enablement. And obviously they're not going to use AWS as their platform. What's the trends? What do you hear from their customers? How much of the data are there reasons why it needs to stay at the edge? It's not just compliance and governance, but it's just because that's where the data is. And I think you were saying there's just so much data on the Z-Series itself compared to in other environments. Yeah, and it's not just the mainframe, right? Let's be honest, there's massive amounts of data that still sits behind corporate firewalls. And while I believe the in destination is a lot of that will be on public cloud, what do you do now? Because you can't wait until that future arrives. And so the place, the biggest change I've seen in the market in the last year is clients are building private clouds. It's not traditional on-premise deployments. It's they're building an elastic infrastructure behind their firewall. You see it a lot in heavily regulated industries. So financial services where they're dealing with things like GDPR, you know, any type of retailer who's dealing with things like PCI compliance, heavy regulated industries are saying, we want to move there, but we got challenges to solve right now. And so our mission is we want to make data simple and accessible wherever it is on private cloud or public cloud and help clients on that journey. Okay, so carrying through on that. So you're now unlocking my access to mainframe data. Great. If I have, you know, let's say a retail example and I've got some data science, I'm building some models, I'm accessing the mainframe data. If I have data that's elsewhere in the cloud, how specifically with regard to this announcement will a practitioner execute on that? Yeah, so one is you could kind of decide one place that you want to land your data and have it be resonant. So you can do that. We have scenarios where clients are using data science experience on the cloud, but they're actually leaving the data behind the firewall. So we don't require them to move the data. So our model is one of flexibility in terms of how they want to manage their data assets, which I think is unique in terms of IBM's approach to that. Others in the market say, if you want to use our tools, you have to move your data to our cloud. Some of them even say as you click through the terms, now we own your data, now we own your insights. That's not our approach. Our view is it's your data. If you want to run the applications of the cloud, leave the data where it is, that's fine. You want to move both to the cloud, that's fine. If you want to leave both on private cloud, that's fine. We have capabilities like big SQL where we can actually federate data across public and private clouds. So we're trying to provide choice and flexibility when it comes to this. And Robin, in the context of this announcement, that example you gave would be done through APIs that allow me access to that cloud data? Is that right? Yeah, exactly, yes. So last year we announced something called Data Connect, which is basically, they give it as a bus between private and public cloud. You can leverage Data Connect to seamlessly and easily move data. It's very high speed. It uses our Aspera technology under the covers. So you can do that in recent acquisition. Rob, IBM's been very active in open source engagements and trying to help the industry sort out some of the challenges out there. Where do you see the state of the machine learning frameworks, Google of course has TensorFlow. We've seen Amazon pushing at MXNet. Is IBM supporting all of them? There's certain horses that you have a strong feeling for, what are your customers telling you? I believe in openness and choice. So with IBM machine learning, you can choose your language. You can use Scala, you can use Java, you can use Python, more to come. You can choose your framework. We're starting with Spark ML, because that's where we have our competency and that's where we see a lot of client desire. But I'm open to clients using other frameworks over time as well, so we'll start to bring that in. I think the IT industry always wants to kind of put people into a box. This is the model you should use. That's not our approach. Our approach is you can use the language, you can use the framework that you want, and through things like IBM machine learning, we give you the ability to tap this data that is your most valuable data. Yeah, the box today has just become this mosaic. And you have to provide access to all the pieces of that mosaic. One of the things that practitioners tell us is they struggle sometimes, and I wonder if you could weigh in on this, to invest either in improving the model or capturing more data. And they have limited budget, and they say, OK. And I've had people tell me, you're way better off getting more data in. I've had people say, no, no. Now with machine learning, we can advance the models. What are you seeing there? What are you advising customers in that regard? So computes become relatively cheap, which is good. Data acquisitions become relatively cheap. So my view is, go full speed ahead on both of those. The value comes from the right algorithms and the right models. That's where the value is. And so I encourage clients, even think about maybe you separate your teams, and you have one that's focused on data acquisition and how you do that, another team that's focused on model development, algorithm development. Because otherwise, if you give somebody both jobs, they both get done halfway, typically. And the value is from the right models, the right algorithms. So that's where we stress the focus. And models, to date, have been OK. But there's a lot of room for improvement. The two examples I like to use are retargeting, ad retargeting, which we all know as consumers, it's not great. You buy something and you get targeted for another week. And then fraud detection, which is actually, over the last 10 years, quite good, but there's still a lot of false positives. Where do you see IBM machine learning taking that practical use case in terms of improving those models? Yet, so why are there false positives? The issue typically comes down to the quality of data and the amount of data that you have. That's why. Let me give you an example. So one of the clients that's going to be talking at our event this afternoon is Argus, who's focused in the healthcare system. Yeah, we're going to have him on here as well. So Argus is basically, they collect data across payers, they're focused in healthcare, payers, providers, pharmacy benefit managers. And their whole mission is, how do we cost effectively serve different scenarios or different diseases, in this case, diabetes? And then how do we make sure we're getting the right care at the right time? So they've got all that data on the mainframe. They're constantly getting new data in. It could be about blood sugar levels. It could be about glucose. It could be about changes in blood pressure. Their models will get smarter over time because they've built them with IBM machine learning so that what's cost effective today may not be the most effective or cost effective solution tomorrow, but we're giving them that continuous intelligence as data comes in to do that. That is the value of machine learning. I think sometimes people miss that point. That they get, oh, it's just about making the data scientist job easier. That productivity is part of it, but it's really about the veracity of the data and that you're constantly updating your models. And the patient outcome there, I read through some of the notes earlier, is if I can essentially opt in to allow the system to adjudicate the medication or the claim. And if I do so, I can get that instantaneously or near real time as opposed to have to wait weeks and phone calls and haggling. Is that right? Did I get that right? That's right. And look, there's two dimensions. It's the cost of treatment. So you want to optimize that and then it's the effectiveness. And which one's more important? Well, they're both actually critically important. And so what we're doing with Argus is building, helping them build models where they deploy this so that they're optimizing both of those. Right, and in the case, again, back to the personas, that would be, and you guys stressed this at your announcement last October. It's the data scientist, it's the data engineer, it's the, I guess even the application developer, right? Involved in that type of collaboration. My hope would be over time, when I talked about, we view machine learning as an ingredient across everywhere that data is, is you want to embed machine learning into any applications that are built. And at that point, you no longer need a data scientist per se. For that case, you can just have the app developer that's incorporating that. Whereas in other tough challenges, like when we just discussed, that's where you need data scientists. So think about, you need to divide and conquer the machine learning problem, where the data scientists can play, the business analysts can play, the app developers can play, the data engineers can play, and that's what we're enabling. And how does streaming fit in? We talked earlier about this sort of batch, interactive, and now you have this continuous sort of workload. How does, how does streaming fit? So we use streaming in a few ways. One is very high speed data ingest. It's a good way to get data into the cloud. We also can do analytics on the fly. So a lot of our use cases around streaming where we actually build analytical models into the streaming engine so that you're doing analytics on the fly. And so I view that as, it's a different side of the same coin. It's kind of, based on your use case, how fast you're ingesting data, if you're, you know, submittal second response times, you constantly have data coming in, you need something like a streaming engine to do that. And you can essentially consolidate in that data pipeline is what you describe, and which is big in terms of simplifying the complexity, this mosaic of Hadoop, for example. And that's a big, big value proposition of Spark. All right, we'll give you the last word. You got a audience outside, big announcement today, you know, final thoughts. You know, we've talked about machine learning for a long time. I'll give you an analogy. So 1896, Charles Brady King is the first person to drive an automobile down the street in Detroit. It was 20 years later before Henry Ford actually turned it from a novelty into mass appeal. So it was like a 20 year incubation period where you could actually automate it. You could make it more cost effective. You could make it simpler and easy. I feel like we're kind of in the same thing here, where the data era, in my mind, began around the turn of the century. Companies came on the internet, started to collect a lot more data. It's taken us a while to get to the point where we could actually make this really easy and to do it at scale. And people have been wanting to do machine learning for years, it starts today. So we're excited about that. Yeah, and you saw the same thing with the steam engine. It was decades before it actually was perfected and now the timeframe in our industry is compressed to years, sometimes months. Exactly. All right, Rob, thanks very much for coming on theCUBE. Good luck with the announcement today. Thank you. Good to see you again. Thank you guys. All right, keep it right there, everybody. We'll be back with our next guest. We're live from the Waldorf Astoria, the IBM Machine Learning launch event. Right back.