 From New York City, it's theCUBE. Covering IBM Data Science for All, brought to you by IBM. Welcome back to Data Science for All. It's a whole new game. Here, IBM's event, two-day event going on. Six o'clock tonight, the big keynote presentation on ibmgo.com, so be sure to join the festivities there. You can watch it live, streamed, all that's happening. Right now, we're live here on theCUBE. All along with Dave Vellante, I'm John Walls, and we are joined by John Thomas, who is a Distinguished Engineer and Director at IBM. John, thank you for your time. Good to see you. Same here, John. Yeah, pleasure. Thanks for being with us here. Sure. I know, in fact, you just wrote this morning about machine learning, so that's obviously very near and dear to you. Let's talk first off about IBM. Sure. Not a new concept, by any means. Now. But what is new with regard to machine learning and your work? Yeah, that's a good question, John. So, actually, I get that question a lot. Machine learning itself is not new, and companies have been doing it for decades. So, exactly what is new, right? So, I actually wrote this in a blog today, this morning. It's really three different things. I call them democratizing machine learning, operationalizing machine learning, and hybrid machine learning, right? And we can talk through each of these if you like. I mean, but I would say hybrid machine learning is probably closest to my heart. So, let me explain what that is, because that sounds fancy, right? Yeah, sorry. So. It's what we need, another hybrid something, right? No, but in reality, what it is is, let data gravity decide where your data stays, and let your performance requirements, your SLAs dictate where your machine learning models go. So, what do I mean by that? You might have sensitive data, customer data, which you want to keep on a certain platform, right? Instead of moving data off of that platform to do machine learning, bring machine learning to that platform. Whether that be the mainframe or specialized appliances or Hadoop clusters, you name it, right? Bring machine learning to where the data is. Do the training, building of the model where that is, but then have complete flexibility in terms of where you deploy that model. So as an example, you might choose to build and train your model on premises behind the firewall using very sensitive data. But the model that has been built, you may choose to deploy that into a cloud environment because you have other applications that need to consume it. That flexibility is what I mean by hybrid. Another example is, you know, especially when you get into some of the more complex machine learning, deep learning domains, you need acceleration, and there is hardware that provides that acceleration, right? So for example, GPUs provide acceleration. Well, you need to have the flexibility to train and build the models on hardware that provides that kind of acceleration. But then the model that has been built might go into, inside of a CICS mainframe transaction for sub-second scoring of a credit card transaction as to whether it's fraudulent or not, right? So this flexibility off-prem, on-prem, different platforms, this is what I mean by hybrid. What is the technical enabler to allow that to happen? Is it just a modern software architecture, microservices, containers, blah, blah, blah? Explain that. Yeah, that's a good question. And it's not, we're not, you know, it's a couple of different things. One is bringing native machine learning to these platforms themselves. So you need native machine learning on the mainframe, in the cloud, in a Hadoop cluster environment, in an appliance, right? So you need the runtimes, the libraries, the frameworks, running native on those platforms. And that is, it's not easy to do that, you know? We've got machine learning running native on ZOS, not even Linux on Z. I mean, it's native to ZOS on the mainframe. That's a very primitive level you're talking about. Yeah. You get the performance that you need to. Yes, you have the runtime environments there and then what you need is a seamless experience across all of these platforms. You need ways to export models, repositories into which you can save models, the same APIs to save models into a different repository and then consume from there. So it's a bit of engineering that IBM is doing to enable this, right? Native capabilities on the platforms, the same APIs to talk to the repositories and consume from the repositories. So the other piece of that architecture is you're talking a lot of tooling that's integrated and native. Yes. And the tooling, as you know, changes. I feel like daily there's a new tool out there and everybody gloms onto it and so the architecture has to be able to absorb those. What is the enabler there? Yeah, so you're actually bringing a very good point. There is a new language, a new framework every day, right? I mean, we all know that in the world of machine learning, Python and R and Scala, frameworks like Spark and TensorFlow, these are, they're table stakes now. You have to support all of these, scikit learning, you name it, right? So obviously you need a way to support all these frameworks on the platforms that you want to enable, right? And then you need an environment which lets you work with the tools of your choice. So you need an environment like a workbench which can allow you to work in the language, the framework that you are most comfortable with. And that's what we are doing with Data Science Experience. I don't know if you have heard of this, but Data Science Experience is our enterprise ML platform, it runs on the cloud, on-prem, on X86 machines, you can have it on a PowerAI box. So the idea here is support for a variety of open languages frameworks enabled through a collaborative workbench kind of interface. And the decision to move whether it's on-prem or in the cloud, it's a function of many things, but let's talk about those. Data, a volume is one. You can't just move your business into the cloud. It's not going to work that well. It's too expensive, but then there's others. There's governance edicts and security edicts, not that the security of the cloud is any worse, it might just be different than what your organization requires and the cloud supplier might not support that. It's different clouds, it's location, et cetera. So when you talked about the data saying being on-prem, maybe training a model, and then that model moving to the cloud, so obviously it's a lighter weight. It's not as much. It's not as much. It's not as much as the entire data. But I have a concern, I wonder if clients ask you about this, it's okay, well, it's my data, my data I'm going to keep behind my firewall, but that data trained that model. And I'm really worried that that model is now my IP that's going to seep out in the industry. What do you tell a client? Yeah, it's a fair point. And so you obviously, you know, just, you still need your security mechanisms, your access control mechanisms, your governance mechanisms. So you need governance, whether you are on the cloud or on-prem, and your encryption mechanisms, your version control mechanisms, your governance mechanisms all need to be in place, regardless of where you deploy, right? And to your question of how do you decide where the model should go? As I said earlier to John, you know, that beta gravity, SLAs, performance, security requirements, dictate where the model should go. We're talking so much about concepts, right? And theories that you have. So let's get, roll up our sleeves and get to the nitty-gritty a little bit here and talk about what are people really doing out there? You know what I'm saying? Use cases, yeah, just give us an idea for some of the, kind of the latest and greatest that you're seeing. Lots of very interesting use cases out there. So actually, a part of what IBM calls a data science elite team, you know, we go out and engage with customers on very interesting use cases, right? And we see a lot of these hybrid discussions happen as well, you know, going from, and one of the spectrum is understanding customers better. So I call this reading the customer's mind. So can you understand what is in the customer's mind and make, have an interaction with the client without asking them a bunch of questions, right? So can you look at his historical data, his browsing behavior, his purchasing behavior and have an offer that he will really love? Can you really understand him and give him a celebrity experience? So that's what's one class of use cases, right? Another class of use cases is around improving operations, improving your own internal processes. So one example is fraud detection, right? I mean, that is, you know, a hot topic these days. So how do you, as a credit card is swiped, right? No, it's just a few milliseconds before that travels through a network and hits your back and mainframe and scoring is done to as to whether this should be approved or not. Well, you need to have a prediction of how likely this is to be fraudulent or not in the span of that transaction. Here's another one. I don't know if you call help desks, you know, I sometimes call them helpless desks, right? Yeah, helpless desks. Try not to helpless desks. But you know, for pretty much every enterprise that I am talking to, you know, there is a goal to optimize their help desk, their call center, their call centers. And call center optimization is huge. So as a customer calls in, can you understand the intent of the customer? See, he may start off talking about something, but as the call progresses, the intent might change. Can you understand that? And in fact, not just understand, but predict it and intercept with something that the client will love before the conversation takes a bad turn. Let's get rid of my calls. To your calls. We're starting your calls, Don. I mean, Andrew, I know everyone's way, I don't know. I game the system and just get really mad and go, let me get you an operator. Right, right. Give me a supervisor, right? Give me a supervisor, right? Okay. You two guys, your data is special cases. This guy's pissed. We are red flying right off the top. You're not even analyzing your data. Dave, John, forget about it, you know. What about things, you know, cause we're moving so far out to the edge and now with mobile and that explosion there and sensor data being what it is and all this tremendous growth, tough to manage. It is, it really is. And I guess maybe tougher to make sense of it. So how are you helping people make sense of this so they can really filter through and find the data that matters? Yeah, I mean, there's a lot of different things rolled up into that question, right? So one is just managing those devices, those endpoints, you know, multiple thousands, tens of thousands, millions of these devices, how do you manage them? Then are you doing the processing of the data and applying ML and DL right at the edge or are you bringing the data back behind the firewall or into the cloud and then processing it there? If you are doing image detection in a car in a self-driving car, do you have the, can you allow the latency of data being shipped of an image of a pedestrian jumping in front to be shipped across the cloud for a deep learning network to process it and give you an answer? Oh, that's a pedestrian, you know? You may not have that latency, you know? So you may want to do some processing on the edge. So that is another interesting discussion, right? And you need acceleration there as well. Then another aspect is, you know, as you said, separating the signal from the noise. You know, it's just really, really coming down to, the different industries that we go into, you know, what are the signals that we understand? You know, can we build on them and can we reuse them? You know, that is an interesting discussion as well. But yeah, you're right. I mean, with the world of exploding data that we are in with all these devices, it's very important to have a systematic approach to managing your data, cataloging it, understanding where to apply ML, where to apply acceleration, governance, all of these things become important. I want to ask you about, come back to the use cases for a moment, to talk about sort of celebrity experiences, I put that in sort of a marketing category. Fraud detection's always been one of the favorite big data use cases, help desks, recommendation engines, and so forth. Let's start with the fraud detection. So about a year ago, I mean, first of all, fraud detection in the last six, seven years has just gone and gotten immensely better. No question, and it's great. However, the number of false positives about a year ago was just, it was too many. We're a small company, we buy a lot of equipment and lights and cameras and stuff. The number of false positives that I personally get was overwhelming. They've gone down dramatically in the last 12 months. Is that just a coincidence, happenstance, or is it getting better? No, it's not that the bad guys have gone down in numbers, not that at all. No, that, I know. No, I think there is a lot of sophistication in terms of the algorithms that are available now in terms of trying, so if you have tens of thousands of features that you're looking at, how do you collapse that space, and how do you do that efficiently? There are techniques that are evolving in terms of handling that kind of information. In terms of the actual algorithms, different types of neural net innovations that are happening in that space. But I think perhaps the most important one is that things that used to take weeks or days to train and test now can be done in days or minutes. The exploration that comes from GPUs, for example, allows you to test out different algorithms, different models, and say, okay, well, this is a very, this performs well enough for me to roll it out and try this out. So, it gives you a very quick cycle of innovation. The time to value is really compressed. Okay, now let's take one that's not so good. Add recommendations, the Google ads that pop up, one in a hundred are maybe relevant, if that, right? And they pop up on the screen and they're annoying. I worry that Siri's listening somehow. I talk to my wife about Israel. The next thing I know, I'm getting ads for going to Israel. Is that a coincidence? Are they listening? What's happening there? I don't know about what Google is doing. I can't comment on that, but I have no way. I don't want to comment on that. Maybe just from a technology perspective. From a technology perspective, this notion of understanding what is in the customer's mind and really getting to a customer segment of one, that is, this is top interest for many, many organizations. Regardless of which industry you are in, insurance or banking or retail doesn't matter, right? And it all comes down to the fundamental principles, but how efficiently can you do? Now, can you identify the features that have the most predictive power? So, this is a level of sophistication in terms of the feature engineering, in terms of collapsing that space of features that I just talked about. And then, how do I actually go through the data science of this? How do I do the exploratory analysis? How do I actually build and test my machine learning models quickly? Do the tools allow me to be very productive about this? Or do I spend weeks and weeks coding in lower level formats? Or do I get help? Do I get guided interfaces, visual builders, which guide me through the process, right? And then, the topic of exploration we talked about, right? These things come together. And then, couple that with cognitive APIs, so for example, speech-to-text. The word error rates have gone down dramatically now. So, as you talk on the phone, you know, there is a, with a very high accuracy, we can understand what is being talked about, right? Image recognition, the accuracy has gone up dramatically. And great custom classifiers for industry-specific topics that you want to identify in pictures. Natural language processing, natural language understanding. All of these have evolved in the last few years. And all these come together. So, machine learning is not an island. Now, all these things coming together is what makes these dramatic advancements possible. Well, John, if you've figured out anything about the past 20 minutes or so, is that Dave and I want ads delivered that matter. And we want our help desk questions answered right away. So, if you can help us with that, you're welcome back on theCUBE anytime, okay? We will try, John. That's all we want. That's all we have. You guys, your calls are still being screened. All right, John Thomas. Thank you for joining us. We appreciate that. Thank you. Our panel discussion coming up at four o'clock Eastern time, live here on theCUBE, we're in New York City. Be back in the bed.