 Live from New York, it's theCUBE. Covering the IBM Machine Learning Launch Event, brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. All right, we're back, John Francois. Pujet is here, he's the Distinguished Engineer for Machine Learning and Optimization at IBM Analytics, CUBE alum. Good to see you again. Yes. Thanks very much for coming on. Big day for you guys. It's like giving birth every time you get one of these products. So, we saw you a little bit in the analyst meeting, pretty well attended. Give us the highlights from your standpoint. What are the key things that we should be focused on in this announcement? For most people, machine learning equals machine learning algorithms. Algorithms, when you look at newspapers or blogs, social media, it's all about algorithms. Our view is that, sure, you need algorithms for machine learning, but you need steps before you run algorithms and after. So, before you need to get data, to transform it to make it usable for machine learning. And then you run algorithms, these produce models, and then you need to move your models into a production environment. So, for instance, you use an algorithm to learn from past credit card transaction fraud. So, you can learn models, patterns that correspond to fraud. Then you want to use those models, those patterns, in your payment system. And moving from where you run the algorithm to the operational system is a nightmare today. So, our value is to automate what you do before you run algorithms and then what you do after. That's our differentiator. Yeah, so I've heard some folks in the cube in the past have said, years ago, actually, said, you know what, algorithms are plentiful. I think Damien made the statement, I remember my friend Abhi met it. Algorithms are free, it's what you do with them that matters, you agree with that. Exactly, that's, I believe, and not only me, that open source one for machine learning algorithms. Now, the future is with open source, clearly. But it solves only a part of the problem you're facing if you want to action machine learning. So, exactly what you said. What do you do with the results of algorithm is key. And open source people don't care much about it for good reasons. They are focusing on producing the best algorithm. We are focusing on creating value for our customers. It's different. So, in terms of, you mentioned open source a couple of times, in terms of customer choice, what's your philosophy with regard to the various tooling and platforms for open source? How do you go about selecting which to support? So, machine learning is fascinating. It's over-hyped, maybe, but it's also moving very quickly. Every year, there is a new cool stuff. Five years ago, nobody spoke about deep learning. Now, it's everywhere. Who knows what will happen next year? So, our take is to support open source, to support the top open source packages. We don't know which one will win in the future. We don't know even if one will be enough for all needs. We believe one size does not fit all. So, our take is to support a curated list of major open source. We start with Spark ML for many reasons, but we won't stop at Spark ML. Okay, I wonder if we can talk use cases. Two of my favorite, well, let's just start with fraud. Fraud has become much, much better over the past certainly 10 years, but still not perfect. Now, I don't know if perfection is achievable, but a lot of false positives. How will machine learning affect that? Can we expect as consumers even better fraud detection in more real time? Yeah, so, if we think of the full life cycle going from data to value, we will provide a better answer. So, we still use machine learning algorithms to create models. But a model does not tell you what to do. It will tell you, okay, for this credit card transaction coming, it has a high probability to be a fraud, or this one has a lower priority, probability. But then it's up to the designer of the overall application to make decisions. So, what we recommend is to use machine learning data, prediction, but not only. So, for instance, and then use maybe business rules. So, for instance, if your machine learning model tells you this is a fraud with a high probability, say 90%, and this is a customer you know very well, it's a tenure customer you know very well, then you can be confident that it's a fraud. Then if next fraud tells you, oh, this is 70% probability, but it's a customer since one week. Well, in a week, we don't know the customer. So, the confidence we can get in machine learning should be low, and there you will not reject the transaction immediately. Maybe you will enter, you don't approve it automatically, maybe you will send a one-time passcode or you enter a secondary system, but you don't reject it outright. So, really the idea is to use machine learning predictions as yet another input for making decisions. So, you're making decision informed on what you could learn from your past, but it's not replacing human decision making. Our approach with IBM, you don't see IBM speak much about artificial intelligence in general, because we don't believe we're here to replace humans, we're here to assist humans. So, we say augmented intelligence or assistance. And that's the role we see for machine learning. It will give you additional data so that you make better decisions. Right, it's not the concept that you object to, it's the term, artificial intelligence. It's really machine intelligence, it's not fake. You know, I've started my career as a PhD in artificial intelligence. I won't say when, but long enough. At that time, there were already promise that we have terminator in the next decade and this and that. And the same happened in the 60s, so I was after the 60s. And then there is an AI winter, and we have a risk here to have an AI winter because some people are just raising red flags that are not substantiated, I believe. I don't think the technology is here that we can replace human decision making altogether in any time soon. But we can help. We can certainly make some profession more efficient, more productive with machine learning. Well, I mean, having said that, there are a lot of cognitive functions that are getting replaced, maybe not by so-called artificial intelligence, but certainly by machines and automation. Yes, yes, yes. So we are automating a number of things and maybe we won't need to have people do quality check and just have an automated vision system. Detect defects, sure. So we're automating more and more, but this is not new. It has been going on for centuries. Right, but does the list evolve? So what can humans do that machines can't? And how would you expect that to change? Yeah, we're moving away from IBM machine learning, but it's interesting. You know, each time there is a capacity that a machine that we automate, we basically redefine intelligence to exclude it. So, you know, so that's what I foresee. Yeah, well, robots a while ago, Stu, couldn't climb stairs and now you look at that. Yeah, and do we feel threatened because a robot can climb a stair faster than us? Not necessarily. It doesn't bother us, right? Okay, question? Yeah, so I guess bringing it back down to the solution that we're talking about today. Now I'm doing the analytics and machine learning on the mainframe. How do we make sure that we don't overrun, you know, blow out all our mips? Yeah, so we recommend, so we are not using the mainframe base compute system. We recommend using zips. So, additional calls to not overload. So it's a very important point. We claim, okay, if you do everything on the mainframe, you can learn from operational data. You don't want to disturb and you don't want to disturb. Takes a lot of different meanings. One that you just said, you don't want to slow down your operation processing because you're going to hurt your business. But you also want to be careful. Say we have a payment system where there is a machine learning model predicting fraud probability, part of the system. You don't want a young, bright data scientist decide that he had a great idea, a great model, and he wants to push his model in production without asking anyone. So you want to control that. So that's why we insist. We are providing governance that includes a lot of things like keeping track of how models were created by from which data sets or lineage. We also want to have access control and not allow anyone to just deploy a new model because we make it easy to deploy. So we want to have role-based access and only someone with some executive, oh well, it depends on the customer, but not everybody can update the production system and we want to support that. And that's something that differentiates us from open source. Open source developers, they don't care about governance. It's not their problem, but it is our customer problem. So this solution will come with all the governance and integrity constraints you can expect from us. Can you speak to, first solution is going to be on ZOS. What's the roadmap look like and what are some of those challenges of rolling this out to other private cloud solutions? So we are going to ship this quarter IBM machine learning for Z it starts with Spark ML as a base open source. So this is interesting, but it's not all that is for machine learning. So that's how we start. We're going to add more in the future. Last week we announced we would ship Anaconda, which is a major distribution for Python ecosystem and it includes a number of machine learning open source. We announced it for next quarter. I believe in the press release it said down the road things like TensorFlow are coming, H2O. Yeah, but Anaconda will announce for next quarter. So we will leverage this when it's out. Then indeed we have a roadmap to include major open source. So major open source are the one from Anaconda, so I keep on mostly key deep learning. So TensorFlow and probably one or two additional was still discussing one that I'm very keen on. It's called XGBoost in one word. People don't speak about it in newspapers, but this is what wins all K-Girl competitions. So K-Girl is a machine learning competition site. When I say all, all that are not in matricognition competitions. And that was XGBoost. XGBoost, XGBoost, XGBoost in the, it's... X-ray Gamma, okay. Yeah, it's really a package. When I say we don't know which package will win, XGBoost was introduced a year ago or so. And, or maybe a bit more, but not so long ago. And now if you have structured data, it is the best choice today. So it's a really fast moving, but so we will support major deep learning package and major classical learning package like the one from Anaconda or XGBoost. The other thing we want, we start with Z. We announce in the analysis that we will have a power version and a private cloud meaning X86 Linux version as well. And I can't tell you when, because it's not firm, but it will come. And then public cloud as well, I guess we'll, you've got components in the public cloud today like the Watson data platform that you've extracted. Yeah, so we have extracted part of data sense experience. So we've extracted notebooks and a graphical tool called Model Builder from DSX as part of IBM machine learning now. And we're going to add more of DSX as we go. But the goal is to really share code and function across private cloud and public cloud. As Rob Thomas defined it, we want with private cloud to offer all the features and functionality of public cloud except that it would run inside the firewall. So we are really developing machine learning and Watson machine learning on a common code base. It's an internal open source project. We share code and then we ship on different platforms. I mean, you haven't just now used the word hybrid every now and then IBM does, but do you see that so-called hybrid use case as viable or do you see it more, some workloads should run on-prem, some should run in the cloud, and maybe they'll never come together? Yeah, so machine learning, you basically have two phases. One is training and the other is scoring. Right. Training, I see people moving training to cloud quite easily unless there is some regulation about data privacy. But training is a good fit for cloud because usually you need a large computing system, but only for limited time. So elasticity is great. But then deployment, if you want to score transaction in a kicks transaction, it has to run beside kicks. Not cloud. If you want to score data on an IoT gateway, you want to score at the gateway, not in a data center. So I would say that that may not be what people think first, but what will drive really the split between public cloud, private and on-prem is where you want to apply your machine learning models, where you want to score. So you know, for instance, smartwatches, they are switching to gear to fit measurement system. You want to score your health data on the watch, not in the internet somewhere. Right, and in that CICS example that you gave, you'd essentially be bringing the model to the kicks data, is that right? Yes, yes, so that's what we do. That's the value of machine learning for Z, is if you want to score transactions happening on Z, you need to be running on Z. Right, good. So it's clear, people, mainframe people, they don't want to hear about public cloud. So they will be the last one moving. They have their reasons, but they like mainframe because the data is really, really secure and private. Public cloud is a dirty word. Yes, yes, for Z users, at least that's what I was told and I could check with many people. But we know that in general, the move is for public cloud, so we want to help people depending on their journey to the cloud. You've got one of those too, so Jean-François, thanks very much for coming on theCUBE. It was really a pleasure having you back. Thank you. You're welcome. All right, keep it right there. We'll be back with our next guest. theCUBE, we're live from the Waldorf Astoria, IBM's machine learning announcement. Right back.