 Live from downtown San Francisco, it's theCUBE. Covering IBM Chief Data Officer Strategy Summit 2018, brought to you by IBM. We're back in San Francisco. We're here at the Park 55 at the IBM Chief Data Officer Strategy Summit. You're watching theCUBE, the leader in live tech coverage. My name is Dave Vellante. And IBM's Chief Data Officer Strategy Summit, they hold them on both coasts, one in Boston and one in San Francisco. Couple times each year, about 150 Chief Data Officers coming in to learn how to apply their craft, learn what IBM is doing, share ideas, great peer networking, really senior audience. John Thomas is here, he's a distinguished engineer and director at IBM. Good to see you again, John. Thanks for coming back in theCUBE. Thank you, thank you. So we'll start with your role, distinguished engineer. We've had this conversation before, but it just doesn't happen overnight. You got to be accomplished. So congratulations on achieving that milestone. What is your role? So the role to distinguish engineer is long, but today, these days, what I do, I spend a lot of my time working on data science. In fact, I'm part of what is called a data science elite team. We work with clients on data science engagement. So this is not consulting, this is not services. This is where a team of data scientists works collaboratively with a client on a specific use case. And we build it out together, right? So we bring data science expertise, machine learning, deep learning expertise. We work with the business and build out a set of tangible assets that are relevant to that particular client. So this is not a four-pay service. This is, hey, you're a great customer, a great client of ours. We're going to bring together some resources. You'll learn, we'll learn, we'll grow together. This is an investment IBM is making. It's a major investment for our top clients working with them on their use cases. And is it global? This is global, yes. Yes, it is global. We're talking about, what, hundreds of clients, thousands of clients? Well, eventually thousands. But you know, we're starting small. We are trying to scale now. So obviously, you know, once you get into these engagements, you find out that it's not, hey, this is not just about building some models. You know, there are a lot of challenges that you got to deal with in an enterprise setting. Like what? What are some of the challenges? Well, in any data science engagement, the first thing is you'd have clarity on the use case that you're engaging in. You don't want to build models for model's sake. Just because TensorFlow or scikit-learn is great and build models, that doesn't serve a purpose, right? I mean, first, that's the first thing. Now do we have clarity on the business use case itself? Then comes data. You know, I cannot stress enough, Dave, there is no data science without data. And you might think this is the most obvious thing. This is the most, well, of course, there has to be data. But when I say data, talking about access to the right data, do we have governance over that data? Do we know who touched that data? Do we have lineage on that data? Because garbage in, garbage out, you know this, right? So do we have access to the right data in the right control setting for my machine learning models we build? These are challenges. And then there is yet another challenge around, okay, I build my models, but how do I operationalize them? How do I weave those models into the fabric of my business? So these are all challenges that we have to deal with. It's interesting what you're saying about, you know, the data does sound obvious, but it's having the right data model as well. I mean, I think about when I interact with Netflix, I don't talk to their customer service department or their marketing department or their sales department or their billing department, I just, it's one experience. Just have an experience, exactly. And, you know, Ginny, I think, use this notion of incumbent disruptors. Is that a logical starting point for these guys to get to that point where they have a data model that is a single data model? Yeah, so, I mean, a single data model is a... Yeah. What does that mean, right? Yeah, so... At least from an experience standpoint. So once we know this is the kind of experience we want to target, what are the relevant data sets and data pieces that are necessary to make that experience happen or come together, right? Sometimes it's core enterprise data that you have. In often cases, many cases, it has to be augmented with external data. So do you have a strategy around handling your internal external data, your structured transactional data, your semi-structured data, your news feeds? You know, all of these need to come together in a consistent fashion for that experience to be true. So it is not just about, okay, I've got my credit card transaction data, but what else is augmenting that data? And you need a model, you need a strategy around that. I talked to a lot of organizations and they say, you know, we have a good back-end reporting system, and maybe we have Cognos, we can build cubes, and all kinds of financial data that we have, but then it doesn't get down to the front line, and we haven't instrumented the front line. You've talked about IoT and that portends a change there, but there's a lot of data that either isn't persisted or not stored or just doesn't even exist, and so is that one of the challenges that you see enterprises dealing with? It is a challenge, you know, do I have access to the right data, whether that is data at rest or in motion? Am I persisting it the way I can consume it later? Or am I just moving chunks of big volumes of data around because, well, analytics is there, or machine learning is there, and I have to move data out of my core systems into that area. That is just a waste of time, complexity, cost, hidden costs often, because you don't, you know, people don't usually think about the hidden cost of moving large volumes of data around, but instead of that, can I bring analytics and machine learning and data science itself to where my data is? Not necessarily move it around all the time, right? So whether you're dealing with streaming data or large volumes of data in your Hadoop environment or mainframes or whatever, can I do ML in place and have the most value out of the data that is there? Well, what's happening with all that Hadoop that we, nobody talks about Hadoop anymore, right? It'd be, you know, the larger the Hadoop larger became a way to store data for less, right? But there's all this data now in a data lake. How are customers dealing with that? This is such an interesting thing. It's funny, right? People used to talk about the big data, right? It's just, you know, we jump from there to the cognitive, right? It's not like that, right? You know, without the data, then there is no cognition, there is no AI, there is no ML. So in terms of existing investments in Hadoop, for example, you have to, absolutely, you have to be able to tap in and leverage those investments, right? So for example, many large clients have investments in large cloud error or Hortonworks environments for their Hadoop environments. So how do you, if you're doing data science, how do you push down, how do you leverage that for scale, for example, right? And how do you access the data using the same access control mechanisms that are already in place? Maybe you have Kerberos as your mechanism. How do you work with that? How do you avoid moving data off of that environment? How do you push down data prep into the Spark cluster? How do you do model training in that Spark cluster? All of these become important in terms of leveraging your existing investments. It is not just about accessing data where it is, it's also about leveraging the scale that the company has already invested in. You have 100, 500 Hadoop clusters. Well, make the most of them in terms of scaling your data science operations, right? So push down and access data as much as possible in those environments. So Beth talked today about, Beth Smith, about Watson's Law, she made a little joke about that, but to me it's poignant because we are entering a new era. For decades this industry marched to the cadence of Moore's Law, and then of course Metcalfe's Law, the Internet era. I want to ask you, I want to make an observation and see if it resonates. It seems like innovation is no longer going to come from doubling the microprocessor speed, and the network is there, it's built out, the Internet is built. It seems like innovation comes from applying AI to data together to get insights and then being able to scale. So it's cloud economics, marginal costs go to zero and massive network effects and scale will attract innovation. So that seems to be the innovation equation, but how do you operationalize that? So to your point Dave, when we say cloud scale, we want the flexibility to do that in an off-prem public cloud or in an on-prem private cloud or in between in a hybrid cloud environment, right? When you talk about operational license, there's a couple of different things. People think that, oh okay, I've got a super Python programmer and he is great with TensorFlow or Scikit-learn or whatever, and he builds these models. Great, well what happens next? How do you actually operationalize those models, right? So you need to be able to deploy those models easily. You need to be able to consume those models easily. For example, you have a chatbot. The chatbot is dumb until it actually calls these machine learning models real time to make decisions on which way the conversation should go. So how do you make that chatbot intelligent? It is when it consumes the ML models that have been built. So deploying models, consuming models, you create a model and you deploy it. You got to push it through the development, test, staging, production phases, just the same rigor that you would have for any applications that are deployed, right? Then another thing is, well model is great on day one. Let's say I built a fraud model, fraud detection model. It predicts great on day one. A week later, a month later is useless because the data that it trained on is not what the fraudsters are using now. So patterns have changed. The model needs to be retrained. How do I understand the performance of the models is good over time? How do I do monitoring? How do I retrain the models? How do I do the lifecycle management of the models? And then scale, which is, okay, I deployed this model out. It is great. Every application is calling it. Maybe I have partners calling these models. How do I automatically scale? I need to be aware of the Kubernetes that you're using behind the scenes or if you're going to use external clusters for scale and technologies like spectrum conductor from our HPC background are very interesting counterparts to this. How do I scale? How do I burst? How do I go from an on-prem to an off-prem environment? How do I build something behind the firewall but deploy it into the cloud where a chatbot or some other cloud native application can consume? All of these things become interesting in the operationalizing. So how do all these conversations that you're having with these global elite clients and the challenges that you're unpacking, how do they get back into innovation for IBM? What's that process like? So that is, it's an interesting place to be in because I am hearing and experiencing firsthand real enterprise challenges, right? And where we see, oh, well, you know, our product doesn't handle this particular thing now. That is an immediate, you know, circling back with offering management and development. Hey guys, we need this particular function because I'm seeing this happening again and again in customer engagements. So that helps us shape our products, shape our data science offerings and instead of just running with the flow of what everyone is doing, we will look at that. What do our clients want? Where are they headed? And shape the products that way. That's what I'm doing. All right, John, well, thanks very much for coming back in theCUBE and it's a pleasure to see you again. Appreciate the time. Thank you, Dave. All right, good to see you. All right, keep it right there, buddy. We'll be back with our next guest. We're live from the IBM CDO Strategy Summit in San Francisco. You're watching theCUBE.