 Live from Las Vegas, Nevada, it's theCUBE. Covering IBM World of Watson 2016, brought to you by IBM. Now, here are your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We're here live in Las Vegas for IBM's World of Watson at the Mandalay Bay here. This is theCUBE's SiliconANGLE Media Flagship Program. We go out to the events and extract the signal from the noise. I'm John Furrier with SiliconANGLE. I'm here with Dave Vellante, my co-host, chief researcher at wikibon.com and our next guest is Inderpal Bandari, who's the global chief data officer for IBM. Welcome to theCUBE, welcome back. Thank you, thank you for having me. Nice to meet you. It's a pleasure to be here. You've had a great conversation with Dave at the last event. Ten years, Bob Pichino was just on us. We're just talking about the 10 year anniversary of IOD. Information on demand. And Dave's joke, what I thought was telling, we'll set it up this way, is that 10 years ago, different data conversation. How'd he get rid of it? I don't want the compliance and the liability. Now, it's shifted to a much more organic, innovative. Exciting. Yeah, I mean, value add. What's the shift? What's the big change in 10 years? What, besides the obvious of the Watson vision, why did it move so fast or too slow? What's your take on this? Yeah, no, so data used to be viewed as exhaust, right? Detritus, something to get rid of, like you pointed out. And now it's much more to an asset. And in fact, you know, people are even talking about quantifying it as an asset, so that you can reflect it on the balance sheet and stuff like that. So it certainly moved a long, long way. And I think part of it has to do with the fact that we are inundated with data, and data does contain valuable information. And to the extent that you're able to glean it and act on it efficiently and quickly and accurately, it leads to a competitive advantage. What's the landscape for architects out there? Because a lot of things that we hear is that, okay, I buy the data, I got a digital transformation, okay, but now I got to get, put the data to work. So I need to have it all categorized. What's the setup? Is there a general architecture philosophy that you could share with companies that are trying to set themselves up for some baseline foundational sets of building blocks? I mean, I think they buy the Watson dream, that's a little headroom. They just want to start in kindergarten or in literally whatever metaphor you want to use. They need a baseline. What's the building blocks approach? The building blocks approach, I mean, from a, if you're talking about a pure technical, architectural, that kind of approach, that's one thing. If you're really going after a methodology that's going to allow you to create value from data, I would back you up further. I would say that you want to start with the business itself and gaining an understanding of how the business is going to go about monetizing itself. Not its data, but you know, what is the business's monetization strategy? How does the business plan to make money over the next few years? Not how it makes money today, but over the next few years, how it plans to make money. That's the right starting point. Once you've understood that, then it's basically reflecting on how data is best used in service of that. And then that leads you down to the architecture, the technologies, the people you need, the skills mix, the processes. The counterintuitive, the way it used to be. It used to be the ivory tower, where it would convene and dictate policy and schemas on databases and say, this is how you do it. You're saying the opposite. Business union's got to go in and own the roadmap, if you will, for the business. It's a business roadmap. And then figure it out, then go back. Then go backwards. That's really the better way to address it, in my view. So the framework that we talked about in Boston, and just, you're like the professor, I'm a student. So, and I've been out speaking to other chief data officers about it. It's spot on in this framework. So let me briefly summarize it, and then we can drill into it. I hope you're attributing it to me, Dave. I'm saying this is in the files framework. I've stolen it, with no shame. No, I'm kidding. So, we're doing a live TV. It's, you know. He can source you. I will give him credit. And so, but you have said there are two parallel and three sequential activities that have to take place for chief data officer. The two parallel are partnership with the line of business and get the skill sets right. The three sequential are the thing you just mentioned. How are you going to monetize data, access to data, data sources, and trust. You're going to trust the data. Okay, so great framework. And I'd say I've tested it. Some CDOs have said to me, well, geez, that's actually better than the framework I had. So they've sort of evolved it. I said, you're welcome. Okay, but now, so let's drill into that a little bit. Maybe starting with the monetization piece. In the early days, John, and when people were talking about big data, it was the mistake people made was, I got to sell the data, monetize the data itself. Not necessarily is what you're saying. Yes, yes. I mean, that's the common pitfall with that. When you start thinking about monetization and you're the chief data officer, your brain naturally goes to, well, how do I monetize the data? That's the wrong question. The question really is, how is the business planning to monetize itself? What is the monetization strategy for the overall business? And once you understand that, then you kind of back into what data is needed to support it. And that's really kind of the strategy in place. And then the next two steps of, well, then how do you govern that data so it's fit for the purpose of that business need that you just identified. And finally, what data is so critical that you want to centralize it and make sure that it's completely trusted? So you back into those three steps. So thinking about data sources, people always say, well, should you start with internal? Should you start with external? And the answer presumably is it depends. It depends on the business. So how do you actually go through that decision tree? What's that process like? Yeah, I mean, you know, you start with the monetization strategy of the company. So for example, I'll use IBM. I mean, in the case of IBM, took me the first few months to understand that our monetization strategy was around cognitive business, specifically making enterprises into cognitive businesses. And so then the strategy that we have internally for IBM's data is to enable cognition within IBM, the enterprise, and move forward with that. And then that becomes a showcase for our customers because it is after all such a good example of a complex enterprise. And so backing in from that strategy, it becomes clear what are some of the critical data elements that you need to master, that you need to trust, that you need to centralize, and you need to govern very, very rigorously. So that's basically how I approach it. Did I answer your question, Dave, or do you? Yeah, so you touched on the second part. I want to drill into the third sequential activity, which is- The sources, right? You mentioned some of them. Well, so we've just talked about this. Well, the sources, I mean, if you have something to add to that. Yes, in terms of, I think you mentioned the internal versus the external. So one thing I'll mention, especially if you kind of take that 10-year outlook that we were talking about. 10 years ago, CDOs had very internal outlooks in terms of the data was all internal business data. Today, it's much more external as well. There's a lot more exogenous data that we have to handle and drill into. And that's because we're making use of a lot more unstructured data. So things like news feeds, press releases, articles that have just been written all our fair game to amplify the view that you have about some entity. So for example, if we're dealing with a new supplier, previously we might gather some information by talking with them. Now we'd also be able to look at essentially everything that's out there about them and factor that in. So there's an element of the exogenous data that's brought to bear, and then that obviously becomes part of the realm of the CDO as well to make sure that that data is available and usable by the business. Jeff Jones said something. Go ahead, sorry. Oh, Jeff Jones would say that's the observation space that you want to have. The news feeds is extra metadata that could change the alchemy, if you will, of whatever the mix of the data is. Is that kind of what you're getting at? Yeah, I would say, I'd even go further than just metadata. I would say that in some sense it's part of your intrinsic data set. Because it gives you additional information about the entities that you're collecting data on and that you're measuring. But John Kelly in the keynote this morning, he made two statements. He said one is in three to five years, every healthcare practitioner is going to want to consult Watson. And then he also said, same thing for M&A, because Watson's going to know every public piece of data about every single company. So it would seem that within the three to five year timeframe, that the shift is going to be increasingly toward external data sources. Not necessarily the value in the lever points, but in terms of the volume, certainly of data. Is that fair assumption? I think it's a fair statement. I mean, I think if you think of it in the healthcare context, if a patient comes in and there's a doctor or a practitioner that's examining the patient right there, they're generating some data based on their interaction. But then if you think about the exogenous data that's relevant and pertinent to that case, that could involve thousands of journals and articles. And so your example of essentially saying that the external data could be far greater than the internal data, I would say we are already there. Okay, and then the third sequential piece is trust. You've got to be able to trust the data. We talk a lot about, we were down at big data NYC, the same week you guys made your big announcement of data work. Everybody talks about data lakes. We joke it's a data swamp and you can't really trust the data. You're further away from a single version of the truth than we ever were. So how are you dealing with that problem internally at IBM? And what's the focus? Is it more on reporting? Is it more on supporting lines of business? The product? Yeah, the focus internal within IBM is in terms of driving cognition at, the way I would describe it is at points where today we have significant human judgment being exercised to make decisions. And that's thousands of points in our enterprise, complicated enterprise like IBMs. And each of those decision points is actually an opportunity to inject cognitive technology in play and then bring to bear an augmented intelligence to those decisions. That, you know, A factors in the exogenous data. So leaving a much better informed decision, but also then a much more accurate decision. Okay, the two parallel activities. Let's start with the first one, line of business, you know, relationships. Sounds like bromide. Why is it not just sort of a trite throwaway statement? What, where's the detail behind that? So the detail behind that, if you go back to the very first and the most important step in this whole thing with regard to the monetization strategy of the company, understanding that, if you don't have those deep relationships with the lines of business, there's no way that you'll be able to understand the monetization strategy of the business. So that's why that's a concurrent activity that has to start on day one. Otherwise, you won't even get past the, you know, that very first base in terms of understanding what the monetization strategies are for the business. And that can only really come by working directly with the business units, meeting with their leadership, understanding their business. So you have to do that due diligence and that's where that partnership becomes critical. Then as you move on, as you progress through that sequence, you need them again. So for instance, once you've understood the strategy and now you've understood what data you need to further that strategy and to govern it, you need their help in governing the business. Because in many cases, the business is maybe the ones collecting the data or at least controlling the source systems for that data. So that partnership then just gets deeper and deeper and deeper as you move forward in that progression. I love the conversation I'm monetizing earlier and there's some tweets going around, you know, what's holding it back, cost of building it obviously and manageability. But I want to bring that back and bring a developer perspective here because a lot of emphasis is on developing apps where the data is now part of the development process. I wrote a blog post in 2008 saying that data's a new development kit. It was radical at the time, but in reality, it came out to be true in that they're looking at data as a library of value to tap into. So stuff's in a daily, they could be sitting there for years but I could pull something out and be very relevant in context in real time and change the game on some insight and the insight economy as Bob was saying. So what is your strategy for IBM? Two, one, on board more developer goodness and two, how do you talk to customers who are really trying to figure out a developer strategy so they can build apps and not to go back and rewrite it and make it certainly mobile first, et cetera. But how does a data first app get built and how should developers be programming with data? I'll give you a way to think about it, right? I mean, and going back again to that 10 year paradigm shift, right? So 10 years ago, if somebody wanted to write an application and put it on the internet and it was based on data, the hardest part was getting hold of the data because it was just very, very difficult for them to get hold of the data, to access the data. And then those who did manage to get hold of the data, they were very successful in being able to utilize it. So now with the paradigm shift that's happened now is the approach is that you make the data available to developers and so they don't have to go through that work both in terms of accessing, collecting, finding that data, then cleaning it, it's all so significant and so time consuming that it could put back their whole process of eventually getting to the app. So to the extent that you have large stores of data that are ready to go and you can then make that available to a body of developers, it just unleashes an away. You're having a library of code available because it's all the hard work. I think that's a good way to look at it. I think that's a very good way to look at it because you've also got technologies like the deep learning technologies where you can essentially train them with data. So you don't need to write the code, they get trained through the data. So I think that's a very good way. It's DevOps of data, I mean it's like Agile meets, I mean you're right, a lot of the cleaning and this is where more noise, we all know that problem, more data creates more noise, better cleaning tools, so however you can automate that, seems to be the secret differentiator. It's an accelerator, it's a major accelerator for development if you have good sets of data that are available for them to use. So I want to round out my little framework here, your framework and my learnings, for the fifth one being skills. Yes. So this is complicated because it involves organization, skills changes, there's Pepper going through the lobby here. We're trying to get her on the cube. Dave has to think for Pepper, okay, should I take over? You want to go see Pepper? I want to see Pepper on the cube, hey sorry, I digress. But so a lot of issues there, there's reporting structures, so what do you mean when you talk about sort of the skill sets and re-skilling? So I'll describe to you a little bit about the organization that I have at IBM as an example. Some of that carries over and some of that doesn't. The reason I say that is again, I mean the skills piece, there are some generic skill sets that you need for to be a Chief Data Officer, to be a successful Chief Data Officer in an enterprise. There is one pillar that I have in my organization is around data science, data engineering, DevOps, deep learning and these are the folks who are adept at those technologies and approaches and methodologies and they can take those and apply them to the enterprise. So in a sense, these are the more technical people. Then another pillar that's again pretty generic and you have to have it, is the information and data governance pillar. So that anything that's flowing, any data that's flowing through the data platform that I spoke of in the first pillar, that data is governed and fit for purpose. So they have to worry about that. As soon as any data is, you even think of introducing that into the platform, these folks have to be on that and they're essentially governing it, making sure that people have the right access, security, the quality is good, it's improving, there's a path to improving it and so forth. I think those are some fairly generic skill sets that you have to get. In the case of the first pillar, what's difficult is that there aren't that many people with those skills and so it's hard to find that talent and so the sooner you get on it, so that's the biggest barrier. In the case of the second pillar, what's the most difficult piece there is you need people who can walk the balance between monetization and governance. Too much governance and you essentially slow everything down and nothing moves. Handcuff, right? And your handcuff. And then if it's too much monetization, you might run aground because you've ignored some major regulation. So walking that line is really- Awesome market value. Yes, that's exactly right. You can really get ahead of your skis, as they say, and have a face plan. If you try too hard to- And you might be skiing backwards down the floor. Well, we see this with web startups like Twitter, that's a big conversation with Twitter, Facebook. If you try to monetize too early, you lose the flywheel effect of the value. Absolutely. So walking that balance is critical. So that's really finding the skill set to be able to do that. That's what's at play in that second part. The third one is if you are applying it to an enterprise, you have to integrate this platform into the workflow of the enterprise itself. Otherwise, you're not going to create any impact because that's where the impact gets created, right? That's basically where the data is at the tip of the spear, so to speak. So it's going to create value. And in a large enterprise which has legacy systems, which has silos, which has acquiring companies, and so on and so forth, that's in of itself a significant job. And that skill set is also interesting. That's a handicap because if you have that kind of siloed mentality, you don't get the benefits of the data sharing, right? So what's that? What's that? How much effort would it take? I mean, just kind of painting that picture kind of like out there, like massively hard. Yeah, no, that's, you know, a lot of people think that data mining is all about my data. You know, this is my data, I'm not going to give it to you. One of the functions of the Chief Data Office is to change that mindset. And to start making use of the data in a broader context than just a departmental siloed type of approach. And now some data can legitimately be used only departmentally, but the moment you need two or more departments start using that data, I mean, it's essentially corporate data. So are those roles a shared service? What do you see that works? It maybe varies, but is it a shared service that reports into the Chief Data Officer or is it embedded into the business? Those skill sets that you talked about? I think those skill sets are definitely part of the Chief Data Officer, you know, organization. Now, it's interesting you mentioned that about embedding them in the business units. Now, in a large enterprise, a complicated enterprise like IBM, the different business units, and that potentially have different business objectives and so forth, you know, you do need a Chief Data Officer role for each of these business units. And that's something that I've been advocating. That's my fourth pillar. And we're setting that up within the context of IBM. So that they serve the business unit, but they essentially report it to me so that they can make use of the overall corporate structure that you want. You do their performance review. The performance review is done by the business unit. It is, okay. But the functional direction is given by me. Okay. So I'll get back to this data. And I can't go either way, I'm sure. That can go, yes, there's a balance there that you have to specify. So we're going to do a lot of time first, but I'll get back to this data mining thing, because you bring up a good point. We can maybe continue on our next time we talk, but data mining is where all the cutting edge kind of best practices are, were, are, were, were, they're still there technically if you're, but the dynamic of data mining is that you're assuming no new data. So with, if you have a lot of data coming in, most of the best data mining techniques are like a corpus, you attack it, and then, but if the pile of data is getting bigger faster than you could data mine it, what good is, again, we're in this circular hole. I'm going to again, you know, just take you back 10 years from now and now, right? And the differences between the two. So it's very interesting points that you bring up. I'll give you an example from 10 years ago, this data mining example, not 10 years ago, actually my first go around at IBM, so it's like 94. One of the things I'd done was we had a program, a computer program that every team in the National Basketball Association started using. And this was a classic data mining program. I would look at the data and find insights and present them. And one of the insights that it came up with, and this was for a critical playoff game, it told the coach, you got to play, you're back up point guard and you're back up forward. Now think about that. Which same coach would actually go with that? So it's very hard for them to believe that. They don't know if it's right or wrong. It's a moment of truth. And the way we got around that was we essentially pointed back to the snippets of video where those circumstances occurred. And now the coach could see what was going on and make an informed decision. Flash forward to now. The systems we have now can actually look at all that context, all at once. What's happening in the video, what's happening in the audio. Also the data can piece together the context. So data mining is very different today than what it was then. Now it's all about weaving the context and the story together and serving it up. Yeah, what happened, what's happening and what's going to happen kind of is the theaters of insight, right? I mean, what happened is easy. Just look at the data and spit out some insight. What's happening now is a little bit harder in memory. I think that's the difference between cognition as it is today versus data mining as we understood it a few years ago. Great conversation, we can go for another hour but we're getting a lot of love to follow up on some of the deep learning. Maybe come down to our monk next time we're in this. Certainly on the sports data, we have a whole program on sports data. So we love the sports with ESPN of tech and bringing you all the action here. Yes, I did that work before Moneyball. My mistake was I didn't ride the bike. Yeah, you could have been. You're too busy riding the next algorithm. But that's okay. We put a little footmark in the cube notes for that. Thank you very much. Thank you. Okay, live in Mandalay Bay. We'll be right back with more of our live coverage. I'm John Furrier, Dave Vellante. We'll be right back. Today, I am helping people.