 Live from Las Vegas, it's theCUBE. Covering InterConnect 2017, brought to you by IBM. Okay, welcome back everyone. We are here live in Las Vegas with Mandalay Bay for IBM InterConnect 2017. This is theCUBE's three-day coverage of IBM InterConnect. I'm John Furrier with my co-host Dave Vellante, our next guest is Seth Dobrin, Vice President and Chief Data Officer for IBM Analytics. Welcome to theCUBE, welcome back. Yeah, thanks for having me again. I love sitting down and chatting with you guys. You're a CDO Chief Data Officer and that's a really kind of a really pivotal role because you got to look at, as a chief, overall a lot of data within IBM Analytics. Also, you have customers, you're delivering a lot of solutions to it and it's cutting edge. I like the keynote on day one here. You had Chris Moody at Twitter, he's a data guy. I mean, you guys have a deal with Twitter, so he got more data. You get the weather company, you got that data set. You have IBM customer data. You guys are full with data right now. We're bursting at the seams with data and that's a good thing. And so what's the strategy and what are you guys working on and what's the key points that you guys are honing in and obviously cognitive to the core is to the Rometti's team. How are you guys making data work for IBM and your customers? So if you think about IBM Analytics, right, we're really focusing on five key areas, five things that we think if we get right, we'll help our clients learn how to drive their business and data strategies right, right? One is around, you know, how do I manage data across hybrid environments, right? So what's my hybrid data management strategy? So, you know, used to be, you know, how do I get to public cloud really? But really what it is, it's a conversation about, you know, every enterprise has their business critical assets. They're what people call legacy, right? If we call them business critical and we think about these are how companies got here today. This is what they make their money on today. The real challenge is how do we help them tie those business critical assets to their future state cloud, whether it's public cloud, private cloud, or something in between or hybrid cloud. So one of the key strategies for us is hybrid data management. Another one is around unified governance, right? So if you look at governance in the past, governance in the past was an inhibitor. It was something that people want, ooh, governance, do I have to do it? Bob Weier, you know? It's, you know, when I've been at companies before and thought about building a data strategy, we spent the first six months building a data strategy, trying to figure out how to avoid data governance or the word data governance. And really, you need to embrace data governance as an enabler. If you do it right, if you do it upfront, if you wrap things that include model management, how do I make sure that my data scientists can get to the data they need upfront by classifying data ahead of time, understanding entitlements, understanding what intent when people gave consent was. You also take out of the developer hands the need to worry about governance because now in a unified governance platform, right, it's all API driven, just like our applications are all driven, how do we make our API driven, how do we make our governance platform API driven? So if I'm an application developer, by the way, I'm not, you know, I can now call an API to manage governance for me. So I don't need to worry about, am I giving away the shop? Am I going to get the company sued? Am I going to get fired, right? Now I'm calling an API. So that's only two of them, right? The third one is really around data science and machine learning, right? So how do we make machine learning pervasive across enterprises and things like data science experience, Watson, IBM machine learning, we're now bringing that machine learning capability to the private cloud, right? Because 90% of data that exists can't be Google, so it's behind firewalls. How do we bring machine learning to that? One more. One more, that's around, God, I gave you quite a list. Hybrid data management, unified governance, data science and machine learning. Oh, the other one is open source, it's our commitment to open source, right? And so our commitment to open source, like Hadoop, Spark, and as we think about unified governance, a truly unified governance platform needs to be built on top of open source. So IBM is doubling down on our commitment to Apache Spark as a framework, a backbone, a metadata framework for our unified governance platform. What's the biggest paradigm shift? Wait, did we miss one? Hybrid data management, unified governance, data science and machine learning, pervasive open source. That's four. That's four, I thought it was five. No. There's one learning in data science, or two. So technically five. If I said five, there's only four. Okay, some of the data governance thing because this unification is interesting to me because one of the things that we see in the marketplace, people hungry for data ops, like what DevOps was for cloud, there's a whole application developer mall developing where there's a new developer persona emerging where it's like, I want a code and I want to just tap data handled by brilliant people or cognitive engines that just serve me up what I need, like a routine or a procedure or a subroutine, whatever you want to call it, that's a data DevOps model kind of thing. How are you guys doing? Do you agree with that and how does that play out? So that's a combination in my mind, that's a combination of an enterprise creating data assets, right? So treating data as the asset it is and not a digital dropping of applications, right? And it's that combined with metadata, right? It gets back to the Apache Atlas conversation. If you want to understand your data and know where it is, right? It's a metadata problem, right? What's the data? What's the lineage? Where is it? Where does it live? How do I get to it? What can I, can't I do with it? And so that just reinforces the need for an open source, ubiquitous metadata catalog, a single catalog, and then a single catalog of policies associated with that all driven in a composable way through API. That's a fundamental cultural thinking shift because you're saying I don't want to just take exhaust from apps, which is just how people have been dealing with data. You're saying get holistic and say you need to create an asset class or layer or something that is designed. So I mean, if enterprise are going to be successful with data, right? Now we're getting to five things, right? So there's five things. They need to treat data as an asset, right? So it's got to be a first class citizen, not a digital dropping, and they need a strategy around it. So what are the, conceptually, what are the pieces of data that I care about? My customers, my products, right? My talent, my finances, right? What are the limited number of things, right? What is my data science strategy, right? My data science, how do I build deployable data science assets? I can't be developing machine learning models and deploying them in Excel spreadsheets, right? They have to be integrated in my processes, right? I have to have a cloud strategy. So am I going to be on-premise? Am I going to be off-premise? Am I going to be something in between? You know, I have to get back to unified governance. I have to, how to govern it, right? Governing a single place is hard enough, let alone multiple places, and then my talent is a piece of it. Could you peg a progress bar or the industry vis-a-vis what you just said, because I think- Again, we only got through four. No, talent was the last one. No, talent, sorry, I missed it. The progress bar of where the enterprises are right now, because obviously the big conversation on the cloud side is enterprise readiness, enterprise grade, that's kind of an ongoing conversation. But now, if you take your premise, which I think is accurate, is that I got to have a centralized data strategy and platform, instead of not a data leg, more than that, software, et cetera. Where's the progress bar? Where are people? Peg and inning or? Boy, you know, I think, I think they're all over the map. I mean, I've only been with IBM for four months and I've been spending much of that time literally traveling around the world talking to clients. And clients are all over the map, right? I was, last week, I spent the week in South America with a media company, a cable company down there, that when I went first setting up the meeting, the guy's like, well, you know, we're not that far along down this journey. And I was like, oh my God, you guys are like so far ahead of everyone else, that's not even funny, right? And then I'm sitting down with big banks that think they're like way out there and they haven't even started on the journey, right? And so it's literally all over the place and it's even within industry, right? There's financial companies that are like also way out there. I mean, there's another bank in Brazil that uses biometrics to access ATMs. You don't need a pin anymore, right? And they have analytics that drive all that, right? That's crazy. We don't have anything like that here. Are you meeting with CDOs? Yeah, mostly CDOs or kind of the facto, like we talked about before the show, but mostly CDOs. So you may be unique in the sense that you're working for a technology company, so a lot of your time is outward focused, but when you travel around and meet with the CDOs, how much of their time is inward focused versus outward focused? Well, so my time actually is split between inward and outward focused. 50 to 50, roughly. Because part of my time is transforming our own business using data and analytics, right? Because IBM is a company and we got to figure out how to do that. Is it correct that yours is probably a higher percentage outward than... Mine's probably a higher percentage outward than most CDOs, yeah. So I think most CDOs are 75, 80% inward focused and 20% outward focused. And a lot of that outward focus is just trying to understand what other people are doing. And I guess it's okay for now, but will that change over time? I think that's about right. I think it gets back to the other conversation we had before the show about your monetization strategy. I think if a company progresses where it's no longer about how do I change my process as in use data to monetize my internal process, if I'm going to start figuring out how I sell data, then CDOs need to get a more external thing. But you're supporting the business in that role and that's largely going to be an internal function, data quality, governance, and like you say, the data science strategy. Yeah, and I think it's important when I talk about data governance, I think things that we used to talk about as data management is all part of data governance, right? Data governance is not just controlling who, it's all of that. How do I understand my data? How do I provide access to my data? It's all those things you need to enable your business to thrive on data. My question for you is a personal one. How did you get to be a CDO? I mean, if you go to a class, I'm going to be a CDO someday, other than you do that. I'm just being a CDO school. I stayed in the Holiday Inn at Tress last night. You know, tongue and cheek aside, I mean, people are getting into CDO roles from interesting vectors, right? Anthropology, science, art. I mean, it's really interesting if you math geek certainly love that they thrive there, but there's not one, I haven't yet seen one sweet spot. So take us through how you got into it and what. So I'm not going to fit any preconceived notion of what a CDO is, right? I mean, especially in a technology company. My background is in molecular and statistical genetics, right? So I'm a geneticist, right? Data has properties that could be kind of biological. And actually, if you think about the roots of big data and data science, or big data at least, right? The two of the probably fundamental drivers of the concept of big data were genetics and astrophysics, right? So 20 years ago when I was getting my PhD, we were dealing with, you know, tens and hundreds of gigabyte-sized files, right? We were trying to figure out how do we get stuff out of 15 Excel files because they weren't big enough into a single CSV file, right? So, you know, millions of rows, millions of... Crude by today's standards. Yeah, crude by today's standards. But it was still, how do we do this? And so, you know, 20 years ago, I was learning to be a data scientist. I didn't know it. I stopped doing that field and I started managing labs for a while. And then in my last role, you know, we kind of transformed how the research group within that company in the agricultural space handled and managed data. And I was simultaneously the biggest critic and biggest advocate for IT. And they said, hey, come over and help us figure out how to transform the company the way we've transformed this group. It's like, we're talking about your PhD experience. It's almost like you are so stuck in the mud with not having the compute power or some of the tooling. It's like a hungry man, oh, it's unlimited abundance of compute. Oh, I love what's going on. So you almost get gravitated, pulled into that, right? Yeah, and it's funny, I was doing a demo upstairs today with one of the sales guys was doing a demo with some clients. In one line of code, they had expressed what was part of my dissertation, right? It was, you know, a single line of code in a script. And it was like, that was someone's, you know, entire four-year career 20 years ago. That's great story. And I think that's consistent with just people who just attracted to it and they're going to end up being captains of industry. This is a hot field. You guys have a CDO event happening in San Francisco. We'll be doing some live streaming there. What's the agenda? Because this is a very accelerating field. You mentioned now dealing practically with compliance and governance, which was, you run the other direction in the old days. Now, it's embracing that. This got to get processed in discipline management. What's going to go on at CDO summit? Are you doing it? Yeah, at the CDO summit next week, I think we're going to focus on three key areas, right? What does a cloud journey look like? Or maybe four key areas, right? So a cloud journey. How do you monetize data and what does that even mean? Right? And talent. So at all these CDO summits, the IBM CDO summits have been going on for three or four years now. Every one of them is a talent conversation. And then governance, right? I think those are four key concepts. And not surprising, they were four of my five on my list, right? So I think that's what really we're going to talk about. The unified governance. Talk about how that happens in your vision because that's something that you hear. Unificate unified identity. We hear blockchain looking at a whole new disruptive way of dealing with value digitally. How do you see the data governance thing unifying? Well, I think again, it's around, you know, IBM did a great job of figuring out how to take an open source product that was Spark, right? And make it the heart of our products, right? It's going to be the same thing with governance where you're going to see, you know, Apache Atlas is at its infancy right now. Having that open backbone so that people can get in and out of it easy, right? If you're going to have a unified governance platform it's got to be open, right? By definition, because I need to get other people's products on there. I can't go to an enterprise and say, we're going to sell you a unified governance platform but you've got to buy all IBM. Or you've got to spend two years doing development work to get it on there, right? So open is the framework and composable, API driven and proactive are really, I think that's kind of the key pieces for it. So we all remember the sort of client server days where it took like a decade and a half to realize, oh my gosh, this is out of control. We need to bring it back in. And the wild west days of big data, it feels like enterprises have nipped that governance issue in the bottom. At least maybe they don't have it under control yet but they understand the need to get it under control. Is that a fair statement? I think they understand the need but data is so big and grows so fast that another component that I didn't mention that maybe is implied a little bit maybe isn't is automation, right? You need to be able to capture metadata in an automated fashion. We were talking to a client earlier who 400 terabytes a day of data changes. Not even talking about what new data they're ingesting. How do they keep track of that? It's got to be automated, right? And so this unified governance needs to capture this metadata and as automated a fashion as possible. Master data needs to be automated. We need to think about applying. And make it available in real time, low latency because otherwise it becomes a data swamp. Right, it's got to be proactive, real time on demand. The other thing I wanted to ask you Seth to get your opinion on is sort of the mid 2000s when the federal rules of civil procedure changed and electronic documents and records became admissible. It was always about how do I get rid of data? And that's changed. Everybody wants to keep data now and analyze it and so forth. So what about that balance? And one of the challenges back then was data classification. I can't scale my governance. I can't eliminate and defensively delete data unless I can classify it. Is the analog true where with data as an opportunity I can't do a good job, a good enough job analyzing my data and keeping my data under control without some kind of automated classification. And has the industry solved that? Well, I don't think the industry has completely solved it yet. But I think with cognitive tools, there's tools out there that we have that other people have that can automatically, if you give it parameters and train it can classify the data for you. And I think classification is one of the keys. You need to understand how the data is classified so you understand who can access it, how long you should keep it, right? And so it's key and that's got to be automated also. I think we've done a fair job as an industry of doing that. There's still a whole lot of work, especially as you get in the kind of specialized sectors. And so I think that's a key and we've got to do a better job of helping companies train those things so that they work. And I'm a big proponent of don't give your data away to IT companies, right? It's your asset. Don't let them train their models with your data and sell it to other people. But there are some caveats up. There are some core areas where industries need to get that together and let IT companies, whether it's IBM or someone else train models for things just like that, for classifications because if someone gets it wrong, it can bring the whole industry down. It's almost, it takes the open source paradigm almost. It's like open source software, share some data, but I, you know. Right, and there's some key things that aren't differentiating that as an industry, you should get together and share. But you guys are making, IBM's making a big deal out of this and I think it's super important. I think it's probably the top thing that CDOs and CIOs need to think about right now is if I really own my data and that data is needed to train my big data models, who owns the models and how do I protect my IP? And are you selling it to my competitors? Well, that's really. Right, you're going down the street and taking away my IP, my differentiating IP and giving it to my competitor. So do I own the models? Because the data and the models are coming together, right? And that's what IBM's telling me. Because I own the data and the models that it informs. Is that correct? Yeah, that's absolutely correct. And I mean, you guys made the point earlier about IBM bursting at the seams on data, right? That's really the driver for it. We need to do a key set of training. We need to train our models with content for industries, bring those trained models to companies and let them train specified specific versions for their company with their data that unless there's a reason they tell us to do it is never going to leave their company. I think one of the things, that's a great point, that you being full of data because a lot of people who are building solutions and scaffolding for data, aka software solution, never weren't data full. They're typical, I'm going to be in a software company and they build something that they don't have a problem for. You guys have your data full so you know the problem. You're living it every day. It's opportunity. And that's why when a startup comes to you and says, hey, we have this great AI algorithm, give us your data, they want to resell that model, right? And because they don't have access to the content. And if you look at what IBM's done with Watson, right? That's why there are specialized verticals that we're focusing on Watson, Watson Health, Watson Financial, right? Because we're investing in data in those areas, you can look at the acquisitions we've done, right? We're investing in data to train those models. We should follow up on this because this brings up the whole scale point. If you look at all the innovators of the past decade, even two decades, Yahoo, Google, Facebook, these are companies that were web scalers before there was anything that they could buy. They built their own. Because they had their own problem at scale. At scale. And data at scale, it's a whole nother mind-blowing issue. Absolutely. You agree, right? Okay, so we're going to put that on the agenda for the CDO summit in San Francisco next week. Seth, thanks so much for joining us in the queue. Appreciate it. Thank you, data officer. This is going to be a hot field. The CDO is going to be a very important opportunity for anyone watching in the data field. This is going to be new opportunities. They get that data, get it in control, taming the data, making it valuable. This is theCUBE, taming all the content here at Interconnect. I'm John Furrier with Dave Vellante. More content coming. Stay with us. Day two coverage continues.