 Live from Las Vegas, it's theCUBE. Covering IBM Think 2018, brought to you by IBM. Welcome back to theCUBE. We are live at the inaugural IBM Think 2018 event. I'm Lisa Martin with my co-host Dave Vellante and we're excited to be joined by one of the keynotes at this inaugural event. Ferd Skeppers, the Chief Information Architect from ING Group, welcome to theCUBE. Thank you very much, pleasure to be here. So, you already mentioned, you're doing, you said six sessions, I know at least one of them is a keynote. So, you've been to IBM events before, you're going to be talking in the cloud and data campus, as they call it. Tell us though about what you have been doing as really one of the leaders for the last five years of ING becoming a data-driven company. And also tell us, what does data-driven mean to ING? Sure. So, let's start with the latter. What does data-driven mean for ING? There may be different opinions within ING even, but for me, it's very much we use data and make it accessible for everybody in the company to help them drive their decision-making. And at the same time, we use that same data also to help our customers get more understanding of what they actually do with ING and maybe even outside of ING. And use that data to help them get better services from ING at the right point in time with the right quality they can expect to really elevate our service level to our customers but also drive decision-making internally. So how do we do that? Well, very much by driving a data architecture and information architecture that started about six years ago where we worked together with IBM to create something that we now call the ING data lake architecture, which was very much about making it possible for us to bring all those data sources that we have in the company together, qualify them with business terms so that people could actually understand what they were, making sure that we came up with a common language across the bank so that across all those different lines of business, all those countries, we actually had a common understanding of what we meant with, say, customer. I mean, that sounds very natural for a bank, right? To understand what a customer is, but you might have very different definitions based on where you come from and which country. Okay, so I have to ask you about sort of that data model and that data journey. Because the financial services business, it's always been a data business, but a lot of years ago, maybe even still today, many organizations' data exists in silos, and so you talked about making data and data sources accessible to everybody in the company so that they could utilize it, but I'm very curious as to how you went about basically busting down the silos of data. What did you have to go through to do that? And do you feel like your employees and your customers actually now do have access to that data as you envisioned? I would say we're not there yet. We're on a journey, and that journey has been ongoing for about five years, but a journey very much started by actually creating the architecture, which was the easy part, but then selling the architecture. And selling the architecture actually means that you need to go to the different stakeholders with very different stories. So what's in it for them? What's in it for your CIOs? Well, you know, an easier landscape, you know, a lot of automation where in the past they had to do manual things, being in control, meaning all the risk items they go down. What's in it for the business side? Well, that well articulated business meaning around data, that empowerment of actually the business side to own the data and to be able to say who has access to it and what they can do with it. So it was really about, you know, selling this architecture with many different presentations with many different stakeholders and then actually building this and the most important thing that I've always said to anybody who asked me why is this successful at ING, we planned something six years ago and we've been driving this journey continuously for the last six years in that same direction. And that is really the key to it. If you believe you can do this journey and have a value after a year and then you're done, it doesn't work that way. It's a long journey. It takes a lot of investment and it pays off after you've done that investment after many years. So the joke is of course we all here at the data lake turns into a data swamp. How do you, so you went in to this thinking about getting value obviously out of the data. How'd you make the data not stagnate? What kind of challenges do you have in that regard? So I think one of the main things that we did when we came up with this whole architecture is to say from day one, you know, it is a data lake that is governance. Even though we didn't use the word that much because a few years ago governance may not have been like the most popular term to use. But in essence, it's what we did. Everything that we have in our data lake is identified. It is governed, you know, with different levels of governance, when you talk about customer data, you want to know all the different details about, you know, what is a salary, does an account includes the accrued interest, all these kinds of things. When you start talking about maybe log data, it is a lesser level of governance. But for every asset we have in our data lake, we know what it is, who owns it, more or less high level what it means. And in a lot of assets, you know, the more key assets of the bank, we know in all the details what's there. And that actually makes sure you don't get into a data swamp, because data swamp pretty much is what a lot of companies did when they said, you know, data lake equals Hadoop, equals put in bits and bytes, and then later you can't fight it anymore. And those data sets are categorized? They are. You've auto categorized them at the point of creation or use, is that right, that's automated? We have still a lot of manual activities, but we're actually more and more trying to automate this. So taking a lot of data discovery tools where we look at the data, the moment we ingest it into the data lake, we try to auto classify what it means, and actually even tie it into business terms that we've defined. But it's still partly also a manual thing, because as a bank, you probably have like thousands of things that you could describe on a business term level, and we're still growing through that process of actually classifying everything. But what about the policies associated with that, presumably is automated for sort of retention or deletion or movement or archiving, et cetera. That's automated, right? Absolutely, yes. So that ties into the business term, so we do everything on business term level. So the moment we talk about customer, we have a policy that's on customer, or customer name or whatever, no matter where that physical asset is, and even which kind of technology it is, it is driven all from that policy on the business term level. So you have published quite a bit with IBM on data lakes. I mentioned that you're speaking at this event. What are some of the key learnings as you are now in what fifth or sixth year of this journey? To Dave's question earlier, that you can share about how to not turn a data lake into a data swamp with maintaining of quality and meeting those internal stakeholder needs and expectations. So one of the complexes that you see in all major organizations is that we have like any technology out there. I mean, even though we're a good friend of IBM, we don't only have IBM technology. And one of the challenges that you have is, the moment you go into the different organizational units within your company, they all use different technologies and nobody wants to give it up. But you don't have a choice because at this moment, that might change over the next few years, the only way to be in control of your entire data landscape is to limit yourself and the technologies that you use and actually to make sure that you drive the governance from a central perspective and use a technology stack framework, whatever you want to call it, that actually ties governance directly into the technology, into that way that you handle the data. If you think you can do that with every technology out there and it magically all works together or you want to do the integration, I would advise against that. I think it's way too much of a challenge. And one of the things I actually be presenting upon here at this conference is about open metadata. So a way for us to actually start opening this up and bringing metadata, which in essence means governance, to a more heterogeneous landscape, which is one of the major drivers why we're investing in this ourselves. Even though we like the IBM technology, we still now and then want to play with tools from other vendors or maybe with open source technologies and it needs to add up, it needs to be governed as well. So this is a major investment for us and I think this is something that everybody should have a look upon. I want to ask you about innovation and governance because they're kind of counter poised in a lot of people's minds, but you were hinting earlier that used to be a bad word, but maybe we can start getting value out of our governance framework. And I've got a great studio audience here. I'm going to be like a broken record to these guys. I've been saying all morning that innovation is going to come from data. You've got a data lake, machine intelligence or artificial intelligence and cloud at scale, whether it's private or public cloud. So first question is governance and innovation. Are they at odds and how do you address that? So I would say they're not at odds, but I do think that the moment you start looking at innovation, you need to take governance as kind of like something that is always top of mind. Actually, I think what we've done so far by investing heavily into a government data lake has helped us with being innovative because that data foundation is there. The moment you want to look at the data that you have within the company, if it's well qualified, if it's known, you know the quality of it, you know where it is, it actually makes it way easier to use a lot of innovative technologies to work with that data because you don't have that problem of trying to find where everything is. And I think that's been one of the biggest problems with all the innovation projects that I've seen. You start with this great idea, then you bring it into a company and everybody says, ho, ho, ho, ho, not with my data. We have all the data together. We know where it is. We know where to use it for. And we can actually say that the moment people start playing with their data, but in a very well-defined set of rules, that's great. The moment we start bringing that innovation to production, we go to the steps to actually see whether that makes sense, whether we want to change the technology or whether we need to bring a next level of governance in there. But because we have everything under control, we can way easier actually play with innovation. So governance brings data quality, data quality brings conviction of your decision-making. Okay, get that. What about the cloud piece? We talked off camera, public cloud, not so much. How do you get scale economies, network effects, et cetera? So one of the challenges that we've been facing is that the moment you start bringing a lot of technology in your own company and you have to deploy all of that and it's the issue of bringing all that life-cycle management into your organization. It's just a challenge. We've got literally, well, I don't know how many teams, we'll say five, six, seven teams that do nothing else but bring life-cycle management-related updates towards our data lake. I love the cloud idea that actually, all that stuff is taken away by somebody else. They do the updates, they do the life-cycle management. I have a clear separation of my compute versus my storage. That's all the good stuff that cloud brings to me. The scalability, the elasticity, all of that stuff. I can't do that all in public cloud. I mean, we have a lot of customer data. We are very, very sensitive, you know, being from Europe and especially being in the Netherlands, you know, all the privacy of our customers. So we don't want to bring everything to public cloud but private cloud as it is today, especially with things like what's now being announced as the IBM cloud private for data. Bringing a lot of those containerized ways of delivering new technologies into our organization. We did a POC with that from three months to a few hours. That's the kind of stuff I'm looking for. And you know, now also the metering comes in there and we can start paying for it in a different way, not by just having a license for a product with a number of cores but actually have that dynamic scaling even in what we pay for. That is really enable us to do a lot of new things and that brings a lot of value. Can you touch on that business outcome that you just mentioned a minute ago from three months to three hours? Give it a little bit more context there that was with IBM cloud private for data. So what we did actually, we did a proof of concept together with IBM where we looked at a product in this case just to try it out, which was data stage. So in the past when we have a new version of data stage it will take us literally months to get that new version in production. Even if it's a small fix because in all honesty the way that the different fixes depend upon each other the complexity of playing through that it just takes like forever and it never goes right from day one. What we did is we brought the data stage containers into our own private cloud which happens to be called IPC instead of ICP which led to a lot of confusion during the whole POC. And we managed to show that we could actually bring the containers from IBM into our own cloud environments and literally we could show that we could do an update in hours. That same update going through the normal process of installing it, doing all the different patches after each other with some of them conflicting, testing it, making sure it all works, literally months. It's a huge success for us. So thinking about the data journey that you went on if you had a Mulligan, I don't know, does Mulligan translate into your native tongue? If do over, a Mulligan golf term, right? Should I shot you, take another one? If you had a do over, what would you do differently? What kind of advice would you give to your colleagues? I think I wouldn't change a fundamental step in what we did. I think we did, the journey was okay. What I probably would have done different is actually two things. One is there was and still has quite some focus on creating this ING language which we call the ING Esperanto, which is something we need. We need to have a definition that is cross country, cross lines of business, and just like a common understanding, but it has also translated quite a bit into becoming kind of like an attempt for a canonical data model. I think we should have shared away a little bit from that and kept it at a definition level a bit more. The second thing that I probably would have done different is that instead of trying to do a lot of the work together only with IBM, I would have probably invited a second partner from day one just to make sure this is even more of an industry standard thing. I mean, we've tried to publish together, we've done a lot of work together, but actually I think that everything we've built shouldn't be an ING proprietary thing. It should be something that's open source. And we're actually doing that now more and more. A lot of the stuff we've built, we're pushing to open source, which I think is the right way because at the end of the day, what we've built is plumbing. A bank is not in the business of plumbing. We're in the business of helping our customers to achieve great things. And all the stuff behind the scenes, all the plumbing is something I'd rather buy and get off the shelf than I build it myself. So last question, I've heard a number of things about what ING has achieved in terms of a lot of operational efficiencies. You mentioned that this is a journey and that's probably also another key piece for people who want to learn from you, that this is something that is going to take time. Last question though, you mentioned the word control earlier and how you had to get buy-in from a lot of stakeholders. You probably felt very tied lines of business to their data. Recommendations and advice for truly building a data-driven culture at a company that's 20, several decades old. Yeah, I would say go to the highest level in your company and make sure you'll see or put states on their messages to the outside world. I think one of the biggest achievements we had at some point in time is that our CEO, Ralph Hamas, he talks to the world and he says, we love analytics. We want to be a technology company and we think analytics is one of the most important things we do because it's the best way for us to actually help our customers to be a step ahead in life and business. The moment you have that message and you explain it that way to the world, nobody within your company will actually say this is a bad idea. Because if the boss says so, even within a Dutch organization, everybody buys into it. So I think just go to the highest manager in your company, get them on board, get them to speak on it publicly, and you're set. Well, Ferd, thanks so much for sharing what you have achieved so far at ING in your current role and for also sharing your recommendations and advice lessons. Laura, we appreciate your time. Thank you very much. And good luck on your keynote and all of your other speaking sessions this week. Thank you, yeah. And for Dave Vellante, I'm Lisa Martin. You're watching theCUBE live on day one of the inaugural IBM Think 2018 Stick Around. Dave and I will be right back after a short break.