 of the industry, especially the big data industry, and also they're listening to you, and this is Wei Wang from Core the World. You are watching the presentation. Live from San Francisco, it's theCUBE, covering Informatica World 2016. Brought to you by Informatica. Now, here are your hosts, John Furrier and Peter Burris. Okay, welcome back everyone. We are here live in San Francisco for Informatica World 2016. Exclusive coverage from SiliconANGLE Media's theCUBE. This is our flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, my co-host Peter Burris. We have Sri Kanthadai, Global Head of Data Management and Steve Jones, Global Vice President of Big Data from Cap Gemini, Insights and Data. Steve, good to see you again. You're welcome. You're welcome back. Welcome to theCUBE. Thank you and you got my name right. It was a tongue twister, but we were talking about Big Data before we started rolling and kind of like where we've come. Kind of talking about open source, but really Big Data, you look back only a few years ago. I mean, go back five years, the Hadoop movement, to where it is now. The modernization message is certainly loud and clear, but it's just not about Hadoop anymore. There's a lot of operational challenges and also total cost of ownership. I want to get your thoughts. What's the trends? What do you guys see as the big trends now relative to this modernization of taking open source to the next, I mean, Big Data to the next level? Well, I think part of the pieces we've actually did, about to publish a report we've done with Informatica on exactly that question, particularly around governance and how people are making it operational. We did a report recently with our Capsule and Consulting Division around operation analytics. Really fascinating thing that found out was the two real interesting of governance, right? The age old thing on governance has been the business doesn't engage. Well, guess what we found when you look at Big Data programs is when the Big Data programs start to deliver value, guess who wants to take them over? Business. Guess who then actually starts leading the governance efforts? The business. So suddenly this piece where the history of sort of data management has been, you know, IT going, you really care about quality. The business, to be honest going, yeah, we don't care that much, we're still using Excel. To the stage of which you're delivering real analytical value, those pieces are going through. It's something we've been on a long journey for. I mean, we talked the other day. 2011 was the first time at Cap we published a white paper on our learnings around Big Data and governance. It's amazing how you took five years ago. We were talking about actually how you do governance in Big Data because of some of our more sort of forward-looking clients. But that shift, and what we're finding in the report is the fact that people are really looking to replace this substrate. It's absolutely not just about Hadoop, but that's the foundation, right? And unlike sort of historical pieces where there hasn't really been a data foundation, there's been lots of data silos, but not a data foundation, companies are looking to move towards actual firm data foundations across their entire business. That's a huge leap for IT organizations to make. And in terms of its impact on MDM and data quality and pace of delivery and those sort of pieces is huge. So also, talk about the trends outside the US, for instance, because now you have in the UK, talk about that because your clients have a global footprint. The governance thing crosses over the boundaries, the blurring, if you will, is virtual, but you still have physical locations. Well, I am based out of the UK, I'm based out of London, and so I see that side of the pond more often than this side, but the trends are pretty similar. And what Steve said, in fact, we were joking about it yesterday, and we said it's not for a tweet, but maybe with a little bit more. Big data doesn't need data quality. And my other favorite statement is MDM is dead, long live MDM. Both of them are relevant. Big data doesn't need data quality in the sense that you cleanse all your data and put it into an EDW or a data lake, because you can't. Only part of it is data owned by you. The rest comes from external sources. Where it needs quality is building the context on top when the end user, the analyst, have a view. And there, if you build the context, then even good data could turn to bad because in a particular context, that data is no more relevant. But bad data can turn to good because you're bringing in the context. And there was this example we were talking about of, you run a marketing campaign and you have all these likes and tweets and everybody loved it. Somebody then said, okay, how good is this campaign? It's great. We need more. How good is it in the context of sales? Guess what? When the campaign ran, there was no difference to your sales. So then, this good data that you had on the marketing campaign has turned bad. Just to the company, that was a wasted effort on marketing. So you need contextual quality. Not pure data quality. If you look at ETL, you transform, you do data quality before you load. Now you're talking of ELT and that's where you need quality. You need the linkage of the references. So this is data changes the data and if real time has been the conversation earlier so far today, the context defines the quality. Quality. Exactly. It could be a data, you know, clean environment. I mean, he's shaking your head. One of the reasons we should present it, that we present in my presentation that I did on Monday was on avoiding a data swamp. So we absolutely think, but what we say is, you've already got it. The myth is that you don't have a data swamp. Right, today, which is, oh, we've got my perfect data warehouse and it's got a perfect schema. Really, and what does your business use? Excel spreadsheets. Where do they get the data from? Well, they get it from SAP. They download this and we've got a macro. Somebody wrote in 1998, which means we can't upgrade that desktop from Office 97, right? So that desktop is Office 97, because it's the only one that has a supply chain spreadsheet on it. So the reality is you have it today. I think to the point you said about the country difference, one of the things we've seen, I think, from a sort of a culture difference between Europe and here in the US is the US has been very much the technology pioneer, right? Is the Hadoop stuff, the Spark stuff, all that technology push. European companies are seeing a lot of, have taken quite a while to get into the, Hadoop marketplace, but particularly the larger manufacturers and sort of, I'd say, the more robust like the pharmaceuticals and these large scale organizations are now going all in, but after thinking about it. So what I mean is, is that we've seen sort of lots of POCs used to be like four, five years ago, people doing POCs here in North America, they're very technically centric, and then people are like, okay, I've now got- Cluster, setting up a cluster, doing some basic stuff. Exactly, right. Whereas over in, now in Europe, we've seen more people going, okay, we know where we want to get to, because we've seen all the technology now it works. We're going to start with thinking about the governance and thinking about that. What's the right way to go about this? So I think, from a timing perspective, the thing that was interesting, we felt beginning of last year that we'd sort of begin to see some earlier, state's larger programs in Europe, maybe towards the end of the year. Reality was by the middle of the year, we were seeing very, very large pieces. There was almost a switch that happened. But we've, we vote the, I want to return this notion of data governance because it's really important, and we've said it here today about 20 times. The rules of data governance have been written piecemeal over the past few decades. Started off by saying, is which application owns what data and is the data quality enough so that the application runs or not? Then compliance kind of kicked in and we utilized compliance related rules to write the new rules of data governance. What is data governance in the context of big data? And the reason I asked the question specifically and maybe put some bounds on it, is we're trying to get to a point where the business puts a value on data, treats data as an asset that has a value, and the only way we're going to be able to do that is through governance rules that support it. So what does data governance mean in a big data context? I'll start on that. Do you want to start? Yeah, so the value is really the impact. And I go back to a very simple analogy. People, when you didn't have computers, you had your ledgers, you locked it up in a safe and took the key home. So you protected who had access to your data. You then put it on PCs, but then you gave them access with logins. Then you said, well, I'll tell you what you can do with my data. That was the era of BI, because you had reports, all they could do was print a report. Now, you've given them access to do whatever they want with data. Now, how do you know first thing? The governance aspect is, what are they doing with the data? Where did they get the data from, which they used to come up with that? What is the exposure to your organization? If somebody has, you know, traded around with Libo rates or fixed them or done something else. And live a rate, you know you're from the UK, you don't talk about that. And then you work backwards on lineage. So now I need to know first thing, not just who accesses my data, and I need to know what are they doing with it. I need to know where they got the data for it. Well, I think this is the lineage. But you don't know when they're going to access it and what they're going to do with it at any given time. But I think that's the thing is where we have the, this is where the sort of contention comes in, right? To be honest, between the areas about the value is, from a data management, data governance, those things are all true, right? We need to know those pieces. The other reality is that today, how do you show the business actually that they value the pieces, which is ultimately the outcome. So the piece we're finding on the research and the research we're about to publish soon with Informatica is one of the things it's really finding is that where, when do you get the business to care about governance? And the answer is when you demonstrate an outcome, which relies on having good governance. So if you do a set of analytics and you prove that this is going to improve the effectiveness, the bottom line, the top line, or whatever the firm, and particularly in operation analytics, as I've put it, customer analytics, where they're real measurable numbers, we can save you 6% on your global supply chain costs. But in order to do that, you need a single view of product and parts, which means you need to do a product MDM. Well, that's a very easy way to get the business engaging governments as opposed to we need to do product MDM. So we get a 360 view of the customer. So what? So we're still pricing the value of data based on the outcome. Oh, absolutely. And then presumably at some point I'm going to sum across all those different utilizations and that'll become the true value of the data. Is that what we're talking about here? I think the piece, I'd say in terms of the, if we sum it up, it becomes a challenge because ultimately the business pays. So one of the things I like about the big data stuff and the programs we're doing with large-scale companies is the ability to deliver value to an area. So what we call insight at the point of action. And that's the bit where I pay. So yes, I could sum it up in theoretically and the CIO could say, well, I'm delivering this much value, but it's at those points of action. And if you say to somebody, right, I deliver you $2 million, it costs you $100,000. That's much better than we have to say in totality, this delivers you $2 billion and it costs you $20 million or $200 million. That's an abstract piece, whereas- Except when I'm thinking about investment because I need to be able to appropriate the right set of resources, financial and otherwise, to the data based not just on individual exploitations, but across an entire range of applications, entire range of utilizations, right? I think so, but then again, in terms of the ability to bill and charge if I can, my total is the summation of the individuals. So that's why, I mean, I worked with the CFO once, you had the CIO was in the room and said business case for one of their programs and the CFO said, well, if I took all your business cases and added them together, this company would be twice the size and cost nothing to run, right? So there's been a history in IT of theoretical use cases, right? So what we're seeing, I think, on the data and the outcome side is the fact that particularly on operation analytics, they're absolutely quantifiable outcomes. So while then you can say, well, yes, if you then add this up, we need to make an investment on this base platform, the two things we're finding are because you can use these much more agile technologies, these projects don't take 12 months to deliver first value. So you can, and because the incremental cost of working in a lake environment is so much less, you know, I don't have a 12 month schema change problem. So that's one of the things we're seeing is the ability to say, yes, as a strategy, you're going to spend 20 million or whatever over the next five years on this, but every three months I'm going to prove to you that I've delivered value back because the one thing I've seen on data governance sort of strategic programs historically is, 18 months in, what have you delivered? What have you done for me? Prove that it has value, right, that. We're cutting your team in half. You're forgetting the question. And I think also we, what we're seeing with big data initiatives is the fail fast methodology, like the drug trials and pharma. So what's your ROI on a project? It's actually the sum of all the programs you've run. And you were talking about apportioning the budgets. Whose budget? Because it's now being done by the individual business. That's how it's also integrated. In their own area. So there's no CFO sitting there and saying, well, this is the budget I give IT in, this is how you apportion it. It's all at the point of the business and they find, well, I do all these fail fast programs and I've then hit one, which makes me my big bucks and it's done. I love this concept because you're essentially talking about the horizontal disruption, which is what cloud and data does. Absolutely. Which is fantastic. And I'm sure this is driving a lot of client engagements for you guys. So I got to ask a question on that thread what Jerry Held talked about earlier today. I want to ask you the question he made a comment but I'll turn it into a question to you guys. He said, most CFOs know where their assets are. When you ask them, they go down to the ledger and they go, oh yeah, all the assets here. What's about data? Where are the data assets? The question is, when you go talk to your clients, what do they look at when they say data assets? Because you're bringing up the notion of not inventory of data. I'm sitting around whether it's dirty, clean, you can argue and things will happen but when it gets put to use for a purpose. Peter says data with a purpose, that's what keep on in the area. What's, is there a chief data officer like a CFO role that actually knows what's going on? And probably know, but how do you talk to clients there? I mean, just share some color because this is now a new concept of, okay, who's tracking the asset value? And I think there's two bits in it. I'll start with the asset and then you get specifically a post in L which I think is a great example of what happens with data when it becomes an asset is the ability to understand the totality of data within any non-trivial organization is basically zero because it's not just inside your firewalls. I'd also question the idea that CFOs know where all the assets are. I'm working with a very large manufacturer and after they've sold it, they need to service it and they can't tell you where every asset is because that information now lives within a client. So actually knowing where all of the assets they need to service are, they might know their physical plants and factories are but some of these assets are pretty big things. They don't know necessarily where they are on planet earth. So the piece on data is really to the stage of because it's also external data, right? So really the piece for me about governments and all these ones is do I understand the relationships of these pieces? In terms of the, do I value data as an individual piece? It's because of what I can do with it. Sometimes the data itself is the value but most of the time we're finding in terms of when people describe value it's to the outcome that it's based upon and that's something that's much easier to define than how much is my product master worth? Well, I can't really say that but you know what? I can absolutely say that a 6% reduction in my supply chain cost was because I have a product master but I think post-NL is a great example of what happens when you go the next step on data. Yeah, because you're looking at address data and actually it's not just post-NL. We were talking to another mail company, a postal company where data has said, okay, my addresses are data assets but I have multiple addresses for one person and what they wanted to offer was based on the value of the packages that you get delivered. They wanted to give you a priority or a qualification of the addresses. They said this is a more trustworthy address because anything about 50 pounds this person gets it delivered there. This is a lot of mail. So do you consider the insurance or the value of the packages that you get delivered to be a data asset? Most people wouldn't. They would say, yeah, the address is a asset. That's a data asset but there's a second part to it which you don't even know. So the answer really is yes and no and it all is contextual because in a particular context you could say, yep, I know where everybody lives. I know where every building is and I have all the addresses. So you almost got to do a look back after the outcome and kind of reverse track the data and say, okay, that stream of data. Well, I would say that people who start with, we've had 30 years of trying to say it's the data object that has the value and it's never, ever happened. As soon as we're starting talking about the outcome and then backtracking and going, in order to get this outcome we needed addresses which historically, I'm sure you said, would have been the value but actually it was that plus the analytics of prioritizing them for risk that suddenly, that's a lot more valuable. That outcome of, you know what? This person tends to be here. This area, people seem to see as lower risk. This is where I can therefore look at a work office for those people. It gives you more information. I like that notion of the data swamp turning into data equality because the context, Sherry says, is really key because now if you can move data to context in real time, data in motion, where people call these days the buzzword that's the value when you stumble upon that. That's where you say, whoa, I thought I had bad data. No, actually it's hanging around waiting to be used. That's potential energy as he always says. It's the same thing with Post-Channel. They're moving from being a postal supplier to delivering packages. Now, you know they have a very short window to deliver packages in. So just how do you get to a building? Do you have to go through the back gate? Do you have to call somebody to get it? Now that data becomes valuable because otherwise all their deliveries go off the radar screen, right? Because they've just shot the schedule. I was going to say about the quality. One great example of quality is that we spend a lot of time, say, process data and manufacturing. We'll clean it up before it goes in the reporting structure, which is great. And that gives you really great operational reports. There's now a hand-to-hand business of people doing the digital discovery of processes so they can use the bad data to discover what your processes are and where your operational processes are currently breaking down. You create new processes. If I'd cleaned up the data, they wouldn't be able to do their jobs. And it's this fascinating stuff we're finding a lot with the data science pieces. It's ability to get different value out of data. Chemical reactions, alchemy, it's all the interactions of the data. Now, this is interesting. And I want to ask you guys, I know we have a minute left and I want to have you guys take a minute to explain to the audience Capgemini and how you engage with the customer in context to their progress. Where are your customers on the progress bar of these kinds of conversations? Because we have a nice conversation. I love to go an hour for this, go up, we can geek out. But reality is they got to run a business, right? So, and the tier one system integrators like Capgemini, they all have kind of different differentiation. What do you guys do differently with this area of your practice? How are you engaging with your customers? And where are they on the progress bar of, are they like, while you're talking gibberish to me, are they on board? Where are they? I think we've got a bit of an attitude we've been on this journey a lot longer than most. Like I say, 2011, we're talking actually data governance and big data. You don't talk about that if you haven't been doing it for a while. We were the first systems integrator to announce with Cloud Error, with Pivotal, we're a massive partner with Hornworks. So, most of what's interesting is when people talk about data lakes and some people are thinking that stuff's new, we're talking about the problem of, most of our clients are now looking at the problem of having, we will have multiple data lakes for PII reasons, for operational efficiency reasons, from budget reasons, whatever it may be. We're looking at how do you collaborate beyond the firewall. So I'd say, obviously we've got a continuity of customers, but a lot of our customers are going beyond the stage to which they're worrying about big data within their four walls to the stage of how do I collaborate beyond my four walls? And this for us is the switch on governance and data and what we do is, is the difference between sort of Capgemon and some of our other ones is, so when Shri says he's the global MDM guy and the old data management guy, he actually, his team is in all of the countries. So he has P and L responsibility for that. When I have it for big data, some of my guys are all in the country. You guys are well on the path. You're out implementing the value extraction. Oh, we're in the, I mean, it's really at the stage if we have. Not like kicking tires. We're at the stage of the week. We're not going to be on the kicking tires the long way back. Yeah, 2011. Yeah, exactly. We were kicking tires in 2011. By now was it sort of, Now you're driving the Ferrari and the Audubon, you know, about 90 miles an hour, straight and narrow. Still a lot more work to do, right? Oh, absolutely. There's always a lot more work. Things keep changing and that's, that's the best part of our job. It's what we do next. And that's the point for us is, the reason we're in this is that it's what's next. And I think that people, the reason governance is changing fundamentally is this move towards global collaboration. So the more you look at health exchanges and all of these things, the more people collaborate outside their four walls. That for us is the problem we want to solve next, which is why we're working on industrializing what we now consider the boring stuff, which is building a data lake and doing the internals and the ingestion and those pieces. That's, we're not interested in putting bodies on that. It's about how you solve the next set of problems. Steven, pretty, thank you so much for joining the Cube. Really appreciate it. Good to see you again. And welcome to the Cube. I love my club. Thank you. You made it. Great to have you. Love to do this again and again. I love the context. I love that you guys are on this, you know, data quality at the right time, really right message. Certainly, we think certainly relevant. So thanks for sharing your insights on here and the data on the Cube. Live streaming from San Francisco. You're watching the Cube, we'll be right back. It's always fun to come back to the Cube because...