 From Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back everyone. Live here in New York City, it's theCUBE's presentation of Big Data, NYC, our fifth year doing this event. In conjunction with Strata Data, formerly Strata Hadoop, formerly Strata Conference, formerly Hadoop World, we've been there from the beginning. Eight years covering Hadoop's ecosystem now, Big Data. This is theCUBE. I'm John Furrier. Next guest is Santosh Manhandarin, who's the Global Head of Technology Analytics at Standard Chartered Bank Practitioner. In the field here, getting the data, checking out the scene, giving a presentation on your journey with data at a bank, which is Big Financials, who's obviously an adopter. Welcome to theCUBE. Thank you very much. Thank you. So we always want to know what the practitioners are doing because at the end of the day, there's a lot of vendors selling stuff here. So you guys, everyone's got their story. At the end of the day, you've got to implement. That's right. One of the themes is the data democratization, which sounds warm and fuzzy, collaborating with data. All this is all good stuff and you know, you feel good and it's moving to the future. But at the end of the day, it's got to have business value. That's right. And as you look at that, how do you look at the business value? Because you want to be in the bleeding edge. You want to provide value and get that edge operationally. That's right. Where's the value in data democratization? How did you guys roll this out? Share your story. Okay, so let me start with the journey first before I come to the value part of it, right? So data democratization is an outcome, but the journey has been something that we started three years back. So what did we do, right? So we had some guiding principles to start a journey. The first was to say that we believed in the three SS, which is speed, scale, and it should be really, really flexible and super fast. So one of the challenges that we had was our historical data warehouses was entirely becoming redundant. And why was it? Because it was RDBMS centric, and it was extremely disparate. So we weren't able to scale up to meet the demands of managing huge chunks of data. So the first step that we did was to reapivate it to say that, okay, let's embrace Hadoop. And what do you mean by embracing is just not putting in a data lake, but we said that all our data will land into the data lake. And this journey started in 2015. So we have close to 80% of the bank's data in the lake and it is end-of-day data right now. And this data flows in on a daily basis and we have consumers who feed off the data. Now coming to your question about- So the data lake is working. The data lake is working, up and running, up and running. So that's- And people like this good spot that was like a batch model of everything in the lake. Yeah, so it's not real time. It is end-of-day. There is some data that is real time, but the data lake is not entirely real time. So that I have to tell you. But one part is that the data lake is working. Second part to your question is how do you actually monetize it? Are you actually getting some value out of it? And I think that's where tools like Baxata has actually enabled us to accelerate this journey. So we call it data democratization. So the best part is it's not about having the data. We want the business users to actually use the data. So typically, data has always been either delayed or denied in most of the cases to end users. And we have end users waiting for the data, but they don't get access to the data. That has become- It was done because primarily the size of the data was too huge and it wasn't flexible enough to be shared with. So how do tools like Baxata and the data lake help us? So what we did with data democratization is basically to say that, hey, we'll get end users to access the data first in a fast manner, in a self-service manner, and something that kind of gives operational assurance to the data. So you don't hold the data and then say that you're going to get a subset of data to play with. We'll give you the entire set of data and we'll give you the right tools which you can play with. Most importantly, from an IT perspective, we'll be able to govern it. So that's the key about democratization. It's not about just giving them a tool, giving them all data and then say, go figure it out. It's about ensuring that, okay, you've got the tools, you've got the data, but we'll also govern it so that you obviously have control over what they're doing. So now- You govern it, they don't have to get involved in the governance. No, they don't need to, yeah, they have access. So governance works both ways. So we kind of establish the boundaries. So look at it as a referee and then say that, okay, there are guidelines that you don't, and within the data sets that the people have access to, you can further set rules. Now, coming back to specific use cases, I can talk about two specific use cases which actually helped us to move the needle. The first is on stress testing. So being a financial institution, we typically have to report various numbers to our regulators, et cetera. The turnaround time was extremely huge. These kinds of stress testing typically involved taking huge amounts- Most of them are the turnaround times. So normally it was two to three weeks, some cases a month. So we were able to narrow it down to days. But what we essentially did was there was, as with any stress testing or reporting, it involved taking huge amounts of data, crunching them, and then running some models and then showing the output. Basically a number of transformations involved. So earlier, you first couldn't access the entire data set. So that we kind of solved by- So Jack, that was good step one. That was step one. But was there automation involved in that, the tax audits piece? Yeah, so I won't say it was fully automated end to end, but there was definitely automation, given the fact that now you've got tax audits to work off the data, rather than someone extracting the data and then going off and figuring what needs to be done. So the ability to work off the entire data set was a big plus. So yeah, stress testing bringing down the cycle time. The second one use case I can talk about was again anti-money laundering and in our financial crime compliance space. We had processes that took time to report, given the clunkiness in the various handoffs that we needed to do. But again, empowering the users, giving the tool to them and then saying, hey, this actually- How about know your user? Because you have to anti-money laundering, you need to have the know your user piece. That's all set there too? Yeah, yeah, yeah. So the good part is know the user and know your customer. So KYCs, all that part is set. But the key part is making sure that the end users are able to access the data much more earlier in the life cycle and are able to play with it. So in the case of anti-money laundering, again, first question of three weeks to four weeks was shortened down to a question of days by giving tools like Paxata again in a structured manner and which we're able to govern. You control this, so you knew what you were doing, but you let their tools do the job. Correct. So look at it this way. Typically, the data journey has always been IT-led. It has never been business-led. If you look at the generations of what happens is you source the data, which is IT-led, then you model the data, which is IT-led, then you prepare and then massage the data, which is again IT-led, and then you have tools on top of it, which is again IT-led. So the end users get it only after the fourth stage. Now, look at the generations that then. So all these life cycles, apart from the fact that you source the data, which is typically an IT issue, the rest need to be done by the actual business users. And that's what we did. That's the progression of the generations in which now we are in the third generation, as I call it, where our role is just to source the data and then say, yeah, we'll govern it in a manner. And then preparation. It really is an operating system. And we were talking with Aaron with Elation, the co-founder. We were using the analogy of a car, how this show was like a car show, engine show. What's in the engine and the technology? And then it evolved every year. Now it's like we're talking about the cars, then now we're talking about driver experience. That's right. So at the end of the day, you just want to drive. You don't really care what's under the hood. You do, but you don't. But yeah, there's people who do care what's under the hood. So you can have best of both worlds. You've got the engines, instead of the infrastructure. But ultimately you, as in the business side, you just want to drive, right? So that's kind of what you're getting at here. That's right. So there's time to market and speed to kind of empower the users to play around with the data rather than IT trying to churn the data and kind of confine access to data. That's the thing of the past. So we want more users to have faster access to data, but at the same time govern it in a seamless manner. The word governance is still important because it's not about just seeing the data. And seamless is key. Seamless is key. Because if you have democratization of data, you're implying that it's community-oriented, means it's available. That's right. With the access privileges all transparently, or abstracted away from the user. Absolutely. Okay, so here's the question I want to ask you. So there's been talk, I've been saying it for years, going back to 2012 that an abstraction layer, a data layer will evolve and that'll be the real key. And then here, in this show, I heard things like intelligent information fabric that is business-consumer-friendly. Okay, it's a mouthful, but intelligent information fabric, in essence, talks about an abstraction layer. That's right. That doesn't really compromise anything, but gives some enablement, creates some enabling value. That's right. For software. How do you see that? So as the word suggests, it's the earlier model was trying to kind of build something for the end users, but not which was end user-friendly. Meaning to say, let me just give you a simple example. You had a data model that existed. Historically, the way that we have approached using data is to say, hey, I've got a model, and then let's fit our data into this model without actually saying that, does this model actually serve the purpose? You kind of abstracted the model to a higher level. The whole point about intelligent data is about saying that, hey, I'll give you a very simple analogy. Take zip code. Zip code in US is very different from zip code in India. It's very different from zip code in Singapore. So if I had the ability for my data to come in, to say that, hey, I know it's a zip code, but this zip code belongs to US. This zip code belongs to Singapore, and this zip code belongs to India. And more importantly, if I can further rev it up a notch, if I say that this belongs to India and this zip code is valid, look at where I'm going with intelligence then. So that's about, sir, if you look at the earlier model, you should say that, yeah, this is a placeholder for zip code. No, that makes sense. But what do I do with it? But- Yeah, in a relational database model, that's a field and a schema, you're taking it and strapping and creating value out of it. Precisely. So what I'm doing is accelerating that option. I'm making it more simpler for users to understand what the data is. So I don't need to, as a user, figure out, I got a zip code, now is it a Singapore, India, or what zip code? Okay, so all this automation, Paxata's got a good source, we'll come back to the Paxata question in a second. I do want to drill down on that, but the big thing that I've been seeing at the show, and again, Dave Vellante, my partner, co-CEO, still an angle, we always talk about this all the time. He's more less bullish on Hadoop than I am, although I love Hadoop. I think it's great, but it's not the end all be all. It's a great use case. We were critical early on, and the thing we were critical on was, it was too much time being spent on the engine and how things are built, not on the business value, so there's a low period in the business where it was just too costly. The total cost of ownership was a huge, huge problem. So now today, how did you deal with that? And are you measuring the TCO or total cost of ownership? Because at the end of the day, time to value, which is, can you be up and running in 90 days with value, and can you continue to do that? And then what's the overall cost to get there, thoughts? So look, I think TCO always underpins any technology investment. If someone said I'm doing a technology investment without thinking about TCO, I don't think he's a good technology leader. So TCO is obviously a driving factor, but TCO has multiple components. One is the TCO of the solution. The other aspect is TCO of what my value I'm going to get out of the system. So talking about what from an implementation perspective, what I look at as TCO is my whole ecosystem, which is my hardware, software. So you spoke about Hadoop, you spoke about RDBMS, is Hadoop cheaper, et cetera. I don't want to get into the debate of cheaper or not, but what I know is the ecosystem is becoming much, much more cheaper than before. And when I talk about ecosystem, talking about RDBMS tools, I'm talking about Hadoop, I'm talking about BI tools, I'm talking about governance, I'm talking about this whole framework becoming cheaper. And it is also underpinned by the fact that hardware is also becoming cheaper. So the reality is all components in the whole ecosystem are becoming cheaper. And given the fact that software is also becoming more open sourced and people are open to using open source software, I think the whole question of TCO becomes a much more pertinent question. Now coming to your point, do you measure it regularly? I think the honest answer is I don't think we are doing a good job of measuring it that well, but we do have that as one of the criteria for us to actually measure the success of a project. So the way that we do is our implementation costs. At the time of writing out our PEDs, we call it PEDs, which is the project execution documents, we talk about costs. We say that what's the implementation cost? What are the business cases that are going to be an outcome of this? I'll give you an example of our anti-money laundering. I told you we reduced our cycle time from a few weeks to a few days. And that in turn means the number of people involved in this whole process, you generally have, you're reducing the overheads and the operational folks involved in it. So that itself tells you how much we're able to save. So yeah, definitely TCO is there and to say that are you measuring- And you're mindful of it, this is what you look at, it's key, right? TCO is something that's on your radar, 100%. You evaluate that into your deals, right? Yes, we do. Paxata, what's so great about Paxata? Obviously you've had success with them. You're a customer, I mean what's the deal? Was it the tech? Was it the automation? The team? What was the key thing that got you engaged with them? Or specifically, why Paxata? Yeah, so I think, look, I think a key to partnership, they cannot be one ingredient that makes a partnership successful. I think there are multiple ingredients that make a partnership successful. We were one of the earliest adopters of Paxata. We started, given that we are a bank and we have multiple different systems and we have a lot of manual processing involved, we saw Paxata as a good fit to kind of govern these processes and kind of ensure that at the same time, users don't lose their experience. So the good thing about Paxata that we like was obviously the simplicity and the look and feel of the tool. That's number one, right? Simplicity was a big point. The second one is about scale. The fact that it can take in millions of rows. It's not about just taking, working off a sample of data. It can work on the entire dataset. That's very key for us. The third is to leverage our ecosystem. So it's not about saying, okay, you give me this data, let me go figure out what to do and then Paxata works off the data lake. So the fact that it can leverage the lake that we built, the fact that it's a simple and self-preparation tool which doesn't require a lot of time to bootstrap, so induce people like you. So it makes it usable. It's extremely user-friendly and usable in a very short period of time. And that helped with the journey. That really helped with the journey. Centos, thanks so much for sharing. Centos, Mander Einen who's the global tech lead at the analytics of the bank, at the standard chartered bank. Again, financial services, always a great early adopter and you got success under your belt. Congratulations. Data democratization is huge and again, it's the ecosystem. You got all that anti-money laundering to figure out. You got to get those reports out. A lot of heavy lifting. That's right. That's right. So thanks so much for sharing your story. Thank you very much. This is theCUBE, more coverage after this short break. I'm John Furrier, stay tuned. More live coverage in New York City. This is theCUBE.