 From the SiliconANGLE Media Office in Boston, Massachusetts, it's theCUBE. Now, here's your host, Stu Miniman. I'm Stu Miniman and this is a CUBE conversation from our Boston area studio. Got to dig in to discuss the data catalog and to help me do that, I want to welcome to the program first time guests, Joe Jacantis who's the global head of data management strategy at CLIC. Joe, thank you so much for joining us. Good to be here Stu. All right, so the data catalog, let's start there. People in general know what a catalog is. Well, maybe some of the Millenniums might not know as much as those of us that have been in the industry a little bit longer might have. So start there and help love us, let us. So our thinking is that there are lots of data assets around and people can't get at them. And just like you might be able to go to Amazon and shop for something and you go through a catalog or you go to the library and you can see what's available, we're trying to approximate that same kind of shopping experience for data. You should be able to see what you have. You should be able to look for things that you need. You should be able to find things you didn't even know were available to you. And then you should be able to be able to put them into your cart in a secure way. Okay, so Joe, the step one is, I've gathered, whether it's my data lake or whatever other oil or water analogy we want to use for gathering the data on. And I've got usually analytic tools and lots of things there, but this is a piece of that overall puzzle. Do I have that right? That's exactly right. If you think about what are the obstacles to analytics, there are studies out there that say that less than 1% of analytics data is actually being analyzed. So we're having a trouble with the pipelines to get data into the hands of people who can do something meaningful with it. So what is meaningful? Could be data science. It could be natural language. Maybe if you have an Alexa at home or you just ask it, you ask a question and that information is provided right back to you. So somebody wants to do something meaningful with data, but they can't get it. So step one is go retrieve it. So our attunity solution is really about how do we start to effectively build pipelines to go retrieve data from source. The next step though is how do I understand that data? Catalogging isn't about just having a whole bunch of boxes on a shelf. It's being able to describe the contents of those shelves. It's being able to know that I need that thing. So if you were to go into an amazon.com experience and you say, I'm going on a fishing trip and you're looking for a canoe, it'll offer you a paddle. It'll offer you life jackets. Guides you through that experience. We want data to be the same way. This guided trip through the data that's available to you in that environment. Yeah, so it seems like metadata is something we often talk about, but seems like even more than that. It really is. So metadata is a broad term. If you want to know about your data, you want to know where it came from. I often joke that there are three things you want to know about data. What is it? Where did it come from? And who can have access to it under what circumstance? Now those are really simple concepts, but they're really complex under the covers. What is data? Well, is this private information? Is this person identifiable information? Is it a tax ID? Is it a credit card? I come from TD Bank, and we were very preoccupied with the idea of someone getting data that they shouldn't. You don't want everyone running around with credit cards. How do I recognize a credit card? How do I protect a credit card? So the idea of cataloging is not just available for everything, it's security. So I'm going to give you an example of what happens when you walk into a pharmacy. If you walk into a pharmacy and you want a pack of gum or shampoo, you walk up to the shelf and you grab it. It's carefully marked in the aisles, it's described, but it's public, it's easy to get, there aren't any restrictions. If you wanted chewing tobacco or cigarettes, you would need to present somebody with an ID, would need to say that you were of age, you would need to validate that you were authorized to see that. And if you wanted Oxycontin, you'd best have a prescription. Why isn't data like that? Why don't we have rules that stipulate what kind of data belongs and what kind of category and who can have access to it? We believe that you can. So a lot of impediments to that are about availability and visibility, but also about security. And we believe that once you've provisioned that data to a place, the next step is understanding clearly what it is and who can have access to it so that you can provision it downstream to all these different analytic consumers that need it. Okay, yeah, data security absolutely is front and center, it's a conversation at board levels today. So the catalog, is it a security tool or is it works with kind of your overall policies and procedures? So you need to have a policy. So one of the fascinating things that exist in a lot of companies is if you ask people, please give me the titles of the columns that constitute personally identifiable information. You'll get blank stares. So if you don't have a policy and you don't have a construct, you're hopelessly lost. But as soon as you write that down, now you can start building rules around that. You can know who can have access to what under what circumstances. So when I was at TD, we took very careful, we took care to try and figure out what the circumstances were that allowed people to do their job. If you're in marketing, you need to understand demographic information. You need to be able to distribute a marketing list that actually has people's names and addresses on it. Do you need their credit card number? Probably not. So we started to work through these scenarios of understanding what the nature of data was on a must have basis. And then you don't have to ask for approval every single time. If you go to Amazon, you don't ask for approval to buy the canoe. You just know whether it's in stock, if it's available, and if it's in your area. Same thing with data. We want to remove all of the friction associated with that just because the rules are in place. Okay, so now that I have the data, you know, what do I do with it? Well, so this is actually really an important part of our click story. So click is not trying to lock people into a click visualization scenario. Once you have data, what we're trying to do is to say that discovery might happen across lots of different platforms. So maybe you're a Tableau user. I don't know why, but there are Tableau users. In fact, we did use Tableau at TD. But if you want to provision data and discover things in comparable BI tools, no problem. Maybe you want to move that into a machine learning type of environment. You have TensorFlow. You have H2O libraries doing predictive modeling. You have R and Python. All of those things are things that you might want to do. In fact, these days, a lot of times people don't want analytics and visualizations. They want to ask the questions. Dude, do you have an Amazon Alexa at your house? I have an Alexa and a Google Home. That's right. So you don't want a fancy visualization. You want the answer to a question. So a catalog enables that. A catalog helps you figure out where the data is that asks a question. So when you ask Alexa, what's the capital of Kansas? It's going through the databases that it had that are neatly tagged and cataloged and organized. And it comes back with Tipeka. Yeah. I didn't want to stump you there. Thank you, Joe. Boy, I think back in the world, there are people that, ontological studies as to how I put these things together. As a user, I'm guessing using a tool like that, I don't need to have to figure how to set all this up. There's got to be way better tools and things like that. Just like in the discussion of metadata, most systems today do that for me or at least a lot of it. But how much do I as a customer customize stuff and how much does it do it for me? So when you and I have a conversation, we share a language. If I say, where do you live? You know that living implies a house, implies an address and you've made that connection. And so effectively all businesses have their own terminology and ontology of how they speak. And what we do is if we have that ontology described to us, we will enforce those rules. So we are able to then discover the data that fits into that categorization of data. So we need the business to define that for us. And again, a lot of this is about process and procedure. Anyone who works in technology knows that very little of the technological problems are actually about technology. They're about process and people and psychology. So what we're doing is if someone says, I care deeply and passionately about customers and customers have addresses and these are the rules around them, we can then apply those rules. Imagine that governance tools are there to make laws. We're like the police. We enforce those laws at time of shopping in that catalog metaphor. Wow, Josie, my mind's spinning a little bit because one of the problems you have, if you work for a big customer, you'd have different parts of the company that would all want the same answer, but they'd ask it in a very different way and they don't speak the same language. So does the catalog help with that? Well, it doesn't, it doesn't. I think that we are moving to a world in which for a lot of questions, truth is in the eye of the beholder. So if you think about a business that wants to close the books, you can't have revenue that was maybe three million, maybe four million. But if you want to say, what was the effectiveness of the campaign that we ran last night? Was it more effective with women or men? Why? Anytime somebody asks a question like why, or I wonder if, these are questions that invite investigation, analysis, and we can come to the table with different representations of that data. It's not about truth. It's about how we interpret that. So one of the peculiar and difficult things to, for people to wrap their arm around is, in the modern data world with data democratization, two people can go in search of the same question and get wildly different answers. That's not bad. That's life, right? So what's the best movie that's out right now? There's no truth. It's a question of your tastes, and what you need to be able to do as we move to the democratized world is, what were the criteria that were used? What was the data that was used? And so we need those things to be cited, but the catalog is effectively the thing that puts you in touch with the data that's available. Think about your college research project. You wrote a thesis or a paper. You were meant to draw a conclusion, and you had to go to the library and get the books that you needed. And maybe, hopefully, no one had ever combined all those ideas from those books to create the conclusion that you did. That's what we're trying to do every single day in the businesses of the world in 2019. Yeah, it's a little scary. We know in the world of science that most things don't come down to a binary answer. There's the data to prove it, and what we understand today might not be, if we look and add new data to it, it could change. It would love to bring in some customer examples as to what they're doing, how this impacts it, and boy, I wish it brings more certainty into our world. Absolutely. So I come from TD Bank, and I was the Vice President of Information Management Technology there, and we used data catalyst to catalog a very large data lake. So we had a Hadoop data lake that was about six petabytes, had about 200 different applications in it, and what we were able to do was to allow self-service to those data assets in that lake. So imagine you're just looking for data, and instead of having to call somebody or get a pipeline built and spend the next six months getting data, you go to a portal, you grab that data. So what we were able to do was to make it very simple to reduce that. We usually think that it takes about 50% of your time in an analysis context to find the data, to make the data useful. What if that was all done for you? So we created a shopping experience for that at an enterprise level. What was the goal? Well, at TD, we were all about legendary customer experience. So we found very important were customer interactions and their experiences, their transactions, their web clicks, their behavioral patterns. And if you think about it, what any company is looking to do is to catch a customer in the act of deciding. And what are those critical things that people decide? In a bank, it might be when to buy a house, when you need mortgages and you need potentially loans and insurance. For a healthcare company, it might be when they change jobs. For a hospital, it might be when the weather changes. And everybody's looking for an advantage to do that. And you can only get that advantage if you're creative about recognizing those moments through analytics and then acting in real time with streaming to do something about that moment. All right, so Joe, one of the questions I have is, is there an aspect of time when you go into this? Because I understand if I ask questions based on the data that I have available today, but if I'd asked that two weeks before that, it would be some different data. And if I kept watching it, it would do that. So I've got certain apps I use. It's like, okay, when's the best time to buy a ticket? When's the best time to do that? How does that play in? So there are two different dimensions to this. The first is what we call algorithmic decay. So if you're gonna try and develop an algorithm, you don't want the data shifting under your feet as you do things, because all of a sudden your results will change you if you're not right. And the sad reality is that most humans are not very original. So if I look at your behavior for the past 10 years, and if I look at it for the past 20, it won't be necessarily different from somebody else. So what we're looking to do is to catch massive patterns. What's the power of big data? To look at a lot of patterns to figure out the repeatability in those patterns. At that point, you're not really looking for the data to change. Then you go to score it. And this is where the data changes all the time. So think about big data as looking at a billion rows and figuring out what's going on. The next thing we traditionally call fast data, which is now based on that algorithm, this event just happened, what should I do? That data is changing under your feet regularly. You're looking to stream that data, maybe with a changing to capture tool, like a tunity. You're looking to get that into hands of people in applications to make decisions really quickly. Now what happens over time is people's behaviors change. Only old people are on Facebook now, you know this, right? So the demographics change. And the things that used to be very predictive fail to be. And there has to be a capability in an industry and in an enterprise to be able to deal with those algorithms as they start to decay and replace them with something fresher. All right, Joe, what about how do things like, you know, governance and compliance fit into this? So governance is really at the core of the catalog. You really need to understand what the rules are if you want to have an effective catalog. We don't believe that every single person in a data democratized world should have access to every single data element. So you need to understand, what is this data? How should I protect it? And how should I think about the overall protection of this data and the use of this data? And this is a really important governance principle to figure out who can have access to these data sets under what circumstance. Again, nothing to do with technology, but the catalog should really enforce your policy. And a really good catalog should help to enforce the policies that you're coming up with who should have accessed that data under what circumstance. Okay, so Joe, that's a pretty powerful tool. How do customers measure that they're getting adoption, that they're getting the results that they were hoping to when they rolled this out? So no one ever woke up one day and said, boy, would it be great if I stockpiled petabytes of data, right? At the end of the day, we're looking. I know some storage companies that say that. They wish they would, so the customers would say that. But at the end of the day, you have data for analytics value. And so what is analytics value? Maybe it's about a predictive algorithm. Maybe it's about a visualization. Maybe it's about a KPI for your executive suite. If you don't know, you shouldn't start. So what we want to start to do is to think about use cases that make a difference to an enterprise. So a TD that was fundamentally about legendary customer experience, offering the next best action to really delight that customer. At SunLife, that was about making sure that they had an understanding from a customer support perspective about their consumers. At some of our customers at a healthcare company, it was about faster discovery of drugs. So if you understand what those are, you then start from the analytical outcome to the data that supports that, and that's how you get started. How can I get the data sets that I'm pretty sure are gonna drive the needle and then start to build from there to make me able to answer more and more complex questions? Well, great. Those are some pretty powerful use cases. I remember back in the early Hadoop days, it was like, let's not have the best minds of our time figuring out how we can get better ad clicks, right? That's right. Yeah, it's much easier these days. Well, effectively, what Hadoop really allows you to do, what big data really allows you to do is to answer questions more comprehensively. There was a time when cost would prevent you from being able to look at 10 years worth of history. Those cost impediments are gone. So your analytics can be much better as a result. You're looking at a broader section of data, and you can do much richer what-if analysis. And I think that really the secret of any good analytics is encouraging the what-if kind of questions. So you want in a data democratized world to be able to encourage people to say, I wonder if this is true, I wonder if this happened, and have the data to support their question. And people talk a lot about failing fast, glibly. What does that mean? Well, I wonder if right now women in Montana in summertime buy more sunglasses. Where's the data that can answer that question? I want that quickly to me. And I want in five minutes to say, boy, Joe, that was really stupid. And I failed, and I failed fast. But it wasn't because I spent the next six weeks looking for the data assets. It's because I had the data, got to analysis really quickly, and then moved on to something else. And the people that can churn through those questions fastest will be the ones that win. Very cool. I'm one of those people. I love swimming in the data, always seeing what you can learn. Customers that want to get started, what do you recommend? What are some of the first steps? So the first thing is really about critical use case identification. Again, no one wants to stockpile data. So we need to start to think about how the data is going to affect an outcome. And think about that user outcome. Is it someone asking in natural language a question of an application to drive a certain behavior? Is it a real-time decision? What is the thing that you want to get good at? I've mentioned that TD wanted to be good about customer experience and offer development. If you think about what Target did, there's a notorious story about them being able to predict pregnancy. Because they recognize that there was an important moment. There was a behavioral change in consumers that would overall change how they buy. What's important to you? What data might be relevant for that? Anchor it there. Start small. Go start to operationalize pipes that get you the data that you need and encourage a lot of experimentation with these data assets that you've got. You don't need to create petabytes of data. Create the data sets that matter and then grow from use case to use case. One of our customer SunLife did a wonderful job of really trying to articulate seven or eight key use cases that would matter and built their leg accordingly. First, it was about customer behavior. Then it was about employee behavior. If you can start to think about your customers and what they care about, there's a person out there that cares about customer attrition. There's a person out there that cares about employee attrition. There's a person out there that cuts costs about cost of delivery of goods. Let's figure out what they need and how to use analytics to drive that and then we can start to get smart about the data assets that can really cause that analytics to explode. All right, well, Joe really appreciate all the updates on catalogs there. Data at the center of digital transformation for so many customers and illuminating some key points there. Happy to be here. All right, thank you so much for watching theCUBE. I'm Stu Miniman.