 From Cambridge, Massachusetts, it's The Cube, covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. Welcome back to Cambridge, Massachusetts everybody. We're here with The Cube at the MIT Chief Data Officer Information Quality Conference. I'm Dave Vellante with my co-host Paul Gillan. Susan Wilson is here. She's the Vice President of Data Governance and she's the leader at Informatica. Blake Andrews is the Corporate Vice President of Data Governance at New York Life. Folks, welcome to The Cube. Thanks for coming on. Thank you. So Susan, interesting title, VP, Data Governance Leader, Informatica. So what are you leading at Informatica? We're helping our customers realize their business outcomes and objectives. Prior to joining Informatica about seven years ago, I was actually a customer myself. And so oftentimes I'm working with our customers to understand where they are, where they going and how to best help them because we recognize data governance is more than just a tool. It's a capability that represents people, the processes, the culture, as well as the technology. Yeah, so you walk the walk and you can empathize with what your customers are going to do. And Blake, your role as the corporate VP, but more specifically the data governance lead. Right, so I lead the data governance capabilities and execution group at New York Life. We're focused on providing skills and tools that enable governance activities across the enterprise at the company. How long has that function been in place? We've been in place for about two and a half years now. So I don't know if you guys heard Mark Ramsey this morning, the keynote, but basically he said, okay, we started with enterprise data warehouse, we went to master data management, then we kind of did this top-down enterprise data model. That all failed. So we said, all right, let's punt to governance. Here you go, guys, you fix our corporate data problem. Now, right tool for the right job, but, and so we were kind of joking, did data governance fail? No, you always have to have data governance. It's like brushing your teeth. But so, I don't know if you heard that, but what are your thoughts on that sort of evolution that he described, the sort of failures of things like EDW to live up to expectations and then, okay guys, over to you. Is that a common theme? It is a common theme, and what we're finding with many of our customers, that they had tried many of, if you will, the methodologies around data governance, around policies and structures, and we describe this as the data 1.0 journey, which was more application-centric reporting to data 2.0, to data warehousing, and a lot of the failed attempts, if you will, at centralizing, if you will, all of your data, to now data 3.0, we'll look at the explosion of data, the volumes of data, the number of data consumers, the expectations of the chief data officer to solve the mis-outcomes, crushing into the scale of, I can't fit all of this into a centralized data repository, I need something that will help me scale and to become more agile. And so, that message does resonate with us, but we're not saying that data warehouses don't exist, they absolutely do, for trusted data sources, but the ability to be agile, and to address many of your organization's needs, and to be able to service multiple consumers is top of mind for many of our customers. And the mindset from 1.0 to 2.0 to 3.0 has changed, from data as a liability to now data as this massive asset. It's value, yeah. And the pendulum has swung, it's almost like the seesaw, where, and I'm not sure it's ever going to flip back, but it is to a certain extent, people are starting to realize, wow, we have to be careful about what we do with our data, but still, go, go, go, but what's the experience at New York Life? I mean, you know, company that's been around for a long time, conservative, wants to make sure, you know, risk averse, obviously, but at the same time, want to keep moving as the market moves. Right, and we look at data governance as really an enabler and a value add activity. We're not a governance practice for the sake of governance, we're not there to create a lot of policies and restrictions, we're there to add value and to enable innovation in our business, and really drive that execution, that efficiency. So how do you do that? Square that circle for me, because a lot of people think, you know, when people hear security and governance and compliance, they think, oh, that stifles innovation. How do you make governance an engine of innovation? You provide transparency around your data, so it's transparency around what does the data mean? What data assets do we have? Where can I find that? Where are my most trusted sources of data? What does the quality of that data look like? So all those things together really enable your data consumers to take that information and create new value for the company. So it's really about enabling your value creators throughout the organization. So data is an ingredient, I can tell you where it is, I can give you some kind of rating as to the quality of that data and its usefulness, and then you can take it and do what you need to do with your specific line of business. You said you've been at this two and a half years, so what stages have you gone through since you first began the data governance initiative? Sure, so our first year, year and a half was really focused on building the foundations, establishing the playbook for data governance and building our processes and understanding how data governance needed to be implemented to fit New York life and the culture of the company. The last 12 months or so has really been focused on operationalizing governance. So we've got the foundations in place, now it's about implementing tools to further augment those capabilities and help assist our data stewards and give them a better skill set and a better tool set to do their jobs. Are you sort of crowdsourcing the process? I mean, you have a defined set of people who are responsible for governance or is everyone taking a role? So it is a two-pronged approach. We do have dedicated data stewards, there's approximately 15 across various lines of business throughout the company, but we are building towards a data democratization aspect. So we want people to be self-sufficient in finding the data that they need and understanding the data and then when they have questions relying on our stewards as a network of subject matter experts who also have some authorizations to make changes and adapt the data as needed. Susan, one of the challenges that we see is that a lot of the chief data officers oftentimes are not involved in some of these Skunkworks AI projects. They're sort of either hidden, maybe not even hidden, but they're in the line of business, they're moving, you know, this mentality of move fast and break things. The challenge with AI is if you start operationalizing AI and you're breaking things without data quality, without data governance, you can really affect lives, we've seen it. These unintended consequences, I mean, Facebook is the obvious example and there are many, many others, but are you seeing that, how are you seeing organizations deal with that problem? Yeah, yeah, as Blake was mentioning, oftentimes what it is about, you've got to start with transparency and you've got to start with collaborating across your lines of business, including the data scientists and including in terms of what they are doing and actually provide that level of transparency, provide a level of collaboration and a lot of that is through the use of our technology enablers to basically go out and find where the data is and what people are using and to be able to provide a mechanism for them to collaborate in terms of, hey, how do I get access to that? I didn't realize you were the SME for that particular component and then also, did you realize that there is a policy associated to the data that you're managing and it can't be shared externally or with certain consumer data sets? So the objective really is around how to create a platform to ensure that anyone in your organization where I'm in the line of business that I don't have a technical background or someone who does have a technical background, they can come and access and understand that information and connect with their peers. So you're helping them discover the data. What do you do at that stage? Oh, well, what we do at that stage is creating insights for anyone in the organization to understand it from an impact analysis perspective. So for example, if I'm going to make changes to as well as discovery, where exactly is my information? And so we have- How do you help your customers discover that data? Through machine learning and artificial intelligence capabilities of our, specifically our data catalog, that allows us to do that. So we use such things like similarity-based matching which help us to identify, doesn't have to be named in miscellaneous text one, it could be named in that particular column name, but in our ability to scan and discover, we can identify in that column is potentially social security number. It might have resided over years of having this data, but you may not realize that it's still stored there. Our ability to identify that and report that out to the data stewards, as well as the data analysts, as well as to the privacy individuals is critical. So with that being said, then they can actually identify the appropriate policies that need to be adhered to alongside with it in terms of quality, in terms of, is there something that we need to archive? So that's where we're helping our customers in that aspect. So you can infer from the data, the metadata, and then with a fair degree of accuracy, categorize it. Exactly. We automate that. Yeah, we've got a customer that actually ran this and they said, you know, we took three people, three months to actually physically tag where all of this information existed across something like 7,000 critical data elements. And basically, after the setup and the scanning procedures, within seconds, we were able to get within 90% precision. Because again, we've dealt a lot with metadata. It's core to our artificial intelligence and machine learning, and it's core to how we built at our platforms to share that metadata, to do something with that metadata. It's not just about sharing the glossary and the definition information. We also want to automate and reduce the manual burden because we recognize with that scale, manual documentation, manual cataloging and tagging, just, it doesn't work, it doesn't scale. And so that's, they're horrible at it. So I presume you have a Chief Data Officer at New York Life, is that correct? We have a Chief Data and Analytics Officer, yes. Okay, and you work within that group? Yes, that's correct. Yes, so that's in the lines of business. Originally our data governance office sat in technology and then our early 2018, we actually re-ordered into the business under the Chief Data and Analytics Officer when that role was formed. So we sit under that group along with a data solutions and governance team that includes several of our data stewards and also some others, some data engineer type roles. And then our Center for Data Science and Analytics as well that contains a lot of our data science teams in that type. So when thinking about some of these, what I was describing to Susan is these Skunk Works projects, is the data team, the Chief Data Officer's team involved in those projects or is it sort of a go run water through the pipes, get an MVP and then you guys come in, how does that all work? We're working to try to centralize that function as much as we can because we do believe that there's value in the left hand knowing what the right hand is doing and those types of things. So we're trying to build those communication channels and build that network of data consumers across the organization. It's hard, right? Because the line of business wants to move fast and you're saying, hey, we can help and they think you're going to slow them down. But in fact, you got to make the case and show the success because you're actually not going to slow them down in terms of the ultimate outcome. Right. And I think that's the case that you're trying to make. And that's one of the things that we really focus on and I think that's one of the advantages to us being embedded in the business under the CDAO role is that we can then say our objectives are your objectives. We are here to add value and to align with what you're working on. We're not trying to slow you down or hinder you. We're really trying to bring more to the table and augment what you're already trying to achieve. Sometimes getting that organization right means everything. Absolutely. Absolutely. How are you applying the governance discipline to unstructured data? That's actually something that's a little bit further down our roadmap. But one of the things that we have started doing is looking at our taxonomies for structured data and aligning those with the taxonomies that we're using to classify unstructured data. So that's something we're in the early stages with so that when we get to that process of looking at or more of our unstructured content, we already have a good feel for there's alignment between the way that we think about and organize those concepts. Have you identified automation tools that can help to bring structure to that unstructured data? Yes, we have. And there are several tools out there that we're continuing to investigate and look at. But that's one of the key things that we're trying to achieve through this process is bringing structure to unstructured content. So the conference, first year at the conference. Yes. Kind of key takeaways, things that have been interesting to you, learnings. Oh, yes. Well, there are the number of CDOs that are here and what's top of mind for them. I mean, it ranges from how do I stand up my operating model? We just had a session just about 30 minutes ago. A lot of questions around how do I set up my organization structure? How do I stand up my operating model so that I could be flexible to write the data scientists to the folks that are more traditional and structured and trusted data? So still these things are top of mind. And because they're recognizing, the market is also changing, too. And the growing amount of expectations, not only solving business outcomes, but also regulatory compliance. Privacy is also top of mind for a lot of customers in terms of how would I get started and what's the appropriate structure and mechanism for doing so. So we're getting a lot of those types of questions as well. So the good thing is many of us have had years of experience in this phase. And the convergence of us being able to support our customers, not only in our principles around how we implement the framework, but also the technology is really coming together very nicely. Anything to add? I think it's really impressive to see the level of engagement with thought leaders and decision makers in the data space. As Susan mentioned, we just got out of our session. And really by the end of it, it turned into more of an open discussion. There were just kind of this back and forth between the participants. And so it's really engaging to see that level of passion from such a distinguished group of individuals who are all kind of here to share thoughts and ideas. Well, anytime you come to a conference and it's sort of an open forum like this, you learn a lot. When you're at MIT, it's like supercharged. Exactly. You feel it when you come on the campus. You feel smarter when you walk out of here. Exactly. Guys, thanks so much for coming on theCUBE. It was great to have you. Thank you for having us. We appreciate it. All right, keep it right there, everybody. Paul and I will be back with our next guest. You're watching theCUBE from MIT in Cambridge. Right back.