 From the campus of MIT in Cambridge, Massachusetts, it's theCUBE, covering the MIT Chief Data Officer and the Information Quality Symposium. Now, here are your hosts, Stu Miniman and Paul Gillan. We're back. This is theCUBE live at the MIT CDOIQ Information Quality CDO Conference here in Cambridge, Massachusetts. The first of two days of live streaming, we're joined now by Barbara Latulip, who is the Chief Data Governance Officer at EMC. I can't say I've run into a lot of people with the Chief Data Governance Officer title. How does that relate to the Chief Data Officer, or are they one and the same? I think they're virtually one of the same. So, by default, we cover master data management, data quality and data governance and try and drive the value of the data for the business. So I'd say the reason it doesn't have the Chief Data Officer is analytics is still managed by a lot of the analytics officers, as well as security. So we have a global security office. So we have a strong partnership with them, but I really view that we're an enabler to those teams and have to have that strong partnership. So that's why our scope right now is data governance. I think we're maturing to really want to build out a formal Chief Data Officer role and kind of bring some of that together. I should point out that you're here as a practitioner, you're not on the product side, you're actually working with data within EMC, with the strategic use of data within EMC. So where do you sit in the org chart? Well, this is an interesting question. So I started EMC in the IT office. So I worked with the Office of Innovation and Architecture, so more traditional, I'd say enterprise architecture role to lay out a roadmap and drive data governance and the information management practices. And really, I'm a strong advocate that data governance belongs in the business. They are the closest to how data is used in the business process. So I think IT organizations really see the need upfront, especially with data integration, data migration projects, to kind of stand one up if it doesn't already belong in an organization. But at my last three companies, I was very successful in handing it off and over to the business. And I feel it has much more success when it's business driven, because then you can tie it to business value cases and also drive it to performance metrics, as well as what's the strategy the business is trying to achieve? Yeah, so Barbara, we've heard the last few years of this show, is CDO, is that a separate role? Does it report to the CEO? Is it a threat to the CIO? Can IT really innovate? I'm curious your thoughts on that whole dynamic. I think ideally the role of a CDO should be at an executive level, right? Because we've all listened at this conference and I've personally experienced it. Unless you have leadership buy-in and executive sponsorship, it's really hard to drive data governance initiatives. It's really a culture change, right? So you're trying to drive ownership around critical data elements. You're trying to drive accountability. And more important for us, you're trying to drive metrics, right? We also went live with the data lakes. So we're running quite a few big data value cases on the lake. I think you really need to have that at the business executive level with full leadership buy-in from the top down. Yeah, could you help us kind of unpack? You talked about the data lake, traditional data analytics and big data. How does that kind of interact with the whole governance discussion that you're having? And so the journey at EMC, we started with traditional data governance. I'd say it was more focused around SAP or ERP implementations. And then we kind of matured up through data quality initiatives and then MDM hubs with what we call data services. And so I was asked to take on by the executive sponsor in the business, who's the president of global services, can you drive governance for the lake? So I think the biggest challenge, and certainly managing a data scientist team, is how can I find my data in a lake? So we put together a framework and operating model to enable the business to understand what data is available to use in the lake. So and I think that's the big part is putting the business semantics around metadata and enabling the transparencies for the organization to consume the data, to drive those big data value cases. Now in the days of structured data, we used a data dictionary to help people what data was there and what it was. But these days data is pouring in from all kinds of different places, structured and unstructured. So how do you provide that cataloging function when you don't really know what data is coming in? So it's been very interesting and I'm very passionate about metadata. So we also talk in metadata's generic term. So I can tell you from the IT side when we talk metadata, they're talking very much about the technical characteristics of a table. On the business side, what we're trying to drive with our catalog and we brought up what's called an information marketplace. And that is a formal catalog where we're asking a crowd sourcing our business users to go put in information about how the business uses that data. So now we're truly able to marry up the business metadata to what we consider the technical metadata. So upon ingestion of data into the data lake, they follow our formal process and we have brought up and again mentioned several tools. We use a Cleber catalog integrated with Informatica, data quality that they actually have helped us portal they go to when they request a new ingestion of a data set. We ensure it's governed in addition, IT starts the provisioning process. So I'd like to call it perhaps my eBay experience that if you have a good catalog, you can search and find the data you need to consume. And then the delivery of that is done by a different organization. And together we classify the data, make sure there's the right security and authorization access around that. So just liberating the data in the lake doesn't mean everyone has access to it either, right? Yeah, well, one of the comments in the morning keynote that had everybody kind of chuckling, it was like, you know, it's 9 a.m. Do you know where your data is? Do you know who owns it? So, you know, who owns the data now inside the organization? The business definitely owns the data. So what I always find interesting is I don't think in organizations anyone's gonna put their hand up and say, I'm a data steward, right? And as well as governance, governance, I'm always asked, are you risking compliance? And I like to say no, we're focused on data quality and ownership, right? The business semantics. So right now we are mobilizing the data stewards. I like to call them knowledge engineers. So I'm shifting away from, I'd say, more data governance to information value, even trying to rebrand that as an information value office, as well as reaching out to what I would call the business trustees, which, you know, are data owners really, that who can you trust knows the most about the information, as well as really enabling and mobilizing that data stewardship workforce in the business. So we're very much a federated model, so we have a data governance office. And that reports to the Vice President, Carolyn Muse, of the Total Customer Experience. But the actual, I'd say, tactical and operations still reside in the business. We connect the right people together to resolve issues. You're talking today, your session name mentions the concept of value, of putting a value on data. How do you do that? Yeah, great question. So we're working also with our CTO office, with Steve Todd, and we're looking at algorithms about how can you wait, how important the data is and the payback it has to the business. Internally, we're also looking and trying to scan which data sets are the most commonly used, so we can try to govern those and make sure that they're available and actually even propose to new data science projects that they might want to also have access to high valuable data sets. So we're tackling it from two perspectives from algorithms and to what they can pay back to the business as well as what do you govern. So we want to, can't govern everything in a data lake, so we want to focus on what we call the critical data elements or what we consider high value data sets to enable the business. Now, when you say valuation, are you talking about an actual dollar value or is this a relative valuation to other types of data? Right, I think really we're trying to drive a natural dollar valuation, right? So we're really trying to understand, can you place an actual dollar value and how can you relate that back to the business strategy? All right, so I'm curious, how do you sort through what can be the counterbalancing, interacting, you've got security and you've got accessibility and when you look at data overall, how do you balance those opposing needs? And that has been challenging for every group, right? Especially when you go to a data lake then certainly when there's more focus on big data, we have a lot of big data sets coming in, for instance like dial home data unstructured, I know marketing certainly looks at a Twitter data and from even a customer experience, we want to understand what our customers are saying about EMC and trying to make that experience better, right? So our governance councils I think have a nice blend, I'm gonna say privacy and federal regulations as well as legal and security. So when we move forward, we can tag those assets in a relevant meeting way to make sure that we understand how they can be used across the business processes. All right, and what about data lifecycle? How does that fit into the whole discussion? So certainly we wanna make the data relevant, right? So we are looking with our data lake trying to go through that whole process of how much is it being consumed? What is the timeliness of data coming into the lake? And then certainly offloading data that's not needed. So the governance council, and we started up what's interesting is an analytics innovation leadership team as well as a more specific data lake governance councils that help set some of these policies. When do you onboard, what I'm called data sharing agreements as well as offload data in the lake? You talked about information, about owning information. It's hard to get people to take ownership of information. It can be a dirty job. How do you convince people that that information is theirs to take care of? Yeah, and interesting enough, I think there's two sides to that coin, right? So if you're a business owner, you also run into the semantics that they don't want to share their data, right? So the data lake doesn't break down. I'd say some of those functional silos. So if you're a business owner, ultimately you have the ability to say who can have access to the data. So that's why we're trying to go more to the term of trustee. So within EMC we say information is a corporate asset, right? And that it's owned by EMC and should be exposed as relevant and to who has access aligned with the business projects I would say. So what we do is we leverage our executive steering committee. So I will go to the vice president level and ask them to support naming say a process owner. So right now we're going through a large initiative to say who owns quote to cash and then trying to align the business functions with that to drive date ownership. Within our information marketplace catalog, we actually assign a data steward's name and a business trustee name. So we're actually tagging that to each of the critical data elements. So Barbara, you've been involved in this through a few companies and a few roles you're familiar with the community. I want to give you the last word, you know, what are some of the big challenges that you're seeing? What progress are you excited about that we've made? I think certainly the challenges isn't with the technology, right? It's all around, I would say, engaging and changing the culture which does need strong executive sponsorship. It's about the ability to bring several groups together and drive to consensus, right? So that also takes a different skill and focus on what's really key, I think to keep your project going is you need metrics, right? You have to measure and monitor and we've had a lot of success by showing good data quality and we always drive process improvement so we partner back with the Lean Six Sigma team and we're able to show tangible value to the business for both cost and an opportunity through measuring good data quality. Well, you're clearly having success. Barbara recognized as one of Computer World's premier 100 technology leaders for 2016. That's quite an honor. Thank you, thank you. Thanks so much for joining us here today. It's been a pleasure. We'll be right back. Secure. The YouTube thumbnail.