 And here we go. Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We'd like to thank you for joining the current installment of the Monthly Data Diversity Webinar Series, Real World Data Governance with Bob Siner. Today Bob will be discussing glossary dictionaries and catalogs result in data governance. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle of your screen for that feature. And for questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag RWDG. And if you'd like to engage more with Bob and continue the conversations after the webinar, you can go to dataversitycommunity.dataversity.net. As always, we will send a follow-up email within two business days, containing links to the slides, the recording of the session and additional information requested throughout the webinar. Now, let me introduce to you our speaker for the series, Bob Siner. Bob is president and principal of KIK Consulting and Educational Services and the publisher of the data administration newsletter, TDAN.com. Bob has been a recipient of the Damon Professional Award for significant and demonstrable contributions to the data management industry. Bob specializes in non-invasive data governance, data stewardship, and metadata management solutions. And with that, I will go to the floor to Bob to get today's webinar started. Hello and welcome. Hi, Shannon. Hi, everybody. Thank you so much, everybody, for taking time out of your schedule to sit with us on this webinar today. It's an important topic that I know a lot of people are interested in, and I'm looking forward to sharing my thoughts with you regarding how the building of business glossaries, data dictionaries, data catalogs can actually result in data governance. So a lot of organizations think that those are data governance activities, but the fact that you already have maybe have already built glossaries and dictionaries and catalogs may mean that there's already some resemblance of data governance taking place in your organization. So let's get started. Got a lot of stuff to share with you today. Before I get started, I just wanna briefly touch on a few of the things that are coming up in a couple other ways that you can learn more about the data governance industry. Of course, there's this real-world data governance webinar series once a month, third Thursday at 2 p.m. Eastern time. Next month, I'm gonna be talking about achieving data quality with data governance. And I have a special guest for that webinar. It's gonna be Anthony Oglin. Anthony and I will also be speaking at Enterprise Data World coming up in almost a month from now. I also allude to the idea of non-invasive data governance quite often. And if you're interested in more information about the non-invasive data governance book, there is information about how you can track that down. I'll be speaking at a couple of data university events coming up in the very near future. The one that I already mentioned, Enterprise Data World in San Diego in March. I'll also be speaking at the DGIQ Data Governance and Information Quality Conference, also in San Diego, which will be in June of 2020. So we're a couple months after the EDW event. I have two online learning plans available through the Data Diversity Training Center. One is on non-invasive data governance, and the other is on non-invasive metadata governance. And I'll talk a little bit about that today. We certainly know the data is not going to govern itself. Well, the same thing holds true for the metadata or the documentation for your data. It's not going to govern itself. There needs to be accountability, and people have to have responsibility for capturing and collecting and distributing the metadata. As Shannon mentioned, I also am the publisher of the Data Administration Newsletter. A new issue was just published yesterday, and you can go and find that at tdan.com. All of the information since the beginning of the publication is still available out there, so please check the archives for that. And last but not least, there's KIK Consulting and Educational Services, my consulting company founded kikconsulting.com, and it is the home of non-invasive data governance. So the things that I'm going to share with you today in the webinar is we want to talk about these three specific assets that are important to your organization, the business glossaries, data dictionaries, data catalogs. They all have different meanings to different companies. I'll share a few of them that I use or that I kind of follow. But we'll talk about how the glossaries, dictionaries, and catalogs can add value to your organization. Then we'll talk about, and I get this question a lot, what should we include in each of these assets and who are the people in the organization that actually have responsibility, might actually be the stewards of the metadata that you're putting into these resources? We'll talk about when these assets become valuable to your organization. And then the last subject for today will be to talk about where the discipline associated with these assets result in data governance. So where can you look for existing levels of data governance where you are already working on these and different types of data resources for your organization? Before I get started, I always like to talk, share a couple of definitions for you. So I've got a couple definitions that I wanna go over quickly. The first one is my definition of data governance. And I define data governance as the execution and enforcement of authority over the management of data and data-related resources. Now, a lot of people think that those words are worded quite strongly and they try to temper them a little bit. They say that execution and enforcement of authority is worded too strongly for their organizations. Actually, I think that you have to word it strongly. In fact, at the end of the day, no matter what approach you take, whether you take a non-invasive approach or another approach, the goal is to execute and enforce authority over that data. The definition I use for data stewardship and then in turn for the data steward is that data stewardship is the formalization of accountability over the management of data and data-related resources. And you might have heard me say before that potentially everybody in the organization is a steward of the data. If they're being held formally accountable for how they define and produce and use data across the organization. So for example, you're not gonna point in half of a room and say, you guys need to protect this data that has to be protected. That's classified as confidential. And you folks on the other side of the room, you don't have to do that. No, the fact is that anybody that uses that data and they can't really opt out to this, they're a data steward. They have to follow the rules. Our job as data practitioners is to share the rules with these people so that they're well versed in how they handle the data, how they define the data, how they produce the data. So I like those two definitions, the execution and enforcement of authority and the formalization of accountability. I think those two in kind of a nutshell defined with the non-invasive data governance approach is all about. So my definition of non-invasive data governance certainly follows suit. It is basically the practice of applying the formal accountability and behavior using non-invasive roles and responsibilities. And I'll be talking about that at Enterprise Data World. And typically in the non-invasive approach, you're gonna be applying governance to process first. And if you don't have the process as defined, then by all means define and formalize the processes. But in order to take a non-invasive approach, I suggest that we don't go out and create all new processes when processes already exist. So we're gonna apply governance to the processes to assure that whatever you're trying to do with data governance that makes sure that the definition, production and usage of the data assures whatever you're trying to do, assuring regulatory compliance, security, privacy and those types of things. So basically non-invasive really describes the approach or how we're applying data governance to the organization. Again, the goal is always to execute and enforce authority, but it really depends on how do you wanna go about doing this in your organization? People are already busy, people have day jobs. If we're gonna throw a lot of extra work at them, then certainly they're gonna feel like it's very invasive. So my idea is that we wanna be as transparent, as supportive and as collaborative as we can be in our approach to governing data within our organization. I know that's a lot of definitions so far, but there's three more that I have to hit with you during this session. And you know what, you might have other definitions for these things, but what I wanna do is I wanna share definitions that you can actually find at dataversity.net. Dataversity has a lot of resources, has what are classes, what is this and what is that? And you can find a lot of this information. But if you look at this and you look at the definitions that I'm providing for these resources I'm gonna talk about today, the words that I'm underlining in the sentences are really, or in the definitions, really focus on what is the goal of each of these different assets. So for a business glossary, we're looking to share internal vocabulary within an organization. Make certain that people are calling the same thing the same thing, that they're defining things the same way. In a data dictionary, typically a data dictionary is related to a specific data resource in your organization, whether it's a data warehouse or a data lake or even an application or just a database. It's a description of the data in business terms. And it also includes information about the data such as things like the data types, the details of the structure and the security restrictions on that data. Now data catalog, the definition of data catalog means a lot of different things to a lot of different people. In fact, I had somebody suggest to me the other day that a data catalog is the same thing as a metadata repository. But we're gonna talk about that a little bit later in the session today about how these things differ, how they're the same. There are a lot of similarities, but I wouldn't necessarily call them the same thing. But a data catalog basically informs people within your organization about what data is available to them, what metadata about that topic of data or about that resources available. And it's different from a data dictionary in the fact that you can use that to search for and retrieve different information that's specific to what you need in the organization. And as I stated before, if you look at the data diversity, what is series of articles and online resources, they're an excellent resource for people that are looking to get firm definitions as to what each of these different types of assets are. So one thing that you'll find that I speak about a lot and I'm gonna speak about this a lot in today's webinar is that when it comes down to it, there are basically three different actions that people can take with data. They can define data, they can produce data, they can use data, they can do two of the three or they can do three of the three. And potentially everybody in the organization does one of these things or at least one of these things. And so thinking back to what I just said a little earlier about the idea that the data will not govern itself. Well, that's true. There's not some magic that's going to occur and your data is gonna become consistent and it's gonna become higher quality. It's going to take work, it's going to take effort and a lot of it really depends on how are we applying this to our organization? So how do we get people engaged? How do we tell them that you're already doing this and that you can do it better? The idea is that building these assets, the glossary, the dictionary and the catalogs will actually enable a preferred method of data definition production and usage of the definition production and usage of these important data assets and the data resources that people are going to to get their data, it helps to improve the understanding of it, the value of it and we'll talk about that a little bit more as we move through the session today. So the first topic that I wanna really address with you today is how do these different tools, the glossary, the dictionary, the catalogs, how do they add value to the organization? And so I'm gonna start with just a series of five different thoughts here about how these tools add value to your organization by improving the things that you see on your screen, improving people's understanding of the data and information and I don't wanna talk too much about the difference between data and information. Some organizations use those terms interchangeably. My idea is that data plus metadata, so you're adding context to the data becomes information. When you see a number, you don't know what that number is until somebody tells you what that number represents. So data plus metadata equals information. It helps to improve the value of data in your organization by improving the knowledge of the people have of what data is available in the organization and who's responsible for that data. Who defines, produces and uses that data across the organization? I often share a tool I call common data matrix which helps you to start to collect and inventory your data assets. Now we're looking at improving people's confidence in the data. One of my clients recently defined the goal or the purpose of their data governance program but to improve the confidence that people have in strategic data of the organization. So that's certainly one of the ways it's gonna add value. It's gonna help by these resources are also gonna help to improve the organization by improving your discipline associated with those three actions that people can take with data. And then we're also gonna talk about the ability to leverage the data to improve business outcomes because that is the end game that we're all focusing on today. So let's quickly go through each of these things and I'm gonna provide a whole bunch of bullets for you in this webinar today. And I hope that you'll use this as a reference and go back to it and see what I meant by or see what I stated about how this is going to improve the organization. So let's first focus on the understanding of the data and information. Well, the terminology that we're providing at the business glossary level of the organization provides that semantic consistency in what we're referring to things as within the organization. So what is a customer? How many customers do we have? Well, a lot of it depends on how you define customer. You know, how you define product, how you define vendor, how you define student depending on what your business is. And then what the naming of the data and the naming of the data elements themselves within the data dictionaries help people to reference the data. So it certainly adds value in the fact that people know what specific data there is about that data within that data resource. It explains specifically what the data is that people are accessing. It provides the attributes and those characteristics that are really important when people are required to use the data. When people are trying to merge data together or break data down, it's very important to have a listing, have a record of what the characteristics of the data are. So people who are going to use it are going to actually learn more and have improved understanding of the data. It inventories lists and available data. It provides things like lineage where people want to know where did the data come from? What did you do to the data when you moved it from its native sources into the data warehouse, into the data lake, into the ERP system that you're using? And also, in the catalog oftentimes companies and organizations will complete report lists. Lists of reports that are available. Who's going to receive them? Where are they going to get them from? When are they going to be done and how can people get access to this information that's already being created? A lot of organizations go crazy trying to create new reports ad hoc when the fact is that there might be reports that are already collected out there that might add value to people. So the report's certainly having a list of the reports and who gets them and what data's on the report, the purpose of the report, those things are all important things that we can share within the data catalog asset that we're talking about here. So how does, how do these tools, how do these assets add value? Additionally, they improve the value through the knowledge of what data is available and who's responsible for that data. So when people go to the databases and they wanna know what data in those databases is available to query and how can we share it and how can we integrate data from several sources? A lot of that information resides within one of these three resources, whether it's in the dictionary or the catalog really depends on you and your organization. It also provides and improves value by improving sources of unstructured data, such as documents and diagrams and audio and video, basically any other content that I would consider to be unstructured data, certainly they're structured to it, but it's unstructured in relation to the structured data that we talk about so often, which is data in databases and tables and columns and things like that, like that. So it also improves value by providing information about who are the people that are using this data? Who's defining this data? Who's responsible for producing this data? And so really associates people's names and their departments with these deep, different data resources that your organization is investing in. It provides a method for impact analysis to keep track of who defines, produces and uses the data. If we're gonna make changes to a specific data set, who are we going to impact? And you know what's very important as organizations go and move forward in the digital age, they're looking to make certain that they're improving the quality and value of those assets. So as they change, we need to know who we need to get involved in those conversations. In these tools also, it provides different rules associated with the data, the business rules, regulatory, privacy, quality rules. And if you don't collect this information, it's obviously then not available to people across the organization. And that's one thing that we wanna do. We want to make this information available to people across the organization so they're using the data appropriately. They're protecting the data appropriately. They're producing the data, following the rules that have been defined for us in the organization. Also it improves the knowledge of the data that's available by keeping track of the location of the data and the tools that people can use to get to the data, the permissions, authority, technology associated with using the data. So there's lots of different ways that we can improve the knowledge of the data in the organization by tracking things that are gonna be important to people to help them to really understand and to value and use the data the way it needs to be used in the organization. Another one of the ways that these tools add value is it improves the confidence that people have in the data. If you're a reader of the TDAN publication and I certainly hope that you are, you'll see that I just recently published an article called the antidote is facts and facts require data. So when we make good decision, we're hoping to base it on facts. Obviously there is some different components to what goes into making a good decision, but we require good data in order to provide good facts to our leadership to make decisions that are important to the organization. So it's really the antidote is facts and certainly the facts require data. That was a follow up to an earlier article that I wrote about there not being any facts if there's no data to support the facts. So it also improves people's confidence by giving them or giving them to rely that the data is accurately defined and produced for use or specifically for their use. It helps them to improve their trust. So they know the data that's available to them really trust improves with the knowledge about anything that you have in the organization and that includes data. So collecting information in glossaries and dictionaries and catalogs will certainly help to improve the way people trust and have confidence in the data. It reduces the time associated with tracking down what data do I need to do my job? It reduces the time spent manipulating the data. We hear the rule all the time that analysts spend 80% of their time manipulating the data and 20% of the time doing what they're well versed in which is analyzing the data. So we want to decrease the amount of time that they're spent tracking down the appropriate data and we wanna reduce the amount of time that they spend manipulating data giving them more time giving the data scientists more time to actually use the data analyze the data make decisions from the data. So we're gonna improve people's confidence by improving the value of the metadata that's collected within these assets I spoke about. It improves the chances that data or reports will match it drives management and leadership crazy when they ask a question and they get different answers depending on people's understanding of the data or where they went to get that data. And so we don't want that to happen. We wanna try to avoid that at all costs. So again, improving the documentation that we have in these assets will help to improve confidence by improving the chances that these reports are gonna match when people are working from the same data with the same understanding of the data. And the truth is that building confidence in the data just breeds additional confidence. So if we can provide as a resource to the organization assets like the glossary dictionary and catalogs that help people understand the data and I'm gonna talk to you in a minute about what goes into each of these assets or at least provide a starter list of those items. What we want to do is we want to build confidence. We want confidence to breed additional confidence. We want people to efficiently and effectively use the data that's being defined in the organization. So it improves the value of these assets improve the value by focusing on the discipline associated with the defining and producing and using of data. So by having these assets and making certain that we have stewards of the data in these assets we're creating formal structure for collecting the metadata that's associated with any aspect of the data. And I specifically focused here on the CDEs or the critical data elements the ones that are most important to the organization. So most of you probably know the song by Meatloaf called two out of three ain't bad. Well actually I'll take it one step further and I'll say actually it's really good to have two out of the three things. So defining the data will really help people understand how that data should be produced and how that data can be used. So producing that data will help on how it's going to be defined or we want to make certain that we're going to produce the data based on the definition of the data and how it's defined for use in the organization and using the data obviously then depends on the definition and the production of that data. The accountability for the meta of the or the application of the accountability for the metadata is extremely important. Now we know that the metadata will not magically appear like I stated earlier, the metadata will not govern itself. We need people to have responsibility for doing these things. If there are already people in the organization that have these responsibilities let's recognize who they are. Let's help them to be the best stewards of the data and the metadata that we can within the organization. And the last thing I'm going to share here is that these tools add value by improving the ability to leverage the data to improve business outcomes. Business outcomes are critical. So we need to follow something like I share something often called the data governance bill of rights. And it's not the rights of the people it's the right thing to do. So getting the right people involved using the right data in the right way for the right reason resulting in the right business outcome at least a good portion of the time. And oftentimes the business outcomes result from the quality and the value of the data. And it actually works both ways. So if you've got good data the chances are that you're going to be able to have good results and be able to make good decisions. If you have bad data I can't honestly tell you that I've ever seen somebody be able to have good results based on bad data. So good data results in good data ends up equaling good results. Bad data equals bad results. We want to make certain that we're looking to improve the results in the organization. There are no facts without data and we need to make certain that we're getting the data we need to provide the facts that the people in our organization need to complete their job and to complete the operations of the organization. And making decisions based on assumptions that's always a risk. So we want to make certain that we're providing assets that are valuable to people of the organization that includes the data assets and the metadata assets that I'm talking about here. So we know that we need to improve value of the data and we can do that through improving the value of the metadata that we're collecting in the glossaries, dictionaries and catalogs. The next subject for today's webinar is just to briefly walk through what are some of the things that we should include within a business glossary, within a dictionary catalog, a metadata repository or even something that I've found organizations refer to as a data management inventory. What data do we have? Who's responsible for it? How is it classified? How should it be handled? A lot of those things might go into the data management inventory but I'm getting a little bit ahead of myself here. So let's first talk about the metadata that should be included in the business glossary. And if you've got any different ideas or you've found that there's specific things that you're storing in these tools, please feel free to share that in the chat, go into the community dataversity.net and talk about well, what are you including in your glossary and where is it adding value for your organization? So a quick list of the items in a business glossary may be the business term itself. Now oftentimes, customer, product, vendor as I mentioned before, but also other things that are more specific about those subjects or those domains. Those are important. Oftentimes the business terminology included in the glossary is apart from the data. So yes, it's great to be able to link the business glossary to the data dictionary. I've done that in several webinars as part of the series where I talk about the three tiers of metadata that are important to the organization. I show linkages between the glossary and the dictionary and the catalog and those types of things. So I wanna make certain that when you're defining your business glossary that is truly what it is. It is not the definition of specific data. This is business terminology that's available in the organization. And I've got lots of ideas if you wanna talk about where are you gonna find your business terminology? You can get it from employee handbooks and user guides and things like that. Things that are provided already have the business terminology defined. It's just a matter of making that information available to people in the organization. The business definition of the term, the category of the term, what does this term relate to and ideas of other terms in the organization? How does customer relate to transaction? Where is the source of the term? I have here, where is the data located and where did it come from? Well, maybe we should take that away. Well, where did we get the source of the term from? It came from the employee handbook. It came from this guide, that guide. We want to at some point be able to link the terminology to the data domains, the subject areas of data within the organization and eventually give them the ability to start with a business term like customer or even something like customer address and let them know what data is associated with that term. So it's the linkage to the data. Who's responsible for this term? Who's responsible for, actually when we get into the dictionary and the catalog, who's responsible for the data that is now associated with these specific terms that we have in our business glossary? Well, what metadata should we include in the data dictionary? So we talked about terminology being in the business glossary, but in the data dictionary, typically the data dictionary is associated with a specific database or a system or a data warehouse or a data lake or a specific platform. But it is saying that this is how the data is defined for this specific data resource. And it can go element by element. So what are the elements that are included within that database system warehouse and so on? The pieces of data in the data resource. What are the attributes of that piece of data? What type of data is it? Is it integer? Is it character? How many positions is it? Can it be null? Basically, what are the characteristics of the data that's being housed within that data resource? And then there's the definition. How is this data defined per this specific resource? We all know that we have data that is in multiple applications. People call it side load data. So it's specifically defined for a resource. Well, within the data dictionary, we're gonna define the definition for the data within that specific resource. Certainly within a data dictionary, you might define things like the standard name that you want people to use and the standard definition and you may link data elements within different data dictionaries as being aliases of the data, of the standard data. Certainly we want the definition of the data and those linkages between the terms. Linkages between the glossary terms and the data. Linkages within, well, we call it customer number over here, but we call it customer ID over there. And here's the different attributes that make up that data. So if you really wanna combine that data, you're not gonna be able to do it if the attributes are different. And if people don't understand that, they're gonna forever run into problems trying to link data together. As I mentioned before, the relationships. Where else does the data exist? Who's responsible for that data? If I wanna find that data in the ERP, what's it called there? How do we get to that? How does the data in this specific resource get there? The lineage is very important. If you ask end users, what do they need to know in order to improve their confidence in the data? It's knowing where the data came from. What did you do to get the data this way? And then all of the rules that I mentioned before, we need a place to record information about those rules. The quality rules, business rules, classification, sharing, all of those rules are very important. And either we expect people to automatically follow these rules without them being documented somewhere. Or in fact, my better idea is we gotta document them somewhere. And oftentimes the dictionary and the glossary and the catalog are those places where we're collecting that information. All right, so let's talk about the data catalog. And the data catalog, like I said before, there are a lot of different definitions out there for it. You know, oftentimes a lot of people think about it as being the metadata repository, but oftentimes there's more of a focus to a data catalog than a metadata repository, which I'll talk about here in a minute. But they're more focused on specific things like the inventory of the data and who's responsible for that data or where does it come from? Where does it go? You know, those types of things are really important to inventory your data resources. You know, what were they in the past? What are they in the present? Where are you going with this data? So the inventorying the data and defining information about the past, present and future is really important. You know, inventorying the accountability and the stewardship of the data. I'm not going to share the common data matrix with you in today's webinar. I share that in a lot of webinars. So please go back and look for that or reach out to me if you want to talk about the common data matrix, but it becomes a very valuable tool so you can see how data differs across the organization, how it's the same, who's responsible for it. And you can answer the question of, well, why are the results different when I go to one resource versus another by looking at the inventory in the common data matrix that you might include that information within your data catalog? And, you know, if you know about it, you know, the data catalog is a lot like a card catalog. If you're familiar with a card catalog for your local library, people go to the card catalog first to figure out what data do I, or what do I need to, where do I need to go to get the information that I need from the library? Well, a card catalog is the same thing for data. It helps people to understand what's available as far as data available reports and those types of things. It needs to be recorded somewhere and it's not gonna record itself. The categories of the business protection and rules, the ownership of these assets, all of that can be collected within the data catalog. Now, the metadata repository in my mind is different from the data catalog. I was a metadata repository administrator growing up in the industry and I got to focus on all aspects of metadata. So the metadata repository was basically that asset that people can go to to get, to refer to all things metadata. And it can include metadata from the glossaries, dictionaries and catalogs. There's business metadata, there's technical metadata, the business metadata focused on improving business understanding and the value they get from the data and the technical metadata basically is already in the tools that you're using in your environment. And the fact is that the metadata is only in those tools. We wanna get it out of those tools and into the hands of people that can use that information. The business models, the frameworks, the predictive models, the processes, all of these things, there's space for them within some of the metadata repository tools. So take a look at what metadata models your metadata repository tool vendor provides to you and you'll see there's lots of information that you might not be collecting already that you should be collecting or that might add value to people's understanding and use of the data. And then there's the DMI, the data management inventory. This often becomes the focus of audits and reviews of your data in the organization. So it keeps track of the data and the information and the records. There's active and passive resources that keeps track of their classification and all these things I don't wanna read through the complete list because we're gonna run out of time quickly but there's a lot of information that you can define in your DMI and your data management inventory and that is really up to you. I mean, there is no specific definition that I know of a data management inventory. It is exactly that. It is inventorying the data assets. Excuse me, that is a response that is available to your organization and it can be past, present and future. So now we've talked about the value we can add. We've talked about some of the things that we can include in these tools. I know I ran through a lot of that quickly. That's why I'm saying, go back and refer to these things because there's a lot of information that can be selected in each of these assets. I just don't have the time in this hour webinar to go through all of them. But let's talk about the people that are responsible for the value that comes from these things, the glossary dictionary catalog and so on. So when we're talking about the business glossary, you're gonna see that for each of these, the data governance administrator or the office or the data governance lead is often the person that is responsible for the data assets or the metadata assets within these tools. Somebody has to facilitate and guide the ship. Certainly business leadership is often the one that says we need to define the requirements for why the business glossary is necessary for the data dictionary. Now they're not going to go ahead and do it themselves. They're gonna count on the data governance administrator or the office. We need business analysts to get involved. Oftentimes we need a project manager to basically crack the whip and get people to do what they need to do. There's the technicians supporting the tools. The data modelers, oftentimes the data governance council at the strategic level are the people that have responsibility for approving the results that we are trying to achieve from implementing these glossary, the dictionaries and catalogs within our organization. The data dictionary, again, you can see that the data governance administrator is the primary person that's responsible for this. And I highlighted a couple of things in red here as they change from one asset that I'm defining to another, but the business and the tactical leadership oftentimes have the responsibilities for defining the requirements of what information we're going to keep within each of these different assets. The business analysts who create the liaison role between the business and the IT, the project manager, as I mentioned before, the technician, oftentimes the application development folks are people that can provide the resource list to provide the resource element list that's going to be included within these data dictionaries. And as I mentioned before, the data dictionaries are often very specific to a specific data resource that we're talking about, an application, a database, a warehouse, a lake and so on and so forth. Let's talk about the data catalog. Again, the DGA is the person responsible for that and business leadership can say, we need to include these things or we need to educate them on why they're important and at least get their buy-in to defining requirements or the requirements that we define as we're building out the data catalog. The business analysts aren't responsible for inventorying the data. Project manager, again, the technician and the council continue to play similar roles for these different data assets that we have for the organization. So you'll notice, as I said before, the data won't govern itself, the metadata won't govern itself, it takes a community, it takes people within your organization that are pushing in the same direction and it takes guidance of somebody to help them to catalog and collect this information and make it available so that people improve their understanding, improve the value, improve the confidence in the data. The metadata repository, I mentioned it before, it's a more wide-ranging metadata asset to the organization. But again, I know from my upbringing in the field that when I was the metadata repository administrator that nothing would be collected, nothing would be kept up to date if there wasn't a person that had the responsibility for administering the tool. And then business and technical leadership. And again, you look through it, the business analysts, the project manager, the technicians, the data modelers can also be involved with the metadata repository. Oftentimes organizations start with the conceptual and logical models of the data within the organization. And who works on that? Well, those are most often the data modelers and then linking those or forward engineering those to the databases and keeping those relationships intact so that you can go from conceptual to logical down to the physical definition of the data and then the data itself becomes an important aspect of utilizing the metadata repository. The application developers, because they know the systems, they know the programs, they know the file layouts. So again, it takes a community, it takes more than just one person to go out and do this. We need to get the appropriate people involved in the right time. So when I went back to the data governance bill of rights, getting the right people involved with the right data and the right way at the right time is eventually going to lead to right business outcomes within our organization. So you'll see that all of these things don't happen on their own. We require that there's people within the organization that have formal responsibility for these things or they're not gonna get built. At least that's my feeling from what I've seen. If you know something different, please share that with everybody. But let's talk about the DMI, the data management inventory. Oftentimes organizations have been doing records management for many years. In fact, I have an organization I'm working with that calls it information governance. And they're really, it's a liaison, actually somebody who's moving records management towards information governance. And the information includes the structured data and the unstructured data and the content and the records within an organization. But again, this DMI is not gonna be created unless there is somebody that has the responsibility for doing that. Business leadership for defining requirements. Again, go through all of this complete list. It's very similar from one to the next, but include the partners as well, including legal and audit and counsel and anybody that can help to assure that the information that's being collected in the data management inventory is available to people to assure the auditability and that we're following all the legalities associated with the data within our organization. So you can see from these five lists associated with these five different data assets, but there's a lot of similarity in who does what, but it's not exactly the same from one to the next. And we wanna make certain that we get the right people involved at the right time and so on and so forth. All right, so I provided a list of what are some of the things that we can collect in these things and the value. Let's talk about where these assets will be valuable to your organization. And we're gonna focus on where I started with this webinar. How data is being defined, how it's being produced and how it's being used, but I'm also adding to that, how we're sharing data within the organization or even with external resources within the organization and how data is protected. You know, with all these privacy laws that are taking place in the different states and they're gonna hit a state near you, if not your state and the near to immediate future, we need to focus on more than just the definition, production and usage. We need to focus on the sharing and the protection of data. That becomes very important when we're developing and delivering data assets to the organization and where are people gonna go to get their information about how they can share the data, how the data is classified, how it needs to be protected. They're gonna go to these data assets that I talked about to get the information that will educate them and help them to understand what they need to do. So let's look at how these assets are gonna add value to your organization when data is being defined. So basically without having these assets or having some sense of formality around what we collect about the data, it's not going to happen. So it's gonna add value by formalizing the processes associated with collecting the data definitions. I often refer to the business definitions that I've seen of data as cheeseburger definitions. What the heck's a cheeseburger definition? Well, it's a definition of a cheeseburger is it's a burger with cheese. The definition of a student account is an account for a student. It doesn't really answer anything more than the name of the element itself. So it's gonna help us to add discipline around where data is being defined. And if you review and validate information that's going into these assets, you're gonna add value to your organization immediately just by formalizing, getting the right people involved in collecting and recording these different data definitions. It's gonna help by providing a data definition resource that can be validated like I mentioned before, that same resource that can be communicated and shared. It's basically going to add value to the organization by improving people's understanding of the data and the consistency of their use of the data. So it's gonna improve their knowledge of what data is available, what that data is, where it came from, how it's defined if we have the focus on creating these data resources. It's gonna provide the clear rules and rules associated with the data definition. And basically by creating these assets, you're gonna formalize the data definition process or the process by which we're defining data for our organization. How is it gonna add value in the process of data being produced? Well, it's gonna formalize the process of producing quality data. If people look at the data and they put it in 99999 or 0000 for somebody's birth date, you know, that's not gonna help us to understand our customers. You know, we wanna formalize the process that people go through as they're collecting quality information. It's gonna provide the data producers, if we have a clear definition of the data, it's gonna provide the data producers with an idea as to how that data needs to be produced. So it's gonna provide data producers with the clear definition and the expectation of how the data is defined and how it needs to be produced to follow that definition. It's gonna help the data producers to provide clear understanding of how the data is used by formalizing requirements for how the data must be produced for providing the rules. It's basically going to formalize your data production process. And if you already have a formal data production process, then that's a great thing. But if you don't, think about how having these tools are going to enable you and help you to build out data governance within your organization by even just focusing on the metadata, the definition of the data as we're getting started. It's gonna improve value how the data is used by formalizing the process of using data appropriately, sharing the handling rules associated with the classification of the data by providing the data producers clear understanding of how the data is used, providing the users with knowledge of how to share and protect all of these different things. Basically, it's gonna formalize the data usage process. And one of the things that we know we need to focus on with the data is formalizing how people use the data. And to do that, we're gonna need to make certain that the metadata adds value for the people that are accessing that data. And we need to give them access to the glossaries, to the dictionaries and the catalogs to help to build on that result. When it comes to sharing data, again, the same type of thing, formalizing the process associated with sharing data all the way down to the last bullet, which basically in red says, formalizing the data sharing process. We need to provide people with the clear rules, rules and the rules associated with sharing the data, providing them knowledge of the classification and how that data needs to be handled, formalizing requirements, how you can share data. In fact, a lot of organizations create these SLAs, we service level agreements around data sharing, a way to be able to measure how well are we sharing data across the organization. Basically it makes the rules actionable associated with sharing data for everybody in the organization who's accessing the data. So basically it's putting formalization around how data will be shared within your organization. Otherwise, they're gonna be asking people in different parts of the organization and using their time to try to figure out how to formalize what data can they share? What data can't they share? How can they share it? And those types of things. Or is there a specific time with the data where all of a sudden it goes from being confidential to sensitive to public and where it can be shared with everybody? Again, how are people going to know that unless we take the time to document that information and to make that information available? Where the data is protected? Well, we certainly need to protect the data. I know I want my data protected and we need to add value by formalizing the process that we're following to protect the data. By focusing on providing the people with clear rules, rules, R-U-L-E-S, for how the data needs to be protected. We're gonna provide detailed knowledge of how the data is classified and what you can do with that data. How can we transmit that data? Does it need to be encrypted? How can we print that data? Do we need to secure printing within our organization? So it helps to do all of those things. It also helps to formalize the requirements for how the data must be protected. And that's gonna be something that your auditors or regulators come in. They're gonna want to see that you have specific rules for how this data needs to be protected and that you're communicating that information appropriately with people in the organization. So basically by formalizing the data protection process, we're going to add value to the organization and it's all gonna result in more formal data governance because you're now focusing on those five actions. I know I mentioned there's definition, production and usage, but sharing and protection of data are critical to the success of a lot of organizations. Otherwise, you could be putting your organization at risk without even knowing you're putting it at risk. Providing this as an asset helps to eliminate or at least to mitigate some of that risk associated with your organization. And the last topic I wanna share with you is where these disciplines can result in data governance. And again, I'm gonna focus on the definition, production, usage, sharing and protection of the data. So let me jump into these real quickly as well. These disciplines will result in data governance by formalizing, by first focusing on where the data is defined, formalizing the accountability for data definition and you'll see for data production, for data usage, sharing and protection. By executing and enforcing authority, again, my definition of data governance over the data definition, by providing a place where the data definition will be collected and made available to people across the organization, by providing an instrument that's gonna encourage interaction between people in the organization, by providing honorable documentation by improving people's understanding of how the data is defined, produced, used, shared, protected across the organization. And I wanna leave time for questions, so I'm just gonna kind of shoot through a couple of these slides quickly. You'll see that they're relatively the same from one to the next. We're formalizing accountability, executing and enforcing authority over these actions and basically improving people's ability and formalizing accountability for the metadata that's recorded in the dictionaries, the glossaries, dictionaries and catalogs. Data usage is the same thing. Formalizing accountability for data usage. Everybody is a data steward if they're using data that has to be protected. So I know people don't like the idea of everybody being a data steward, but potentially everybody in order to receive the entire coverage of your organization, everybody who defines, produces and uses data might be held formally accountable for how they're defining, producing, sharing and protecting data. Certainly that's one way of making certain that you're covering the entire organization. And the same thing holds true for sharing the data and the same thing holds true for protecting the data. You can see that these lists are very similar, but this is how focusing on managing these assets results in data governance within your organization. So my final thoughts before I turn this over to Shannon is about metadata governance. I mentioned that there's a learning plan available through the Dataversy Training Center about non-invasive metadata governance. The fact is that metadata governance is real, it is a real thing and it is necessary because the metadata will not govern itself. And for the metadata in these assets, they're basically only gonna be as good as the associated discipline that you have around that data focusing on the accountability that people have for the definition, production and usage of both the data and the metadata becomes important. And my suggestion, like always, is to start and stay non-invasive in your approach to formalizing accountability around the management of data. So in today's webinar, I spoke quickly about how these assets add value. We talked about what should be included in these assets, who should be responsible for these things, at least from my perspective, when these assets will be valuable to the organization and how getting people engaged in the appropriate way will result in discipline that will actually result in data governance within your organization. And with that, I'm gonna turn it back over to Shannon because I know we have a few minutes left. Shannon, were there any questions today? Hey, Bob, thank you so much for another great presentation and a lot of questions coming in in the Q&A portion of the screen. And just answer the most commonly asked questions. Just a reminder, I will send a follow-up email by end of day Monday for this webinar with links to the slides and links to the recording and anything else requested there. So Bob, do you have a definition for data inventory? For data inventory? Well, I would say a list of the data resources available within your organization. So an inventory of your books is a list of all the books that you have. You know, an inventory of your data is a list of all the different data assets, who to go to, where they're located, who's responsible for these assets. So basically, the data inventory is a list of the data resources that are available and who they're available to and who they're not available to and how they can be shared in all of those things. So it's basically just a definition of inventory as it applies to your data. And when you reference the glossary, why don't you discuss information versus data? Does the, do you feel the glossary contains information terms or data terms? You know what, I think it includes both, actually. If there are documents and there are records and there are things that you want people to refer to in relationship to the terminology that you're collecting in your business glossary, then you can still link the terms in the glossary to that specific data, to that specific information, to those specific records. So I would say, yes, we want to refer to the information, the unstructured data as well. I know that a lot of organizations start by focusing on the physical data on the structured data, so to speak. But certainly you can link the glossary to other information assets that you have that might not be in a database or in a specific system. And how do you document data lineage? Well, you know what, there's a couple of different ways. Organizations use ETL tools, extraction transformation and loading tools that have those smarts built into them. What we need to do is we need to pull that information out of these ETL tools and make it available in an understandable way to people. Some organizations just map data from one place to another using a spreadsheet, which is better than using a Word document or a Word processing document. At least it's more structured in its approach. And if you're gonna move towards a metadata repository tool or a data catalog tool, having it in a spreadsheet is going to be more importable than having it in a Word document. Or should I say the effort required to pull it from a mapping spreadsheet or from an ETL tool is gonna be a lot more helpful than keeping it in a Word document. So yeah, the data lineage is where did the data come from? Because I wanna know, if I'm gonna be making decisions based on that data, where did it come from? What did you do to it? How does it compare to the native source and all of those types of things? And we have just a few minutes left for questions. We have time for, I think, a couple more here. But if they keep the questions coming for this particular webinar, Bob will go through all the questions unanswered and we'll get answers written up for you. That will be included in the follow-up email. So Bob, are there benchmark industry studies which show specific companies which excel in this topic area and how they did so? Ooh, I don't know. I honestly don't know if there, I mean, you can do searches on the internet and look for best practices associated with each of these types of tools or just in metadata management in general. I'm not sure if there's a resource that says that one company versus another is doing a better job of this. Certainly reach out to the vendors before you would go and acquire a tool from a vendor. Talk to some of their customers about how these tools are adding value within their organization, how easy they are to use and all of those types of questions because talking to the people that have done it before and asking them the value that comes from what they are doing is gonna give you a much more straightforward answer than trying to read what somebody writes up about the same subject. Yeah, you know, and I can't think of anything immediately off the top of my head although I'm sure we have some examples. There's a section midway down on dataversity.net called case studies. So if you click that, you can find different case studies of how companies, yeah, are using different things. So are there any, is a data catalog the same as data lineage? No, a data catalog is more of an inventory. Data lineage tells you where the data came from. So certainly if you're going to provide the information about the data in the data catalog and people now start to access that data and reference that data, one of the first things they're gonna wanna know is where did the data come from? Was it just hand entered or did it come from another application or from an outside source? So a data catalog can include the data inventory but a data catalog at least from my experience is not the data inventory itself. It just, that's a component of a data catalog. Like I mentioned before, you can keep report catalogs in there. You can keep a lot of information in a data catalog. That term is more general and people can really define what they need the most to improve their end user communities understanding and usage of the data. Perfect. Well, I'm afraid that is all the time we have for today. Again, if you have additional questions for Bob, feel free to submit them and I will get those over to him to answer in written form and to be included in the follow-up email which will go out by end of day Monday which will also include links to the slides, links to the recording and all his matrices and Bill of Rights, et cetera within the follow-up email. Thanks Bob, thanks everybody. Really appreciate all the engagement. And again, if you'd like to continue the conversation as Bob has said there, you can go to community.data.net. I hope everybody has a great day. Thanks Bob. Thanks everybody and thanks Shannon. We'll talk to you again soon, I hope. Bye. Great.