 Here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for DataVersity. We want to thank you for joining the latest in the monthly webinar series, their Data Architecture Strategies with Donna Burbank. Today, Donna will discuss best practices and metadata management sponsored today by Top Quadrant. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DA Strategies. And we very much encourage you to chat with us and with each other throughout the webinar to do so. Just click the chat icon in the bottom middle of your screen for that feature. And if you'd like to continue the conversation after the webinar or follow Donna further, you may do so at community.dativersity.net. As always, we will send a follow-up email within two business days, continuing links to the slides and the recording of the session, and any additional information requested throughout the webinar. Now, let me turn it over to Jesse from Top Quadrant for a word from our sponsor, Jesse. Hello and welcome. Thank you, Shannon. And hello, everyone. And thank you for joining us today. I'm here to talk to you about knowledge graphs and how they support metadata management best practices. Top Quadrant has been committed to and using knowledge graph technologies since its beginning in 2001. We offer knowledge graph technology for metadata management and governance to provide you with the most expansive view of the most important things in your enterprise. In this lightning talk, I will present knowledge graphs as a smart approach to strategic metadata management, including the ability to connect anything, preserve meaning, and enrich business value. One big problem that we have is that with all the different approaches and products we have available to us, we end up with disconnected metadata silos, or I should say more disconnected metadata silos. These silos don't help make data more valuable, usable, searchable, discoverable, or any of the other good, able words. Gartner is now saying that knowledge graphs are important for metadata management. They mentioned that metadata management must span catalogs, glossaries, lineage rules, semantic connections, and more. Spanning these things requires connecting these things. And that is what a knowledge graph is for, especially if you keep flexibility and evolvability in mind. So you may be asking yourself, what is a knowledge graph? First, it's not a black box, it's yours, because it's based on open standards, semantic graph standards. This allows it to represent any knowledge domain. And by represent, I mean the facts, the models, and the rules are all in the graph, all together. No more hard coding of business logic in code. The visualization on the right hand side is trying to say this. A person exists, and a person has two parents, and a person can have eye color. Then sitting right in there with it is a rule that says, if both parents have blue eyes, that person will have blue eyes as well. And then we have James, who I can infer blue eye color for because of his parents. This information or knowledge all resides together. Let's take a look at another example. For governance, version control, access, and many other reasons, you may want to maintain different knowledge graphs. Makes them much more approachable. Here we have three different knowledge graphs named KG123. In the first, we're doing data asset management. Data element mappings, mapping the glossary terms, element to element, flagging PII, assigning stewardship, and all of those great things. In KG2, we're doing technical asset management. And in the third, maybe enterprise asset management. Separate, these are very manageable and quick to be created. But together, we can self-compose information that never existed before. So in this case, I may have a data set with a PII data element and a policy restricting physical location of PII storage. And this located in rule is able to enforce that policy. Pretty cool. Further, you can look at knowledge graphs with different lenses on. Here we see a high level lineage that is dynamically generated by metadata and model. The model allows us to approach this information as if it was a message that we're free to drill into when we need to get the deeper how and what. Here we see how complex a simple patient discharge actually is. The paths of lineage are interactable, and you can follow your nose really as you see fit. And if we were to engage the data flow icon here, I'll be presented with that derivation. So the information is there when you need it, exactly where you need it. I like what Carter says that a knowledge graph grows and becomes more comprehensive as more metadata sources are ingested and business users enrich it with context and semantics. I really hope that you enjoyed this quick exploration of the flexibility and evolvability of knowledge graphs. And please remember that a knowledge graph is a flexible, standards-based approach to seamless metadata management, seamless metadata management. Check out our website for more information and let us know if you're interested in our copyright edge knowledge graph system. Thank you all once again, and we're back to Shannon. Jesse, thank you so much for this great presentation. And if you have questions for Jesse, I see one coming in already. Or if you have questions about top quadrant, you may submit them in the bottom right hand corner of your screen. And he will be joining and he will be joining Donna in the Q&A at the end of the presentation today. So now let me introduce the speaker of the monthly series, Donna Burbank. Donna is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the Managing Director of Global Data Strategy Limited, where she assists organizations around the globe in driving value from their data. And with that, let me get the floor to Donna to get her webinar presentation started. Donna, hello and welcome. Hello. Thank you. Always a pleasure. So as Shannon mentioned, this is part of a monthly series in data architecture. Any of the previous ones are on demand. You can catch them all on Dativersity.net. And then we hope you can join some of the upcoming ones. The one next month is on data quality and with my colleague Nigel Turner as well. Before you go any further, Shannon, can I just do a quick sound check? Is it okay in your side? Are you getting any background noise? Because I am. I just want to make sure you're good. Sorry, I'm having e-button issues. You're sounding a little fuzzy, but you're coming in fine. There's a little bit of static. Yeah. Is that any better? Much better. Yep. Okay, we'll go with this then. Gotta love technology. Okay, so the topic of the hour, as was already covered, was metadata management. The good news about metadata is it is hotter than ever. And Dativersity does a lot of surveys, a lot of webinars on metadata. And there are always some of the more popular ones, which I don't need to convince the folks on this call. But even though metadata has been a while around for a long time, it is still critical for driving business value and it has evolved. Because technology evolves. And metadata not only supports business value, but also technology as well. And we'll talk a little bit. I mean, it's a short webinar, but as much as we can cover because it could be all day on some of those metadata strategies and technologies. And just a note, as I mentioned, metadata is hot and every time Dativersity has one of these events, they're well attended. So just last week, if anyone attended the data architecture online conference, we also did a metadata topic there. Those are all available online as well. I think, as Shannon said, if you register for next year's event, you can get this year's recordings. And for those of you who have joined both, I tried to mix it up. You'll see a little bit of overlap. I also tried to cover it new. I could talk all day about metadata. So I tried to be conscious of that because I know I already recognized some names that were on both, but tried to be true to the content. So apologies if there is some overlap as well. Okay. So without further ado, why are we talking about metadata? Because as I said, it is hotter than ever. And it's a little dated now, but not terribly so. We did a dedicated emerging trend in metadata management a few years ago. And the one of the reasons I used it, we had a particular survey here. Over 80% of folks said metadata was not only as important as it ever was, but even more so. And we've done similar surveys. I think a lot of you have downloaded, we have data architecture data management surveys in which we include metadata. And the basic survey result has been the same. But since this survey particularly focused only on metadata, we thought we'd cover that. So there'll be a link at the end of this for those of you who may ask. It's out on the data diversity site. It's also on our global data strategy site that you can download and see the full thing. Because if you're interested in metadata, that's a great paper to look at. So one of the reasons metadata is so hot is that, again, if you've joined any of my monthly webinars, you've seen this chart, and this is sort of our data strategy framework, where it takes you a lot of things you may be familiar with, the Dama, DMBock, which sort of adds that business layer on top. And you'll see that metadata management is a core piece of this. And you could argue metadata could be a circle around this whole thing, right? Because metadata touches everything. If you're doing data integration, you need metadata. If you're doing governance, core of data governance is metadata. If you're doing a business strategy, you want to understand what data means, how it's used, how you're calculating total sales, that's metadata. If you're doing a bottom-up inventory, you want to understand the structure of your databases or unstructured sources and how they link that's metadata, right? So, yes, it is one building block of an entire data strategy or data management approach, but it's a critical one. And again, I guess each one of these could have its own lens, which is sort of what we go through in each of these monthly series. So, just wanted to put that out there that we'll touch a lot of these in this conversation, but metadata is everywhere, which, thus, the popularity. If you've heard me speak on metadata, you may have heard of, I'm known to rant from time to time. One of my rants is what not to do. So, when I define what is metadata, a couple definitions I like to use, one of them is the idea of data and context, right? What is how we're using data in the context of analytics or in the context of master data management rules? Please don't say it's data about data. I think often the business love to kind of say that what's metadata, it's data about data, but that doesn't do a whole lot, especially to a business user. They get obfuscated a bit more. It's a clever plan words, but metadata is already complicated enough as a term. So, if you're trying to sell it, probably not the best way. So, a little mini rant over. I like to say it's metadata in both business and technical context. Another graph I use, and it was popular last week, so I'll show it again, is some of the Zachman framework of metadata, right? Who is the who, what, where, why, when, and how of data? And that's where it starts to make a little more sense. How do we know who created this data? Who's the data steward of this data? Who's using it? Who, quote, owns it, right? Along the what? I think that's often where we're most familiar with metadata, especially those who kind of use data dictionaries and things like that. What's the data structure? What are the business definition of this data element? What are the rules that the previous Jesse mentioned? Where is the data stored? That's data lineage. I don't need to read all these to you, but as you go through each one, you'll see getting this is really the full context of any data you're looking for. So, kind of a good framework as you're looking through metadata. Am I catching all of these? Sometimes, we do a whole lot on the how, right, where we know the data I'm working with a large international customer right now. And they started sort of trying to do a full inventory of all the data types over in this how and how data is formatted and how the data sources. And they initially kind of forgot the why. And they were going for volume, but not business impact. So, we kind of mixed things up a bit and started with the why. We could talk about a lot of different types of data. Let's ask the business what's most critical and start there. Otherwise, you're boiling the ocean. A lot of the tools out there can do kind of a broad scan. And that's really nice. But again, don't forget the why. So, kind of a when, sometimes, do we need to look at the old data? Do we need to limit it past five years? When do we when do we purge it? Do we put that in the metadata so we can keep track of that? So, again, kind of a holistic look of data, metadata. Sorry. One person's data is another person's metadata, which leads me to the next slide. Sometimes a picture is worth a thousand words. So, if everyone understands spreadsheets, and then one of the other data diversity surveys is sort of the leading data management platform, which maybe we don't like, but a lot of spreadsheets out there. But everyone kind of understands that. So, if this is a typical spreadsheet of customer data, we have customer first name, last name, company, et cetera. Those who think of the column headings as your metadata, what is in that first column? If we just saw Joe Smith, maybe that's obvious, maybe it isn't. If we see City, what is New York? Is that where it was purchased? Is that where, et cetera. So, kind of adding that little bit of context. So, if you look at the next example, yes, those column headings, STR-01, TXT-123, that's metadata as well. Again, if you just sort of scan, if this were your database, you'd get this as kind of your data dictionary. So, yes, metadata, but is it full? Again, if you're kind of going back to that who, what, where, why, and when, do I know what is even in that column, or why we're storing it, or what it means, or who's using it? No, it's kind of, well, yeah, there's a string and text, I guess, and that's a date field, I guess. It doesn't give a whole lot of rich metadata. So, there's metadata, and then there's metadata. Is it valuable? I think, I forget who said it in the industry, but someone said metadata is sort of a gift to the future, even yourself. I had a grandmother that every picture we took back in the old days where they were printed pictures, put the date on it and put who those people are, you're going to forget in 20 years, and we found some pictures, right? I have no idea who those people in the pictures are. There was no metadata, right? So, my grandmother was prescient. She knew her metadata way back when. Another example I put together, I was doing one of these webinars on self-service data analytics, and the point I was trying to make was how easy it is to use some of these either self-service data prep or self-service BI tools, and that there's open data sets, and kind of my point was within a half an hour how much rich information you can get. Well, I was sort of thwarted from this. This was, I won't name the exact name, but it was an open data set from the UK, which I thought would be a fun example, which was Road Safety by Vehicle Make and Model, and I thought, well, that'll be fun. Let's pick on those Mercedes drivers or the Porsche owners that, you know, go too fast and have vehicle issues or whatever the answer is. But I downloaded this, and the tool that I built this in had some really nice visualizations. Like over here, you can see that F-13 is amazing, followed by F-14, and 205 blah blah blah is 10, and F-4 has nine of them, and there's 2,115, or is that a date? I bet that's a date field. So again, there might have been some really great information in here, but there's no metadata. So it was really, smack you in the face example of what data looks like with no metadata, and it was, I was so frustrated I didn't even go try to look any further. I'm dying to know what F-13 is. Is that your Porsche or is that your Volkswagen's, right? I don't know. So someone did a whole lot of work to curate this data, and I can't use it, right? So that's almost a great example of metadata. Way back if folks follow Bob Siner and his T-Dent kind of newsletter, I'd put an article on that metadata is marketing, right? Imagine writing a book, and someone asks you what it's about, and you don't tell them, or there's no title, or there's no abstract in the book. No one's going to buy your book. So this is the same thing. You have all this great data, publicize it, let us know what it is, why we need to use it. There's some great metadata sets out there that, you know, there's a lot of great context of what it was created, especially some scientific open data. We did a webinar last year with the Environment Agency of England, and they were using some data models to publish open source metadata for citizen data scientists who are collecting bird samples and things. You need to know where you saw that bird, how it was collected, when it was, you know, all of that information is the metadata that makes that data useful. So I'm probably preaching to the converted here on this call, but these examples you can't use enough of them. A metadata is one of those things. I love to use the word because it's kind of a funny word, but it almost does itself a disservice. It's so obvious once you explain it, and once you show that when it doesn't exist, things don't make sense. With some more example, I mean, there's technical metadata, there's business metadata. So, yes, if I'm looking back to my customer spreadsheet or customer mini database, you know, even something like last name and first name is last name the surname or the family name and some cultures quote up, you know, the western last name is someone else's first name, or, you know, companies that the company the person's working for, where they, the company they bought from, city, is that where the person lives? I mean, anything gets complicated. And so understanding the context that even something as simple as a year, is that the year it was purchased, the year the person was born, all of that, right? As well as, what's the format? Is this a Varkar field? Is it a text field? I mean, the number of, I should write a memoir of metadata, the metadata lives I've lived or something of even, even this year, massive Fortune 100 companies have had major business impact over metadata issues. One of them was, even in today's day and age, a online business platform, someone changed the part number field from character 10, from character 12 to character 10 and brought down the online system and crash they couldn't sell for a day and a half. So that was a metadata problem. There was other things, there was a governance problem. There's a lot of other things that caused that. But at its core, we brought down a major retail organization over a metadata issue. I think, no, I won't talk about it on this, I've talked about it on others, but if you're not familiar with the famous Mars rover issue that basically cost NASA millions of dollars or billions, it was a metadata issue. So again, this is pretty critical. The good news, although I've just again, preached to the converted about six times now. The benefit of metadata, once explained, the business, business could mean a lot of different things in different industries, gets it, you know, kind of your non-tech person. In fact, I would argue, it's often the business that gets it more than IT does. Often as IT, I feel that I need to explain to, yes, you have to define what that field means. Well, I know what it is, so why should I define it? Everybody should know. Well, maybe not, right? Or when you, when you leave, well, other folks know. And please don't do it as a false assumption of this is going to be job security. If nobody knows what I do and I don't document it, they're going to keep me. I think companies are wised up to that. So it would probably have the opposite effect. And again, think of it as marketing. You've done great stuff, tell people about it. So in this famous metadata survey that I'm referencing, when they kind of said who was using metadata, 80% were from the business, right? Because again, they're often the ones consuming the data and maybe consuming the reports and maybe they don't have all the lineage or the definition. And one of the quotes from the takers, the survey respondents of metadata is really what helps business and IT understand what they're working with. If not, organization is at risk. So I tend to be a kind of opportunity driven rather than risk driven person. But on both sides, if I don't know how this total sales number was calculated, I can't understand how to grow my business. I can't understand that. On the glass half empty side, I'm also at risk for getting sued or having the auditor come in and giving you a fine. So it's both, but both can be solved by metadata. So again, the business meaning and context of data, the extreme example with my automotive F4 example, but it happens all the time, even as something as simple as the business person saying, show me all customers by region. And maybe they don't understand how hard that is. Can't you have it by this afternoon? Well, some complexity to that. A good data architect or metadata manager or data governance lead or whatever, his brain should be spinning with that question. What do you mean by customer? Is it current customer, lapsed customers, customers on maintenance, customers who may be a prospect? A lot of different questions. And again, in my metadata memoir, I will list the story of how many, again, Fortune 100 or even small nonprofits who have done embarrassing, horrible business mistakes on something as simple as what do we mean by a customer? Sending renewal notices to prospects or et cetera, et cetera. Or how do you define a region? I am not exaggerating that I've had been on the, I'm whining now, I've been on the phone since 5 a.m. my time, but I've had three different calls, three different calls with three different clients and all three today came up with a different version of what is a region. Big different. One was what is a market region versus a sales region versus a ethnic region in terms of Latin America. Is Latin America, South America? Is it North America with Mexico? Is it a regional preference? Another one was a region where it was a geographic region, but it also had to do with their market segment. Is it power versus infrastructure, et cetera? On two of those, two out of three talked about customer and what we mean by customer versus client and the different types. This isn't an academic example, and if you're not thinking of these things, don't say, well, those companies are crazy. How do they not know what a customer is or what a region is? I would argue those customers are really advanced in their data usage and they're really understanding the nuance of their data, and these were both business and IT people talking about that. What are the different flavors of region? How do we store it? How do we populate it? How do we display it? These are the conversations you should be having in something like a data governance council, and you need metadata to support that. We'll talk about the types of metadata. It's the lineage of how do we calculate a region. It's the business definition and the glossary, et cetera, et cetera. A super simple example. If you're in any organization, show me our people by region is pretty simple, but even that has complexity. Again, it might seem simple, and again, even when I was early, I used to say that from the first thing that I did in one of my first day diversity conferences is, oh my gosh, these weirdos. I talk all day, but what is a customer? How hard is that? That was my year one in data management. 30 years later, I know what they mean, but it can seem simple, but again, what's obvious to you is probably not obvious to somebody else. I do avoid that dreaded. Well, I just know. I mean, gosh, really, I have to define what a date is. Really, I have to define what a date is. Well, and even one who's, again, been in data management more than 15 minutes, is it fiscal year? Is it calendar year? Is it et cetera, et cetera, et cetera? Which holidays do we use and which part of the world? Even something as simple as a date, please document it. And again, I've seen reporting errors, thinking it was calendar year, and it should have been fiscal year, all of that. Or something, here's a gentleman who's about to retire and he says, well, yeah, part number. We used to call that component number before the acquisition. That's the same thing. He knows that, but that's a pretty big difference. Is that the same thing or not? Right? So, please document that. That's where your data stewards come in. Put it in a glossary, put it in the metadata repository, or a catalog, or a data model, or a collaboration tool, or a SharePoint, or anything. I have my preference. You may too, as well. But again, even a spreadsheet, I know we love to hate spreadsheets, but that might even be a great way to start if you don't have anything else. But at least document them and make it clear, because then nothing is clear. So again, I love to have data management cartoons, because who does? So anyway, you've probably seen this before, and maybe it's not even that funny, but I continue to put it up because we can all relate to it, right? Hey, we're almost done with acceptance testing, and everything's great for this marketing application. But how are we defining customer? And again, that might not seem funny until you've done this. And then it's probably not funny. It's probably post-traumatic stress of what you have to do to fix that later, right? So again, not a small question at all. And again, if you've heard me present, you don't want to be overly academic. You don't want to spend six months defining what a customer is. If you're doing that, you've done something wrong. But even that five-minute question, guys, when we say cut, I'm thinking of one of my metadata memoir, our stories, of it would have been a simple question. Someone in marketing pulling a list of customers, just asking that clarifying question, do we mean prospect, do we need existing customers, and do they have to be on maintenance? That would have been a five-minute question that could have solved a big problem. So none of this stuff needs to take years and months, even a quick whiteboard or a quick data model or something, but we start asking those questions and documenting the answers when you have them. Because metadata is used by, both produced and used by a lot of different roles in the organization. So again, this is a subset of probably a list of thousands. In fact, in several of the reports we've done with metadata, with the diversity of some of the surveys, we ask kind of, who's using this? And I find a lot of the write-in answers the most interesting, you know, race car drivers, scientists, physicians. I mean, when we talk about the business user, as data becomes more prolific and used everywhere, almost everybody in the organization. So I always find the write-in answers the most fun. But just to keep it simple, you know, a developer, think of that developer who brought down the sales system by changing the length of a field, if he or she could have done impact analysis, if I changed this field, what would be affected? Might have been a good idea, right? What's the definition of regional sales? I'm a business person and I'm looking at a report. Just did a big, one of our Latin American clients just did a whole effort on that very thing. And again, this probably has happened to you in a different iteration. Every month they would do different sales by the different regions around the globe. And it took them out of a month, two weeks to put that data together. And it was always right. And half of the sales meeting talked about how we calculated the numbers. So they were putting in data governance and data warehousing and lineage so that they could spend the meeting talking about how to improve sales, not how we calculated sales, right? I mean, that's huge and that's 2020. This isn't kind of old-fashioned stuff. This is happening, you know, yes, we've had the technology to solve this, but this needs organizational rigor to really make sure metadata is used and governed throughout. A data architect, right? Maybe I'm building a new system. If there's an approved data structure, can I leverage some of the standards? Is there master data? Are there other standards I could use? So I mentioned the business person talking about, you know, trying to find the definition of regional sales. And maybe that person did it on their own, but you know, maybe the auditor kind of prompted that question. You better show me how you calculated that, Wendy. You know, if you're reporting to the street, you can't sort of make up numbers. Maybe then that prompted the data architect to understand, well, how is the source to target mapping done on that? How do I understand? Or, I mean, the one that happens a lot and maybe is not often sort of thought of as even just something as simple as a business glossary, who didn't start an organization? And then I generally, when I go to a new client, create my own. I have my own little data glossary of what all the acronyms mean, right? Or all what their terms are. You know, often as a consultant, you're thrown down in a brand new industry. Maybe you've never learned and it's just kind of a learn by us most as really quickly. I mean, you probably had to do that yourself at your company. So if they were a nice data model or business glossary, that would have made everyone's life a lot easier. So to sort of maintain these, that's where data governance comes in. Huge, big, massive caveat. This is an example of some data governance role. When you need to make, this makes sense for your organization. There's a lot of different ways to do governance, but these are fairly standard roles that shouldn't probably raise too many eyebrows, but they don't just take these and implement them. But as an example, maybe your business data owner, they're probably going to look at things like, how do we define a KPI? I want to know even what KPIs are important? What are the metrics I'm looking at? What regulations we need to understand? Maybe down to the business data steward was maybe more the person in the front lines. Maybe they're going to define the business rules of how to calculate that KPI or maybe more nuance of what that means. The data architect, I find is a massively important role and kind of one of those unicorn people that can kind of live on both sides of the world. So if we go back to the other side of kind of these orange, yellow colors of that being kind of the tech, right? So maybe your data engineer needs to understand the physical structures and the data type standards and maybe whatever you call this person, the system data steward that understands how SAP is running and how all the systems work. This data architect can kind of go and talk to the business person and understand their world and then kind of turn their head and also kind of understand the tech, which is really what metadata helps with. So if anybody should understand metadata, often it's that data architect that gets the full scope of that both business and tech. Another thing I did cover on last week's call and you may have heard me mention before, but I think it's critical just like this data governance slide, you can apply these roles in your org if that's not what makes sense. When you are looking at tools and technology or how to manage metadata, do think of your use case and is it more, again, is it more opportunity-based where we're just trying to brainstorm ideas and get that kind of superset of people's thoughts or is it more rigid and I'm really trying to vet a single source of truth and do more, maybe regulatory, etc. So the example I always use is or the analogy is an encyclopedia or Wikipedia and you'll probably have both in the organization, right? If I'm talking about master data or I'm talking about the enterprise data warehouse where I am reporting the figures up to finance, I hope that's vetted and I hope that doesn't change a whole lot and I hope there's some very standard enterprise data sets for that. I don't want to mix that with maybe I'm doing some exploratory analytics on customer behavior and I have a group of data scientists and we're just brainstorming here, don't lock them down so much that they can't do anything new and at the same point don't have your warehouse be, I don't know, what do you think? So I kind of say, encyclopedia is the bearded old guys in their office kind of defining it and then publishing it and Wikipedia is sort of that more eventual consistency of Wikipedia, one can mock, is it really official? It's often right, there's enough eyes looking at that but eventually we get there but there's a lot of voices who got there and often that's the kind of the goal point of a data-driven organization of I want people to be looking at and everyone's looking at the data and everyone's contributing. I would argue you may not be able to get there without this encyclopedia foundation but don't pick the wrong tool for the wrong job because really there's probably a balance and just make sure you're doing that. I think I've mentioned on last week's call as well, the right tool for the right job. I've seen companies that are really trying to be more collaborative and have a very standards-based metadata tool where they can't even have people give feedback or even anything or know you inform. I've seen on the other hand a company that was trying to really enforce GDPR compliance that was using more of a collaborative based where well sure you can define it another way or just give me your ideas and they couldn't they couldn't lock it down enough and both were good tools and both failed because they just were solving the wrong use case and I know that's obvious but there's so many good tools out there now just make sure you've got your choices do a really good requirements analysis and make sure you've got maybe it's multiple tools for different use cases but just give that some thought. The other side of when you're looking to implement metadata or metadata management tool do and again these are all obvious but checklists are always helpful when we're all busy right but think of all the different types of metadata you need to store again we often help companies with their data metadata repository data catalog data dictionary whatever you want to call it these days and often where it breaks down is they don't support one of the data sets so maybe the demo was great and they showed us all the SQL server databases and everything else and maybe you're storing stuff in AWS and COBOL copy books from way back right and they don't support what you're trying to do so just do a checklist and make sure you understand several things what are the sources you're using are you doing legacy um do you have some metadata already and things like data models i'm a huge fan again if you've got a low budget and want to do a whole lot of metadata with a low end well low end is regular probably but a lot of that business and technical metadata already lives in your data models publish them out put in the spreadsheets better than nothing right or are your information sourced in spreadsheet isn't people's heads are you doing media like multimedia um i've had some companies were trying to get the metadata from pictures right or from who took it when when it was taken etc what metadata from social media if you're doing data science you're probably looking at customer sentiment are you either publishing open data or do you need to take data from open data are you sharing this outside the organization maybe with some xml out to a different set so anyway understand both what you're storing within the org and also outside the org so what i found interesting and again this is a couple years old but if you look at some of the newer surveys similar findings um or is in the past when we and still one of the major use cases when we think of metadata it's really kind of heavy on the relational world data warehouse data models glossaries i would argue that that's not going away i mean you're always going to have in fact in terms of eyes on things your enterprise data warehouse your your figures are reporting out to finance are always going to be important so that's a great place to start those aren't going to go away when you look at the future plans they don't go away but they're augmented by a lot more stuff so how do we get media metadata how do we get um you know metadata from twitter so a little bit of a plug but hopefully it's a helpful one i think uh shannon usually sends out after a link there's a metadata course on data diversity um that i taught and what i do one of the segments is going into detail on what do we mean by metadata for uh each one of these for a video or for a photo or for social media because i know when that was new i had to do my own learning of yeah well what metadata can you get from twitter and it's pretty fun it's pretty interesting so um anyway you might just do some exploration yourself on what new sources is not only in relational databases still really important um but there's a lot of other sources out there again i'm a big fan of checklist because like you i'm really busy everyone is these days so you know just like going to the grocery store what do i need so sometimes these simple checklists can be a lot so you know just just list what your sources are what's consume almost like your old-fashioned crud um a crud matrix if people are familiar with that where data is uh created read updated and deleted so who's using it both as a consumer as a producer and you can either do a heat map well let's maybe start with oracle everyone's using that or hey we're not dumb if exact leadership is looking at this open data source maybe we start with that but anyway at least you'll understand and when you're looking at tools please do approve of concept and please throw some of these use cases at these tools because again it can be a wonderful tool they might not really every tool has its strength it might not be the strength of the data source you're looking at again um i've seen some things go wrong with great tools but just the wrong on use case so most of these tools and again you don't have to use a fancy tool for metadata if you can afford one and they're great um and you can use metadata for a lot of things again i'm a fan of data models they do a lot of good in an organization often both from a business and technical perspective you can do some of that top down i'm going to define my roles and implement them in a system great way to do things often there's things that already exist uh well almost always things that exist and you want to do that kind of bottom up discovery most tools have something whether they call it a scanner a crawler uh you know whatever there's some way to do automated metadata discovery and again depending on the robustness if that's the word of the tool it can get a lot just that what are the data structures right maybe i inherited an old db2 database i've never logged into db2 i don't really know how to do it but i want it's the sales database i need to understand it um it can scan it and understand that data structure or your famous cobalt copy books right i learned they were old when i was learning um but what was the new jersey governor in the u.s they were looking for uh folks in this new pandemic to uh code cobalt because some of the old systems were needed to be updated so great skill to have right but some of these tools can help with that now what we'll talk about in a bit is not only the structure themselves but this inter relationship and especially as jessie was talking about things like graph databases it's not only the metadata for a source but it's the metadata between the systems that can be really interesting which sort of leads the types of metadata um there's kind of you know i would say that we go back to who what where why when right if you go to structural metadata that's sort of what we were saying almost your data dictionary where is it how is it stored i have it in the database it's a character length of 12 it's in these columns super important very important um you also want to get kind of that descriptive metadata that's more getting around the usage in the context okay it's a name field it's character 12 it's used in these six sources when we say name we need mean family name um of you know the adult and the family or whatever there's a lot of context around that where it's used um but also as important as this idea of i call it relationship metadata is that the data lineage the impact analysis and we'll kind of go through some of these examples because that's often where the massive aha moments come from of the relationship the relationships between things um your classic one um this classic that means it's important doesn't mean it's not important it's gone away um you spent around for a long time but still absolutely critical is your data lineage from again maybe i have that sales report that the business user wanted to say what are total sales by region okay so we could say what do you mean by sales what do you mean by region whatever we've gotten that out um so some of that's in your bi tool well if you use kind of your classic the bi ecosystem well that probably came from a number of systems on the back end there was probably some transformation and this business rules hidden in these etl tools or the scripts you've probably got some physical data models maybe if you've got a great team you have logical and dimensional models there's glossaries there's a lot of different metadata that you probably want in a good metadata management tool like a repository or a catalog they can see all of that so you can drill into each table um so i have a field called uh i don't know customer name and i want to understand that i want to see where that was sourced and maybe i want to drill into this table and see more about that or even see the data itself is that pii all of that and that seems like an amazing task um but that's kind of needed right because if reporting on it you need to understand it the good news is a lot of the tools have automated scanners there but again um when you're looking at that make sure it fits your use case um most good tools could do this picture invent them if you have this as your use case maybe you don't so don't pick a tool that does this again if it's in some sort of structured or semi-structured format the tool should be able to read from your etl tool and your databases and even your bi structures et cetera your glossaries and mush that together complicated to rush it there maybe some rules around mushing it but it should be able to do that um but make sure it fits all your use case is one of my um big media clients the tool passed this test um but it didn't pass this one a lot of their stuff was out on us three um and they were doing a lot of stuff in the amazon platform and that tool was a great tool but not for this use case um and so i also wanted to stress this you know AWS isn't blazingly new it's been around for a while um but it's certainly not legacy technology and i have heard and i don't know what planet they're coming from when i show a slide like this folks say isn't that old school you know we don't do that anymore show me a company that really doesn't need to report on its sales that would be surprised right so yes this is this is one of many use cases now but it's still a valid use case um but similar use cases of source to target mapping maybe i'm moving to the cloud maybe i'm i'm doing some advanced analytics in the cloud on its three buckets um but i still need to understand lineage so that use case doesn't go away um some of the new technology that does help uh so in the old days um a lot of this mapping is boring and banal stuff we're exciting if you're nerds like me um of even something like okay it's customer on the staging area um let me change colors uh maybe it's it's customer right on your database table but in the source system it's cust c us t or maybe it's cbl underscore c1 right and in the day and still there's a case for it is how do you do that mapping is there sort of a a rule of for example the data models can kind of you can create a mapping when it says c us t i know that it's customer um maybe it's not maybe you're an ice cream company and that means custard right so you might want to be careful with those mappings but and some of the obvious ones a lot of this kind of machine learning and ai can be automated right if it's so even looking at the data and either the pattern of the table names the g at cus it's probably means customer or a client without the vowels might mean client um and it can do it best guess or can look at the data itself especially some of the security software is good at that and i see a pattern here um it's probably so secure number so look at that where you can automate i do think though don't over automate sometimes you do want to override again maybe it is custard and not customer and i don't want it to over ai um to begin those tools exist you shouldn't have to do manual where it doesn't make sense use the manual override where it truly does um a couple examples on lineage because especially when we're talking about things like graph that lineage and kind of interactions makes a lot of sense um the one i mentioned was that impact analysis of what's going to break if i change something again i'm changing um i want to change the length of the brand field that we changed the name of our company now it's 30 characters let me see what else needs to be changed across the organization from from databases to xml to sales applications etc um a lot of the good tools can do that save a lot of headaches that would have saved that company that lost sales because they changed their part number right they could have used something like this the where you used impact analysis call different things one that maybe isn't talked about in all circles but i am a big fan of that semantic mapping but again maybe i've decided the conceptual level we're calling that client across the organization we all call them customers we call them clients well maybe in a lot of systems it's called customer and we should have allowed that too um but maybe in the physical systems it's all of these different versions and the different platforms important to know data modeling tools are good at that a lot of the um metadata repositories are good at that and that again maybe you want automated discovery to discover some of that maybe some of those are embedded rules in something like your data modeling tool that can do that conceptual logical physical or at least to one of the other you know logical to physical might be a common one um as jesse was talking about earlier the power of graph i mean in some cases the metadata itself is i mean the the the graph the relationships themselves are the metadata so think of fraud detection so you you yes you can use graph to kind of link some of these systems together but you can also look at a graph and discover a lot i'm i am i don't know doing fraud detection and now there's a whole lot of financial transactions on different accounts from one laptop gee that seems strange or i'm a criminal and i've called six people before i robbed the bank maybe we look into those six for people or whatever right so you can determine a lot from the relationships itself and that is a term metadata in fact when metadata became hot and we were talking about phone records and things often it was this kind of metadata people were talking about what what phone patterns were made so it's the usage of the data itself so another one to look at won't go deep into this again we covered it last week if you want to re-see this again but just be know i i talked about it before there are different tools for different jobs there's no one size fits all so if you can afford it if you want to go that route if you can get the full-on metadata repository data catalog that can scan all of the systems in can do both business and technical metadata go that route it in the long-term it'll pay off but if you can't and you're like most companies and budget constrained maybe just start with a glossary on a share point or maybe just start with your data modeling tool and publish it or a simple data dictionary don't not do it because you don't have a massive budget and don't forget outside your organization often the metadata standards are driven by export and so when you're thinking of consumers don't forget that you may be publishing it or you could leverage it outside your organization as well so before i wrap up because i do want to give time for questions again i'm not going to go in detail in each one of these areas but hopefully it's something you can take back and kind of digest when you're looking at metadata don't forget the strategy right we talk about data strategy a lot do you have a metadata strategy back to that why again with the good tools you can scan a lot what's the priority what do we focus on and how do we make that consumable so as we're looking to make consumable how do we capture it how do we store it how are people going to be getting it into the source and then getting it back out which sort of leads to this idea of integration and publication maybe it's a repository but it's not very user-friendly to get out maybe we need a different tool and don't forget the metadata management and governance and again we could talk all day what's data governance what's metadata that's right but in general are you do you have the right roles that are tracking that do we keep data quality statistics on metadata how many definitions are have been defined etc etc what's the life cycle of metadata maybe that was defined but it was divine 16 years ago is that still how we define a customer i've i've gone into clients oh yeah we've got a big picture don't worry we did that back in the 90s it's still good probably not right i hope your business has changed since then so again almost everything you would do with the data strategy at a certain level tone it down a bit and just look just at your metadata so hopefully that's kind of a helpful thought as you go try to implement so again metadata please don't say data about data they give it as data in context who what where why treat it like a first order asset in terms of governing it so understand data governance from both the business and technical metadata and relate them together and there are awesome tools out there now um and so as you get your structure together think of the right tool for the right job um and make sure that your metadata is part of a wider data strategy don't do this in a vacuum because that's going to defeat the purpose you you really have a lot of users for that um quick plug for the white paper if you wanted more of those survey stuff again it's on data versatility as well as the data strategy website if this interested you would be happy to help plug we do this for a living um and love this stuff please join us next month where we talk about data quality and my colleague niam jule turner from cardiff will be joining as well so without further ado i do want to open it up to questions and i'm going to pass it over to shannon who will read the q and a donna thank you so much for another fantastic presentation just a reminder uh i will send a follow-up email to all registrants by end of day monday for this presentation with links to the slides and the recording if you have questions for donna or jesse feel free to submit them in the bottom right hand corner of your screen uh so to dive in here and jesse the first couple of questions are from your presentation uh with the use of knowledge graphs can i perform impact analysis across the metadata platforms yes i i mean it is as simple as the yes because um impact analysis is naturally supported by the connectedness of the semantic um graph representation of the metadata but then impact analysis can even be further um uh promoted because you could actually bake some of that into your model so you could traverse and do impact analysis of the raw proper context metadata but then you could also add characteristics specifically for impact and even rules that help infer more impact getting a lot more out of your content that way so really this simple answer is yes i love it and another question for you um in one of the slides you were showing the quote-unquote rule shows on the knowledge graph seems to be the rule of the individual instance rather than the business rules from the enterprise perspective it was that correct um no if i understand the question correctly um i would say uh always think of the rule for the context first so a rule would apply to all instances of people not just one person um a rule would apply to all instances of data set or pii not just a given instance but when you did need to actually write a rule specifically about an instance you can um but primarily your rules are you're wanting them to be um part of your context part of the meaning part of the model more so than the instance data you want to be able to check the instance data given those rules and the model that sit above them a bit all right and to uh kick it off with you done on this question the um we purchased uh a metadata tool so what kind of metadata would you recommend cataloging first i will give a great consultant it depends right so that's terrible um but i would look at when we went back earlier in the presentation and we talked about who your state i would answer that question by the following who are the stakeholders what are the business use cases for that data even as simple is it a business user or a technical tool the question sort of said one of the leading metadata tools and what i find funny is that kind of folks on both end of the spectrum some of those tools are really aimed at business users and a great workflow and great glossaries um some are really based at tech in general i think i've told the story before of wanting to do a great technical implementation of metadata and my boss wanting me to start with a glossary and i i disagreed with him and found out he was right glossaries tend to be a really user-friendly way to start because everyone can relate to them both for tech and the business and it's a really great to set that context so and you may it's a great way to reach out to a lot of folks so glossaries are kind of a nice i don't want to easy first step because it's always complexity but they're very approachable um i've also seen success with don't overdo it with the scanning in of systems because that's where you just get a lot of volume and not a lot of value um but sometimes you can look like a magician by again if it's if this is one of your use cases pointing whether the automatic scanners are the system and saying you kind of wondering what's in this system here's a really nice visual way of showing all the tables and columns or even some of the lineage so again those are kind of two of the quick hits i've used on both ends of the spectrum there's a lot more and you would need to do some analysis but those are two ways i know i've started quickly both on the tech end and on the kind of business front end that might kind of at least at least get something that shows the value and then don't go too far down that road without showing it to someone um and then you could build from there but don't wait a year to populate it wait like a month and then start showing it and getting the word out and getting a lot of people involved with it just anything no that was that was great awesome all right so how can we enforce data quality over metadata can you name a few metrics and examples um yeah so there's a there's a couple there's you you can use uh even this data quality metrics our um metadata in itself but often just go do we have a definition for each one of these fields that could be a great way we have all this great information in our business intelligence report and only three of them have a definition that might be one um it could be at the system level we understand the structures of these systems or the integration between the system um it could be do we have a data steward which is a type of metadata for all of the the systems um do we have pii coverage on the different information so it could either be on the quality of the metadata do we have if we define these are the metadata areas we want to cover could be definition it could be security are those covered and then you bridge into the is it data or is it metadata quality um do we do we have things like definitions and and uh you know the security and things like that around it yeah you know in addition to that um using a standard that helps focus on data quality something like shackle the shapes and constraint language from the w3c allows you to actually target target exactly the idea of constraining and checking the validation and quality of metadata um and then you know really opening the door to metadata management in the big picture vocabulary management glossary management reference data management that opens the door for you to be able to do things like permissible value mapping um and realizing that once you have really good control over your terminologies and vocabularies you can really guarantee a lot of things about your metadata instead of just worrying about the quality of your metadata changing the subject here but you know regarding machine learning pattern matching how do you recommend those patterns be determined when there are potentially hundreds of different applications that need to be catalogued again I would sort of well I there's a couple things that you I would probably focus on in your business value so something like pii is a good work search security number in the us is what I mentioned because that kind of fits a lot of the different use cases and then you can extrapolate from there high value high business need you know I'm a I'm a bank I need someone's you know social insurance number and then Canada Social Security in the US right and they tend to follow a pattern so they're easy to capture and you're ubiquitous and they're high risk so that would be a great example of high business value high need high risk of getting it wrong and probably ease of tracking and not a lot of um you know if I see something and sell just the US one I'm more familiar with you know the four numbers and the two numbers and the you know I got that wrong didn't I need to for um there that's a fairly standard pattern to match some things are a little more complicated but those would be some criteria on that one then you can extrapolate ones that wouldn't fit in that category but that might be a good way to look um what where there's fuzzy areas maybe AI maybe you do need a human in the loop to kind of look at that so don't over AI either you know what I mean I don't just see a few thoughts no you nailed it don't don't overdo it and what you are doing make sure that you're documenting it yeah great point yeah AI isn't magic exactly AI is not magic perfect well said I love it and that's a perfect tagline to end on because that does bring us to the top of the hour here thank you both so much for these great presentations and things to tough quadrant for sponsoring today and helping to make these webinars happen really appreciate it thanks to all our attendees for being so engaged in everything we do I love all the questions just a reminder again I will send a follow-up email by end of day Monday with links to the slides links to the recording and I'll include all the additional information requested there with links to past recordings and the white paper and all the other good stuff thanks everybody I hope you all have a fabulous day and stay safe out there thanks Bona thanks Jesse thank you