 And welcome. My name is Shannon Kemp and I am the Chief Digital Manager for DataVersity. Thank you for joining the latest in the monthly webinar series, Lessons in Data Modeling with Donna Burbank. Today Donna will discuss data modeling and metadata management. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. And if you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the top right-hand corner for that feature. For questions, we'll be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag LessonsDM. As always, we will send a follow-up email with the two business days containing links to the recording of this session and initial information requested throughout the webinar. Now let me introduce to you our speaker for today, Donna Burbank. She is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She is currently the managing director of Global Data Strategy, Limited, where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa, and speaks regularly at district conferences. And to let you know, so with that, I will turn the webinar over to Donna to get us started. Hello and welcome. Hello, Shannon. Always a pleasure to do these, and thanks everyone who joined. So as Shannon mentioned today's topic, we'll be talking about data modeling and the ever favorite topic, metadata management, which always tends to draw a crowd here at the University. So Shannon mentioned a little bit about me already, so I'd have to go talk too much about this. Just a few notes, as Shannon mentioned today, we do have a hashtag LessonsDM, so if you're a Twitter person and want to continue the conversation online, please do. I am online on Twitter at Donna Burbank, which is easy to remember. And a little bit more about me in terms of metadata. I'm one of these few metadata nerds said proudly that have been doing this for over 20 years, so if anyone remembers Platinum Repository way back as showing my age, I was a metadata consultant. I actually wore those titles way back when across US, Europe, Africa, and Asia, helping some of the larger companies in the world manage their metadata. I went into the product, become the product manager of that product, both the mainframe and distributed. And then they kind of spent some time with some of the data modeling tools in the market you may be familiar with, the RStudio, Erwin, and really fleshed out some of their metadata and modeling as well. Rick Wilberks' book on the subject was also, if anyone knows or familiar with the Object Management Group, which they had the acronym OMG before sort of online Twittering and that sort of thing. So it's sort of funny now, but I sort of worked with their metadata strategy and things like the Information Management MetaModel and things like that. Also a big fan of DAMA and very have been involved in that for many years if anyone is part of the Data Management Association on a great group to be in. So that's a little bit more about me, a little bit more about the series. Some of you, which is, thank you very much. We do see some of the same folks joining every month to kind of be part of the series. Other folks kind of join in as a topic piques their interest. And we tried to set it up that way, just last month we talked about the evolving role of the data architect. That one was very popular partly because when you see the different topics, it really is an evolving market out there and it's not just plain old ER modeling anymore. There's so much more in just the role and the practice of data modeling is so broad it's really hard to limit it. So we kind of broadened out the topics this year and you'll see kind of a broad range from Agile to data wrangling this next month kind of what that means in the world of data modeling. Quick call out and I think Shannon will be putting it in the follow-up. Usually there's an email with kind of some of the questions and topics we talk about. We're actually doing a survey on data architecture and we're as curious as you in terms of, it was actually a hard survey. I'm doing it with Dataversity and some of their writers as well. There is so much changing in what's going to stick. Who's using blockchain? Who's using Agile? Who's using traditional legacy technology? So I kind of want to hear from you as well. So if you have some time, do take that. It's a bit long, but we sort of needed to because there's so much out there. So that's a plug for the series. If you missed any in the past, they are all on demand and you can catch them because I know everybody has busy day jobs and I personally hardly ever am able to catch a webinar when it's live. So I'm always doing it weird hours of the evening so we get that. Okay, so that is the past and the future and today we'll be talking about metadata and how data modeling and metadata are kind of nice cousins and how they fit together and kind of what that means for your organization and beyond. So I'm talking about surveys. We did a survey last year on, again, merging trends in metadata management. It was hugely popular. It's still available for download. That'll be in the links as well. And no surprise to me. Hopefully it's not a surprise to anyone on the call. But in the survey, over 80%, actually statistically speaking, actually a little higher than that. Metadata is more important now than it was in the past and it's growing and important. And I'm not saying that as a surprise because we see a lot of requests. Most of my customers are either asking for metadata as part of a larger data strategy or as part of a governance strategy or just a pure play. How do I manage metadata? So it was nice to see that confirmed in the survey, I guess is what I'm saying. But it's a growing trend. It's hardly never. And hopefully this webinar will come and help demystify some of that for you. In terms of what is metadata, if it's new to anybody here, I know this is kind of simple, but I think we can often over complicate metadata and say things like metadata is data about data, which of course I had to say that once in this webinar, but I won't say it again because I find that a bit frustrating in terms of a definition. But really it's just data in context. It's the business and technical context around your information. To be a little more specific about that, one way I like to talk about it is the who, what, where, why, when, and how of data on the document framework of data in a way. So we won't go through each one of these, but I think as you look through, if you're not familiar with metadata, it sort of starts to make sense, right? Who created this data, who owns it? If there's stewardship in place, what, you know, that's I think often what we think of when we think of metadata. What's the business definition? What's the technical structure? Where it's stored? Why is one we don't always think about and I always start with that with any project, I do, you know, what's the business purpose? Why are we managing this anyway? There's so much information across the organization you really have to prioritize and focus on the stuff that's going to have the highest value, the when, you know, is it going to be current? And then how, how is it formatted and that sort of thing. So you can read through these, but I find that helpful, especially folks, abuses for some sort of business stakeholders and that kind of get the nodding heads. Okay, now I see what you mean. Because metadata, I'd love to say it's sort of a funny word, but it is sort of a funny word, so we don't want to overdo it. They already think a lot of us techie folks are too techie anyway. But the key thing about metadata is it really is part of a larger landscape and we don't do metadata just for the fun of it, although maybe it is fun. But, you know, this is a, this is a framework we use in our practice all the time and I think it, we've gotten some good feedback but it really kind of sums up the challenge we have as data management professionals and we always start with the top down why are we doing this anyway? And I sort of alluded to it on the beginning of the call but I'll just talk to it now. I mean, we often used to get in my, we're on a consulting practice, there's a lot of this stuff and we would do metadata, there's always been metadata, right? But it's sort of been hidden. We're going to do governance, we need better metadata to govern. But I'm having companies come to me not only with data strategies, how do I make my company more data driven? I've actually had folks come to me and say, I have metadata and what's my metadata strategy and how can I become more competitive using metadata? I think people are getting it that metadata is kind of that glue in the magic dust that makes everything happen. So I'm seeing that as a positive thing. And these are business people, they understand metadata and the importance. But it does help manage the stuff in the bottom too because we can have a great strategy but unless you understand whether it's unstructured data or structured data or big data or documents, there's metadata and all of that. We won't go through all of those today. We'll kind of focus on the relational side of things. But they really are kind of the glue that fit all those other things together in the middle. You know, master data or data warehousing, BI data quality, you can't get any of that without metadata. And then the governance is kind of that people process policy. We have a couple other presentations online if you're interested in that we've done in the past on governance and really how metadata makes governance actionable. So it's great to have policy about data and what's PI and what's our information policies, but unless it's actionable through the metadata and the lineage and that sort of thing, it's not so helpful. So again, that's another way a lot of my customers come and say, I've got governance, but how do I make these policies real and actually link to those physical systems on the bottom? So hopefully that's a helpful framework to kind of put what we're talking about today in a larger context. The other sort of larger context is really metadata and that exists not only within your organization but also beyond the organization when we think about open data. You know, the first thing I look at at open data is the metadata around it. We think, well, who created it? Why? What was the purpose of this data? That's a big part of open data that might have been scientific research, but what was the context of why? How were they tracking it and what was it used for then? So we'll be talking about data models, but I think there's a lot of other pieces of metadata that exist within your organization and beyond it. And really that's the crux of trying to get that full context of your information. Another slide from the survey we talked about early on, this emerging trends in metadata. I found this really interesting. I was, whenever I do these surveys, with the diversity, I'm a big old nerd and I was curious myself what some of this stuff would come out. So we did a question, Naomi, what are people using metadata for now? And that's, you know, probably not huge surprises. Things like data models and relational databases and data warehousing glossaries, those are kind of the traditional ones. But I was curious, what are people going to be using it for in the future? And again, not too many surprises there when things like big data becoming bigger, NoSQL, some of those, some of the fun things I did find interesting is one of the most popular ones in the future was legacy systems, things like COBOL and JCL. And again, that's probably not what people are building their new platforms on, but when you think of as people retire, metadata is that, that's the documentation that we wish we had when Joe retires and what was in Joe's head, right? So I think metadata is that magic glue that helps not only different types of technologies, but also past and future. It helps you leverage when you're trying to get to new technologies some metadata that stays constant, really definitions around your data. So I found that kind of interesting in terms of what people are using metadata for and what they're planning on using metadata for. And data models are a good source of metadata. And I want to put this in context. It's not the only source. Kind of the focus of this series is sort of data modeling and how all these other areas kind of relate back to that. So I don't want to limit metadata just to data models, but so many, when I was a metadata consultant years ago, and we would go into some of the largest organizations on the planet and do their entire landscape, and they would literally spend millions of dollars on this robust system. And really when it came down to it, some of them were really just using it for relational databases and glossaries, which is great, but a lot of that was in a data modeling tool. And so if when you do your inventory, and we'll talk about that later, but that's one of the first things you want to do, of what type of, who, what, when, where, why, what, who, what, where, when, what type of metadata are we tracking? Key thing to think of. And if the majority of what you're doing already is in relational data models, a lot of the tools, in fact most of the main tools on the market today have wised up to that and do have some sort of metadata component that are sort of metadata repository light. I guess I'd like to call it. Because if you're building these models by definition, this sort of tracking your technical metadata, what's the structure of those systems, and as well, if you're doing modeling correctly, the business metadata, what do I mean by customer? And so the beauty of it is that you're already doing it as part of your data job, if you're a modeler, so leverage that and publish that out to other sources. So that's kind of the crux of what we'll be talking about today. Nothing wrong with the full metadata repository. In fact, many times you will need something more broad. But as a starting point, or if the majority of what you're working with is relational, and you're already doing modeling, leverage what you have, I guess is the point. So a little bit more in terms of what we're talking about, in terms of metadata, if this is a new concept to you, data versus metadata and metadata, we could keep going. But if you think of a spreadsheet, which is kind of a rough approximation of a database, if we want to think that way, it's sort of the roles of the data. The fact that Joe Smith is in New York and he purchased something in 1970, that's the data. The metadata, think of it as the column headings, the fact that Joe is the first name and Smith is the last name and City is New York. So kind of put some additional context. So you might feel sorry for my friends, because at parties I actually talk about things like metadata, and I was trying to explain this to a friend, and there's things like, well, yeah, but that's pretty obvious. Why would you need a whole, you know, City is pretty clear. Is it, you know, we just know this stuff. Well, maybe not. What if your column headings were something like this, right? I'm sure we've all seen this. Okay, so somebody has some sort of logic behind it. There's some strings, string one, string two, text one, two, three, text one, two, seven, the date field. So yeah, that's metadata. You can probably infer some of it. But if you didn't see the data there, you probably would have no idea what those felt fields meant. So the names don't always have that additional context. So that is why these other pieces of metadata can be helpful. The other part of that, that even if it were named like the first case, you know, first name, last name, city, if you've been doing game modeling, you probably know there's a lot of subtleties to that. Just think of city. Is that the city where the person lives? Is it the city where they purchased the item, where the store is located, where the billing address is? Without that extra metadata, you don't know. Or even something like first name, last name. It could be a business rule that in the Asian market, what we call first name is really the last name. So there's a lot of different subtleties, even something that seems very simple. So that's where some of that metadata adds the context and definition around seemingly simple things. We'll talk more about that as we go. So just some other examples of that. You know, this idea that especially in the data model, the beauty of them is that you both have that technical, which describes the structure and the format. So if you're familiar with databases, that's sort of some DBL, or data definition language on the left. So however, to create a table, an employee that had a name and first name, last name, social security number, that's kind of the technical data structure. And then the business metadata is what do I mean by an employee versus the customer? You know, the ever elusive, what do we mean by customer? And there's so many subtleties to that. And then the data is the actual customer themselves. And we often can forget that. But think of things like GDPR now, or the European General Data Protection Regulation, and there's many more. Think of privacy. You should be thinking of that actual person who's John Smith. You know, we sort of obfuscate people in a way. We start thinking of it just in terms of, you know, bits and bytes of data, but there's an actual human being there. We're doing something like master data management. You're trying to get that single view of customer. Well, that metadata actually represents a human being out there. So always good to remember that. And then the next slide is just a few more examples of what we mean by business and technical metadata. We can keep it very simple in terms of, you know, column structure. But, you know, even that gets more complicated. Things like keys and validation rules and nullability rules and permissions and ETL and all of that. And then on the business side, you can add, you know, end levels of detail there in terms of definition and stewardship or, you know, speaking of privacy and security, a lot of companies are using metadata for that very reason, et cetera. So there's some examples there. And speaking of the business side, you know, we often, if you're a technical person on the call, we kind of think metadata is ours, right? That's a techy stuff. So, but business people do get metadata and they need metadata. And so in the survey that we referred to, over 80% of the users are from the business. And there's a quote there that, you know, we really helped, that's what helps us understand the data. And I often find my little color commentary. I think sometimes the business quote gets metadata the more than IT does. I think sometimes IT sees it as a burden. Oh, yeah, well, yeah, I just know what that table means. Well, you do, but it's everybody else in the organization. If anyone reads TDAN, I had sort of a comb several months ago on, you know, metadata is actually the marketing for your system. You know, you wouldn't write a book and forget to put a title on it and tell anybody about it, right? So, you know, the database is your art. You know, make sure you publish it and let people know. I had one, one of my favorite quotes from a business person is, we were trying to start a metadata project and we went, we're trying to explain the benefits of understanding the technical structures and the linkage and the lineage and the business definitions. And this project, and she looked at me and said, you mean you're not doing that already? She kind of figured that's what you guys had locked down. And if we couldn't get away with that and finance not knowing the lineage of our money and where it came from and how it was stored. So I think, you know, the more business-ized you think of self-service BI, we got a webinar another couple months ago. I wanted to catch that. More people are looking at the data and want to see the metadata if I want to write a report or I kind of want to know what this data means. So this is folks, get that if you are a business person on the call. You're probably nodding your head. And I, too, folks, remember that and they can often be your best sponsors. I've had some great marketing people be my best champions on a project because they get the needed data and they get the need of metadata. So I always throw a few data modeling cartoons in these because I have them and where else can you use data modeling cartoons? And you've probably seen this one before and it's not that funny. But it sort of is if you've been in the business, you know, okay, we've built this application within all the testing. We're ready to roll out. Just one small question. What do we mean by customer? Right? And I'm sure you've all had various flavors of that. I've worked for many Fortune 100 companies around the world that were named Nameless because you don't need to because it's happened to probably everybody where you do something like send renewal notices to people who don't have your product or send purchase notices to people who already have your product or et cetera, et cetera, et cetera. And I think that seems the quote easy stuff so people forget it. But don't forget it because that's really the crux of what you're doing is what do we mean by, well, why are we doing this anyway if we think about who, what, where, why, when? The other piece of that is the, I like to call it, avoid the, I just know, that kind of alluded to that earlier. I think so many of us on both the business and technical side, in the business side, you could say, well, yeah, it's part number. How hard is that? It's the number of a part, right? There's a lot more context to that. So avoid that, I just know. Think of if you were here in training, what would people maybe want to know? Maybe this guy, Joe, who wrote the Kobo program and he's retiring, that used to be called component number before we were acquired and all these different other pieces of information that may be obvious to you or probably not so obvious. So write it down. Take that extra three seconds that it takes to write it down. All right, maybe 10 seconds. But you know what I mean. It's not that hard and I think the value could be great and worth thinking of business definitions like that. And it could be expensive. I mean, there are real-world costs. I mean, we could, I mean, of when I do a data modeling or metadata class, I said, if we start out, it's one of those ice breakers, tell us one of your horror stories or success stories about poor metadata having real-world implications. I've never been in a class where no one had anything. There's always something. Here's some actual numbers and you can use these. They're quoted in some of your own projects that the U.S. economy can lose $3.1 trillion when it comes to data quality or things as simple as mail being under delivered because the data's wrong. And data quality really comes from metadata. We think of things like data science or self-service BI. I mean, the quote there is that people might spend 80% of their day getting the data right. What you want to do is find the discovery. So clean metadata really helps you get to that easier. Probably one of the more famous ones that we love to quote because this one actually was directly metadata if you've heard of the NASA Mars Climate Orbiter. Where they lost it. That was a $125 million mistake. Very easy to quantify. And it literally was a metadata issue. So they were sending the data for the thrusters that kind of sent it up. And it was in English units rather than metric units. So they kind of missed the mark. So it was off course and it got lost and that was a very embarrassing thing. So not only did they lose a $125 million asset sort of was embarrassing. So you had that brand and reputational damage. You'll think of your company, something similar. You send the wrong mailing to the wrong person. Not only did you waste that mailing, you doubly wasted because now they kind of think you're not that professional and probably less likely to buy it from you. And the thing that's easy to forget is the lost opportunity. They were sending that up there for a reason to do this great research and we never got to do that. So that's kind of a very popular one and really it was just something as simple as. And someone probably thought, well I just know metric of course is metric. We only do scientific stuff in metric. Well somebody didn't know and they did it in English units, right? So that's that avoid the dreaded, I just know. To give NASA a little credit because it's always easy to pick on people. But a couple of the other presentations we give, I talk a lot about open data and actually NASA has some awesome open data sets and their metadata is great. So they've sort of learned their lesson and generally they say who published the data, why it was used in summaries and they have actually some great examples of good metadata for some of their data sets. So they're not all bad. But that wasn't a very embarrassing example. But that sort of is the summary, right? Just like your parents said when you're a kid, you get into trouble once at school. They remember that. They didn't remember all the A's you got. They remember you got in trouble. So this kind of is metadata too. So getting this right on the other hand can offer some great not only efficiencies in terms of knowing your information but in terms of reuse. And that's one of the biggest challenges of most organizations is A, if we think of that idea of kind of that single view of customer, do I even know where all my customer data is across the organization? So what some companies do as a start is kind of do that inventory based on the metadata. With the data modeling tool, they kind of have these scanners or reverse engineers or whatever you want to call them. And you can get that source of information. And then you can start to do the rationalization based on some of the matching rules. Is this the same version of customer? Is it different? And how do we kind of get that standard reference metadata? So as you think of kind of a golden record and MDM of what's the golden record of customer, you can also have those golden record in terms of standard metadata. So part of the problem sometimes, we're all human, we're trying to get things done, but I need to create a customer database. If there's nothing out there that I see, I'm just going to build it myself. And when you build it yourself, it's always going to be a slight difference. In my company, we kind of had a shared directory. Just the other day I saw this. And we are data people, right? So what do we do? We do data standards. And we were just creating a directories structure. And it was for a conference we've gone to for the past three years. And everybody for those past eight conferences all named it something different. It was London Conference X, conference in London. It was actually hilarious. How many different versions you could have of the same conference name? And we as data people even did it. So I think publishing a standard, had I published a template and said, this is how you publish the name of a conference that wouldn't have happened. So the same thing, if someone's going to create a database and there's RDDBL they can reuse, or there are standards that are published, people are more likely to use it. So that is the beauty of metadata. It really does help with that consistency. It does lead to more consistent data itself. So part of what can help with these are these idea of metadata discovery tools or scanners or parsers or every tool kind of has their different name for it. But the beauty of that is that they're smart enough to kind of go through and read these systems of the data dictionary if we're talking about relational databases and kind of understand that structure and then populate them to some sort of storage repository. So I put kind of repository and parentheses there. So if you're the full metadata repository, they're probably going to have that store with a lot of other sources. But if we're talking data modeling repository, a lot of them have kind of broadened up and they store what do we call a metamodel. I think of a metamodel as a data model for metadata. I always talk about the meta levels. But when you think about it, a table from Oracle and a table from Sybase and one from DB2 and a data model and a spreadsheet and a cobalt copy book, they all kind of have some similar things, things that are kind of like tables and kind of like columns. So that's where the idea of some of these metamodels can kind of do that rationalization. These things look a lot alike. We're still talking about customer first name. Can we kind of either link them together or kind of rationalize them together? And that's the beauty of storing it in a singular place. You can kind of do that linkage between them. I do give a call out at the end. We have a whole diversity class online course. And we do go in a little more detail on things like metamodels and how this all works. So if that little tiny piece of it was all interesting to you, we kind of go into more detail in that course. But for now, I just think to kind of store it one place. Like your data model, there's a metadata model behind it. And then once you have that kind of model that can do the linkage, you can do things like data lineage. And a very common example when we're thinking of things like data modeling tools is relational databases and warehousing and BI. So your data modeling tool probably has all your data structures in a physical data model, or could if you want to. So I have a customer table in Oracle, one in SQL Server, one in DB2. I may have created a staging area. With a lot of the tools now, they kind of have built in. There's one popular company based in California that kind of does metadata scanners for some of these. They use the same one everyone uses, which is kind of a benefit because everyone's populating the same way. And so if you're using any tail tool, you can use one of these scanners and it can populate your data modeling tool. Usually there's some sort of source of target mapping. And it can get either the table structures and sometimes the mappings as well into something like a staging area or into your warehouse. So again, if you have models there as well, your logical and or physical, and then you may go to a dimensional model in your warehouse, but again, the lineage and the metadata in those relationships can be pretty well tracked. And then a lot of the tools, again, they can do things like parse your BI reports and kind of see what field has been used so that elusive, I want to see the field on this report and kind of how we can go back. Some of the tools are better than others, but many of them can get a lot of those pieces. Or the other way around, some can kind of push back out to a BI tool. I had one customer that was kind of taking all the definitions from the logical model and pasting them into the BI tool and they found out there was kind of a bridge between them that just did that automatically and saved them about a month of time not having to go do all that. So there are interrelationships between all of these things. And when you think of it, there are things that kind of look in some very simplistic way of an entity and attributes or table and column and the different permutations above it around it. And you can link them together. And I know that's an oversimplification, but not really. A lot of these things are more similar than maybe we think about. I mean, the other piece is kind of an impact analysis or where used. So I'm going to change a field. I'm going to change the length of a column. What else is going to be affected? You know, the DBAs in the column can probably even nod their heads, right? Just don't break anything. So we can't just go randomly changing columns. It's going to be affected and what's going to be affected on the front end. So again, if you have all that in some sort of repository, you better do that sort of impact analysis and see what's going to change. Or I have something like PI. I have everything in a repository like this. Something like GDPR comes up. You need to show the lineage, be doing customer data. You're a step ahead because you have that lineage. Other people are playing catch-up. So this not only helps you be more efficient, but once you have this, the amount of usage you can use is kind of your world is your oyster. There's a lot of different things you can see from the metadata itself. Part of the reason you can do some of this is folks are using different model design layers. So if you think of conceptual models as kind of your business concepts, so kind of your business definition, what do I mean by a client? What do I mean by a customer? Is it different from a client? It's a logical model. And again, we could debate this picture all day that people have slightly different definitions, but hand wave. Conceptual is kind of your business concepts. The logical is still at the business level, but you're creating kind of rules between it kind of customer have more than one account. What's the attributes on customer? And then the physical and many of the tools, you can do it from all these layers, link them together. So they had a lucid, I have this term called customer. Words that live on the different databases. Is it Cust on Oracle? Is it Ctable on the score 16 on DB2? And how do I do that lineage and kind of the mapping between them? And so if you have this design layer relationship, that's yet another way you can do this lineage. Again, think of something like GDPR and where's all your customer data? Well, I have that because I've been forward engineering from a single logical model, for example, or some of them could do this after the fact, or you can create kind of mapping rules. That I know that when I have CUST, that means customer, right? Or I know that these are different kind of mapping rules. You can do that. Some of the more pure play metadata repositories are kind of doing more kind of AI, and they can apply, well, Jesus, this looks like an email address. I'll link it to things called email. Some folks can do it that way, but you're more traditional data modeling. You kind of do those mapping. So you kind of create your naming rules. Or proactively, that I know that when I have customer, it's always going to be abbreviated as CUST, and not CX96 or whatever. You kind of proactively create some of these. So we've been talking a lot about relational, and I kind of hinted at other things like BI tools and things like that. But kind of wanted to talk beyond relational databases, because any of us in the business know that there are things beyond relational, especially nowadays there's so much. And most of the data modeling tools, again, out there, do support some of these other sources. And I'm not going to list everyone on the planet. Again, that data modeling course that I talked about, we kind of do go into a lot more sources, but just to kind of keep things in scope, you'll get the idea. And I also don't want to be too redundant on some of these. One of the big ones where you can get some benefit, and there's some kind of add-on tools that can help with this, some of the data modeling tools have that module. When you think of something like a CRM or ERP system, your sales force, that kind of thing, those really have relational databases underneath them. And when you're trying to do things like create an MDM hub or a warehouse, and you're trying to, you know, a lot of your company's most important information, say, about customers or employees are in these systems. And if you ever just try to reverse engineer them, which I've done, and two days later it came back and it looked like something on the left, you know, all those kind of random tables with funny names that are very technical or even in German and technical or, you know, can be very hard to understand, some of the tools out there can actually translate that to more business logical metadata. And you can see that mapping. So if you want to integrate these with your other systems, you can kind of get that metadata behind it. Not only the technical metadata, but some of the business layers as well. And then when we think of things like NoSQL, which means a lot of different things, there isn't just a NoSQL. But document databases, for example, they, you know, some of these, quote, NoSQL has better metadata than others. It's probably my quick summary for this webinar. But documents kind of, document databases have a little bit more structure than others. And a lot of the data modeling tools can kind of support that. So you see that it looks a little different and they have things like collections, you know, it's a different paradigm. But you can get some of that metadata as well. And what the important thing about this is, you have customers. Remember that picture we have Joe Smith, who's a customer. He exists. He has all this information about himself. The fact that it lives in a MongoDB database versus a side-based database is beside the point, right? You want to get all the data you can about this person or wherever it is. So the more that some of these data modeling tools can support that, it really helps you get that bigger 360 view. No SQL is not as friendly when it comes to metadata. Things like key value pairs, kind of by definition. I mean, it's great that they're super flexible. They can be super fast. If you're trying to do something like get all my session information, you know, click someone's using on a web page or something, awesome. If I'm trying to store long-term, you know, customer names and addresses, it doesn't have the more traditional metadata that we're used to. If you just scan it, often that's on the application code. So you can get some things, but it's not necessarily as robust in terms of where you're used to in kind of a relational land. I'm motoring on through apologies, but just kind of give the idea of COBOL metadata. So for probably most of you on the call who've never had to code these, think of it in a way that's, you know, a bit like your data structure. So here's a picture of a COBOL copybook. And if you've never seen one, you can probably start understanding, okay, I see first name, last name, date of birth. There's some kind of data types here. I sort of understand. Well, a lot of these data monitoring tools can kind of translate. Some data monitoring tools are better than others. Some are starting to natively model some of these artifacts, the newer ones, probably not going back in natively modeling COBOL. But for things like a COBOL or for things that they don't support, often they kind of, well, I'll say sort of fake it, right? They kind of map it to a relational model, which often is good enough. I mean, I think often we worry too much about the details. So, you know, I could argue all day on this, we have things like classes and we have things like tables. And at the end of the day, if we're just trying to get simple things like the fact that we have first name, last name, and date of birth across these systems, maybe we don't need it stormed as natively as, you know, exactly how it looks in the native structure. Maybe we're just trying to get the high-level metadata. In some cases, you do. I mean, that is why you want a full metadata repository that many of them do keep it all in the native format and go back and forth. And the reason I put cobalt in here is not because I am a damasaur, but because it's actually growing, and partly because there's no people to do this anymore. They're all on the beach with a margarita somewhere laughing their way that they coded this years ago. But that, you know, when it gets back to, I just know. I mean, that's sort of the reason of the beauty of metadata. It should go beyond human beings that coded this and kind of leave the legacy for other folks that might not remember how you coded it. XML is another kind of classic one. And again, there is some metadata in XML itself. But when you think about if we're going to simplify things, we're just trying to see where name address is used. A lot of the data modeling tools can get that in now. And again, depending on the tool, some model it more natively, which is more of a hierarchical structure, and some kind of map it to a data model. So everything kind of looks like an ER model. And there's pros and cons to each. But literally, if we're trying to do the broad brush of where are all my customer names, it could be helpful no matter what the format. Similarly, a lot more folks are kind of moving to the JSON kind of world. And again, if we're thinking of I want to see everywhere where price is used across the organization, JSON has its own structure as well. So again, I couldn't have spent a whole day on the different types of metadata. But I guess just quickly, just to point out that either you have data models doing things like for relational databases that you awesome sources of metadata, structural business metadata if you've been putting it in. A lot of the data modeling tools do support some of these other systems. Most of them have some of these bridges you can look and just take a look at them. You'd probably be surprised at some of the sources between BI tools, ETL tools, some of these legacy tools, and often it's free as part of your license. So look for it or ask for it. You might be pleasantly surprised. You don't have a budget for a full metadata repository and you kind of want to use what you have. There might be some hope there for you. Okay, so I have talked a lot about tools and tech and all that, and big fan of that. Also a big fan of the bigger picture. So just like you may do a data strategy or trying to understand how to use data, metadata needs to be treated just like any other kind of data and really plan it out. So like anything, who's using it, right? So we talked about before some of the biggest users are business people, but also techie folks as well. So it could be a business person saying, how did you define regional sales in this report? That's business metadata. It could be a developer saying, I'm going to change this field. What am I going to break? Or one of the sources of target mappings if you're a data warehouse, et cetera, et cetera, et cetera. The more metadata you have, the more folks are going to be using it. The weird thing about metadata, it seems that we all hate creating it and we love consuming it, right? I mean, just look, I'm a big fan. If I haven't used the word nerds describing myself yet, I'm not going to go ahead and do that. But yes, I will on a Friday night be looking at open data sets or something out there. There's so much exciting information out there. And when you find a site that is great metadata documented and who created it and why it was created and what the fields mean, you just want to hug that person. Also, I've seen stuff that's just crap. It's just some field with X963 and you have no idea and you don't use that, right? So it's sort of your legacy to others that if you've built this data set, please document it so other people can use it. It does not take that long. So please do it. So because other people do want to use it. So when we think of publishing this information out to other users, again, one of the common artifacts for metadata and when you think of the business side is a glossary. So that could be the business terms, their abbreviations, the stewards, security levels, a lot of that. A lot of the data modeling tools out there. Again, I don't want to overstate, you know, there are beauties of a full-fledged repository. There's also a lot you can kind of do with what you have. But a lot of the modeling tools have kind of figured that out. And when you think of, say, a logical model, if you're doing logical modeling correctly, you probably have something called, you know, employee number or what interest rate is and what first name means. All those definitions kind of can be published out to the glossary. So if you already have some of this information, you can help, you know, publish that, make sure it can be gotten out to a wider audience. So to kind of summarize that, because again, each of these tools in the market, I mean part of what could make anybody's head explode in the market is that, especially with metadata, there's so many different options. But again, when we're talking about metadata tools, it was the topic of this conversation to kind of put it in context here's a little chart, right? So what data modeling tools, and there's some examples of some of the names, I'm very careful not to mention them. You can figure that part out yourself. What they can do, I kind of have the big X for what they do well and the small X for lesser support. So of course they do data modeling well. They can do some of that lineage. They can do some of the metadata storage. You know, they're probably not going to let you customize it so fully. You're probably, they have some glossary, but it's not a full-fledged glossary. If you want to do full-fledged glossary, maybe something like a data governance tool, like a CLEBRA or DiACU, which is now Informatica, or you know, metadata repository, sort of by definition folks like ESG and an adaptive and data advantage group, they do metadata, great. What they're not going to do is data modeling. That's not what they were meant to do, right? That's what a data modeling tool does. You know, a spreadsheet I put there, because it's probably the number one competitor to do almost anything on a spreadsheet, big fan. I do a lot of stuff on spreadsheets, especially when I'm testing things out. I'll do it on the spreadsheet first. So you could. They're really not designed for that. So you could have a glossary on a spreadsheet. It probably wouldn't be on a prize scale. So again, there's no right or wrong answer to this. So if you're already doing data modeling and you need a lightweight glossary, think of that. Or you're doing data modeling and you want some lineage between BI tools. That's fine. If you're doing Internet of Things and a lot of application coding and you have, you know, 17 different data sources and you want a true full lineage, maybe you do want a metadata repository, I guess what I'm saying. So, but just, you know, this is kind of a helpful when is what I have good enough. And if I'm doing a lot of data modeling, I can probably leverage a lot of the tools I have. So again, when we think of metadata management itself, there are components to really make it successful. We've been talking a lot about the tech. That's only a really tiny piece of it. So you should have a strategy. Why am I doing this anyway? Right? If I talk to all my stakeholders, what are their pain points? Is it they need better definitions? Is it the technical structures? What are their goals? What are they trying to do with it? How am I capturing and storing it? So that's kind of what we were talking about already. What are my sources? What is the best storage and publication mechanism? And then when we think of publication, who are those users? The little people figures I had in the previous. Is it mostly the business? And therefore, maybe something like glossary. Is it mostly tech? Do they want it in the data model? Can I just publish DDL out to a DBA? Can I send code snippets to somebody? So again, a lot of what these metadata tools can do is implement it natively in somebody's day job tool. So maybe they don't want to go out to a separate place and look it up. But if I can just import it into the tool I'm using and there are the definitions, sure, I'll use that. So give that some thought. Before you build the glossary, but it's all DBAs and all they want is the DDL. I'm not going to go look up a glossary. I want the actual data structures. So to give that a little bit of thought before you start building your system. And then how are you going to govern it? So data governance is hot. I think every company is looking at how they can have better governance of their data. Well, you need to think of how you govern metadata. So do you have roles or responsibilities specifically for metadata? And there might be overlap between some of your metadata governance and data governance roles, but it may be something completely different. And do you have standards for metadata? And do you have life cycle management? Do you have metadata quality statistics? Right? So we talk a lot about data quality and data profiling. What are you doing that for metadata? Is for every data element, is there a definition defined? Is it a business definition? Do you have privacy tags on it? How are those tags cascaded across your organization? Is that metadata integrated with other tools or is it standalone? And give that some thought, because metadata is a thing in and of itself. So when you think of metadata strategy, kind of some of the components to look at is can't stress this one enough. Business drivers and motivation. So I just did a big metadata strategy for a couple of customers late last year. And they were both probably on the opposite spectrum. One had one of the big metadata repositories technically had everything. It was amazing. They had lineage, they had glossary, they had all that. But you know, when they started tracking user statistics, nobody was using it. They were spending a lot of time populating it, but they really hadn't done that due diligence and talked about what people wanted to see or publishing it or getting the word out. Another customer had done a great job selling the need for metadata. Had talked to the governance team, had talked to the business team. Before I walked in, people were saying the word metadata. Really had a lot of ideas. We really had to spend time prioritizing what we could do first because everybody wanted it. And even though that second company technically was probably level zero on the maturity curve, they didn't even have a metadata repository. I gave their chances of success a lot higher because they spent the hard time before they even did the tech really figuring out the why and the how and getting buy-in and prioritizing. And in a year, when they have a smaller repository, everyone will be using it and they can show ROI. We built KPIs and all that. The other folks, they had had a big, expensive project and now they were trying to sell it. And that was kind of putting, as they say, put the part behind the floor of the course. So awesome technical. I don't want to knock them for that. But don't forget the business part because that's almost more important. You can have a grade. One of my first metadata repositories, successes, and I can't give myself credit. It was my project manager at the time, from Hughes Bank in New York. We published a glossary. I'm a being the techie whiz kid and I wanted to do so much more. And they said, no, you've got to get the buy-in. And everybody used that glossary. So we got to sell messages and the lineage and all the stuff behind it, but we got the buy-in first. So a bit of a tangent, but not really. That should be the number one thing. Assess your metadata management maturity. They're on the right and I kind of mentioned that already. Where are you? So it may be that everyone in the business wants full lineage and you have nothing. So set expectations and we need to build that out. Or it could be, hey, we do have this great metadata repository. Guys, we could use it for this. Or hey, new project B. Do you want to use what we have? So give that some thought. Look at your sources and technology. And when you look at a tool, that does seem so obvious. But when you are looking at vendors, find what you need first. And make sure to be clear with that when you're doing any evaluation. Do you support these six sources? Do these six sources I have? I don't care that you have 400 scanners. I have six sources. Do you match these? And be a little tough there because that is important. And then consider your stakeholders and audience who's using it. Why are you publishing the information to the right people in the right way? So when we think of publish, we always sort of think publishing, literally a web page, a newsletter, that kind of thing. But for some, like I mentioned, it might just be importing into their development tool. Again, they see quick definitions. I had one customer that built it into the Agile Lifecycle. So when they're using things like JIRA, they kind of had their metadata or data questions around there. So again, think of who your users are. It isn't always a business user. It isn't always tech. It would be a combination of those. So give that some thought. And like anything, talk to people. Get direct feedback. Find out what those pain points are. Some of these, when you read through, they kind of are obvious. We're the data structures, right? But some are a little more, you have to give that some thought of we're spending this much money to clean up the data because there was no metadata. So they have to make that connection for people. If we have metadata, it'll help you, their problem. So then when you go back and solve the problem, you can tell the people, you know, you didn't have any data structures, now you do. You don't know where the data standards are. Now here they are. It kind of helps you in selling it down the road. And I'm a big fan of, because I started out modeling, everything I do is a model on a consultant. So we have matrices and for everything, but they're actually very helpful. Keep making complex things simple instead of our art, and they're going to be really helpful. So I'm a big fan of kind of, here's one template we use of what are the business drivers? Write them down. Are we trying to get to a digital organization? Can we explain to management that you can't be digital without metadata behind it? No one knows where to go. And then why can't we get there? Well, you'd like to be all digital self-service, but we don't have our data integrated to get there, for example, or it's too expensive to do it because the data quality's bad. And then map that to the things you're trying to do. Often I do even a heat map of, these six things are solved by these two things to kind of show why you're doing that because it's often, if anyone's ever been in one of my classes, I generally have people do an elevator pit, right? So if you go up to management and say, you know, we're going to rationalize all your data sources with a great metadata lineage. You know, they're probably not going to jump for joy. That sounds really nerdy. But if you can say, I'm going to help with your journey to digital sales and help you sell more because we can have a single product catalog. You know, that might be more interesting. So again, link everything you're doing to some sort of business driver. When we do talk about the tech, again, big fan of having some sort of heat map, not only in terms of, again, if you're evaluating tools, you're probably doing this already, but if not, write it down. Make sure these are all the sources I'm using when I did the inventory, my relational databases, my BI tools, I'm using open data. And make sure you ask the vendor how you support those and they support it. And then kind of do a heat map of who's using it, right? So here you'll see that everybody's using Oracle. And so, again, you may be looking at a tool and maybe they don't support, we have one here that no one uses. They don't support the open data, but really only three people are using it. They don't use MySQL, but only marketers using that. And we can probably live with it through some other means. Again, sort of obvious, but kind of create that heat map sometimes. And I've done this myself. You get so stuck on the thing, oh my gosh, they don't support this. And stepping back and saying, is that really important? Or could we get 90% of the way with 90% of the coverage that we need? So, again, metadata roles and responsibilities. Do people have either specific job titles around metadata? Or as is with things like governance, is it probably a part-time role, but they understand that metadata is part of their job. So do you have, you know, things like a metadata repository administrator? It's probably the full-time metadata job. But folks that who are the consumers? Are they sometimes odd to be sort of compensated against that? But when you think of it, are you actively using the standards that are out there? We have standards. Are people being held using the glossary? Are you publishing metadata if I'm creating code and my documenting it? And then do you have someone like an executive sponsor that kind of gets why you're doing this and helps either carry the stick or talk about the benefits and that kind of thing for you? And then as I mentioned, metadata quality and metrics. So we often think of data quality metrics. Is it complete? Is it accurate? Is it up to date? Metadata is another type of data. And so to keep track of that, yeah, I documented this code last year. Well, is it still current, right? So I know when I mentioned open data several times, but that's one of the first things I looked at is when is this data set published? Is it new? Is it 12 years ago? It really depends when you're doing analysis on it. So monitor that. It could be as simple as are people using it? Are we showing hit rates? Is it entered? Or you can get very complicated. I'm working with a customer now that's actually being very detailed when they put a definition, different components of a definition that have been filled in. Not a part number is the number for a part, right? But actually they have different metrics on how complete those definitions are. So you can go crazy. You can be very simple, but do give some thought of the quality and the metrics around your metadata. So in summary, metadata is for the cool kids. It's more important than ever. It's always been important, but I think it's more and more people with things like open data, with data-driven organizations, especially with more business people looking at data. They want to know what it means. Crazy idea. Data models, if you're using them, are a rich source of metadata. So leverage that. You can probably do a lot with what you already have in-house. You can augment it, especially if you're using some of these other sources. They can kind of fit nicely together. And again, I'm the biggest fan of the full metadata repository. If that's not in your scope, and a lot of what you're doing is some of the things we've already mentioned. Often these metadata repositories can be just enough for even just the data models themselves. And don't forget to organize some considerations. Why are we doing this? Has everybody bought in? Are we tracking it? And you might laugh, but have fun. I mean, there's so much out there, and even metadata is evolving. And hopefully some of these webinars are showing. I mean, the new technologies, and how they track it, and it's really kind of a fun time to be in the biz. A little bit about us. We do this for a living, so if you need help, let us know. Here's my contact information, and Shannon will send that out. I mentioned the white paper several times. This is free for download, not only on the Data Diversity site, but on our Global Data Strategy site as well. I think Shannon will be putting a link to that in the follow-up. And again, if you are so interested with metadata that you would like more, we have a full online training for that as well. So without further ado, I'm a terrible multitasker, but I did see a lot of questions coming through as I was speaking. Just quickly before we get to that, do try to join us next month as we talk about data wrangling and data munging and all of those new terms out there as well. So Shannon, if we could open it up for questions. I think we've got about five minutes. Absolutely, and we've got a lot of great questions coming in. Just to answer the most commonly asked questions, just a reminder, I'll be sending a follow-up email by end of day Monday with links to the slides, links to the recording, links to the additional things that she's mentioned so far and anything else that comes up. So diving right into it, Donna, to understand the horizontal data lineage, you need a vertical lineage connecting the physical to conceptual. This seems to be a brute force exercise and quite time-consuming. Any ideas how to shortcut that effort? It should not have to be that brute force. So there were kind of two questions in there and maybe I misunderstood, but the horizontal lineage, some of that, again, these scanners or interfaces can kind of pick up. So some of it could do the mapping with certain naming standards. Some can kind of do best guess based on the table matching. So some of that can be automated. Having those good standards in place always helps. The earlier you start, so things like the top down that the questioner mentioned, that can be a little hard to do after the fact. And so sometimes getting that into your best practice early can help, but if I have a logical model and I forward engineered from that. So in some cases, it's enough to have a separate logical model and kind of have that as long as you have those definitions, but it doesn't always have to be fully linked to the physical. I'm a big fan of lineage, but sometimes search works. So if I have that all in the repository and I see the definition of customer at the logical level and I see the customer, sometimes that works as well. So I would say get it all in there and then some of it can be added as you go or some of it needs to be human mapping, but a lot of it is kind of the Google approach. So at least getting it out there, I would say don't be afraid to do a phased approach. And the other part is you need full lineage for everything. So normally I had a slide on it that I didn't show, but if it's a behemoth effort to link that, just pick the most critical stuff. What is mission critical? Is it the customer or stuff for the customer? Cm or M migration? Just pick that. The 90% that we don't care about? Don't try to boil the ocean. It would be a behemoth effort. So if it's hard, just pick the critical ones and make sure that the stuff people are using. Sure. So in an organization, how does metadata management and data quality affect the data model, especially exploiting the data quality results of validity and consistency? And do you have any use cases, examples? They are very well linked. Some folks link that actually in the data model. They kind of have a field for data quality specifics. Some folks can use some of the metadata around the model to look at whether it is, usually you have the structure and you can kind of base off the structure of the definitions. And then sometimes data quality is, the sort of data quality that's the true profiling. So I have some data quality. I have some domain rules that the social security never should be in this format and kind of base your rules off that, right? Or I have these domains that customer name has to have 10 characters. And once you have those rules, sometimes you can be proactive. And maybe the application at the front end can have a dropdown with those domains and kind of nip it in the bud to begin with. And so a lot of these rules, either it can be done after the fact, or proactively, even better. But if we know that the name is required or name has to be 10 characters or there's only a dropdown of three genders and you want to put those in, then you can do that proactively. The other piece of data quality, which is often the bigger one, is what does the data mean? And sometimes it's been populated because people didn't know that, you know, what did I mean by state? It's the state someone was born in, not the state they purchased in or whatever. So sometimes it's the business metadata that can help with quality in even a bigger way. Because you can involve the bits and bytes, right? But if people aren't using it right, it can be a bigger issue. All right. Well, I think we have time for one more question. And I will get these questions over to you, Donna, that we've had a lot coming in. One of the biggest challenges is tying all the metadata that lays scattered in the multiple platforms, databases, data models, ETL mapping. What is the best way to bring this metadata together? Are there tools to help you to help you integrate all this metadata? There are tools. So there's kind of, similar to one of the previous questions, there's kind of different levels of the tools. So most of the, if we're just talking the data modeling, the metadata reporters do this too, the data modeling tools have this idea of reverse engineering. And many of the quick wins I've seen at the customer side is just do that reverse engineering against some of these sources and do that scan and just see the structures. And in some cases, say, if you're thinking of something as a legacy system, nobody even knew what those structures were. And that could be a huge aha moment. That could be a quick win, just getting them all in one place. And again, so that previous, maybe just searching across them is enough for now. The rationalization part of how do I link these together? In some cases, the tools are smart enough, either because the names are similar or you can create matching rules or there's some metadata linkage because your tool supports your ETL tool. I think some of that can be done automated and then some may need to be a more manual effort. In that case, I would prioritize, like I said to the previous user. So it's probably a phased approach. Some of the stuff you may be surprised how easy it is to get in with something like just reverse engineering. You may say, wow, I didn't know that. Some of the tools out there can do automated linking and then some of it is probably, there's always that kind of manual cleanup after the fact, which sometimes a valuable step is kind of looking through, seeing what you have. All right. Well, I'm afraid that is all the time that we have for. Donna, thank you for another fantastic presentation. Just love it as always. Thanks all of our attendees for being so engaged in everything we do with a lot of great questions that are coming in. So I'll get you those if you want to take a look at those questions remaining open. And just a reminder, again, I will send a follow-up email by end of day Monday with links to the slides, links to the recording of the session, and all the additional info that Donna mentioned throughout, including links to the white paper, or the research paper from last year on metadata, and links to the survey that's currently open through next Friday on data architecture, and as well as her online training courses that are more in depth on metadata. So again, thanks everyone. I hope you have a great day. Donna, thank you so much. Thank you. Always fun. Cheers. Bye.