 And here we go. Hello and welcome. My name is Shannon Kim and I'm the Chief Digital Officer for Data Diversity. We want to thank you for joining the latest in the monthly webinar series Data Architecture Strategies with Donna Burbank. Today Donna will discuss best practices and metadata management sponsored today by data.world. Just a couple of points to get us started due to a large number of people that attend these sessions you will be muted during the webinar. For questions, we will be collecting them by the Q&A panel or if you'd like to tweet, we encourage you to share our questions via Twitter using hashtag DA strategies. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note the chat defaults to send to just the panelists, we may absolutely change that to network with everyone. And to open the chat and the Q&A panels, you will find those icons in the bottom middle of your screen to enable those features. As always, we will send a follow-up email within two business days containing links to the slides and recording in the session as well as any additional information requested throughout. Now let me turn it over to Mo for a brief word from our sponsor, data.world. Mo, hello and welcome. Thank you so much, Shannon. And welcome to everyone here. And my name is Mo Dodge. I am the senior sales engineer at data.worlds. I appreciate you all being here and we certainly appreciate the opportunity to sponsor this webinar today. So prior to introducing our expert speaker today, I love to just provide a quick overview of data.world to kick things off. So, first of all, if you enjoy these types of webinars and enjoy learning about data each week, we here at data.world host an honest, no BS, non-salesy live podcast on topics around data management and analytics, and we stream live on places like LinkedIn and Facebook. And it is led by our awesome duo, Juan Cicada, principal scientist, and Tim Gasper, our VP of product. And we drink cocktails, talk about data and have some fun in the process. It's called catalog and cocktails, and we would love for you to check it out on your favorite podcast platform. All right, so just to talk a little bit about data.world, you know, why we believe an agile approach is super important to metadata management and data governance. So data.world is different. We are an in-price data catalog for modern metadata management. We see a lot of challenges in organizations across all industries and verticals around data literacy and enablement. We're relentlessly focused on driving better adoption for those organizations, and we believe data cataloging and governance are key to make a difference in that area. We are born in the cloud, and we very much focus on how we can easily integrate with your data environment to create better capabilities around data discovery, for example. And we do believe in a Facebook and Amazon-like experience is really important to enable more effective and efficient collaboration. And really, our focus is around being open and flexible, no black boxes, and making sure that we are fully interoperable with the rest of your technology stack. Now let's just dive into a few of the key metadata management trends that we are seeing today. So what does that really look like in the near future? We believe that it's going to observe these four key tenants you see here. First and foremost, metadata will drive action. Metadata is going to become more than just data about data. It's going to be about prescribing actions. It needs to become an action driver, right? And so we believe it's going to tell you based on the current states of your data that you need to start a workflow, for example, to improve data quality, or it will illuminate the need for improved data privacy and so forth. So metadata will become the central driving force for prescribing those next steps that your team will need to take to drive greater value for your business. Secondly, your data catalog will become your system of record. We believe that it's going to be more than just about relational metadata, schema metadata, or even semi-structured or unstructured data. It will become a catalog of all of your data, right? And that involves things like eventing and reporting, right? Dashboards, logs, click streams, all of that will be in the catalog and it will become the first place anyone in business goes to start any type of project. Third, your catalog will live in your revenue stream. And what we mean by that is many of the traditional metadata management solutions that we've seen, we're really not intended to do that. They were intended mostly to keep your data orderly and secure, right? In case an auditor came knocking, right? You think about GDPR and CCPA and things like that. While that's super important, we do believe that historically those types of tools were built to empower more defense-oriented activities and initiatives and rather than office-oriented things like democratization and monetization, right? But future data catalogs will really need to be responsible for generating revenue, liberating the data hidden within your organization to improve efficiency, accuracy, and insight. And finally, last but not least, your catalog will need to have a contextual user interface. So what we mean by that needs to be easy to use, easy to search, easy to understand, right? Similar to Google, for example, right? It's not going to be, they will be simple enough that everyone can use them without having to have special data or technical skills, right? It will make the data that you want easy to find, the underlying engine that drives all of that will make the connections that help you discover any new information or context you didn't know existed. And it's going to show that in a user interface that encourages you to drive deeper and deeper into the data until it becomes a world of discovery. And what's really key to that, in our opinion, is that just like Google, it needs to be powered by a knowledge graph, right? It's kind of hidden behind the scenes. But that knowledge graph is really what's going to be uniquely capable of mapping and linking key concepts to uncover those hidden relationships and to speed up the search and discovery process to provide unlimited insight. All right, so next one to talk a little bit about a big movement that we're really seeing in this space, which is really toward accelerators and not barriers, right? And so one of the key concepts that we're really excited here at Data.World is this notion of agile data governance. We really advocate for a non-invasive approach and iterative approach and really focus on the collaboration between different people across your organization to ensure that your implementation of a data governance or metadata management tool is use case driven. Because ultimately, what we don't want to do is to boil the ocean, right? You don't just want to take a technology-centric approach that's been tried and failed often in the past. We want to really leverage those specific business problems that affect people across various parts of the organization who are struggling to find, understand, trust the data. The North Star of your metadata management or governance program really should be about how do we adopt solutions tailored to those specific challenges and to help those individuals and teams put data better to work, right? In order to derive better insights for the organization. And so really, in order to accomplish that, what we really advocate for is this flywheel approach. Really take out some of that middlemen, right, if you will, or complicated processes that really get in the way. And we want to, you know, curate, audit, govern documents and try to move through that flywheel as fast as possible and iterate your way towards more value and better adoption of your data. Now, there are typically a lot of people who are involved in this process, right, from your program team to the actual governance team, kind of working on on a specifically full-time to data engineers, to data stewards. Sometimes they're full-time stewards. Sometimes, like many companies, they are just wearing the hat part-time. You got your analysts, you got your decision makers, all of them are really important to this approach and to making data governance more iterative and more agile. All right. And finally, but really importantly, it is this type of framework that we, you know, have really come to understand as important, which is thinking about investing in your metadata management program as a data front office. If you really want to empower better use of the data, you know, try to make this type of project work better in your organization and really think about building that layer where folks can find the data they need and where it operates with the rest of your data ops ecosystem, whether you're using like a lineage tool, a quality tool policy solution, right, so we can ensure that they work better together. And ultimately, the goal here is to, you know, help you understand your data supply chain better and provide a better self-service experience for your downstream data consumers, whether they're in BI, whether they're doing ML, AI or other data consumption needs. All right, so that was a little bit about data.world and our philosophy around the modern approach to metadata management and agile data governance. And with that, I will end it back to Shannon now to get things kicked off with our speaker today. Thank you so much. Mo, thank you so much for kicking us off. And thank you to data.world for sponsoring and helping to make these webinars happen. And if you have questions for Mo or about data.world, you may submit your questions in the Q&A panel as he'll be joining us in the Q&A portion at the end of the webinar today. And now let me introduce our speaker for the monthly series, Donna Burbank. Donna is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. And currently is the Managing Director of Global Data Strategy Limited, where she assists organizations around the globe in driving value from their data. And with that, let me give the floor to Donna to begin her presentation. Hello, and welcome. Hello, Shannon. Thank you. Always a pleasure to do these webinars with data varsity and welcome to everyone who joined. It's always good to see the chat and see some familiar names in the group. Welcome back to folks who have been regulars and there's a lot of you really appreciate it. And some of you I'm sure are new to this series and maybe even new to data varsity. So I want to let you know that this is a series of every month. We have a different topic on some area of data architecture, probably the most popular question is, are these recorded can we get the slides yes yes and yes. And so if any of the topics from earlier in the year are of interest to you, they are available on the data version of the site in perpetuity I believe, and we also keep them on our link to them on our data, a global data strategy side as well. If you'd like to join some of the upcoming ones data quality is next month that's always a popular one as well. I would love to see you back on some of these other sessions. So the topic of today is metadata, near and dear to my heart for those of you who know me up instead of how I came in the way back in the day was the data metadata management and what's nice to see is that it is hotter than ever, and it is growing and we'll show some of the stats, diversity and we do a survey every year. And we continue popularity of metadata for, for a lot of the reasons we're here in the webinar to discuss. And, you know, I'll probably resonate a lot with what some of the topics more mentioned is, you know, really that driver is basically coming from business needs and that not only the defensive aspect of industry regulation but also the the defense of, you know, really driving from the business and a lot of that is coming from business users, but at the same time there's a lot of a technical change and technical diversity in the market and so how do you manage that as well that's what makes metadata a challenge and an opportunity so what we'll talk about that in the next hour or so. Before we go too much into the details I do want to kind of show, you know, it's not I'm not making this up the metadata is hotter than ever so this comes from a day diversity survey. There are trends and data management this is a sneak peek into the 2022 results. I think that'll be published coming up here and in September timeframe. But when you look at the top priorities for organizations in the coming years 2023 and beyond metadata management is the number three up there with kind of the sister efforts of governance quality. Mastered it you know in a way they're all overlapping and we'll talk more about that as well. So not only is it in the top three, but you'll see that that increased over 20% year over year since last year, which is great to see. And things like data catalogs are growing in popularity, both from the vendor side and a lot of from the implementation of folks like yourself so really not surprised to see that because you know I've been a fan of metadata for years, but really pleased to see that it just becomes a de facto standard really to have in the organization. Right. Why is this growing. I think it was already touched on your data governance is top. No surprise data governance is the top initiative. Anyway, in the organization, things like data quality improvement again and there's overlap between quality governance, really hard to have good quality data if you don't understand the metadata behind it. So if you're using in BI, we'll talk more about that that's almost your classic use case from the old days, but hasn't gone anywhere. And we're still reporting you still need to know what the data on that report means and where it came from kind of that's not that's not going away. Master data, you know how do you, how do you even get that single view of customer if we can't define what a customer is right. I'm really pleased to see that idea of efficiency and agility because I know this and you probably know that you've done it. I think that the more you have your metadata documented and you understand the lineage and you understand the definition that makes your teams more agile more efficient. If everyone knows where that came from and I don't know why that wasn't always obvious but sometimes it seemed you know folks that was just going to take us longer and you know we don't need that that darn documentation. You know, maybe there's some upfront development to get there but once you have it. Certainly. You know, how you couldn't live without it, right. Right. You know, I'm pleased to see things like efficiency and agility topping up regulation and audit because regulation on our certainly use case for metadata but that makes it sort of seem like well we have to do this. We want to do this. I love a data science big data analytics is a certainly a driver for this as, you know, folks are trying to do analytics on data science discovery on the data. You know what that data means that this is sort of, you know, obvious to understand, exchanging information with other orgs. I'm pleased to see as well we will talk about that in this session. I'm going to share information with others or even open data we're working in our practice with some government organizations that want to publish their data externally. How can you do that without metadata you need to have that context even the data when it was published right and you could see some of the others but I found this really interesting to kind of see some of those key drivers and then hopefully you. So, as I mentioned in the beginning and if you've joined my webinar as you've seen this before this is sort of our methodology, you know, we do a lot of data strategy and this could be all actually it is a whole other webinar what that means. But what what is both interesting and challenging is that there's so much overlap right so we can talk about metadata management over in the lower right as being a foundation for the rest of your strategy but you know metadata management requires and is linked to data architecture right you can't do data quality without metadata management data governance really relies on metadata management and drives it right so this could be one being interactive circle and it could drive you crazy but we always look at all these things because there are so much related and, and none of this really can work effectively without without metadata management so the big fan of metadata. The other thing about metadata is, is probably the name right what the heck is metadata and maybe one of these days will change that because, or maybe Facebook has changed that everyone knows what meta means now right but you know the term can turn people off it seems really technical and then what makes it worse if we give definitions are in the past and I will not write that down on this presentation when people say, you know what is metadata, oh it's data about data, well that's really helpful. It just makes it sound even more complicated so I will not use that definition here. I like to just say it's data and context. What's the context of this information, the meaning behind it sometimes it's easiest just to show an example of it and then everyone knows what it means, an easier way to think that even then data and context I like to just say so who what where why when, and how of data, and again is using some, you know, just definitions here. I think a lot of us kind of think, especially the technical folks of the data are not, you know, you know what what is either the business definition of the data or the business rules, or even kind of how of how that data is formatted is it a character 12 field, you know, etc, etc. But but also, you know the who who created this who's the data steward moving forward. That's what Mo kind of said about turning metadata into action. Well, it's great to have the metadata but what are we going to do about it who who's going to drive this right, who quote owns this and who's or maybe who is auditing it. Where is it stored and we talk about things like data lineage. It's a huge part of it, but also, you know, I don't want to dis auditing and regulation because that's a big part of this but you know where is the data storage geographically is a big issue with some organization that's not necessarily international. I don't, I don't want to ever to forget the why. And again, kind of resonating Mo said I've seen catalogs go wrong, partly because the tooling is very good and you can catalog especially with a lot of the automation, everything right and and sometimes the driver is volume rather than the why of you know can be very a scan all 1000 tables we have 1000 data sources and get as much information in there. Why are we doing that let's focus on the high value data really get that, you know, make make it a usable thing what why are we doing this is always a nice question to ask, when is a huge one. I kind of had a snarky tweet a few weeks ago I was reviewing a metadata standards document and I was just sort of curious and I realized there was no date on it and no author. Isn't that ironic a metadata standards doesn't have a date, but often that's that's interesting this is from data that's published, or a definition that's published. Is it from 10 years ago is that last week, big, big part of it, like my channeling my grandmother who always if we had a picture put the date on the back, but the day in the bag. And we were going through the attic a few weeks ago and found some pictures without from her actually without a date. And we wanted to where it came from and who was in it right that's good old fashioned metadata. It's like, how, how what a storage retention roles what when should the state of the purge you know some of those business roles around it as well. So anyway, it's a good old fashioned who what where why and when. But I think it is also beyond as as metadata expands and becomes more of a business asset is beyond your good old fashioned data dictionary with just here's the fields and columns. There's a lot more to it. But what is metadata some of you may be joining this webinar I know there's a probably a range of skills so if this is not new to you bear with us. But some folks just you know what are we even talking about this funny world called funny word called metadata. So I like to also differentiate between data and metadata so let's think of a good old fashioned this could be a spreadsheet everyone's used to spreadsheet or this could be a database with columns. The data is, you know that Joe Smith, either works for or bought from or something is associated with a company computers are us, either in New York or he lives in New York but think about that. And then the year you know something about 1970 that's the data all those values of the data the metadata are kind of if the easiest way to think about is those column headings that Joe was the first name and Smith is the last name. And the company is computers are us. Right. So, and the year purchased is the year he purchased something and not the year he was born right if you just had a year without contest you wouldn't know what that meant so probably the easiest way to think about it is, you know, the data is your roles and the metadata is kind of your column headings, a little bit more complicated than that but kind of a simple way to think about it. That's kind of data but you know, we can talk about this as well. You can metadata could just be poorly named columns. So, you know, we've all kind of looked at a database at some point that has really helpful names like string one string to text 123. You know, maybe the developer, whenever he or she built it new with that man, nobody else is going to know what that means so kind of that intelligent naming, or labeling of things. Yes, there's metadata that is technical metadata and that is the name of the perhaps the column but not so helpful, right. So, even within the well named metadata and this is why sometimes we do sound like crazy folks when we are architects or data architects or metadata managers etc you know, what do we mean by a year or a city. So, you know, something as simple as a last name is that do you really need the surname or the family name is not every culture or language has the last name be the family name right so you know something as simple as that or city. The city where the customer lives the city where the story is where they purchased it the city of the manufacturer of the product to see a lot of and probably some nodding heads of folks that have built these you know well formed business ruler definition is critical. And there's a lot more to even such a simple thing as you know what's any of these what's a customer what's a company. Again, is that the company that Joe works for the company that he bought this thing from or etc etc etc. And that's where had that idea of the business metadata is so critical. And on that note, this is a bit dated now but we did do one of these surveys just surveys just on metadata. And what was interesting, we kind of looked at the usage of 80% of the user metadata at that point where we're from the business and that's really not a surprise to me. This is a way for it to the business to collaborate but you know that simple question of I'm looking at a dashboard or some analytics, how was total sales calculated. I don't need to know that. You know, I'll probably hit on this again, you know, often, we do get the why do we need metadata why do we need documentation that's just slowing me down. And I may get some not not fans here. I hear that's my team, you know the business seems to coat get metadata more than it doesn't mean you haven't defined this. I still remember we were doing kind of a business case for a big back in the day we call the metadata repositories right for a big bank and we went to finance and we were explaining all of this and how we need the lineage of the data where it means and where it comes from. The business sponsor sort of looked at us in horror and said, you mean you're not doing this. That's frightening. We should have assumed you knew where the data came from and what it meant before you gave me a report about, you know, the financials of my business. Again, time and so yes of course this just wants it yesterday and, you know, easy, and they have to be involved in those definitions as well but it just is such an obvious thing to do that you often don't get to get some pushback from the business. So, you know, it is that that business meeting and contact so you know the business person might just say show me all customers by region, you know, can I have that yesterday, a good data architect metadata management data analyst insert title here should immediately start thinking. Well, what do you mean by customers that current current customers labs customers, you know, you know, is that retail customers wholesale customers. How do you find a region. I have two customers right now who are still in discussions about what a region is, and the sales reasons versus geographical regions and you know that's against the simple thing to be very complicated. Can a customer have a billing address or more than one region do we have to obfuscate personal information or PI so many questions about just such what seemingly such a simple question just show me all customers by region. And that's business people get metadata and the need for it, but we can also kind of common, you know, have that sort of snide comment from the business person show me all customers by region how hard can that be that seems so simple, until you actually try to do that and there is a lot of complexity to even things. So, you know, again resonated with what Mo said of the one of the great things about a lot of these new data catalogs is the user friendliness of it because a lot of the metadata from the business side isn't people's heads. And I always like to say, you know, avoid that I just know, because although people more and more do understand the need for metadata, we always seem to understand the need for other people's metadata. And I want someone else to have a really well documented data source but I'm busy. And gosh, don't you just know, you know what what a part number is. Why do I have to create a definition for a part number until Joe who's been there, you know, since 1980 was like oh that used to be the component number before the acquisition and actually have issues aligning component number with part number and there's a lot more to something right and that's where, you know, a lot of these quote simple things have a lot of background and enabling people either through a glossary or a data catalog or a data model or just these more collaborative tools. That is really capturing what some people's heads into these tools and having some healthy discussion I've been doing this for for ages and I'm still surprised something that you would think is so simple why do we need to define it. You know, address or anything probably has six versions of it or some history behind it even something like customer ID or part number, which seems so straightforward. It's not there's something it was a mystery. And that's why again more of these tools that are more collaborative and allow folks, and I'll talk about this in a little bit. This is the place for the highly governed kind of master data approach that this is the definition, and it is published and now shall, you know, look at this and then there's sometimes a collaborative approach for people to come up with definitions but in either case, you want to have that feedback loop, one might say that I'll shout call part number this, but Joe and accounting might say well yeah but I actually I have a comment on that we don't use it this way. And even with standards you need to have that feedback loop because it might not apply to everyone or there may be an issue. Alright so collaboration is so important. When it comes to this business definitions. And of course I have to have a joke and a cartoon. This comes from one of my earlier books on data modeling but you know this maybe this isn't funny at all or maybe it's not funny to you. But you know we've all been there right we're already with this application we're into acceptance testing we're going to, you know, roll it out just one new question. And again that I still remember one of my early conferences, I think was the Data Diversity Conference way back, and someone told that joke and a conference of, you know let's try to get a single definition of customer and everyone laughed and I thought myself at that point how hard is that a customer is a customer right. But again, is it retail customer lapse customer, you know, loyalty program customer, you know or even if there's a clear business definition of customer. You know the joins in the back ends of the data table. A lot of complexity there. And I may be bold and suggest that metadata is a contributor world peace. I think if we all had better definitions would all help our family lives or social lives or work relationships and I kind of like a position as an example but it sort of hits home of, you know what if we actually had good metadata definitions that daily life just think you know let's go on a family vacation, and let's do the ubiquitous, you know, in the US at least is often a common let's drive across the US and, and go from New York to California and see everything along the way. What can go wrong with that family in the car, probably with very different concepts of what a vacation means right dad says oh that's great we're going to stop at every state park along the way and learn something and let's spend some time in the interactive session and you can all learn a new fact. Mom says you know I've been working so hard you go to your, you know, museum or whatever and I'm going to sit in the car and read a book and be by myself for 10 minutes. Do what you want and Jane says, dad we're in the national park and I go out and actually hike and get some exercise you know I've been studying at school and I don't want to stay in your stupid interactive session. You know Bobby he doesn't want to be there all he must be home with his friends. Ian says I don't know what you weird Americans think about a big holiday and I'd rather be in the pub. And Donna, can I just get my laptop and design that right all kind of a funny example but even something as simple as what we mean by a vacation. Clared in the beginning probably went up a better kind of some less fights in the car and then have your time right so these terms that can seem so obvious to us of what a vacation is or is it vacation versus holiday even just the terminology of that, you know can cause a lot of problems and when you have, you know, 1000 business terms with with some, you know, impact of revenue risk of how more important that is right. Okay, here's another example of NASA, and this is a bit dated in fact some of you on the call probably weren't even alive then, but you might have heard the story of you know the the Morris climate orbiter, and NASA actually documented $25 million on this thing. And clearly documented was this was a missing metadata issue so they sent, you know, I am not a rocket scientist. Don't quote me on the details here. But when they're trying to calculate, you know sending the orbiter up into space, they had some, you know, what do you call it. Numbers to calculate like that, but they just had the numbers the metadata was in town seconds instead of metric units right or Newton seconds and it went off track, because they didn't have the right method is it 60 miles 60 kilometers is a 60 centimeters. Right, so not only did they lose the orbiter, but that looks really bad. And just lost opportunities for research you could have done right. So all because someone probably said, don't you just know. Right how hard can that be, of course it's going to be in Newton seconds or whichever one it was right. So pretty embarrassing to give NASA some credit. So as I mentioned earlier on the call this idea of open data is a great source and need for metadata. In fact, most open data sets have metadata requirements when back to my grandmother I put the date on it. When was this data set published. What was it used for, and if you who are doing data science, you know, a data set that might have been published for one set of research might not have been stored in a way that applies to all sets of research. But who has done it if you're doing research, can I talk to this person who did it and we can have some collaboration, etc, etc. So metadata, especially for so much of the data science and open data sets and all the collaboration done around the world, really does depend on metadata and one of the earlier when I showed that slide of the use cases for metadata, a lot of it is cross organization cross university cross medical facility collaboration and generally there are metadata standards that we can share that data, even Amazon.com you can't post on Amazon.com without good metadata about your product right so it's one of these things that's so ubiquitous but we don't always call it out as a first order activity in organizations. And, you know, for those doing analytics and that's only growing self service data analytics is super popular but data is only as good as the metadata and I've used this example before you might have heard me tell the story before but what's sort of funny about this one is I was actually in a webinar on self service data analytics and how powerful the tools are and how open data is available and I thought it would be kind of a funny example to just show up this is from what do you call it. Road safety accidents by vehicle make and model nice okay we can pick on those Porsche drivers or you know whatever cars gotten the most accidents, but this is the data I got it was a published data set on a UK data site. And we learned we learned that F 13 is huge. There's over 250,000 F 13 things, and F one. I don't know, and F two looks like that could have been a date 2015 and I don't know this actually this data makes actually no sense so great someone took the time to publish this data and open data. One of these super powerful data visualization tools was psyched do some, some research and nothing because there was no metadata and I'm sure somebody said, what's obvious that's a date but is it right and everyone at the person publishing probably knew F 13 was a Mazda something or other I don't know, but I've never been able to figure out that means or didn't want to take the time to figure it out because there was no good metadata so little less than that we should all know what is this, what does it mean this was just an extreme example. For a real world example just think of financial reporting again something that sounds so simple what is a year. We really, and you can see the person rolling their eyes my boss wants me to go to find what a year is, you know, but think about it so this actually was an example of an international retail chain, they were trying to do some data driven analysis on their sales across regions made total sense. And that was the only thing to attain that you had the checks kind of looking at the data to make decisions, but typically in the fourth quarter, they see a spike in revenue. That's November December that's a holiday season for many people, and people buy a lot of stuff. But they had a Latin American subsidiary, and they saw a dip in that quarter. And so they started to think, you know, should we do more marketing is this the, you know, are we in the wrong market should we close some stores, and they did some research. And so it was using the fiscal year, June to June rather than the calendar year for the rest of that company. So again, it was a metadata issue that caused confusion to business and that could have, that could have caused the wrong business decision on something as simple as what do we mean by here. I don't keep beating this one to death, but that is such an important, you know, meaning of context and, you know, maybe I've been in metadata and data architecture too long but, you know, I think it's a good thing to ask questions sometimes and not so fun parties you know Donna want to go out on Friday well what do you mean by date what do you mean by Oh, would this be an outside event like so weird that you over clarify things, but sometimes asking, you know, clarifying questions is good. So, but I don't want to overdo the business need even though that is a huge driver. There is the idea of this idea of technical traceability. I had one one client say recently and I wanted to slap him but they probably wouldn't have been good he's like people actually do this anymore like seriously you want to know the lineage of the data on a report and I thought, if you want to use that report. But this is actually a common use case right so I have this idea of total sales or total sales by region. You know what is that lineage back to it came from the North America sales database the Asia pack America Latin America. This is a super simple example anyone who's done this knows that it gets much more complicated. And this is a great way for these tools that have automated scanners, you know, populators whatever they call them. The vendor has kind of a different word, but you know use technology to its advantage there is there is huge increases and technical skills now that you know the tool can do a lot of this for you if you're mapping by hand. Stop. A lot of tools that can help you with that now. Another thing at misconception I've heard, you know, we don't need metadata for big data analytics or a lot of this new stuff you just drop it in the lake and magic happens. Don't hear that so much. I think folks are kind of coming to a level of maturity, but even with big data analytics, you know more bad data doesn't make it more helpful, right. So you do need good metadata so as an example and it's kind of loosely taken from a client example, but you know maybe we're trying to look at IOT sensor data from smart meters if anyone has this idea in their home you know the energy sensors, you can even have it on your phone, and you can get a lot of great analytics about energy usage. You can get something like you know, we found that you know, 5% increase for every percentage point for a 20. You can do the data there right, but so that seems pretty simple but then even that. What do we, what's the source for this weather data was that accurate how often with the readings taken. What was the purpose for this, you know anyone doing scientific research, you know, knowing how that was calculated and for what purpose, you know how was reading taken was it a meter reading for accuracy of temperature or was it for billing, you know, et cetera, et cetera, is the usage by a household by an individual by an address what if it's an apartment unit and there's many apartments in that a lot of different questions about even just how that data was calculated think back to that NASA example, a good public open data set has all of these things that context around the data and you really can't do good analytics without that. So even with IoT streaming or that was one of those top use cases earlier, you still need the metadata, what that means. So who uses metadata why is this important just about everybody probably but different types of metadata so I've been talking a lot about the business people, you know the finest person that wants to look at regional sales or the auditor that wants to know how that was calculated, but technical people need it to fight change the field, what else is going to be affected. You know, and this still happens I was at a club, unnamed, really big international client in this day and age had a had a database administrator. And maybe no one else is freaking out by this but there was a product ID and changed it from 12 characters to 10 characters brought down the website they took two days to fix. So it was a huge big thing why someone thought that was a good idea. I don't know. But if they had looked at metadata and done some lineage they could see the impact analysis if I change the length of a field, it's going to impact the website we can't sell anything anymore, or it's going to affect something else so impact analysis for developers is huge. You know, source to target mappings for data warehousing. This is metadata is critical for both business and and tech. The governance you remember from the beginning was a huge driver is both a driver for metadata and an enabler of creating metadata because you know completely agree with with mo about, you know that distributed nature of metadata is in the heads of both technical people and business people and but you don't want to do that willy-nilly. It's something like the definition of total sales or region or whatever should some needs to be done by committee when it's shared some need to have a dedicated owner, and you need to have some rhyme or reason about that rather than just letting everyone at it and you know, the most loud person wins. So really defining these roles with met both metadata creation as well as usage is super important and having those right policies procedures around it so won't go super deep into this you'll see some examples of the different types of metadata these folks might create or manage from both tech and business actually is a whole webinar but kind of want to talk about that the touch points there. On that point of crowd sourcing, probably given that I've been, I was way back in the day, you know, kind of the metadata repository person when they were first coming out. And one of the positive things I've seen is this idea of the collaboration and crowdsourcing. And I can be an old curmudgeon sometimes and I will say I was probably a late adopter to that idea of, you know, if you think of the two ways of thinking about it kind of the Wikipedia, the encyclopedia approach so encyclopedia and this still both have their place before I go on my little rant, right. Some things your master data, your financial reports that you report to the street you can't just change the calculations you can't just, you know, make up master data definitions that that does need to be highly governed and structured right. So maybe you need feedback on that changes or we're doing some, you know, discovery analytics, we do want to be more agile and have teams kind of brainstorming around it, you want to use that more Wikipedia approach where it is created by many edited by many. I want Wikipedia sort of was starting I was very skeptical seriously that's going to work you just have people kind of openly editing, but it's kind of an eventual consistency of information is generally I use Wikipedia all the time right, and they both have their place and yes the encyclopedia may take a bit longer but generally that's more of edit, and you don't want everything also just openly edited and I think there's a lot of good tools in the market. And also companies buy a really good tool for the wrong use case. One of my frustrations with tools on the market, it tends to be not all of them, but in either or that that that design approach of allowing you want to be able to lock down when you need to and then not have everything to be openly edited. We had one big international bank by one of the tools that was very much this collaborative Wikipedia, when they said no we want to lock down the definitions of, you know, how we define our metrics to be reported vendors or you shouldn't do that everyone should have a voice not everyone know everyone gets a voice for how we do our books like that that is, but they wanted both. And so, think about it before you buy the tool. If you need both make sure they can do both, and you know you don't want something too open that you can't lock down, or so encyclopedia based nobody feels like a voice. And there are tools that can have that nice approach but give that thought in terms of how your governance is done and for which data sets because you probably have both models in your organization. Just just use that wisely and finding that right balance right because there are a lot of good tools so just be sure that you have that right mix of certain things are highly governed should go through, you know the full, you know, might take a little longer but you'll know that governance and then some should be more, you know we're doing some discovery analytics or maybe a one of the, you know, functions the team wants to do some of their own, you know, internal without having to make an enterprise why some really great tools for that so but just kind of think of that before you start implementing. So, a lot of different data sources beyond the organization, both within and beyond and that is I talked a lot about business stuff, but the technical BS challenging and moments and that as well it isn't all nicely structured in a relational database or even a semi structured platform, we have a metadata course on the diversity that I gave and we go a lot through that of what does metadata mean for social media what does metadata mean for a photo, what does metadata mean for it is very different model or open data sets etc etc. A lot of us think about databases and to be fair a lot of our enterprise data is there so it's not a bad thing or documents which sort of frustrates me. You'll have the whole document management team in an organization creating a taxonomy for search, completely separate from the database team, creating a hierarchy for master data right and what why aren't they online that should be similar metadata right just a lot of reasons why. But really that metadata can be a bridge across all of these different. Another plug for the diversity surveys we do kind of get to give some numbers to that in terms of one of the questions we always ask of what are the diverse sets of platforms that you use. You know relational databases aren't going anywhere you'll you'll see that both current and in future on both on prem and a bit more moving to the cloud relational databases are here to stay nothing wrong with that they certainly have their place. There's a lot of great things, but you will see the difference between current and future is the distribution gets a bit more even you'll see that more people are looking at things like social media posts and media files and non relational, you know real time there's a lot more sources out there that you need to manage metadata for and that's where a lot of these tools on the market do allow you to kind of scan and look at look at that as well I mean when you're doing an RFP for a vendor. Also seeing customers run into has a great user interface, you know but we need to get an database database and can you scan that in. Nope, you know you don't want to be doing that by hand right so you do do take a list of what you need to get in terms of your organization. You'll see, you know what I find interesting of still a huge amount of good old fashioned cobalt copy books. And then can that can the source that you're using scan those for you because you probably don't have a whole not a new college graduates knowing how to create cobalt copy books so you know that's where something where the tools can definitely help you. A lot of different architectural options for managing your metadata as well. Kind of traditionally like like like almost the data warehouse for metadata is that kind of centralized enterprise metadata repository. There's no way to think of it is almost like I mean what house for metadata right certainly has some sort of data model and metadata model in a common storage. Think about that as well I mean, do they have that the fields that you want to capture fear man is it customizable, how much do they lock down, and then what what sort of integration do they have the other tools. Again, think of it like data is really historic. Can you kind of match emerge so if you have, you know, field across multiple data sources can they kind of rationalize that and say yes that's still your customer and logical level. You know that don't get fooled by some of these great user interfaces right there's a lot of meat behind that and you want to make sure that meat matches your use cases. This is also a good place for some of these tools specific repositories in a way is almost a catalog of catalogs most tools now realize that data catalog is so important so your data modeling tool might have a great, you know, think of a lot of your metadata is in your data models, especially at the business level or even your technical field and column names, your business intelligence tool has metric definitions, etc etc. And maybe that's enough. I mean, I, again, get back in the day sold these big enterprise repositories and have seen companies literally spend millions of dollars on a tool that could have been replaced by a couple thousand dollar data modeling tool published to the web. So just think of your use case you very well may need an enterprise approach, but and or now it could be a combination of some of these tools now that are publishing their metadata in a different way. I don't want to also forget this idea of metadata exchanges or metadata registries between organizations, or between research or clinical trials and things like that. A lot of our business now is for, I'm just finding it interesting is less on your typical, you know, retail company trying to sell more widgets and getting your internal data, but like cross cost research platforms for you know clinical data or physical data research, and the huge part of that is these metadata standards that you can kind of share data across work so give that some thought as well. Data lineage is a really great use case for a lot of these tools were really literally that that good old fashioned but still very much needed I have a sales report. So the definition of that metric from my BI tool after the warehouse to staging and physical data, so many pieces of that, and you're probably using all of these tools and more along the way from ETL to real T or your source systems. And that's where a lot of these automated tools can do a lot of work out of the box to do some of that mapping for you. I mentioned the impact analysis of, you know, I changed the name of a field. Once you have that story that isn't just about audit right in that give up in most conversations. Well, yes, you have this lineage helps when the auditor comes around, but you can be using that yourself. I'm going to make it change the development. What's the impact what else do I have to change my. Just that semantic mapping kind of a nerdy word, but you know that from your conceptual logical physical data models or even just your business term. I have this term customer. Where is that across turreted or the lead to XML whatever all have fairly, you know, technical name for that. How do you link that up. And graph came up in most conversations as well either either graph for displaying metadata or graph itself are metadata relationship so finding that pattern between the data as terms of you know kind of a full full source pattern is super important and graph can be really helpful as kind of an underlying technology for that. And this is one of the big improvements over the years for these metadata repositories or catalogs or dictionaries is kind of a funny cartoon there but you know in the day. You have to do a lot of this manually either create rules that, you know, this field map to this field but so much of this machine learning and pattern recognition can scan through a lot of these databases and say it looks like a social security let's do some mapping so again be careful. I've seen the both and condition can sometimes be hard. But I've seen the vendors go in the wrong way of we can just automate all of that mapping for you will sometimes you do have a very specific business rule that is cannot be implied. And you do want to say that this looks like a social security number but for us that's our part number, could we overwrite and create our own pattern so you want to have that mix of sometimes you want an explicit rule. And sometimes you want to do that kind of fuzzy pattern matching. But again, automate a lot of this as much as you can. And all of this I do want to leave some time for questions but metadata management should be treated like data management right so almost all the things you do for data management, a plot for the meta right so do you have a metadata management strategy are you aligning with business goals. As I mentioned earlier don't just try to scan everything in and get as much metadata as possible, could be helpful but focus on on the why who's using it. How do we prioritize how do we organize it and build over time. What metadata are we capturing what's coming from human beings heads and do we have the right stewardship around it, and what's coming from which data sources and does our tooling support that. How do we publish that out to the larger world. And then do we have that metadata management data governance, and a whole life there's a lifecycle of metadata as well as there's a data lifecycle so think of all of that from soup to nuts. When you think of a metadata implementation it's a first order thing to do. So summary, we talked about metadata being that who what why, where why and when of both business and technical, you need the data governance to orchestrate it, and technology is more complicated so really find that right to tooling to really support a wider part of your strategy. Before I open it up for questions just a bit of a plug. Next month is on data quality I have my special guest co worker Nigel Turner he's always a popular speaker with us. One more plug we do this for a living if you need help and a double plug of we are hiring so if, if any time in this conversation you're nodding your head going yeah these are my people. We have a job opening and we are looking for more metadata and data management nerds like myself. So, with that, I want to open up this Shannon and are there any questions. Lots of great questions coming in Donna thank you so much for another great presentation and just answer the most commonly asked questions. Just a reminder I will send a follow up email for this webinar by end of the Monday with links to the size and links to the recording along with the emails requested diving in here. A question came in the beginning here for you Mo. In your architecture solution for data.world this metadata management harvest metadata from repository based designing architecture tools, CM database and also from data modeling tools. It would be great to conform data to data domains and business capabilities contained in CMDB and architecture tools. Yes, obviously there's a lot of tools out there but certainly we currently have support for modeling tools like your studio or when etc so we don't have a ton of time to go through all the tools that we support so feel free to reach out and we'll we'll give you some specific answers. I love it thank you so I'm diving in here for this. metadata management include the extended use of you of user defined data types domains as a base data type for standardization. This seems to be an overlooked tool. I'm not sure I totally follow that but I mean I'll answer it anyway. Sorry. So, to be able to both this and that a question track any defined data types as something to stand created, but I also think for the metadata tool you put you should be able to create your own field and your own customized data. So it should be a full level it should be able to track and longer user defined data types you've created in your source systems and then you should also be able to create those your metadata. And Mo feel free to jump on in here on any of these questions. Yep, no, I completely agree. So, how do you deploy context setting information to different subscribers for example subscribers to the core system analysts open data citizen analysts and developers who need technical metadata. So, you know, on the slide that I mentioned, I guess, partly this one, and that is why before you have what the metadata, think of it as your data right who's the who's the users of it do we have the right security privileges and really design both the, the system of how we're importing it and then how you distribute it so there should be security, normally security of who can't see metadata but you know customization so that what what you wish to see what needs to be published externally should be more part of both your design and you want to make sure your tool can do that as well the one you choose to be able to filter accordingly. But Mo any any thoughts you have on that. Nope, I agree with everything you say I think it's again ultimately really important to think about that that use case and what value we're trying to add but but yeah absolutely there's there's a lot of different options there. So, I, you know, your calendar example Donna isn't there only one year for calendar, your basis for December to December spans two years. Yeah, the point of that example is maybe in your organization but even getting that clarity that you're using the example we gave is that we were using calendar year and not fiscal year because often a fiscal year doesn't go December to January to December. And so again with metadata I'm not saying one is right or wrong, you need to know what that person, how they're defining it in this case, you had two different groups, using a different calendar and that wasn't called out we just they want to assume calendar meant and you will count you know, physical with regular calendar not school calendar which aren't always the same. This is very true. So this data strategy offer a solution which includes everything like data catalog data lineage meta data management data governance etc. Data strategy should look at all of that and that that's why you know that picture I showed earlier that kind of look who looks holistically up see if I can do it quickly. But then that's what makes a data strategy so inclusive you really need to think not only your data level and security but do think of metadata, and then how the governors can manage that so the metadata is a cool of the key part of a full data strategy but there's a lot of other pieces as well. I'll just double down what Donna mentioned earlier her presentation right not every, you know, I work for a data catalog company obviously, but not every catalog solution or many management solution is right for you right so really think about what your goals are and pick it up. Yeah, definitely makes sense. Okay, well, I'm okay. Well, I'm afraid that is what I was going to see if I can slip in one more but we are right at the top of the hour. So Donna and Mo thank you both so much Mo thanks for joining us as always and thanks to data.world for sponsoring today's webinar and helping to make these webinars happen. Okay, and thanks for attendees for being so engaged in everything we do love the questions and everything coming in again just reminder I will send a follow up email by end of day Monday for this webinar with links to the slides and the recording. Thanks everyone. Thank you. Thank you. Have a great day. Bye.