 Hello and welcome my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We want to thank you for joining the latest in the monthly webinar series, Data Architecture Strategies with Donna Burbank. Today Donna will discuss modern metadata strategies. Just a couple of points to get us started. Due to the large number of people attending this session we will be muted during the webinar. And we very much encourage you to chat with us and with each other throughout the webinar to do so. Just click the chat icon in the top right hand corner of your screen to activate that feature. For questions we will be collecting them via the Q&A in the bottom right hand corner of your screen or if you like to tweet we encourage you to share highlights or questions via Twitter using hashtag DA Strategies. As always we will send a follow-up email within two business days containing links to the recording of the session and additional information requested throughout the webinar. Now let me introduce to you our speaker for the series Donna Burbank. She is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the Managing Director of Global Data Strategy Limited where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia and Africa and speaks regularly at industry conferences. And with that let me give the floor to Donna to get today's webinar started. Hello and welcome. Hello. It's always good to join these. And as Shannon mentioned this is part of a yearly series. We have this new this year the new data architecture strategies. If you missed January or February they are all on demand and that's always one of the favorite questions. Will the slides be available for this presentation? Yes and it'll actually be recorded if you want to hear all this lovely topic again. And then you'll see there's a lineup for the rest of the year and a lot of hot topics around it architecture from graph databases to data lakes to MDM, etc. So jumping in just for today, today's topic is one of my favorites which is metadata. And we'll talk a lot about this during the presentation but metadata is hotter than ever which is music to our ears, the folks that are interested in metadata. And what I find interesting about that that a lot of the interest is really coming from business drivers, that people are trying to get better value from data and as we know as soon as you try to get value from data you need metadata and we'll talk around that. So that's I like to sort of talk about the carrot and the stick. That's the carrot. You know the more value you want to get from data we need to understand that through metadata. There's also the stick things like industry regulations and GDPR might be talked with a lot of folks minds because it's coming up soon. That's sort of we absolutely need to have this or we'll get in trouble. So never is fun to do the second but often I've worked with several companies that are doing metadata for regulation and actually see a lot of business benefit from it as a side effect. So it's a positive trend. The other thing that I love to wax poetic about is just how fun it is to be in data now because there is so much new technology coming so quickly in innovation in the industry. But the challenge of that is how do we track metadata about some of these new technologies. And we'll talk about is there a way we can use these two new technologies to track metadata. So sort of meta, meta innovation. So we'll talk about all of those today but sort of where a lot of folks start when we talk about metadata is what is metadata, right? And I think by now a lot of folks know I like to keep it simple. Metadata is a complicated enough word but it's really data in context, right? It's not just a number. A number has meaning. Sort of the Zachman framework if you will for metadata is that another way to think about this is that metadata is really that who, what, where, why, when, and how of data. And we won't go through each cell in this spreadsheet but you get the idea of who created it. Who's the data steward if you're doing data governance? Who's using it? Who quote owns it? Who's regulating or auditing? You know, the what I think is what a lot of folks think of when they think of metadata. What's the business definition? What's the technical definition? But you know, think beyond that. Are there security level, privacy requirements, etc. Where, when we think of lineage, you know, where's the state of the story? Where did it come from? But also things like backup and recovery or especially when we think of things like GDPR. Are there regional differences or privacy rules around that police this data? Why? But sometimes you might wonder, right? Why are we storing this data anyway? But that's a big part when we talk about governance and data quality fit for purpose information. What is this data meant to be used for? No, I don't use it incorrectly. When, when was it created? When was it last updated? Is it the freshest version? And then the how. I guess a lot of folks, I think when we think of that metadata, it's often the what. How's it defined? And then the how. How is it formatted? How many databases, all that sort of thing. So it really is that full spectrum of what is the context around my information. If you've been on my presentations before, you'll be familiar with this framework, but we get a lot of positive feedback about it. The thing about metadata management, you'll see sort of here in the circle box. It can be seen as its own thing, but really it supports everything in this box. You can't do governance without having metadata management. You can't really have a realistic business strategy and take action on data if you don't know what it means. You certainly can't integrate or even understand data if you don't know its format and its source and where it came from. Try to build a data warehouse without any metric definitions. You'll have trouble, right? So I would love to have metadata just be a check in the box for all of these things. But as we'll talk about today, metadata is a thing in and of itself. You need to manage it and have a certain strategy around that. So we'll talk through that today. As I've already mentioned, and we'll keep mentioning because I love to hear it, metadata is hotter than ever. So I've been doing metadata before metadata was cool. Well, before people realized metadata was cool. And it's just interest is growing with the interest in data. And so we'll refer to several surveys, research papers we've done with data diversity in the past few years. And metadata is always top of mind. So this is a statistic I like that over 80% of people said metadata is not only as important as the past, but even more important than it has been with its growing interest in data. And I'm sure there'll be a question. We have links to how you can download the papers we'll reference in the back and Shannon will follow up with that as well. Interesting in the paper is sort of why it's interesting. So some of us like nerds like me just think it's interesting and it's on the right. But really, why is it growing? And this is an interesting statistic. It was not only the 2016 survey, but also the 2017 survey so you can kind of see the growth. It's, you know, for the data statisticians, they weren't necessarily the same people answering each question. So it's take that as the grain of salt. But you'll see that certain things just didn't change that data governance, data quality, data warehousing, master data management. As I mentioned, you can't do any of those without metadata, especially governance. Now that it's growing, you just, that's the lineage, the definitions as key to that. 2017 saw growth in a few things. Regulation, which you could say is tied to governance. Things like I mentioned GDPR or Basil II, you know, any of the industry regulations you might have. As well as master data management, which I found interesting. I know in our practice we're getting a lot of interest in that. But, you know, I keep going back to this slide. Why do I use it? To me it's almost my checklist and I'll go into a client and they'd like governance, but you can't do governance without master data. You can't do master data without metadata. You can't do that without integration. You know, really almost everything has often a piece of a lot of these. So there is just such a close into relationship. I'm not surprised by seeing that kind of getting that single view of customer, single view of product, et cetera. So driving a little, you know, this is sort of technical in terms of the main use cases people are seeing. I always like to tell stories, as you know, it's just sort of, you know, what does this mean? You know, this is all of the examples in this presentation sort of resemble something in real life, but then obfuscated to protect the innocent. But this came from a client we'd worked with financial and international retail chain. And they were trying to compare its fourth quarter sales. So in North America you often see sort of spike in sales around the November to December timeframe. There's a lot of holidays, end of year sales, that kind of thing. But they just acquired subsidiary in Latin America. And their sales were particularly low around that. And they're in North America, that similar culture to, you know, Europe and North America. So they were wondering, what's wrong? Do we have to increase marketing? Is this just maybe the wrong market? Maybe Latin America doesn't like our, should we start clothing stores? And then they went through and they said, well actually, what is in a name, right? So when Latin America was using a fiscal year of June to June, everyone else was using a calendar year. So something as simple as what do we mean by a year? What do we mean by a quarter? What do we mean by a, you know, and we'll go through these examples throughout the presentation. And this is where, until it clicks, I remember I've told this story before, when I was early on in the industry, and I'd be at something like an enterprise data world conference and they'd say, you know, trying to get that single view of customer. And I just sort of looked around, how hard is that a customer as a customer? And I know you're probably rolling your eyes at me, you know, do one project and data, you'll understand the subtleties of what do we mean by all these. And that's where we can sometimes sound like a crazy nerd when we overdo it until people clicks with them. What do you mean by a year? You know, you can think you're crazy. What does this person, have they not lived? You know, of course I know what a year is, but you may not. And that is the whole thing of metadata. What do I mean by, and really getting that clarification. So this example, it wasn't just the how do we define core terms. That's a key part of metadata. But it also gets back to data lineage, which is a huge part of things like GDPR or any just understanding of business, of where did a term come from? That's the audit and traceability. So it could be that, you know, total sales for customer this quarter are 1.5 million. That's great. How did you calculate that? So good news, especially when we're talking about modern metadata strategies, this particular use case has been around for years now. And a lot of vendors have been doing this for years that you can get that lineage in an automated way. So there's metadata repository, a lot of other data modeling tools and collaboration tools. Now have things called scanners or interfaces or whatever they pretend to use. That really can automate a lot of this and get that lineage because if you've built one of these, you'll know that there's sort of a ETL process that can get from your source to your target. Your data modeling tool has built in metadata. So a lot of these tools, they have metadata about your metadata just to keep using meta. So this can be, I don't want to say relatively easy as you know, getting the matching right. But this has been done for a while and this can be automated. It's getting better and we'll talk about that. But this is your fairly almost classic lineage from source to target. Do you need all of this every time? Maybe not. You might not need to have the full detailed lineage. At a minimum you should know how we're calculating total sales. I think the auditors might want to know that and what the metadata is. So moving ahead, another part of the lineage is this idea of impact analysis. So think of any software development. And this came from, again, a protected, innocent example that I've worked with. Something as simple as you're developing software. And I want to change the name of the software. I want to change the brand. Our company was acquired. Now we have a new name for our company. Where is this affected? There's software that's written that says name of company. There's orders that went out to customers with the name of the company. Wouldn't it be nice just to say where is this used and change it once and have it cascade? One would think. But as anyone who's done this knows that that lineage doesn't always exist. So having tools that can help you automate that and see that impact have changed. Not only does that help you do it faster, but that reduces risk. This is a fairly innocuous, well, maybe not innocuous, your brand. That's actually your face of the company. But it could be anything. It could be customer name. I've changed the name. How is that cascade through the system? I'm changing anything. I worked with a big retail company a few months ago. Again, name protected for the innocent. But somebody innocently changed the name of the customer identifier from 12 to 10 characters and broke the system and orders weren't going out. And it was a major emergency. So when I was young, I used to joke. There's a metadata emergency I have to get to work for. But we really do. So there was a lot of things behind that governance change management. But had that person said, what happens when I change the name of our product code, these things are going to break. That's pretty important to running the business. So metadata does matter. The other thing we hear a lot in the industry is big data, right? And big data analytics. And I'm confident now and pleased to see that the fallacy of you don't need metadata for big data has gone away. It isn't just magically put it in the lake and sort of munch it together and great insights come out. I would say almost more so with big data analytics. You need to have the metadata around it. So here's an example. This actually came from one of the clients we worked with doing sort of energy meter reading and smart meters that I've talked about in previous webinars. So it could just say, our analysis showed the energy used with smart meters increases by, you know, 5% for each decrease in temperatures. The colder it gets, then the more energy you use. But with smart meters, you can be more efficient, right? Well, there's a lot of questions about that. How did you get that analysis? Did you use it by household, by individual? How do you get household? Is it by your residents? Is it by relationships with other people? When would these readings taken? What was the source of the weather data? Was it in Celsius or Fahrenheit? You know, all of these questions are relevant to getting the answer to that question, right? And so unless you have that, you could have really embarrassing results. So an interesting statistic, and I'll have a lot more in this presentation, of one of the biggest impediments to data lakes, according to actually Boulder area, Colorado, research advisor, Arabian advisors, was that's one of the biggest impediments to success of a data lake is not having commonly understood shared metadata and data definitions. So you can have these great numbers that came out through data science and statistics, but unless you know the context of that and the meaning of it, it's not going to be valuable at all. So there's two types of, well, many types of metadata, but the two main categories are business metadata and technical metadata, and that should be obvious, but here's some examples. You know, business metadata, what do we mean by a customer? A customer is a person or an organization, B2B and B2C, they purchase a product, or there's a lot of context around something as easy as a definition. Or data stewardship, privacy level, acronyms, all of that stuff that kind of makes business sense is your business metadata. The technical metadata is, you know, you're almost your data dictionary type. What's this column structure? What's the data type and length? Are there nullability rules? How does data move? What server is it on? All that sort of stuff. And both are important if you really want to understand the context of your data. So delving a little bit more into business metadata, and why do we start with business metadata? Because I think everything with data should relate back to the business, but also it's sort of proven by the numbers. So in this Data Diversity Survey, I found it was interesting, 80% of the users of metadata are from the business. So I just thought that was interesting, let it sink in. I mean, that's sort of obvious that that's really what helps that communication going. So something like how this total sales calculated, they want to know that. And I found from my experience in consulting, I often find that business users sort of get that more than IT does. And I'll claim to be a techie IT person, but I think we're also on our worst enemy. Do I really have to document what I did? Yeah? So people know. It's not the fluff. It's actually the meat of what you're doing, the context. Try to sell what you're doing by having people understand. One of my favorite quotes from a business user, we were sort of explaining metadata and lineage and why we need this. We're actually trying to explain sort of this picture. Why we need a warehouse and all the lineage and we can calculate total sales and what it means. And she said, you mean you're not already doing that? Pretty scary. We just sort of assumed you did know where the data was and what it means, right? So to our credit and in the tech world, it's not always that easy to just magically have things documented. But documentation is as important as the code you're writing or the database you're building. It's just that's how people know how to use that. And I think the business gets it and we just have to support that. So an example of that that hit home and I used this example last year in one of the BI presentations I did. Self-serve is huge. There's open data sets. There's amazingly cool data visualization tools out there. This is the time in the world for data nerds to just be able to play around and have fun. But as you know, data is only as good as the metadata. So this is an example. I was actually trying to show the power of open data and some of these visualization tools. And there was a data set out there from one of the UK agencies that was road safety by vehicle make and model. And I thought that'd be fun. Let's see, is it the Porsches that are getting into more actions compared to the Volkswagen's, right? But I couldn't ever get there because there was no metadata. So I did some really neat visualizations. I know that F3 is just amazing compared to, you know, F10 that's sort of weak. And F120169, I have no idea what this data means. I know there's something really interesting. It's almost teasing you because you can see that there's some patterns. We have no idea what that means. And just basic, I can sort of assume that F2 is probably a year, 2015, but it's sort of shown as a 2000, 0100. There's no metadata. So this awesome visualization tools and amazingly open data that somebody spent a lot of time collecting was completely wasted because there's no context, right? So, you know, that's just kind of, I thought, was an interesting way to show that example. But especially in this idea of the rise of self-service BI and analytics and data prep, more and more folks are going to that self-service model, which is a great thing because more people are looking at the data and they want to play around it themselves. So a good Gartner study, actually they predict by 2020 over half of folks will probably be doing some self-service data prep with data integration and reporting. But I thought it was interesting that they called out, too, that it's curated metadata is needed, right? So they predict by 2019, organizations that provide agile, curated, that emphasis mine, internal and external data sets will realize twice the benefits of those that don't. And I think the key is that, yeah, agile, get it out there, but don't just get it out there and no one has any idea of this example. It might have been really fast to get that vehicle data out there, but it's completely useless until I know what it means. So I just found that, I mean, self-service is great. Self-service is great with nice curated data sets. And that, again, should be obvious, but often in the rush to get things out fast, we skip the most important thing, which is the metadata. So my famous quote is avoided that I just know. And again, you're building this, you know, something like part number. Really, I have to make a definition for a part number. A part number is the number of a part. You know, please don't roll your eyes and do definitions like that because stop for a second and think, is there context around that? Could something be like, oh, you know, this is a guy that's been with the company for 20 years that used to be called component number? Really, thank you that someone who's building a script would love to have known that. Or part number is also numeric. The first two fields mean the customer region. And please don't do that and build intelligent keys that way. But this often so much more information that could be put in a definition that would be critical to some of the analysis. So please do share that out. Put it in a glossary or a metadata repository or a data model or some of the new collaboration tools or all of the above. Because that's really the meat. Take that extra time to do that. Because in this new world of agile and data science and self-service BI and data lakes, metadata matters even more than ever. So, you know, sort of the irony of it is, you know, we've all heard the statistic that data science is the sexiest job of the 21st century and all of that, right? So these folks that got these fancy degrees and have all this great information are all excited to go to be a data scientist. And they spend about 80% of their day just trying to do things like reformat region codes or find the metadata or get things into a common format, right? And so here's another quote, you know, data scientists spend 50 to 90% of their time cleaning and reformatting data to make it fit for purpose, right? So again, magic doesn't happen. Numbers and numbers, data is data. It's the data quality contact structure, i.e. metadata, data models, glossaries, lineage, all of that stuff is what makes data science the same. So lots of waste, wasting that person's time, wasting their energy, this person that would love to be doing awesome analytics and she kind of wants to shoot herself because she just wants to get some work done, not this other stuff. So when things are done right, here's the happier self-service user because there's metadata. And we'll talk more about this, I think in a lot of ways there's kind of this dichotomy to, I don't want to say, but traditional or old school things like master data and data warehouses and glossaries and data models and metadata repositories. And I think there's a fallacy and I will strongly say that's a fallacy that you don't need that anymore. Of course you need that. If I have, you know, I'm reporting financial metrics to the street and that's audited. I need to have closely regulated metadata. I need published documentation. I need lineage. I need standards. If I have something like product codes in my company, they had better be standardized and published. And I think the self-service user would be the first one to say that, great, there's a standardized data set. I would love to use that. Do I want to build it? Maybe not. That's why data architects have a great place in this world. Folks like me who actually enjoy doing that kind of stuff. But publish that out. Let's make sure folks want to use that. But there's another kind of metadata, which is kind of, I guess, crowdsourcing way. And that's also metadata. But in a way, this is the agility and the making people's lives interesting that, you know, if I'm building a data warehouse and I want my master data, I love to use them, sort of here on the left. I can also use that for my own self-service data prep and analysis. But there's certain things I'm building with metadata about. What model, the analytical model did I use? What data set did I use? Hey, I tried this cool query and that really worked well. That's more that crowdsourcing, I want to see what other people have done. Oh my gosh, somebody else already built the model. I could use that. Maybe I'll go talk to Joe who built this model. So it's sort of one of the ways I like to look at it is kind of the encyclopedia versus the Wikipedia. And both are good, right? So I was one of the skeptics when Wikipedia came out. Are you serious? People are just going to put stuff out there. That's why we have people who write encyclopedias. But I use it all the time. You understand where it's coming from, but it's that wisdom of the crowd, that eventual consistency, it's being dynamic. There's still a place for encyclopedia. It's going to take longer. There's probably a few vetting it. There can be problems with that approach too, because are you getting the right information from everybody? So I think there's a mix, right? So if there's a standardized enterprise data set, these are the validated product codes we're using. Please, publish them. There should be some sort of feedback. People can say, actually, that's not right, unless give some feedback into that. But the Wikipedia is more really for that self-service data prep analytics. Hey, I found this cool query. Hey, look at this data set. Or some of the tools out there can actually do usage metrics. Well, I know this is the approved data set, but 90% of the people are using this. So maybe this is the right definition of total sales that's actually being used. Or we put out a data set and no one's using it. Why not? Let's evaluate. So the nice thing is that both, again, are valid. It's finding that right balance. So don't do the wrong approach for each. Don't just loosey-goosey put out core master data without it being properly governed with proper lineage and proper committee voting and all of that. You still need that. But also don't overgovern things like people are doing exploratory analytics. They work well together. Just do the right approach for each. And then that really, I think, when we're talking about modern metadata strategies, it is that you have the best of both worlds, right? I still use an encyclopedia. I'm glad people can do scientific research and vet things out for years and then tell me the answer, right? But there's also someone in their backyard that really knows how to put up a swing that I bought. I don't know. I just made that up, right? And that's true of the crowdsourcing. I'm glad they can just do a video and show me, right? So both are valid. Neither one goes away. It's just using the right approach for the right method. The other thing really that I like about metadata, and this is an acronym, FTLB. I made it up. Faster time to light bulb, right? How do we make faster decisions? And of course there's some metadata around this. It doesn't mean foot-pound energy management unit, right? So unless you knew that, you didn't have metadata and you wouldn't have known what I said. But I've seen this over and over and over in organizations. Folks love to say crazy things like, oh, we don't need data models anymore. It's all big data or we don't need metadata. Well, of course, even more so with the volumes of information we have in the data links. So I'm a big fan of the conceptual data model or the business-level data model because I've seen it over and over these light bulbs go off. Even just a high-level conceptual model that, again, think of that guy in the picture at Goss, I have to document this. We just know, well, of course we have staff and support reps and engineers and customers and products. Do we really have to build a model to show that? Yes, you do. Because you start to see, for example, you spell it out this way in a nice clear way because there is so much complexity behind this, making it simple as valuable. And you start to see the light bulbs. Wow, we have support reps that are actually talking with customers or online chatting, and there's these support logs. Would that be great if we could pass that back to engineering and they could help feed that back into product design? Yeah, I never really saw that till it was so clearly spelled out in something like a data model. Or, wow, this is interesting. When we look at this loyalty program, which are the customers that are most excited about our product, and in the databases, there's no link between anything. We don't know which of our customers who bought a product from our invoices are actually part of the loyalty program. Huh, maybe we should know. We don't actually have our loyalty program link to product. So maybe we know what customer it is, maybe we can fix that line, but wouldn't it be great if we knew which products of ours they bought and which ones they like, and can we feed that back into engineering? Right, so this is like your whiteboard brainstorming through metadata and through things like data models. And, yes, behind this might be a data lake. It might be a relational database. It might be Internet of Things streaming data from your product. It could be S3 buckets on Amazon, right? It doesn't mean that you don't have the coolest new tech. It means you're understanding what it means and how you can better use it. So metadata does matter, not only what we mean by acronyms, but really thinking strategically around your data. And you can't do that without metadata. So that is why it matters. It isn't just a documentation exercise. It really is helping you make better data-driven decisions. So that was sort of the business side of the equation and why we need it. But the technical part is equally as important and equally as challenging and exciting, actually. So this is again from one of the research papers of what are people using now in terms of metadata for technical platforms and what is being used in the future, right? So now I didn't find too many surprises in that. So a lot of what we discussed in almost that classic picture. I'll keep going back to it because it just sort of resonates with a lot of people. It's this picture. It's BI tools. It's data warehousing. It's data models. It's glossaries, right? Now I have to jump back to my own picture. So that ETL tools, that has sort of been the classic metadata story for many years. That doesn't go away. When you look at the future state, you'll see that all those things are still important. But what I found interesting is now everything is important. All those lines just get bigger and more volume. So that continues, but you'll see some other data source platforms and things like big data, machine learning and AI, semantic technologies, no SQL. I found this interesting. We'll talk about this legacy platforms, right? So when you say, what are people using today? Yeah, we have some legacy things like COBOL and JCL. That's only going to increase in the future. My thought on that is that that guy in the picture, why do I have to document my code? Well, sometimes you're going to retire and no one's going to have any idea what that means. And no one's going to have any idea what even COBOL is at some point. So you need metadata to really understand that because they're still running. There's big old mainframes running things quite well that aren't going to go away in the near term and we need to integrate that with other systems. So things like social media, media, video files, pictures, all of that has core metadata that tells a story about your organization. And integrating that can be a huge challenge. So that is the other side, not just from the technical side, but from the business side, but from the technical side as well. So some of you might think, could she have talked any faster and she's going to speed up? Yes. Yes, it can happen. No, I'm one of the few nerds that actually would spend the time and I found this super interesting. What does that mean? What is social media metadata? What is the picture metadata? And I've actually gone through and done some examples and I am not going to spend a lot of time on each slide. I just will tell you that now we're just going to zip through them as an example. And then if you're interested, as I said, Shannon will be sending out the slides. There's enough on that slide that you can kind of digest. I just wanted to point out, yes, here's some thoughts and examples of what would cobalt metadata be, right? So in case you run into this and you're bored, some night and want to look it up, this will just be a reference for you. So in the zipping through, cobalt copy book metadata, yes, I know the title is modern metadata strategies, but they are still around, right? And it's only going to grow. So think of that as almost your data dictionary. It's typically a cobalt copy book. It tells you the structure of that data. So if you need to go back and look at an old system, something like a cobalt copy book can tell you what it means, first name, last name, date of birth. Ah, got it. Thank you metadata. You've made my life easier. Next on the laundry list, big data platform. Ah, this could be a whole webinar, but we're going to spend a second on it. So just when you think of it, a big data, like Hadoop, that's a file system platform, something like HDFS. So when you get the pure metadata from that, you're going to get the metadata from the file system, almost your directory structure, right? You can also get how that data was ingested and put on there, some business, what does this file mean? Can we tag the file? I think what a lot of people mean when they mean big data metadata is the stuff on the platform, right? So there's no structure there unless you put it there. So if you have something like a hive, it's going to look more like a, you know, hive structure is going to look more like a relational database, but you could be putting video files out. You could be putting anything. So when people say big data, that could mean so many things. So at its core, you're going to get the file structure or you can get the metadata about those things on the platform, right? NoSQL. Again, that could be a whole webinar. And what do you mean by noSQL? That's just a huge swath of things, right? So again, it depends on what platform for no key value databases. They can be amazing. They can be fast. They can do a lot of great things. Metadata is not one of their strengths. You really kind of have keys and values. And often it's the application code that's adding that metadata. It's not like a relational database that one of the nice things about relational, it says have that sort of structure like your Kobo copy book that can give you the structure of that. Some noSQL databases can. Think of a document database like a MongoDB. They do sort of have much better metadata in that sense. Some modeling tools support that. And you can see that kind of different kind of your tags that there's a country called China and there's different things. You could have an artifact. You could have a book, et cetera, et cetera. So some noSQL does have metadata. Some aren't as strong. This is actually where some folks were talking about EDW that's coming up next month. This is a picture from it. Gorgeous venue. I'll take a little extra time to tell a story of I was at a party. I do get out sometimes. And a friend of mine who was a photographer was talking about metadata. And I practically ran across the room in a costume like, did you say metadata? And she's like, I'm like, what do you mean by that? You know the tags and stuff that people can find your photos online and you know what it means? I'm like, yes, yes. And why do you use the term metadata? And at this point she's kind of frightened and is like, I think I need to go get a drink. She's like, that's the term they call it. I don't know. Leave me alone. But we tend to think that metadata is our data warehousing type world. People use metadata all over for, this is an actual picture I took. I'm a terrible photographer. All the settings. You might love this photo. How do I recreate Donna's artisticness? Well, here's what she did. She had an Apple iPhone. These were the settings. You can add your own tags so that if someone wanted to search for EDW pictures in San Diego and Google, they can find them. Or I have a copyright. This is such a great picture. I've copyrighted it. It's licensed for it. Don't use it. So lots of metadata about something like a picture. Social media. Gos, if you want to nerd out all of the metadata you can get from a tweet, pretty darn interesting. Who tweeted where they were, what device they used, et cetera, et cetera. Sort of interesting, creepy, however you want to look at it. But yeah, there's a lot of metadata in social media that you can mine and integrate into your systems. This one, metadata for machine learning. Right? I mean, that's fascinating now, right? So what I find interesting there is that's not only the data that was used, but what algorithm are we using? There's a lot of controversy in media now of is there bias in algorithms? Can we publish what algorithm is being used to make a decision about a population or a marketing campaign or whatever? Right? So there's not only the data that has metadata around it, but the algorithms themselves. So again, we're zipping through all of these could be a webinar of themselves. Semantic Web was one that came up. Think of RDF, World Wide Web Consortium. You can read through this as a little more, think of it as kind of triples, right? So a thing as a relationship to a thing, right? Here's an example you can look through that kind of explains that. So the basic one-liner is instead of creating a web of documents like on the internet, have a web of data. So there's certain things like this picture that I took, that was at a place, that place could be the certain San Diego. Am I giving enough plugs for EDW here, Shannon? You still have time to register. But there was also an event at that place, right? So again, this common narrative. So it's just adding some context to things like web pages. So coming up for air, that we're done with the race through technologies. And this is actual picture from somebody who's ever listened to one of my presentations. No, I'm kidding. That may have made your head spin. But hopefully if you're interested in any of those, it's enough that you can go back and kind of look through. But what's equally or perhaps more interesting is not only what type of metadata we need to store, and that's only going to increase, but then how can we use some of these technical innovations to manage metadata itself? The meta, meta, meta, metadata, right? And that's changing a lot too. So one of the things that everyone has heard about now is machine learning. And how can that be applied to lineage or metadata? So in that picture, oh, can I show it one more time or do you have it memorized? In that picture that we showed before of that kind of typical data lineage, certain parts of that can be automated. And if anyone's been in the business, as long as I have, you should have done that. So does customer link to CUST on the database? And does TBL underscore CL link to customer, right? So a lot of that in the past was sort of done by, and still, there's still a use case for that kind of manual mapping that I know SSN equals field one, SSN equals social security number. And you can create these naming rules and do matching. And there's still a very valid place for that. But for a lot of that boring stuff, kind of like this teacher was saying, none of us really joined data management to do a lot of those mappings. So a lot of those can be pattern maps through things like machine learning. So if you set up this thing, this pattern looks like a social security number, you can go through the data and find that for you. So hey, this field X looks a lot like a social security number, is it? So some of these things that are just kind of that banal boring mapping that we don't want to have to do, that's a great use for a machine. So a lot of this lineage in mapping can just be sort of done on steroids. There still can be a place for naming rules and mapping rules. And I almost align it to, sometimes I can get frustrating that every search nowadays is kind of that Google-style search that show me everything to do with fish on the internet. Well, that's fine. But often I want to say, show me a file from last week at 3 o'clock written by Donna about fish. But you want to get more specific. There's still a place for that kind of thing and defined mapping as well. Both can learn. Both can exist. Another nice way for metadata in the news several years ago, metadata was suddenly the hot thing of folks sort of taking cell phones and doing metadata pattern analysis. And I heard some folks in our industry say, that's not metadata. Well, that's not metadata like relational database metadata, but it's certainly metadata. It's just used in a different way. So think of things like fraud detection or threat detection or understanding a terrorist organization and who's talking to whom. That's done by kind of pattern and graph database analysis. One of my favorite quotes, and I didn't save it. I'm kicking myself a friend of mine from Australia. They were having a similar to folks in the US might be remember some of those metadata. You know, people were looking at metadata from phone calls and things like that. And was that valid? Was that creepy? Was it whatever? And it was a headline on an Australian newspaper that said prime minister offended by not being invited to metadata talks. Like, yes, my career has come full circle. That is a headline that the prime minister wanted to get in the meeting and talk about metadata because metadata was critical to some very key. This is definitions and actually government rules. So again, metadata means a lot of different things. And then it's just tagging. And what's sort of nice to see is that, you know, everyone gets metadata. So you know that Amazon S3 is totally new technology at this point, but it's not a relational database, right? So they have this idea of things like metadata tags that can actually travel with the data. So you want to say I want to have this a security tag on this bucket or on this file. And as you move it, it'll stay with it, which is pretty neat. So you're going to sign that metadata with your kind of put and post requests, as well as some of the system-generated metadata they have. You know, is what kind of storage class, when was it created, you know, what encryption levels, et cetera. That's what, where, why, when of metadata. So what's sort of nice is that, you know, most platforms get this, that with metadata, I have this. I've created a test bucket, you know, GDS test bucket. What is that? How long do we have to retain it? What does it mean? That's what makes it interesting. So, yeah, we could, again, wax poetic about all the different technology types, but to summarize, any technology, any data needs metadata. So how do you design a metadata strategy that makes sense? There's so many different types out there. There's so many different stakeholders now looking at the data. It can make your head spin. So, I'm a consultant. I'm a big fan of templates, right? They kind of make our job easier. And all of these you could sort of say, well, that's sort of simple, but it's the simplicity of them that makes them helpful. So one thing that's helpful is just do an inventory of all of your data sources, right? What do we have? I know this slide is hard to read. Do we have relational databases? Do we have BI tools? Do we have open data sets we need to look at? Do we have a big data platform, et cetera, et cetera, et cetera? And you may want to do something like a stakeholder matrix. Who is looking at that and why? And then almost do a mapping. So here we'll see in this example, SQL server, oh, no, it was Oracle, is used by everybody. That might be good to make sure we have good metadata about that. These open data sets, huh, a lot of folks are using these. I hope we have good definitions for what this open data means and all of that. So it's just kind of a nice heat map of priorities. What tech do we have and who's using what? We also often do a racy and, you know, how important are these people? Is it one kid doing one little analysis or is it the CEO trying to find some information and hate to be elitist, but that does make a difference, right? The CEO is looking at it, make sure they've got a good information for them. And I talked about this before, but it's important enough to mention again, is don't overdo it and don't underdo it. So know what to manage closely and what to leave alone. So yes, I'm doing a master data effort. Please, yes, that needs to be highly governed. This should be clear metadata around it. Core enterprise data, my enterprise data warehouse. Yes, please have lineage, please have defined metadata, defined metrics for the reports. But then if I'm doing some raw exploratory data, I want to do some sentiment analysis from the last media campaign, a marketing campaign, and I'm going to download some data sets and see what happens. Don't limit folks from doing that sort of exploratory analysis. And then there's this kind of gray or actually it's like blue area in between, is that maybe there's some local data march or some local reporting that people are using or operational reports. There should be some governance there. It doesn't have to be an entire enterprise steering committee to approve everything. But there's often a continuum, right? Things that may have started out as exploratory data. That's something we should maybe track in the warehouse. Didn't realize that the weather that day for a media campaign affects sales. Let's start tracking that and vet that in the warehouse, et cetera. But just don't overdo it and don't underdo it. Really give some thought of how this data is being used. That's metadata, right? That how this data is being used. And then put together a realistic roadmap, right? So the challenge for all of this, there's so many things we need to do. We need to have a realistic business strategy and how that relates. There's things like Internet of Things and open to data and social media and my warehouse and everyone wants to look at it. We'll do that analysis that we showed before of what and where and how. But do think of the why. And it may take a year or two to really get that full metadata lineage. But can we do sort of quick wins? So if we started with just customer data, how many people would be happy by that? Marketing would love it. Sales would love it. Customer support would love it. The execs, of course, would love that. So let's start with customer as sort of a business area. And then, yes, we could do a data warehouse and phased approach and do customers first. But then we don't want to just be old school. Can we think of the new stuff, too? Could we look at some social media around customer? Can we communicate that out? Can we have a governance look at customer first? Right? So just think of it holistically. And again, not brain surgery, but important to think about where you are today in terms of maturity and what your business priorities are. And then deliver these things in a phased quick win. So to summarize, metadata is hot a bit ever. Why? Because more people realize how valuable data is. And if you get that right and support both business and technical and both kind of the new technology and the old, you can't skip any of it, that's really where you get that faster time to light bulb. Right? The reason we're doing this is better innovation, faster speed to market and reducing risk. Right? So we'll quickly open it up for questions. Just a few things. We do this for a living if you need help. Let me know. The white papers I mentioned are all available both on data diversity as well as our global data strategy white paper. If you're passionate about metadata, it is Education Month and there is a course out on metadata management on the data diversity website. If you're interested in that, it's just a sort, goes into a little more detail than we covered today. And we hope you can join us next month to talk about graph databases. So with that Shannon, we can open it up to questions. Donna, thank you so much for another fantastic presentation. We've got questions coming in already. And just to answer the most commonly asked questions, we will be sending a copy of the presentation by Ender Day Monday for this webinar with links to slides, links to the recording. So diving right in, Donna, common practices, common practice uses tools such as CLEBA to build out data catalogs, business glossaries, and data dictionaries. What is the best way you have found to tie all these out and how do you think the documentation to a well-documented database, for example, extension props? I am seeing kind of a merging of the two extremes in the market. You wanted me to bring out my favorite picture of the lineage, so I'm going to go back there. So I think in the old days, or typically there's almost been an either or, there's been a tool that's been really good. It's sort of getting the technical side, and that might be your standard metadata repository or your data modeling tool that kind of have scanners and can automatically populate your data dictionary. Where those historically have been weak is really getting that idea of a glossary. What is this, and having a view that the business people can have? So then there's been often tools, and I won't, I don't like to mention names, but that have had more of that, you mentioned CLEBA, but they've had a very good kind of business user interface. What does this data mean? What are my glossary terms? What is the organizational structure around that? But I think historically there's been kind of a gap between those two. You should have had an either or. I don't think we're 100% there yet that any one tool in the marketplace solves all of it, but I'm seeing an emergence that a lot of those data modeling and metadata repository tools are moving more towards adding more glossary functionality and more organizational structure and more collaboration and that business user restart. And I think a lot of the business user tools are realizing that you can't have that in a vacuum and they're adding some of that technical stuff so that you have the glossary definition. So hopefully this line here will go away and it will be solid and you'll have all. So right now there's kind of strengths and weaknesses, but holistically you need to see them both together. I hope that helped. Absolutely. And I'm kind of sorting through the chat as well. There's a lot of great questions that came in through there. With all the changes going at the speed of light, wouldn't it be a nightmare to document all this stuff? Are there any automated tools available? Yes, there's definitely automation. So there's two answers to several answers to that. One is don't do it all, right? So pick just enough metadata management. So I would definitely start, make sure your master in reference data and core enterprise is documented. So that's like on the type. I think the other way to prioritize and not boil the ocean, as they say, is on the content. Maybe we're going to start with customer data first or product data first. So type content and then there's certain things that are automated and certain things that cannot be. So a lot of this picture can be automated. Plenty of tools out there can scan your relational databases and get the data dictionary. What can never really be automated and it will scare me when it can be is what's in human brains, right? So you can automate a lot of the stuff that can be automated with machine learning or scanners. And then there's that. That makes the human analysis easier. And there should be a way to connect both of those as you can only define how you define total sales. No one can define that for you. But the tools can help you with a lineage to show how that was calculated. That helps. So yeah, don't do it all. Automate where you can. And make the automation let you have that faster time to light bulb to really do the business side of it. Yeah, there's a lot of questions about tooling here. You know, is there any way to use machine learning to help? That's used to help in metadata repository information to automate this and enhance it? Yeah, so I did touch on that briefly. So some of this can be done. I wouldn't call it machine learning. It's really learning scanning the ETL and scanning the metadata catalogs. There is a lot of the metadata tools are adding that idea of machine learning. So you don't have to say, you know, this particular field maps to this is one example. And they can sort of depend. There's a lot of metadata patterns and that's what machine learning is really good at. So yes, definitely use that. So humans don't have to do this machine stuff. Humans can start adding the human metadata on top. I love it. You know, and in general, Donna, you know how much of the metadata management process should be on paper, quote, unquote, on paper, so to speak, before using tools, you know, such as glossaries creation and extracting technical metadata from source for future validating? I am a bigot to have things automated and on machine wherever you can. That said, I'm also a big fan of whiteboarding. And let me give you an example from the slide. So this example of the data model, right? A lot of folks want to skip data model because it takes too long. And I will, good thing I'm not with you. I would slap you for that, right? So something like this is a great way to whiteboard. Get some business people and they're, what data do we even have? Staff, salespeople. I did this last week at an education client and it was a light bulb moment for a lot of people. Someone said, we have donor information. We have IT information. We can link together. And it was literally a brainstorming of all the data we have and how we can use that better. But then that should be put in a data modeling tool with true metadata. Then they did do the scanning for the relational databases. So it's both. Use the technology where you can. You shouldn't have a person going, writing down data dictionary fields from a database. Please don't do that. There's tools that can do that for you. But using pen and paper where it makes sense to brainstorm and then put that in the system also makes sense. So both. So how do you manage the many versions of the metadata? I think this usage is complex and it means different things for different business units. Good point. So especially, well, with versioning for the technical, you definitely want to say, you know, this was, you know, test, you should version your metadata repository if you have that. Do a test development production just like that were a database because it actually is. For your technical metadata, make sure you map that. Am I doing, as you do change management for your production environment, metadata should go through that. But I think more importantly and probably what the person was saying, I think it makes sense with these collaboration tools at a minimum who changed it. I'm trying to find that slide. I guess I'm a consultant. I can't talk without slides. But this idea of the Wikipedia and encyclopedia, or who made the changes, it really is that balance. So in some cases you can have discussion and then the Steering Committee must approve it, but at least you see where that change came from with discussion. Or it could be open collaboration, but at least you can say this was changed because these pads. So most of these do come with some sort of history and discussion threading and things like that. So, you know, does the data governance Steering Committee actually do collection of metadata or just put some strategy in place? I think generally the steering, there's a slide I could show that shows that. The governance is on many levels. It's kind of an operational level. You know, the architects day to day, they're the ones actually building that metadata. I wouldn't say that the Steering Committee would be doing that. What the Steering Committee, to the person's point, is they should vet that decision. So say you are using a collaboration tool. And two groups can't agree on what total sales is. That's a great way for the Steering Committee to kind of make that final decision of what it should be done. So no, I don't think there's either a working group or, you know, kind of the operational teams generating the metadata, but it should be easily consumed by that Steering Committee so they can, almost that idea of that data model I showed, they wouldn't build that data model, but it would be a great tool for the Steering Committee to say, how do we prioritize these? Should we have a program to link loyalty with customers? That kind of thing. So this should be consumed for them to easily make those decisions. All right. And I'll give everyone a moment to add additional questions here. There is a question here, Donna, if there's templates for you that you can share. Well, of course, we'll be sharing the slides. Yeah, we will share the slides. They're kind of little on here. We don't share templates beyond that. I will say that the course that we called out does have a few, goes into a bit more detail there. You can kind of get the idea, but, you know, these are quote templates, but they're also easy to build yourself as kind of the beauty of them. It's a spreadsheet or it's a document. I love it. All righty. Well, that brings us to the end of the questions here. Donna, thank you so much for another great presentation. It's such a hot topic and such an important topic, as clear by the questions and by the attendees. So just a reminder, we will be, I will be sending a follow-up email by end of day Monday for this presentation with links to the slides, links to the recording. And also we have, I hope everyone will join us in April for the rise of the graph database. Donna, and yes, you did an amazing job promoting ADW. I just loved it. It was awesome. So hopefully we'll see everybody in San Diego next month as well. So I hope everyone has a great day and enjoy. Thanks, Donna. Thank you.