 Hello, and welcome my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity webinar, Essential Metadata Strategy sponsored today by Top Quadrant. It is the latest installment in a monthly series called Data Ed Online with Dr. Peter Akin. Just a couple of points to get us started, due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A section, or if you'd like to tweet, we encourage you to share highlights to questions via Twitter using hashtag dataed. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And to open an access, either the Q&A or the chat panels, you may find those icons in the bottom middle of your screen. And just to note the Zoom chat defaults to send to just the panelists, but you may absolutely change that to chat with everyone throughout. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days, containing links to the slides. And yes, we are recording and will likewise send a link of the recording of the session as well as any additional information requested throughout the webinar. Now, let me turn it over to Jesse for a brief word from our sponsor, Top Quadrant. Jesse, hello and welcome. Hi, Shannon, and thank you, everyone, for joining us today. I work for Top Quadrant as a Semantic Solutions Architect, and we're going to jump right in with a set of questions. So first, we need to ask what is active metadata management because of the title here? And according to Gartner, that is an emerging set of capabilities resulting from continuous metadata management innovation. That's what we're here to talk about is that continuous journey. Next slide. It all begins with metadata, and the best definition and explanation of that is going to be given to us by Peter and not to steal any of his thunder, but I've stolen one of his slides by screenshotting it and pasting it here because it is such a relevant description for us at Top Quadrant. It basically says the who, what, where, when, how, and why is of data are individual statements about the data, which is basically moves us to the next part of the path. Next slide. And that is to say, okay, we went from metadata to data catalogs, but what is a data catalog? And I love this description here that I'm going to give you, which is your metadata catalog or your data catalog in general are all of those metadata statements just brought together to serve a purpose. And that purpose is or are the data stakeholders, sort of a one-stop shop, if you will. And it's important that all of these facts, it's not that they're just in one place or centralized, it's that they're understood and most importantly connected. Next slide. Now, the connections are the most important, I said, so how do those connections happen? And this is where we start getting into this idea of a data or metadata model. The model describes the relationships between the data assets and the secondary components. Here we see an example that's describing a data set and it's not a specific data set. This is the model. So this is what is allowed to be said about a data set inside of your data catalog. Next slide. So if we can describe a data catalog, things like data sets and databases inside of your catalog, and we can cover the idea of controlled vocabularies, we're really getting someplace now. So the data catalog has the ability to describe your first class citizens, but also the second class citizens, those are the controlled vocabularies. And these are everything from business glossaries to simple enumerations. For example, here, a common list of formats. The more we speak in common controlled vocabularies, the better. Next slide. So with the model in place that is giving us instruction, guidance and an understanding and the controlled vocabularies in place, we can start stringing real statements, actual statements together. And once you can do this, you have something ready to be automated, enriched, governed, consumed, which are really what you're ultimately doing, what you're doing for. And if you like where we're getting here, you'll enjoy the next bit, which is on the next slide, Shannon, which is to ask the question, what is a knowledge graph? And that's what Top Quadrant does. Our software is all based on knowledge graph semantic web standards. And the best way to answer what is a knowledge graph is to say that you'll want to do all of those previous steps that I used as an example, building up a data catalog. It's just you've decided to do it as a knowledge graph, which means you've decided to use semantic graph standards to formally describe your assets. And why would you want to make this decision? Next slide. Knowledge graphs align with Gartner's view that we began with, that metadata management requires continuous innovation and what I believe they also call a connected enterprise. Knowledge graphs are a flexible, standards-based, non-black box, highly-integratable metadata solution. And adaptable is another software virtue that could be added to this list. And I hope that maybe got some gears turning and gets you thinking, because the next leap in the journey, if we've gone all the way from metadata to a metadata catalog or a data catalog, we're talking about knowledge graphs here now, and we need to make that next jump, which is really adapting to the future. So if you are just getting started with metadata management, following Peter's great experience and recommendations, you want to be sure that you're on the right path to be able to sustain and evolve your metadata strategy. And that may just be to the point that we just got, or it may actually be that your future could be what is now called a data fabric. Next slide, Shannon. Gartner defines data fabric as a design concept that serves as an integrated layer fabric of data and connecting processes. They liken it to a self-driving car, but for your data. So you can see it's bigger than, you know, it's architectural, it's design strategy, it's bigger than just a data catalog or a knowledge graph, but on the next slide, Shannon. There's a strong belief that the smarts of a knowledge graph based data catalog, which was the journey that I took you on, is needed in order to support this emerging architecture. Next slide, Shannon. And we at Top Quadrant believe that our top rate edge enterprise data governance solution, which is based on knowledge graphs, has been designed for this exciting future for everyone in the metadata world. But we also believe that it's ready for you to start using on day one because it can evolve and grow with you very organically as a metadata system and allows you to build your metadata and data catalog following open semantic web standards and establishing you for that future, which may be something like Gartner's data fabric. So let us know if you're interested in top rate edge for modeling, vocabulary management, metadata management, or if you just want more information. So have a great day and we're back to you, Shannon and Peter. Jesse, thank you so much for this great presentation. And if you have questions for Jesse or about Top Quadrant, you may submit them in the Q&A portion of your screen. We will be, Jesse will be joining us in the Q&A portion at the end of Peter's presentation. And now let me introduce this series speaker, Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. He has written dozens of articles and now 12 books. His most recent just released is Data Literacy Achieving Higher Productivity for Citizens, Knowledge, Workers, and Organizations. Congratulations, Peter, on your latest book. Peter's experience with more than 500 data management practices in 20 countries and consistently named as a top management expert. Some of the most important and largest organizations in the world have sought out his expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, and the Commonwealth of Virginia and Walmart. And with that, let me turn everything over to Peter to get his presentation started. Hello and welcome. Welcome to you, Shannon. And, Jesse, thank you for a great intro on here, and you're right. That is, there's so much happening in the area of metadata that it's just sad to see how little that people actually know they can utilize as they're going forward on this. So when we bring Jesse back, I'm sure he'll talk a little bit more about some of the latest things that have been coming out there. But the Knowledge Graph area in particular is a very fruitful area. And those organizations that are not exploring this really run the risk of falling behind in this area. So what we're going to talk about today is really defining metadata in the context of data management. Unfortunately, most people define it, I think, a little bit differently than perhaps Jesse and I will on this. And the question is, would we really want to do with metadata? Well, the answer is leverage something. And the reason you leverage something is, of course, to add value to it. I'm going to start out with hitting you with a very quick example early on that uses iTunes or, in their case, Apple renamed it the Apple Music app, but so that you can see this. And most importantly, teach it to others as they need to learn about it, specifically your boss's boss. Swing your boss's boss says, I don't understand this metadata. We're going to look at four specific strategies. The first one is that metadata is a gerund. Do not treat it as a noun. That is, metadata is a use of data rather than a type of data. The second strategy is to enforce metadata to be the language of data governance. When I work with organizations that are having trouble with their data governance, it is typically because they are not speaking from the same language or understanding, as Jesse said in his little presentation on that. It is very critical that that understanding take place. Strategy number three is to treat glossaries, repositories as capabilities, not as technology. And I'm sure Jesse will agree on that one when he gets back here. Cyclical approaches tend to start out in a way that the organization can crawl, walk, and run their way towards success on this. And finally, the fourth one is that there's a lot of good first piece building blocks that are out there. We'll go through very quickly a number of them, but I've said many, many, many in there just to make sure that point is reinforced. We'll finish up with a summary of the benefits, the applications and the sources around this to understand metadata defines the essence of organizational interoperability. Finish up right at 3 p.m. with some takeaways and references and look forward to the Q&A session. Let's dive right in. We start out with defining it, of course, and in the history of language, when two words were pasted together to form an initial concept, we started out with a hyphen. Even our dim mock at DEMA is hyphenated with it. It's done. It's now the passage of time says that the hyphen can be lost, and we should use the word metadata. And that's important because, again, just trying to do a search on it, we wouldn't want to have to search for all three of those terms. Anything that's relevant that's happening is happening with the term metadata without it. Interesting also, there was a copyright granted by the US government to somebody on the term metadata, kind of like the joke that Microsoft owns all the zeros and ones now. Disregarded, it's absolutely irrelevant and has been thoroughly discredited in that area. So it used to be that if you said metadata, they were afraid somebody would come out and sue you. Let's start by talking about getting agreement. And it's very difficult in many cases because many people who are working with data see it from just one perspective. Again, the blind men and the elephant. It's a fan. It's a snake. It's a tree. It's a rope. It's a wall. And of course, they're all correct. Unfortunately, we've been working on our definition of data management around that as well. And it's also been a little bit tough. Perhaps some of you all will be able to jump in and help. That's not what our discussion is for. But if you've got better ideas on this, we're all looking. The idea is we used to say that data management is everything that happens between when data is acquired and when data is used. The problem is that's a little bit squishy. And it also doesn't include the idea that we're reusing data largely. Again, there's not much point in putting something in a repository if you don't plan to go back and get it later on. So we've been working on a different definition here, which is to say that there are things on the sourcing side, typically falling into a data engineering category, but that requires specialized skills. And that there are also things on the exploitation side. These, again, hit us in a way that's very, very useful. And now we can say, oh, you're looking at it from an evaluation perspective or from a presentation perspective, but none of these still hit on what we're trying to do with data for the most part, which is reuse it. And we are not really well studied as far as how to formally reuse data management. We're looking at it from a number of different perspectives. Again, the knowledge graphs area seems to indicate that we can use the metadata about the metadata, meta metadata around all of this to start figuring out access patterns and things. And again, that's something we can dive into a little bit more, but our focus here is really going to be on a preparation and delivery in these cases here. So when people are looking at data, they're looking at it from different perspectives and only seeing part of what's actually going on in the metadata world. If we go straight to the dictionary and what does meta mean, the fourth definition up here, beyond transcending, more comprehensive, and at a higher state of development are all great ways to think about this. And if you go to metadata's origins, which of course are the card catalogs of the libraries, I know that those of you that are too young don't know what a library is, much less a card catalog. So when you were doing research in the old days, you would go to the library, you would access the card catalog, you would flip through these three by five index cards and try to find things that were interesting. And these three by five index cards kept all sorts of information about the books. You can see these are just some examples that are here. And then you would write down the number of the book, the key, and go find the book in the library. So this leads us to, I think, a very good starting place for a definition of data about your data. Now, that's technically true. And if you think even more carefully about this, it means that you should manage data with the same good data management practices and techniques that you already apply to your data. So there's a very nice complementariness between the two of these. If you're having trouble looking about your organization and figuring out where it is, like you don't have a library, one thing almost all organizations do have is networks. And these networks have a group of people in there that keep track of where the wires go, what devices are permitted to log on, access points, all sorts of other things, and usually ends up being a named individual as in Peter Akin is the person who maintains the access list so that we can determine who can come onto our network. And this group within your networking group is practicing metadata already. They're not doing it about data, they're doing it about networking access points, but nevertheless what they are doing is still the access of metadata. I mentioned that the reason that you want to do this is because leveraging is an engineering concept. And if I'm trying to balance these two pieces here, you can see that 100 kilograms and one kilogram just don't work. But if I understand a little bit about the laws of physics, I can improve the weight on the right hand side and perhaps it will balance if I have the right length of the ruler. If I go to too many, I will get up with an unbalanced situation. And the key to this is that this holistic approach to data management works the same way. You've got some organizational data, you've got some technologies that you use. Again, many people are not very familiar with the technology, so I urge you to follow up with Jesse and the others and the events around this and learn as much as you can about these technology pieces because they are truly exciting. If we go to data and people, we also need to have our group of knowledge workers supported by data professionals. And of course we need, if we're going to put people and technology together, a process to tie them together and that process is the leveraging that we are describing. Think of this as a three-legged stool. You cannot sit comfortably on a two-legged stool, so a three-legged stool is necessary. Similarly here, people process and technology and that process of course is driven by strategy. So metadata should be focused on helping organizations do better with their data. And if you watch that slide carefully, you may have to go back and watch it on rerun. I also put in a little bit there that says if you've reduced the size of the organizational data, it helps to increase your data leverage we'll get into that in just a little bit. But metadata is a primary means of applying leverage or adding value to your existing sets of data. And this multi-use concepts permits organizations to manage your data within the organization but also with your various data exchange partners and most importantly in support of the mission because if you're not supporting the mission with your data, you're missing out on potential resources that can help. The leverage obtained by metadata is the idea that we're trying to get more data centric or I like to call it the data doctrine on this and focusing in on non-rot data. And I've used that acronym twice. It means data that is redundant, obsolete, or trivial. In other words, it just gets in the way. The bigger the organization, the greater potential leverage exists in here and treating data more asset like simultaneously reduces IT costs somewhere between 20 and 40 percent and increases organizational knowledge worker productivity up to a possible upward theoretical number that we've come up with about 18 percent. So imagine if you could lower your IT costs and increase your knowledge worker productivity, that's a tremendous opportunity here. This bit about rot is all about separating the wheat from the chaff. And in the past, we've had to argue that well-organized data is worth more but some people still demand some sort of proof here. So I'm leaving you with an example and that's the purpose of this of course is to give you all these opportunities. You can either replay these from the YouTube videos that we put out at the end of the session here or use these slides and repurpose them yourself. But just in terms of pre-information age metadata, we would take a book and we would number the pages. We would put the index in alphabetic order. There were all sorts of things here. And if you have more desire to learn about that, there's a wonderful series of examples here from Abby Covert who is a non-technical information architect and has written a wonderful book called How to Make Sense Out of Any Mess. Now the key is of course, if I take the cover off the book, the spine off of it and push these things out and throw the pages away, well, the knowledge of course disappears which means that somebody is going to have to recreate it. Which is why in our organizations, so many are spending time messing with data that has redundant obsolete or trivial. And the question is, which data do I eliminate? Because most data has never been analyzed in organizations. So metadata actually yields valuable information about your data sets. Do we have this specific class of data sets? We can say yes or no. What is the quality of something? And we can say it's suitable or unsuitable. Again, these are things that you can do with the technology that Jesse was describing. Then what cost will be to improve this class of data sets? Well, if I can improve them for 35 cents a piece and I have 1,000 of them, we've got a pretty good idea of the cost. If I have 100,000 of them, we have a different idea of the cost. And can these data assets be provided more granularly? And the answer typically is no if it wasn't designed into it in the first place. Metadata, as I mentioned before, we spelled it incorrectly in the dim buck in the last version of it. We'll get that corrected in the next one, but you can see it is in the upper left hand corner there with the yellow surrounding it. Talking about one of 11 practice areas that comprise the discipline of data management and metadata management here. Again, as an input output chart, I'm certainly not going to read you everything here, but it's a really nice little piece to print out by your desk and just put up there showing the inputs, the processes, the outputs, the goals, the activities, the participants, everybody who's involved in this. Now I mentioned I was going to go into straight example here. This package used to be called iTunes. It existed on Windows as well as on Macintosh platforms. It's now called Music in here, but the process is still the same. If I took a CD, again, those of you that are younger, that was the way we used to listen to music. We would put a CD into our computer CD reader and it would present information like this, which was not terribly useful. What did the metadata that was on each CD contain? Well, in this case it was the, excuse me, the specifically number of tracks and how long each track was. It's not very much metadata. So how did it get to where we are? Well, it turns out somebody, a company called Grace Note, put in a little bit of a query and said, you guys can come look at us and we'll take a digital fingerprint off of each CD and we'll maintain this one collection here so that everybody else doesn't have to type all that data in each time. Again, CD name, artist, track names, genre, artwork, all of these things come from Grace Note, which happens pretty much seamlessly when you stuck a CD into this music package. Now, you can see here, I'm looking at a Miles Davis recording here and I'm going to put my Miles Davis recordings together. So by the way, you can see I hold this example as my iPhone was a version four at that point in time. But anyway, we can do a special piece using the metadata in here that has now been provided and augments our existing collection and say I'd like to have a selection of a new smart playlist that contains just data from Miles Davis. So this helps us organize all of this information. Again, this is all out there. You can do this yourself, follow along if you want, et cetera, et cetera. The only problem was, of course, you could see here on the left-hand side, there's my Miles Davis folder that I've got my smart playlist, but I've now got twice as much Miles Davis in there as I thought I did because I had another Miles Davis CD in my iTunes collection that I hadn't forgotten about. I didn't get the desired results. I already had another recording and I need to fine-tune it. Now it's not just Miles Davis in this case, but I want to have Miles Davis in a particular folder in order to do this. All of this metadata, of course, allows me to make playlists that are comprised of jazz or pop or whatever it is that I'm looking to get a hold of in this. And most importantly, this architecture for the play scaled. So the iTunes metadata was exactly the same. Interface, processing, and data structures that were applied to. Podcasts, movies, books, PDF files, and many other things that were out there. They eventually decided this was complex and broke it back apart again, but the nice part was that it worked really well for about 10 years and the economies of scale when you build something like this are critically enormous. So once again, metadata yields valuable information about your assets. Do we have these specific Miles Davis recordings? Yes, indeed. What is my most played Miles Davis recording? Again, a particular song pops up. What will it cost to improve this class of data assets if I had low fidelity recordings? If somebody comes along and tempts me and says for two cents a piece for each song, you can go to High Definition Quality Recordings and that might be very nice. Where, well, can I listen to the entire album before dinner? It might be a reasonable question and not easily in this case because it's a live recording that's almost two hours long. So that's the context and the teachable example here. Let's dive into the strategies. Those of you that are not familiar with English grammar and I'm terrible at it, but I did learn enough about it to work it this way. As Jesse said, comprehension by others is critical. If others do not understand what you do, then you are perceived as a cost to the organization. Whereas if they do understand what you do, it's easier for them to understand the value. I'm going to show you the origin of that slide that Jesse used in there came from Walmart and not even for me, a colleague Brad Melton that I was working with at the time out there came up with this as Walmart was going through part of this journey. It was a wonderful piece. And he said, look at this data is, as Jesse said, a combination of any circle and any data that unlocks that value. I'm sorry, I went too fast on that piece. I'm going to go back just to make sure that you can see it when we do the recordings for it. But again, thank you, Brad, even though it's been 10 years, hope to hear from you soon around all of that. It was a great piece of insight that he had. Another one, of course, is your outlook inbox. Imagine if your outlook inbox didn't have stuff broken out by subject priority, user ID, et cetera, et cetera. It would be very, very difficult. For example, if my boss sends me something important, I need to be able to put it in a high priority boss. Imagine working with outlook and all you had was a bunch of email. It would be dreadful and complex. And we've already determined, again, it's a little factoid in the new book, but if we had better email management capabilities, we could save the typical knowledge work of 14 days each year. I believe that's the right number. I may have to look that up. Maybe it's 10. But that's certainly an awful lot of time that we spend because we don't have the easiest to use metadata facilities. So metadata is, of course, involved in every data management activity and it's integral to all applications. Data reflects real life. Metadata reflects the data transactions that occur in there. And those transactions that occur describe the structure and workings of an organization's use of information. Our friend David Hay has a wonderful book where he dives into that subject in a lot more detail. Jesse also mentioned a Gartner definition. I'm going to go back about, well, exactly 10 years. And one of Gartner's definitions at the time, and I think they still use it, is that metadata unlocks the value of data and therefore it requires management attention because there's a value component to it. I like that definition a lot and I use it all the time. Metadata management then is the set of processes that ensure proper creation storage, integration of all the things that we need to have. So you're, of course, all wondering, what is a gerund? Well, a gerund is the idea that it's a verb that functions as a noun. And I think that works really well in this case. It really isn't a noun. Because what happens is if you look around for metadata to store, people start pointing to things and say, is that metadata? And that's the last thing you want everybody doing because it confuses things. The answer is any piece of data can also be used as metadata depending on the circumstances. So instead of looking at things and saying, is that metadata and therefore should I put it in the tool that I'm working with, you should instead say, is this data that I'm now creating worth including in the scope of our metadata practices? Or does it fall under that rot stuff that Peter was talking about last week when I watched his webinar? Let me show you a quick use of this. Again, very old materials here, but this is really how new this stuff is. So we published a paper one of the journals a while back that talked specifically about using metadata when we're putting in new systems. It's a terrific paper. It's out there on the web. You can certainly get ahold of it on that, but here's just an example. This was implementing PeopleSoft which some of you know is a big ERP. And when they looked at PeopleSoft, they said we're putting in three modules develop workforce, administer workforce, and compensate employees. You can see those comprise the majority of things that are there in this module. They bought some other things that would monitor workforce to find business, blah, blah, blah. But you can look and say, okay, these are interesting. Now the question is, what are we measuring? And I'll show you that in just a second. These are the number of different steps, process steps that occur in each of the primary modules that we brought down. And what I'm showing here is I've blown up administer workforce to show that in this example, it's likely that recruiting workforce is more complex than managing competencies. And certainly both of those are more complex than planning successions around here. So this is just a very quick way of using some metadata in a new system implementation that's also very, very helpful. Data structure problems have been difficult in the past because we've had all sorts of issues. This is a old look at an old green screen system. And you can see it's an old system at VCU that we used to use in there. And the example is just fantastic here because here is the data model that I made this particular class put together. They reverse engineered the system, analyzing the metadata. And if you look at it, you can see why perhaps this particular file was called the student database master. And the answer was because we had a piece up here that was a table that was literally connected in a parent-child relationship to all the other things that are on this table. This was how this particular system was used. It worked really, really well for the students. And we had an interesting company come and apply for us to try and replace the system. Now we had to get off the mainframe anyway so it was no big deal. But they sent us their metadata for the same information that we were maintaining here. And I hope that with this just brief one-minute explanation here, you understand that the SDBM thing in the upper left-hand corner here has a link to each of these other pieces. One to many in this case. This was oftentimes one to one. You'd have to look at the chart here to be detailed. And when we asked the company that was proposing to replace this system with their own to send us their version of this chart here is what they sent. Now it doesn't take a rocket scientist to figure out this ain't going to work. And truly it never did. It was way too complex. They never could even tell us was this a data model or a process model or anything else. They had no idea what they're doing and they came very close to ending up in jail. Again, that's a drinking story sometime if you want to hear the rest of that. Metadata is so important that IBM and one of its relentless we're going to solve all the world's problems just like they did with Watson came up with something called AD cycle application development cycle. And these are all the various types of metadata that can exist according to IBM. It's a great place if you want to study this and look specifically at metadata. But the real question is not is this metadata but would we obtain value from this data if we include it within the scope of our metadata practices? Let's move on to strategy number two metadata must be the language of data governance. Again, that understanding is critical. Let me take you very briefly through a model of what data is. And I'm going to tell you all just up front that 42 is not Jackie Robinson's jersey number. It is in fact the meaning of life, the universe and everything that will only make sense to you if you happen to read a book called The Hitchhiker's Guide to the Galaxy where the white mice and the dolphins turn out to be running the world with us as the experiment to create a giant supercomputer that runs for 300 centuries and comes back and says the answer to life, the universe and everything is 42. And you're saying, I'm sorry, why am I wasting my time here listening to somebody talk about science fiction? Well, what I've done hopefully here is give you a fact and pair that fact with the meaning. That is the definition of data. I fact paired with a meaning. The number 42 in this case means the meaning of life. So if you've not learned anything else from this particular webinar, you have at least learned that the meaning of life is 42. On the other hand, we don't want to go around pairing every piece of data with all those possible facts and meanings. We only want to do that for useful data. And so the key is, can we find useful data and then distinguish that from our next level up information? And the answer to that turns out to be quite objective. Yes, if somebody requests that data, it becomes information. That is literally the piece that takes us apart from that. We have one more level that we always put on these charts, but we can't go forward here before we say, you can have data without information, but you can't have information without data. So it doesn't make any sense at all to try and manage them separately. Now, the last top of our pyramid here is instead of just what information is requested, how is the information actually used? And that's when it transforms from information into request. And most importantly, this structure here that's bouncing right in front of you is also a metadata structure showing how these things are all related. Let's take it a step further and let's look at it in the context of data governance. And when we look at a organizational data strategy, as I mentioned before, that is what guides your process-oriented activities or tells data governance what the data assets need to do in order to support the organization's strategy. And of course, coming back out from that, we should have some how well is it working information so that we can make any corrections. Remember, our data strategy is always subordinate and only exists to support the way the organization achieves its strategic goals. In Peter's world, data governance actually dictates what IT projects go forward. That's not widespread yet, but it is becoming more so. And the IT projects, of course, support the operations. We'll put in a couple of feedback loops here just to close the whole thing together. But it's really kind of complex. So again, we'll go for a little bit of a simplification. What the data assets do to support strategy needs to be expressed in terms of business goals. If we don't tie them to something specific, again, the organization will look at you as a cost rather than as a value component. And the language of data governance has to be metadata. If you do things not focused on metadata, you will end up with confusion. People not sure why they're at this meeting or what the importance of this topic that we've been arguing about so vociferously for a while. We, of course, take that little strategy and pull it on down into, again, the data stewards as well because they need to be on the same sheet of paper literally. So the way to think about metadata also is that it's a really lot like sheet music. And that if you're going to present a group of musicians playing something, they need to have the same sheet music in order to look at this. Let's take this how does governance and metadata fit together just a little bit further? I like to start this slide out by saying that governance is always based on and founded in and grounded in the IT and the systems development world on this. But we're going to divide this particular piece up into four quadrants here and just say that the domain expertise is on the left and the domain expertise is greater on the right. The roles are more formally defined on the left as opposed to less formally defined on the right. And the things on the top in the bottom of these lines are that the people below the lines are going to encounter data that is governed more directly and the people on top are going to have less chance to encounter governed data but that the people below the line will have more time dedicated to that process and the people on the top less. So let's dive in a little bit further. What are the components? Well, we've got leadership. We've got stewards. We've got participants and we've got sources and again, the only source of coordination with them is metadata. If we draw our line around it and say that for this organization this is what they've decided that their data governance organization should look like we can now start to put some roles in place. The leadership is responsible for obtaining resources and understanding feedback that comes from the organization making decisions and transferring the implementation of those decisions to the stewards. Then the stewards make sure that something happens where people take action. We make some changes. We'll probably have some additional feedback that comes in here some additional ideas and some guidance that all comes throughout. Again, that charts a little bit complex. So I'll give you this version of it just so that you can take it and show it to others. This is what happens and why when we're looking at it and the only possible way to coordinate this is through careful metadata use. So I guess the term is now Jesse will talk about active metadata use that should be focused in on leading techniques. Metadata then yields valuable information about your governance processes. Do we have shared understanding of our goals? We can say yes. Are we an IT focused on similar goals? Yes, are we being cost effective? Well, again, we can go back and look specifically at the numbers here and say that the supply chain area is the area we'll get our most value from. First of all, glossaries and repositories should be to retreat. Excuse me, I'll start over again. Glossaries and repositories should be treated as capabilities not as technology. It's important that people understand the function and the use of these things. When you look at it, most organizations start out and say, well, let's define something. So here's a definition of a bed, a piece of furniture used as a place to sleep or relax, kind of nice. But now let's take an example of it. Our good friend Clive, who passed recently these last couple of weeks here, taught me the importance of the purpose statement instead of the definition because it incorporates motivation. So purpose statement says, why is the organization maintaining this information? And in this case, we maintain the information about it because we're trying to understand what beds are all about and where there are beds existing. How many beds can be in a room? A room can have, one room can have zero or multiple beds in this case. Finally, another piece of important metadata is that all of your models are in a unvalidated or draft, excuse me, draft status before they're validated. So it's important to write that on there and that also becomes metadata as far as your organizations go. I've had lots of organizations kind of get saved because they put draft on their models and when managers said they were out of time, they said, well, we can't be out of time because the model isn't yet validated. And then management had committed to validated models. Again, that's another story we can get into. Let me tell you a story about a really interesting organization that I worked with for a number of years, Nokia. Nokia had something that they called the Nokia Term Bank. And this was a wonderful facility that they had set up so that every employee in the organization, whenever they had a question about something, they would literally turn to their notebooks, their mobile devices and type in this word at the Nokia Term Bank. And if it wasn't in there, it meant they didn't have Nokia-wide agreement on this particular term, but that the organization would then vote very quickly if you're just in a meeting. Should we submit this to the organizational term bank metadata people? And they built this term bank up over many, many years. It was a tremendously valuable piece. It also emphasized the need for culture around all of these terms because most organizations, when they start out to have conversations, the customers are not as knowledgeable of the vendors. And we have to do some things that breach this technology gap that's in here and help organizations actually get smarter about what they do. And one of the ways that you can do this is by trying some metadata activities on your own. So you may or may not be familiar with a company called FTI, Financial Transactions Incorporated. They're one of the first business rules engines. And we took their main data model and reverse engineered it. They didn't think it was very useful, but I'll show you what use we made of it in a particular customer. By the way, this slide right here also shows you the metadata that you need to build your own metadata repository if management hasn't yet decided that it's a good idea for your organization to go forward. You can build this here is the kind of thing that you might build using Microsoft Access or Power BI or something else very similar here. This is a repository that we put together, delivered for a customer here. You can see we've selected a table called ft underscore t underscore count and it's showing me the columns also on each column whether they are primary or foreign keys. Now we can also click around a little bit here and look at this and say here's a table ft underscore t underscore a b d f a real table. And here are the column names. Here are the they're this way. Here are the column details for each column. We can look at it. We can click on that primary key button or foreign key button and find out whether they pop up over there. Primary key shows you what that shows up in the other places shows up where it shows up as a foreign key. Again shows up maybe where it shows up as a column where it's neither a foreign key or whatever. This type of repository coupled with an executive who gains value from using it a critical component allows your organization now to find out more about your data assets Do we have these specific assets? Yes or no. Is this data item used somewhere else in this case? We can say definitively absolutely not in this instance. What will it cost to acquire more assets? Can these be shared securely? Again, not easily because of their security classification around this. And now we'll move to our fourth component on this which is the idea as a strategy do not start from scratch. We have so many things that we have built up as a community over the years the vendors are all tremendously helpful about this. If you start from a blank sheet of paper you'll spend a lot more time unnecessarily. So let's dive in and talk about architecture just very briefly. First of all, architecture is about describing things at a very high level of abstraction as in things or perhaps if you're looking at a house perhaps doors and windows. It also talks about the function of those things. It doesn't do a whole lot of good to have certain doors that lead nowhere. There's actually a very famous house called the Winchester Mystery House that has doors leading to nowhere on this. So what do those things do? What do they do and how do they interact? Now this is a silly piece obviously Dr. Seuss back and forth but if you can answer those three questions you've actually got a very good description of your architecture. When we look at it from a data perspective what we're seeing is as you might expect details are organized into larger components. The details mean that we're dealing with things that are intricate and we need to make sure that that intricacy is preserved because that intricacy is where the business rules are stored and how organizations differentiate themselves in the marketplace. The larger components then are organized into models and the models then indicate dependencies. You can't have one of these unless you have at least one of these other things whatever the things are that we are specifying as we go forward and finally we organize those models into an enterprise architecture comprised of these various architectural components and the last adjective that I'll add on here is purposfulness and architecture is organized for a particular reason. In a retail situation it's about selling stuff and it services organization it's about providing services. Now we don't tend to see these things in terms of the details we tend to see them as attributes we tend to see them as dependencies and organized into models and we get to the architecture and the real challenge is I showed you an example of the attributes up here thing one and the definitions here the dependencies that are here in this data model although I haven't labeled anything and remember a data model is always incomplete if you don't have the definitions of those pieces there which again is a really good reason why you should maintain that centrally so that when somebody reuses a term across your data modeling environment it is the same term that again everybody is on the same sheet of paper but why aren't there any examples of enterprise architecture models because they're really big and they're really complex and we have to rely on patterns. The patterns are something that allow us to understand things even though we may not have visited them before probably you all have done this as well gosh that's really ironic I'm popping up a piece of Perth which is of course where Clive's hometown was on this I'm not sure where that came from or how it happened but again rest in peace Clive on that one if you go to the restroom in any of these buildings in Perth or any large building around the world you find out that the restrooms tend to be in the same place on each floor why is that the case? Well the process of moving waste out of the building is one that relies on physical tubes and gravity so if we're going to put physical tubes together and gravity to take stuff out of buildings that we don't want in buildings it probably makes sense to make sure that architecture is as simple as possible metadata will allow you to do that and you should hopefully see the same thing that electrical wiring, HVAC, floor plans all fall into the same categories in fact when we look at these design patterns we can start to see that they are transferable from instance to instance for example a big house is pretty much the same as a small house except that it may have updated equipment or perhaps multiple instances of the equipment again these are very good pattern reuses and there are a bunch of these patterns out there so you will notice here that I've got a couple of authors up again on Silverstone David, Marco, David Hay got some books around this let's talk about and I'll just give you a quick sense of this Len is a wonderful person who I've worked with over the years Paul as well terrific colleagues and Len he kind of gets annoyed with me because I will only call him up when somebody calls me and says have you ever heard of a data model that we can use for a pharmacy that's in a healthcare situation as opposed to a pharmacy that's in a retail situation and I can call up Len Silverstone and he'll tell me book 2 page 17 and literally if you get these books they actually have CDs in the back that have the metadata associated with them so that you can directly implement these things it's a wonderful set of resources do not start from scratch on this we look around here again at the types of metadata that can go into so this if you're trying to reverse engineer some of your systems and figure out what those relationships are this is a great relationship for it I'll also attach to the slides when I send this out a copy of the IBM systems journal article that describes this in greater detail as well if you're interested in learning more about that so all of these pieces allow us to get to a place where we can now start to talk about metadata as a capability rather than as a thing it's even more difficult when we look at that thing versus capability difference and the data is semi-structured or as most unscrupulous vendors tell you unstructured we can take your unstructured data and turn it into structured data well anybody says that to you I would hand that individual a glass of water and say please turn this into wine as well because it's equally miraculous the definition of unstructured of course is that there is no structure but metadata does describe both structured and semi-structured information a better way to describe this is tabular and non-tabular data because it doesn't appear that you're making anything occur miraculously which you are not you're adding structure to something that already has structure and that structure that you add of course is metadata metadata for semi-structure responds to a whole series of different requirements and you can see out here there are lots and lots of them I'm going to direct your attention specifically to the structural metadata and if you haven't heard of something called Dublin Core it is something that you absolutely should look up the again just a sad story I'm the typical American geographically challenged and I'm in Belfast Northern Ireland and I'm hinting because I hadn't yet been to Dublin and I'm trying to get somebody to invite me to come down to talk so I can see that wonderful city that I've heard so much and somebody finally put up their hand and said well Dr. Akin the Dublin that you're looking for is actually in Ohio rather than in the south of the Republic of Ireland but nevertheless we still get you the invite to come on down here so metadata for semi-structured is even more critical than because when people say that they are structuring your unstructured data they're adding metadata to that set so let's look at these again pieces here these patterns yield valuable comparisons and starting points almost every type of thing in the entire world has been modeled in one form or another and if you're having trouble finding it we actually have banks that share metadata patterns back and forth because that does not necessarily represent collusion we have all sorts of organizations and industries that are trying to get in there if you work in the petroleum industry the group PPDM has done a wonderful job producing a general map of what the oil and gas industry metadata looks like again it's not to say that it is going to be the thing that you have but you do have something so we can ask questions do we have to create a pharmacy building system from scratch and the answer is no I go out and spend a little bit of money and buy one of Len's books import that model directly into my case tool crank it out so that we can create some GDL and we are actually up and running with a prototype in a very very short amount of time on this will the proposed software fit it is now considered best practices to ask for vendors to provide you with a logical model of their data inside their systems as a condition for bidding on a project organizations that don't provide the models don't get to bid on the project and you can use that to help weed out poorly run metadata practices in there as well do industry best practices exist yes I just described one to you which is the idea that organizations should be getting this in this is actually gotten as far as the federal government this best practices is now implemented in law and I'll talk to you about that in just a second has anybody promised a model excuse me published a model implementing I don't know GPPR as that should be GDPR error there I got to catch that in there and then maybe not yet okay we haven't gone that far so those are our four strategies that we're looking at let's talk about some benefits in particular and there was one of our presidents a little while ago one that I happen to like but he made a statement here that was a little bit unnerving here so they according to the electronic future frontier foundation may or may not know about certain things you can see they you know that they rang a phone or you called from a particular location but the president said you don't have to worry it's just metadata well no metadata is clearly quite valuable and we need the public to become more aware of this all the way around I'm going to show you now a little bit of an animation describing how one company decided to make a value proposition for itself companies called in Vera it had a really interesting typical story around this excuse me but excuse me the proposition was that they looked and said hey how do companies exchange information and that your suppliers these are all the bits of information we go back and forth as we're conducting a transaction between these two and of course these transactions can be conducted by phone fax email electronic data entered change xml whatever it is that we're doing but you can see of course that that becomes pretty complex pretty quick and when you compare not just a and b back and forth but a to c a to d and all the rest of it it gets very complex very quickly so in vera's value proposition was to eliminate all of this complexity and instead put itself into the center of this as a clearinghouse they created the situation where you only needed to make one type of connection to company b and every connection the company a had with in vera was automatically accessible to company b now not from a security perspective but from a technology integration metadata perspective on this and of course this looks like a pretty good value proposition there's lots of companies that are trying this as well you'll see them in a number of different offerings that are popping out as organizations get clearer on this and hopefully that gives you a little bit of the value but of course what was really important here is that in vera said i'm not just simplifying what you're doing as you're going back and forth instead what we're talking about is that this metadata allows us to build up and start doing planning and all sorts of other things that enable us to really significantly add value to the organization i mentioned before the federal law that had recently been signed into effect again 114 19 here's the law read it quick as it goes by i'm just kidding that's just a piece out there but what's happened is that all federal data is now open by default they are no longer allowed to put it in a pdf and say good luck getting access to the data they have to provide the metadata it also acquires agencies that work for the federal government to appoint non-political chief data offices that are completely separate from the cio staff and to use open data and open models when doing performance evaluation and the interesting part about this law is that if somebody violates this law the penalties are higher than HIPAA and that's a really big big lift so metadata benefits increase the value of your strategic information telling you what whether what you have in your warehouse is valuable or not again I assisted warehouse whatever your data collection is it can be incredibly effective about reducing training costs and orienting people to the systems it helps data oriented research time by assisting the business analysts doing this providing documentation improving communication between the business users and the IT professionals helping organizations leverage decreasing your time to market reducing risk of project failure and identifying and reducing redundant data and processes all the way around so we spent a better part of an hour here talking through some very very rapid fire topics around metadata we've defined metadata in the context of data management had to define data management of course first and say what do we mean by using the data as metadata and I hope you understand now it is a use of existing data as opposed to a type of data I've given you a specific example using the music app from Apple that you can demo to any manager anywhere in five or 10 minutes to show them how you can use this I've given you four specific strategies treat metadata as a gerund not as a noun enforce metadata to be the language of data governance make sure that you treat glossary and repositories not as technology but as capabilities I'm sure Jesse will reinforce that point when he gets back on here in just a second and finally our fourth strategy is build your metadata from the existing building blocks out there there are lots and lots of resources around all of that we've talked a little bit about the benefits we're getting ready to do the takeaways at the top of the hour we will switch over to our Q&A so takeaways data about data is a good starting place but make sure that people do get that second piece it unlocks the value and therefore requires attention met data is a lot less about what and more about how metadata must be the language of data governance and metadata defines the essence of almost all business challenges and that your real value perspective question is should we include this data item within the scope of our metadata practices so we're getting right around at the top of the hour here I'm going to leave you all with some references and recommended reading and I also mentioned that I'm going to include the IBM systems journal article in the package that comes out to you from Shannon when she sends you all of my slides and Jesse's slides around this and we are now up to the point I need to remind you that we've got a conference in person coming up DGIQ we will be meeting for the first time in almost two years in San Diego very early in December and that our next webinar coming up here are some prerequisites where we call exercising the seven deadly data sins on that got a couple book prices in here again the data literacy book is out there on Amazon if you look around you'll find a 25% off coupon on that and we are back to Shannon right at the top of the hour Peter thank you so much for another fantastic presentation you have questions for Peter or for Jesse feel free to submit them in the Q&A portion of your screen and again just a reminder I will be sending a follow-up email to all registrants by end of day Thursday with links to the slides the recording as well as anything else requested throughout as Peter mentioned so diving in here so is the quote unquote which instrument created the data or question not relevant for defining metadata is which question not relevant for defining metadata absolutely valid question so what they're doing is they're looking at the six interrogatives that we've talked about and they are who, what, where, when, why, and how I'll pop that slide up here as soon as I can jump to it there we go on that and the question is it does which come into there well probably is a subset of this if you look at journalism as a career field and the gentleman that wrote the forward for the new data literacy book was the editor of Time Magazine for many years and then the assistant secretary for misinformation I like to call it that on here as an editor if you look at a subject and you've answered the questions who, how, where, when, why, and what which probably comes in there as well unless you're spelling it W-I-T-C-H Jesse any thoughts on that? Maybe it's just a Halloween joke Yeah I mean I guess I'm going right back to the root of the question and it's is this and the answer is always yes any metadata is good metadata it just depends on how valuable it is so and as you said Peter which oftentimes is a subset of one of these others but you know it depends on where you're asking the question so you know some things that come to mind is is that sometimes the same piece of metadata is created from different places and knowing where your metadata came from which sometimes I think the which is targeting that can help you weight your metadata or determine the value of your metadata or actually have a trust of your metadata when you're able to actually track the the which of that dimension of the metadata Jesse for your products it's fairly easy if a business user says I'm not going to use it unless you put the which question in there that's something that you guys can handle correct? Yeah with the modeling platform of the Semantic Web built right into the knowledge graph you have probably the most powerful modeling capabilities available to you so if you needed to deactivate the how and activate or create a which absolutely would be able to do that for any flavor or type of asset that we were talking about if you guys think about it the guys that make metadata technologies like Jesse have to be operating at not just a metadata level but a metadata level the answer he just gave there of course indicated exactly that they thought about that instead of somebody wants it we'll be able to program it in there because of course customers are always right right Jesse? Yes always All right Shannon So can you tell what's the difference between master data management and metadata management? Sure do you want to try that one first Jesse? I'm going to let you take that Peter because I want to hear what you have to say first I'm going to play off of you so master data management is the idea that some things in your organization that create that focus on the nouns on that the person places or things that the organization does and that master data management is part of that data confusion that we talked about before I like to call all of those challenges that organizations have the organizational data debt that they face because it's not easy to untangle yourself from the existing mess so that you can take advantage of some of these wonderful technologies that we've been talking about the idea of course within looking around and trying to figure out what is master data and what is not is always a hard problem so I don't like to tell organizations to say get a definition of master data but instead to say master data is really a strategy and the strategy is the idea that we're going to implement a bit of leverage in there such that master data again the data around Peter is going to be influenced across all of the data that Peter has in the system so again I'm a big amazon.com customer even though I pick on them from time to time but I live way way out in the country and so I get lots and lots of amazon orders instead of taking the gasoline to go into the city I'm not sure it's really cost effective to get a pack of batteries delivered by a big UPS truck but you know that's at least seems to be the sense of it just quickly Amazon is working specifically with metadata to try and group those deliveries into fewer pieces so if you're a little bit green like me and you want to try to do that you can say hey my amazon delivery can come on Friday you can bring me all the things on Friday now the idea of course is that math meta excuse me master data does in some ways function as metadata so there could be confusion between the two we did a show and I did a presentation on meta master data one or two times ago if you go to the YouTube channel you'll see you can look it up pretty pretty easily on this but think about that in those meta master data is kind of controlling that other data it's again saying these things are Peter's bits that he's bought from Amazon and by the way I have an author's page at Amazon too so I have a couple different relationships with Amazon but if you think about master data there's something even more controlling than that and that's reference data and we always talk about them in the same breath because they work in exactly the same ways and I'm going to use a an example here that's kind of been in the news at least a lot of school board hearings around it at least in my part of the world out here so if we're going to say that gender is only going to have two valid values in there that's reference data and that means that if somebody changes from one value to another value they cannot so that reference data actually lacks as a control on master data and again provides the same leverage around this so reference and master data are both in some ways some types of subsets of metadata and function somewhat alike them but they are very different in terms of the strategies on there and the metadata excuse me the master data stuff is something that we've had a lot of organizations try to build technology around and hasn't worked out quite as well as it should but nevertheless does it make sense to get a hold of the things the list of vendors that you have the list of customers that you have the list of products that you have and have them managed out of one place generally yes we didn't used to call it master data but we've now moved into because just as Jesse was saying before some technologies have provided us some really nice technologies that we can take advantage of and really explain to a much greater degree on that to leave you anything Jesse well I like how you started it's all part of the confusion and you know I think it's almost fair to say we could combine any one of the words data metadata master data reference data and use that old saying right and one man's data is another man's metadata you can say that between any one of those combinations one person's reference data is another's master and master is a meta and it's just part of the confusion and sometimes it may be because of a piece of software that almost has established a culture but it's it it will do you well to clear out some of that confusion and just leverage any of this kind of stuff that you can for a value whether somebody's calling it master or reference or meta or data or whatnot isn't necessarily as important as the value that you're getting out of your content and I would add one more piece that I just thought of which is that of course if you're trying to implement master data it's virtually impossible to do if you don't have good control of your metadata practices so uh certainly run uh Shannon I think Jesse just invented a new game for us at EDW what do you think I love it yeah we could do word combinations maybe we could do it with twister right all kinds of games we got I got a master data green over here so let's be right elbow right what is your opinion of business process models as metadata Oh absolutely once again the existence of a business process architecture component is a wonderful piece of metadata that you can say yes we have some even organizations that have some is a wonderful thing I can tell you that of the thousands of organizations that I have literally worked with over the past 40 years one in 10 tries to do this and that's from a process modeling perspective it's sad because again just following up on the last question if you try to implement master data without understanding your process architecture of your organization you will not succeed and I have seen so many organizations neglect that particular step they think it's just something they can put into the system and it'll run no it has to be engineered you have to purposely try to obtain that leverage that we're talking about in that tell me over to you bring it to life let it affect work flows processes policies issues requirements bring it to life let it be part of your model let it become part of your metadata and then drive drive other aspects off of it for sure and of course your product handles all of that type of work integrated seamlessly correct yeah because of the modeling even back to the original question you know without coding without having to do some sort of like deep business logic it's part of the on apology modeling we call it ontology just a fancy word for model schema if you will but with the graph nature Peter you we have that flexibility and the tooling built right in to just drive the model and let the model drive the rest of the system even the apis and again you can see here on this particular model we have included links for process components that go into here of course if you were working in a case tool environment you would have the process and the data pieces interrelated via a cred matrix again something is not taught in schools anymore but is certainly a valuable component sounds like you've had experience with those as well Jesse yes yeah the you know that's where sometimes the experience matters and definitely start from learning from others and and what's available absolutely great question thank you make others an organization understand and see the value in what we do about metadata I love this question as you know Peter we get variations of this question all the time and in our webinars and the value is a portion that we've neglected to illustrate to our students as they're learning these topics so they think just doing them for the sake of doing them is correct and generally it is but if you're in an organization you're almost always in some sort of constrained resource perspective so the first thing I would do is to go back and actually run a little demo with them here where you take some music find a CD find a CD player put it in there and say this would really suck if I was trying to manage my music off of the screen it just doesn't do anything if I'm trying to find one particular song am I supposed to remember that it's track 18 or the longest track by the way and here you can also click on these things and it'll sort them in order so you know you could sort them by the longest time but you can see the value of metadata is that when we've taken that input and now we don't have to go back and everybody has to link to all of these various bits and pieces instead we can now say oh just stick the CD into the player and all of a sudden all of the metadata shows up along with it if that's not business enough for them there are lots and lots of examples in a little book that I put together called monetizing data management there are 17 examples of the use of metadata and value in particular it's just absolutely critical to make sure that you do include that value component in there because people look at this and if they don't understand it they will perceive you as a cost and you won't get anywhere Jesse how do you guys help customers realize the value out of that if they come knocking with a specific use case it's usually identifying what is the value that the others are going to see right so whenever someone comes knocking and they say look we've got this kind of data this is our scenario our goal is this but are other people going to see it that's what we have to bring to light and it's surprising how quickly people do see it but you have to put something like you know with their data that's usually a key part right not not just mock data and if it is mock data it's got to be something like this iTunes example something that everyone gets and it's very clear but if you can do it with their data in their scenario that's what we'll usually you know get someone over the hill and actually and you know to be able to see it and little things matter you know showing the search ability the reuse ability the reference ability the little things tend to matter and if you can do a practical demo like Peter said but with their content it's even that much better and if they need to see the witch show them the witch right sometimes executives are focused on these things and that's the answer so yes that's the way we go thanks again great question so is it necessary that we have software solution to maintain metadata well one of the things that's fun is to introduce young students to the idea of managing a many to many relationship and of course they quickly figure out that if you have more than a hundred things connected back and forth to a hundred other things it's almost impossible to measure with any code or anything else that goes into this so the idea of course is to say within the constructs of what you're trying to do how should we best do it and you might put something together and I showed you an example of one that's pretty easy to put together again I've literally given you everything you need in this example here including the underlying data model to build this little repository out of access you can build it it'll take you you know a couple days to knock together a couple more days to populate and you'll start to work with it of course we all know that Microsoft Access was never designed to be a real production database in there and by the way here's another little interesting piece of metadata I had an organization that did an inventory of their Microsoft Access databases that were running in production and found the number was upwards of 400,000 and then they decided they wanted to get rid of each of them which was probably a really good idea from a governance perspective so it took them almost 10 years but they got rid of every single one of the 400 plus thousand Microsoft Access production databases now again I could you could produce one of these I would suggest that the the real reason for producing something like this is to help sell the concept conceptually and in that sense you really don't necessarily need to make a functioning prototype on this if you're just trying to sell the bosses boss on this just showing them that they can go in and get access to it and having business people say yes that piece of technology will be very useful to my business operations is usually enough to make them sign the check and get started on the process again remember in all cases you're not adding new things you're buying some technology that's going to help you to reduce your data debt and that data debt comes in a number of forms but one of them is usually insufficient attention to metadata management and metadata practices in there so yes you could build them again you you guys should all be able to build this thing from the information that I've given you here but I'm suggesting that the idea of using that should be done as a way of trying to create a model such that you can show your people who produce the checks out of this on the other end of this that this will be a valuable piece in order to do this you can I have seen organizations run on on homegrown metadata management solutions the real problem with this is that this runs on the old relational database technology and it's not able to incorporate the knowledge graph components that Jesse has so you could sort of get started and build one of these things but you'll never get to anything like Jesse can show you if you give him a few minutes to really show you a demo about this and see how the knowledge graph works it's just a fundamentally different approach to managing this and Jesse maybe you want to elaborate on that point I just want to say yes yeah so joking so you know do you need software at some point yes but don't let the software just make the debt unmanageable so get what is right for the job for what you're trying to accomplish don't buy today because you needed something today buy tomorrow or next week after you've evaluated the real inputs to the situation that you have today so do your research know what you're after and you know and then yes as as Peter said there are you know there are very flexible solutions like knowledge graphs available to you that can grow with you and it without forcing you into a rigid approach like the like something that's backed by a relational model so we believe in the knowledge graph approach the semantic standards and I threw in that comment earlier I think during my talk the non-black box you wanted to be able to know what's going on you want to be able to have the control and do the things like add the witch if you need to and Jesse can you with this example or anything that I've shown here give a contrasting way of how this would be more flexible using the knowledge graph because I have not built that example yet maybe that'd be something we ought to collaborate on to yeah immediately being able to inherit characteristics so an ontology modeling approach just in the model alone representing this would allow me to simplify this down a lot and inherit common characteristics common constraints on those characteristics a common understanding by the machine of those characteristics down through the model so it actually even though the word ontology sometimes is terrifying to people it's actually a much simpler cleaner approach so I can inherit a lot of characteristics down through this and capture these details and then drive validation mechanisms drive the capture of the metadata and then we could start layering things like inferencing and rules over top of it to make it even that much smarter and that much more connected but without you having to be involved with every step exactly again try to do that in this type of in a model here this is your basics this is where you start this is sort of the one on one thing but where organizations are truly obtaining value now is to convert to the knowledge graphs and working it from that direction yeah thanks jessie great great answer there so do you know and we could probably do a whole webinar I think we've done a whole webinar peter on this any best practice metadata frameworks well so again this piece right here if you're going to build the traditional old repository for storing metadata information about your data this is it I mean this is really the core that you're looking at right here again I'll put that paper to blank and there's a little bit easier to read version that looks a little bit like this but the best practices really evolve around organizational use of data so it's not to say that again this is a terrible statement to make but it's a true story so I feel obligated to let you guys know one of the places where my retirement you can tell I'm a little closer to retirement perhaps than some of the rest of you is stored has owned a metadata repository for the past 15 years they've never broken the shrink wrap on the product I've been brought in by the vendor several times I've sat with these folks and said please I have lots of my retirement with you I think this would be a really good thing for you to do unfortunately their attitude is we get surveyed once a year by the banking regulators and they want to know if we've purchased a metadata model the answer is yes we have purchased a metadata model and that's it they've never done anything with it it just drives me crazy so that's clearly not best practices in there and the starting point on this is really not that hard it's finding out what are your knowledge workers spending time doing some of your knowledge workers will be doing things that are related to IT some of them will not the example that I gave earlier on were the people that are doing the wiring or the networks in your organization find those folks and get them to tell what would happen to the organization if somebody stopped keeping people from accessing your networks and learning about various dropping points and all the rest of the things that go into that lots and lots of bits around that but as far as codifying a best practice the closest that we've gotten to are things like the Denmark the DKAM and some instances there's a couple of other reference publications one of them is the controlled unclassified information I forget what the acronym for that is that's been put out by the National Archives of all people and once again the back to the Dublin core if you are storing information on documents you don't start with the Dublin core shame on you because they've got literally hundreds of years of research into that and they have come up with the best way to do it which you might imagine is implemented in most libraries worldwide Jesse am I missing anything on the best practices score Recently what I've been noticing some people mean whenever they come up and say what do you have a framework they're looking for things like even just codifying what are the names of the stewards business steward data steward application steward subject matter expert the racy matrix those kinds of things so putting roles and parties in place documenting their policies and procedures making their 500 page document referenceable you know you know being able to say because 12.3.34 says so being able to get a framework started what we call with top rate edge our knowledge graph we call that executive governance and it's what we call top down approach putting the framework in place with a lot of people when I hear that language they're actually after the how do I get started and they're interested in what kind of roles should they have what kind of policies what kind of procedures what kind of workflows do they have an issue tracker and some of it's a single solution sometimes it's a multi software solution but a lot of times when I'm hearing that recently it's a how do I get started I need a framework and they're not even ready to start talking about metadata they're talking about it from a distance they're stiff arming it the metadata saying I'm not ready for it yet I need to start with a framework and it's funny because it's it's really just the who what where how when and why is and which is of the process that you need to start capturing who are the who's who are the key players what are the yeah and then you've got a framework in place if you start using that metadata language just in a pre metadata kind of scenario we're not quite ready yet is what you hear a lot but actually you'll find that most organizations are actually doing it to some degree I've never failed to walk into an organization and find somebody over in the corner who says that don't tell it that I've got the SQL server instance under my desk but I've actually mapped out the entire marketing area and can track every all the data we get from MailChimp comes into here and we do you know analytics that make MailChimp look like a very primitive organization and you know my answer is always twofold one IT always knows about your instance and they keep it around because they understand it's valuable they'd love for you to come in and integrate with the rest of the thing but that's not a number one priority right now which is why you're allowed to still be doing this so you'll find somebody in your organization that's doing something wonderful with data in that sense and it's always almost always an application of metadata so we are closing to Halloween here and I urge organizations to have a data horror meeting just you know say hey come here tell a story and again make sure you're not broadcasting this live over Facebook and telling the rest of the world but but you know look at your your specific problems and figure out what size data debt that you have and whether metadata solutions are going to be in your future or not and in almost all cases it's going to be there I love it there's so many puns in there for Halloween data that I just they're out already saying leave the comedy to the professionals right well I think we have time for one more question here so you indicated that 80% of the organization data is wrought given that case does that mean we eliminate the right or do we stratify it as a lower priority lower priority for starters on that the elimination of 80% of your data if you think about it would result in one fifth the cost of your of your infrastructure that's storing the data whether it's cloud-based or whatever and remember clouds only scales linearly so you don't get the economies of scale that you used to when you had the wonderful server farms that you'd made an investment in and so that's a a new reality that organizations have to think about but the the other part of this is to say within where organizations are really trying to gain value it's just exactly like the marketing everybody knows that half of all advertising dollars are wasted but nobody knows which half and that's the same problem with this as well if you don't have good metadata practices you'll have no idea whether this is the golden source as it would be in the master data management set or the you know source that was most recently accessed by the customer or whatever the particular pieces that you're looking for or metadata is integral to that process just any thoughts on that no and rot can come in a rot can come in a lot of different forms and it may be it may do you well to even add if you've got a flexible modeling capability especially like a knowledge graph approach you can you can categorize and and determine what kinds of characteristics what of the house who what's wears and when's are actually identifying possible rot so any metadata is good metadata even if it's temporarily rot because you may learn something from it so it I don't I don't think Peter was saying throw it away but identify it look at it and then determine how to optimize but was is that the case Peter absolutely yes yeah yeah no Peter told me to come in here and throw away one at every fifth data record so we're just going to go I'm sorry four out of five data records we'll have fun with that one yeah Shannon do we have coverage on insurance oh here well thank you both for this great presentation is really it's really been great and thanks to top quadrant for sponsoring today's webinar and helping to make these webinars happen we really appreciate it and just to note again a reminder I will send a follow-up email to everybody by end of day Thursday with links to the slides and links to the recording of this webinar along with the additional information so and thanks all of our attendees for being so engaged in everything we do really appreciate it and we hope to see you next month this Peter has the schedule up there talk about there and hope to see all of you in December or March depending on which which events you're going to come to but we are definitely back in the saddle if you will I love it it's exciting all right Jesse thank you so much we really appreciate you Peter today yes thank you and Shannon thank you both thanks so much everyone have a great day cheers everybody bye bye