 Hello and welcome. My name is Shannon Kemp. I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity Webinar, Essential Metadata Strategies. It is the latest installment in the monthly webinar series called Data Ed Online with Dr. Peter Aiken. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweak, we encourage you to share your questions via Twitter using hashtag DataEd. And if you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom right-hand corner for that feature. And to answer the most commonly asked questions, as always, we will send a follow-up email to our registrants within two days containing links to the slides. And yes, we are recording, and we'll likewise send a link to the recording of the session, as well as any information requested throughout the webinar. Now let me introduce to you our guest host for today, Tony Shaw, founder and CEO of Data Diversity. We're excited to have him join us. Tony, hello and welcome. Thanks, Shannon. Good morning, Peter, or good afternoon wherever you are in the world, everybody. I'm in Los Angeles, so still late morning for me. I am guest hosting today in all transparency because we have a big conference next week, our EDW virtual conference. And Shannon thought it would be a good idea if I got some practice in before we do that. So hopefully I'm not too rusty this morning. It's certainly a pleasure to see a lot of folks on the attendee list today who I've met in person, and thank you very much for everybody for joining us. So Peter, welcome back. For those of you who may not have heard Peter before, he's an internationally recognized data management thought leader. And if you've been to any Data Diversity conference really in the past 10 years, you've probably had the opportunity to hear and meet him. Peter is a very popular speaker. He's also a widely read author. His most recent work is around the themes of data strategy, data literacy and the role of the chief data officer. Peter is a former president of Dama International. He is an associate director of the International Society of Chief Data Officers. He's also an associate professor at Virginia Commonwealth University. And Peter, we've not had the opportunity to talk too much about your most recent venture, but Peter is the founder of a company called Anything Awesome, which indeed sounds like something awesome. Thank you, Tony. I was just setting you up there for that. All right. Peter, it's over to you. Thank you very much. Thank you, Tony. And actually, our association goes back to the year 1999. If anybody else can remember back that far, but I can remember, it was the year of the Columbine students and that occurred at the Atlanta Conference for UDW. So it's a very long 21 year association for us here. It's been my pleasure the entire time. So thank you, everybody, for joining us. A topic today is essential metadata strategies. And obviously, that will be the focus. What we're going to do specifically is take a look at defining metadata in the context of overall data management and business activities. So of course, to do so, we have to define data management. And then talk about what we mean by using data as metadata and why this is critically important. And the shortness of that sentence, I think, says it all. When you're managing metadata, it is data. Therefore, the same good data management practices that you already applied to your data can also be successfully applied to your metadata practices. And this is critical because you don't have to learn anything new. I'll give you a specific teachable example using an old software package called iTunes, but it's readily transferable to all of your devices that you can look at and use it to teach others. So the three strategies are, first of all, to recognize that metadata is mostly treated as a noun. And that's not really the way it should be treated. It's more of a gerund. That's a verb that's being used as an adjective if I recall. Metadata is a use of data, not a type of data. And this is challenging because everybody, as soon as they discover metadata, run around and point to things like three-year-olds saying, is that metadata? Is that metadata? Is that metadata? And the answer is, they can all be metadata. But what we're really talking about here is the valuation portion of it. When it becomes valuable enough for you to use as a use of data, then it becomes metadata for your organization. Strategy number two, enforce metadata to be the language of data governance. And I'll show you a number of examples all the way through how metadata is key to making data governance effective in that area. Number three, treat glossaries slash repositories as capabilities, not as a technology. The cyclical approaches that we'll talk about do not start with technologies. Technologies are usually the last thing that you buy. And fourth strategy is build from metadata building blocks. There's no need almost ever to encounter a blank slate. This has been done before there are models out there that we can access. I'll talk a little bit about these many, many types of resources that we look at. We'll finish up with some benefits and applications, sources, types of things like that. And then of course the part that's really the fun part on all this is a Q&A where we can get engaged on some of these topics. So let's jump right on. First of all, interestingly enough, people use the word incorrectly. And I'll update, I think, from Dave Eddy, if I recall, many years ago, but when two words are pasted together to form a combined concept initially, we put a hyphen between them. So we started out with metadata. Then we went to meta hyphen data. The hyphen's been lost. And we can now make the argument absolutely that the word is metadata. It's even wrong in our dim box for those of you that don't hope that means just hold tight. But there was at one time an individual who had patented the word metadata and was keeping everybody else from using it, but it's not being enforced at this point. So just our term metadata, no hyphen, no space. The next question that you might have is, if we're going to invest in it, how do we figure out how to do this? You've got to have IT dedicate resources. And one of the fun things about metadata is that IT is already dedicating resources to metadata management. I'll show you specifically where that is. And to make this important for everybody, not just the IT folks, you have to understand this important role that it plays in here and what technologies can be involved. But you can really start out for a very, very low entry cost on this. And the most challenging aspect of all of this is that we don't really teach people well about data, much less metadata. So our challenge with data is like the blind people in the elephant, you've all seen a variant of this diagram in one form, which just says that they know it from one perspective, but don't realize there's more out there to the picture. And unfortunately, data management is exactly the same way we've defined for years and years, data management as being what happens between a source of data and a use of data. And while that's actually technically true, it's not helpful for us. So we could try to make these boundaries bigger, but they break pretty quickly. And the other part of this is that between sources and uses doesn't really capture data at its most valuable, which is when it is reused. And that requires a different form of architecture. So let's just take a quick step back into what it looks like to define data management. What you're seeing here is a number of different pieces that people tend to approach. And this is the typical way. You'll see these sort of entry paths that are written, if you turn your head to the left, you'll be able to read them correctly there. A data scientist, for example, typically finds out that data science is the only thing that's out there in the data world and none of these other perspectives are available to them, which is an unfortunate situation. Again, the uses on the other end of it, the data scientists, the data exploitation people, as we call them, are really into how that stuff is applied and the others are into how it's prepared so it can be applied in here. But again, most of these techniques do not take into account formal data reuse management. And that's critical because governance will allow us to do things in these areas that are just beyond comprehension. So data, while it's understood in these bits and bytes, just like the blind people in the elephant, it's really not well understood from a holistic perspective. So let's take a step back and look at the word meta just for starting. Meta means beyond transcending, more comprehensive at a higher state of development. And that's a really good definition here. Now, unfortunately, I have to explain to the young people these are the way we used to find knowledge in the old days. There was a card catalog and card catalogs where you had to go to the library and that's where books were. And these wonderful people at the library would keep track of things so that you could search by subject, by catalog. This was the most advanced form of data management that we typically had in these days. And of course, you hopefully are all seeing that this is data about data. And that's really where most people's understanding of metadata starts. It's a great place to start. And I mentioned that it's highly likely that your IT organization is already doing metadata management. They may or may not call it metadata management, but I guarantee you that somebody in your IT group that's looking at the networks is responsible for knowing all of the devices that are permitted to log on to your network, the locations of the access permits, the responsibilities belonging to a named individual. There's much more other metadata involved in network management. If you have a 30-person network group, I find that generally 10% are highly involved in maintaining organizational data for the network group so that they can all access what they consider to be high-quality and reliable data. If you try to take that away from them, they'll tell you that they can't keep your network secure. So it's quite obvious that what they're doing is critically important. Again, great place to go look, talk to, and perhaps some books and lessons from that perspective. A wonderful book on information architecture that I recommend a lot is Abbey Covert's 2004 book How to Make Sense of Any Mass. And she makes in the book the point, while we can arrange things with the intent to communicate information, we can't make the information our users do that for. So what we have to do is make sure we can make it as easy for them as possible. And she points out in the book that there were a lot of examples of metadata use prior to the information age, just taking the simple book. Imagine if I handed you, instead of a well-numbered, alphabetically ordered index table of contents bound with indices, maps, and diagrams all set in the book and instead handed you a spineless set of pages that were unnumbered. The structure would fall apart very quickly in this context here. So metadata allows you to understand valuable information about your data assets. Do we have these or not? We can answer yes or no or I don't know, but we can get definite answers. What is the quality? Well, perhaps suitable or not suitable depending on what we're attempting to do. Is there a cost associated with these things? Yes, well-managed metadata can yield very, very specific costs that can be extremely helpful for planning and alternative evaluation. Can these data assets be provided more granularly? Might be a question that would be asked of the metadata. The answer would be perhaps not easily. I mentioned we spelled it wrong in the DIMBOK. This is the DIMBOK version two from DIMA International. If you're seeing it for the first time, we'll get you some mediation around that on this. Metadata is one of the 11 key area practice areas that are involved in this. And just a little diagram here. A lot of this material for you, if you haven't been on these webinars before, is for reference material, which I'm not gonna read every thing on this slide here. But what you can see is that they're grouped into activities involved, inputs, activities, outputs with some tools. Participants very well-organized metadata about the term metadata, right? Gets that way very, very quickly. I said I was gonna give you an example dealing with iTunes. iTunes is an old program. It's now called Music on the Macintosh platforms and iOS. And there's an equivalent for it on the Android platforms as well. And in the old days, if I was inserting a CD into my Macintosh computer as they could play things, the Macintosh would be able to tell me some information here. For example, it was able to look at the metadata that comes from the CD itself. And when the CD is purchased, it says I've got the number of tracks, 25. I can determine the length of each track. That's a very useful set of information, is it? So what would happen is, if you brought a slow internet connection, you might see this most of the time. It actually went very quickly. When you connected something called a media database that would go through and take the fingerprint of the CD and match it against the fingerprints of the metadata that they had there and copy this metadata down to you, it sure would be a pain to type all of this information here. And now we can enjoy this wonderful CD that we have. It brings with it the CD name, the artist, the track name, the genre, the artwork. Sometimes occasionally you'll see two master lists of this metadata that's out there, but nevertheless it's still a very useful collection of information because they don't only do the access from these iTunes programs, but all sorts of other things subscribed to this. So I've now got my new CD and I'm kind of excited about it. I'd like to do something with it here. All right, you can see there are, in this case here, folks looking specifically at trying to grab all of the Miles Davis CDs together because I think I've got another one out there. I'm not sure, but I know I would like to have more. So I'm making a rule, a playlist, a folder, a smart thing that is going to collect just the things that are tagged with Miles Davis on there. So to do the organization of it, it's pretty straightforward. All I really need to do is create the smart list for Miles Davis. And unfortunately it doesn't give me the results I expected. Luckily it's good news. There's more out there. And I found another Miles Davis CD that I had downloaded a long time ago and forgotten completely about it in there. So it's done better than I have. So my list here, Miles Davis, is now showing many more songs than I originally had, but including the ones that I got there. And I just didn't get the results that I wanted. That was not a problem. I already had this other one. And it contains now all of these things to where I can move them around and organize my playlists. Now, the interesting thing about this model of programming and the way we're looking at the way metadata is managed is it provides a very nice, repeatable pattern. And in iTunes in particular, the pattern is extended to the same code base running the same interface, the same processing, the same data structures, and applied to podcasts, movies, books, PDF files. And the economies of scale are enormous until of course those of you that understand how this is involved, Apple eventually found this to be a very, and Apple customers of course found this to be a very confusing thing with way too much in it. And they broke these out into different pieces. But the key here is this metadata management is extraordinarily useful. Now you have the ability to show somebody else what you're talking about when you talk about managing metadata and why it might be valuable. Again, this gives you valuable information about your iTunes assets. Do I have these specific recordings? I can look up, yes, I do. Most of my, what is my most played? It keeps track of that for you. You may not want that to happen, but it's something that happens by default there. What would it cost to improve these, right? Again, if I was gonna look at that, and I listened to the entire album before I needed to go do something else, these are very definitive answers that can be obtained from all of this. So let's move now to this gerund part. I know most of you didn't come here for an English lesson and I am not qualified to really lecture on the subject either, but we will get through it. Don't worry. So as a topic, data is fairly complex and detailed. People who are outside of our community generally don't want to hear about these things. And most of them lack the requisite architecture and engineering background to really properly absorb what we're talking about. As a subject, this is taught very inconsistently with a large focus on technology and almost no focus on the business impact of what's actually happening. And therefore in general, it's not well understood. There's a tremendous lack of standards. And most importantly, every work group in your entire organization has learned something about data management, but you don't have a clue what it is. My favorite example on this one is a little bit. You can Google it while the Eastern playing piano. Good work there, Wally, but can you imagine how much time and effort your knowledge workers are spending learning this stuff on their own instead of actually learning from a base set? One of the most important aspects of working in an organization of any kind is to make sure that others understand what it is that you do. Because there's a natural bias towards if I don't understand what you do, I perceive you with a cost perspective. Whereas if I understand what you do, I'm more likely to perceive you with a value perspective on this. And this is critical. So one of the organizations I've worked with over the years is Walmart. And they had a brilliant idea around metadata. What they really looked at was to say that metadata is any combination for their organization of a circle and the data in the center that unlocks the value of that data. And of course, you can look here and see these. By the way, if you don't recognize those right away, those are the columns of the Zachman framework. So that works out very, very nicely there as well. And Brad, I hope you're doing well back out there in Bentonville still. Here's the one that came up with this particular scenario here. So again, that's how Walmart did it. Here's another example. It's very easy to understand if you're using Outlook or any of the male clients that are there, you have these same columns. What, how, where, why, all in your email. And you can sort home them. You can find important stuff. And you can weed out the junk. And you can organize for future access. Can you imagine trying to do this if you did not have who or from or to? I mean, these would be crazy things. Our inbox is hard enough to manage as it is. So metadata really is everywhere. And it's integral to everything that happens. Metadata is the data. What data is to real life? Data reflects the life transactions. Metadata reflects data transactions, objects, events, et cetera. David Hayes got some real good words on this that help the workings and structures of an organization's use of information in his modeling book there. Gartner's got a couple here and the second one really is the more important. When metadata unlocks the value of data and therefore requires management of tension that's a great definition. It captures it well. I'm not sure everybody will understand it right at first but nevertheless it's a good reason to understand what's actually going on there. So the management of it is the process of going through and managing the metadata. And we've done a little bit of work in this area. This is an ERP implementation at one point. And what we were looking at here was metadata describing a new PeopleSoft implementation. Now when you look at a new PeopleSoft thing if you haven't had a chance to work with PeopleSoft a very fine software. But this was just a listing of the names of the modules that we were looking at. So under this I should back up a little bit. There we go. Just looking at the three main modules that were purchased, develop workforce, administer workforce, compensate employees. They're about the same in the same magnitude. Then there's some smaller ones to monitor the workplace, define the business, other types of things there. And we took this administer workforce one and we split it out in a little bit more detail. I said, well, what does the administer workforce one look like? And basically, you know, we could look at the various levels that are there. Again, it was obviously going to be more difficult to understand a recruit workforce than it was to plan successions. We've only had 5% of the functionality there. These are just measures of things that are in there. In this case, words. We're looking at our titles or structures. Because data structure problems have always been really, really difficult. Here's a data structure of the system that we had at VCU back in the 90s. You can see the date term fall of 1999 up there. And our student population was managed by a very nice system. Here it is. This is a model that the students actually built. It was called the student database master. And if you look hard here, don't do too much staring at this thing or it will definitely mess your eyes up. But you've got a lot of things connected to that master thing there today. A very nice hub and spoke description here. It's sort of a star schema in terms of the way they've pulled it together. This is old predates anything. And I'm just showing you this to say that you understand that it's a one to many relationship between each thing at the SDBM and each of the leaves on the tree that are out there from it. So a little bit of explanation. You've now got an idea of how this thing is laid out. And if I told you that a vendor honestly with a straight face proposed to replace it with a system with a data model like this, you probably can guess it didn't go very well on this. And I said I was gonna talk about the part of speech. Again, one person's data can be another's metadata. So the gerund is this verb that functions as a noun. I think I said that verb before, I'm sorry about that. Again, it's describing the use of the data, not a data type. And that's important because that changes your value proposition around metadata from is this metadata? The answer is anything can be metadata. So the answer is likely to be yes to the right question, which is should we include this data item within the scope of our metadata practices? And that's a value proposition. That will allow you to determine whether or not it is worthwhile to store this information to maintain it in a way that's sense. So as a strategy, don't try and do everything. Don't start with the A's and say, we'll get to the Z's in five years. Instead, say, is it worthwhile? What sort of value proposition can we ascribe to managing this data item better? And you will do much better off with your approach to this. Number two, strategy number two, enforce metadata to be the language of data governance. Again, let's just start off by talking about what is data. And I'd like to introduce this by introducing the number 42. No, it's not Jackie Robinson's. Well, it was Jackie Robinson's jersey, but that's not the use I'm making of it. 42 is the one from Douglas Adams, Hitchhiker's Guide to the Galaxy. 42 is the entry to life, the universe, and everything. And if you don't know any more than that, Google it. It's a fun book. But I've associated a fact here. The fact is 42 with a meaning, which is the meaning of life. So if you've learned nothing else from this webinar, you definitely learned the meaning of life. And each piece of data then becomes a specific combination of fact and meaning. And those combinations, we want to find useful data, not just random data. So we're going to put the qualifier useful on it there to bring it in. And if we get to information, we can do that by understanding what are the nature of the requests that people are making to us. So this gives us another level in there. And obviously at this point, it's obvious, but then keeps one instead of best. You can have data without information, but you can't have information without data. So keep that in mind. I don't like trying to manage them separately. It's more useful, more trouble than it's worth. Finally, getting to the intelligence level, although I've seen this other level also labeled as wisdom or knowledge. What are we strategically using it for? So of course they want all the data, but what are they really using? And that's an important distinction here. You can see we built this up in a little taxonomy on the bottom part of it, and it should be useful. This structure here then becomes something that's critically important to understand as we're doing this, because when we look at these things in context, data governance and data strategy are interrelated. The first thing is that data strategy speaks about what the governance is focusing on. Let's provide it some direction. I'm going back the other direction, of course, how well is the data strategy working? Of course, data strategy only is relevant in the context of an organizational strategy, and the question there is what can data do to support the achievement of organizational strategy overall? That's the primary focus there. Of course, we have a relationship with IT projects on this and what actually happens in an organizational operation component here as well. So we'll put a couple more things up, but I definitely don't show everybody this chart and keep it kind of simple like this. And notice what's happening here. The data strategy has to be done in support of specific business goals, or it's very hard to have any tangibility to it at all. And of course, we said it a couple of times, the language of data governance has to be metadata. When we're communicating with the stewards, they have to know what we're looking specifically for and how they can communicate it. All of them have to speak the same language. Metadata is the language that we should is at a higher level of abstraction that is more formally developed. And of course, we want to pull all of these things together, so let's talk a minute for how the data community is composed in most organizations. I always put the IT foundation in place because that's really, really critical, but I'm gonna divide this up into four quadrants here. On the left-hand side, the domain expertise is less. On the right-hand side, the domain expertise is greater. The roles are more formally defined to the left. They are less formally defined to the right. Similarly, we have a division of top and bottom. The bottom half is more likely to encounter governed data. The top half is less likely to encounter governed data. And the bottom half are more likely to have more time dedicated to the process, whereas the people on the top half will have less. So that's how I'm dividing it up. Again, it's gonna be comprised of leaders, stewards, principles, and others. I said participants, I said principles. It's really the SMEs that we're talking about from that perspective here. And what happens here, well, of course, one of the things we want to make certain of is we define somehow that some group is part of our overall data governance program. It is not part of the program. We have to draw a line around separately, but everybody needs to know what's going on. And how this works is that this group is responsible for bringing resources in. They're gonna gain feedback from these perspectives. They're gonna make decisions. The stewards are the ones trusted to implement the decisions. The decisions will require actions on the part of some people and other people, and changes. This is gonna give some feedback, and we're gonna hopefully get some new ideas around and provide some guidance. And again, I don't show that big version to everybody. I keep it very simple, but you can see that all of this communication means we have to be speaking the same absolute language between technology components as well as between people components. One of the biggest challenges around data governance is that people get confused as to exactly what we're talking about. So when you look at it from this perspective, you've got the ability now to say we've got valuable information about our governance assets and promises. Do we have a shared understanding of our goals? Are we in IT focused or aligned perhaps? How cost effective are we being? And what kind of metadata are we finding more valuable and giving us increasing insight all the way along? So that's what I mean by the language of data governance. You've got to be using metadata to do it that way. Otherwise you are just incredibly wasting resources. Glossary capacity, the big thing in governance these days is business glossary. Everybody wants one. Of course, we all want them. Let's just do a little quick double check on this though. Most people get a glossary to come up with definitions. I like to go a step beyond it. And I want to make sure I give Clive credit for this. It was a birthday a couple of days back. But Clive was the one that taught me to use a purpose statement instead of a definition. So here's a definition of a bed, a piece of furniture used as a place to sleep or relax. If we go to Wikipedia, we get something nice like that, right? Well, let's go a little bit better. The purpose statement really incorporates motivations. So when I bring in something describing why the organization is maintaining information about this business concept, I can make it something that happens here. So for example, this is a bed as well, but this is a substructure within the room, substructure of the facility location that contains information about beds within rooms. And now we have a lot more understanding of this. Apparently what they were attempting to do to pull all of this together. I just got a sources of information. Where did it come from? A partial attribute list. And there are associations that go with other parts of the organization. So we have lots and lots of this metadata that we can describe. And finally, there's a status associated with it here. And I wanna emphasize this point to you, as you're looking at your models, all of your models, your attributes, your statements, et cetera will be unvalidated until they are validated. And that's where you can put the word draft on them and be very, very successful. Let me talk about the most successful business glossary application I've seen in 35 years. And it has the wonderful term N-T-B. Now this stood for a term bank for Nokia. And Nokia is a very fine company before Microsoft got them. It was a pleasure to work with them. They really understood a lot of these concepts and apply them very, very well. Just a quick note, those of you that are listening will actually get know who this was. We won't give them the part where it actually describes the Nokia term bank. Let me just tell you a little bit about the culture here because it does go into culture. Nokia was a company that wanted to be recognized worldwide and became recognized worldwide. In fact, that was a tremendous player. And one of the things that they did, first of all, you have to understand the Finnish are such a polite society. I mean, they're disrespect at all. 2% of the Finnish population speaks Swedish. So they all learn Finnish and Swedish. So they can, in case they run into the 2% of the population that speaks Swedish, they're able to converse with them. I've never worked in an environment like that. It's a phenomenal thing. But the management said to them, if we're gonna play on the world stage, you all have to do all of your meetings in English as well, even when you were in Esku, Finland. So it was very, very interesting. And what they discovered was if they had a common set of vocabularies that were both for the business, but also for the transition to English, it'd be very helpful. So when they were in a meeting, if somebody used a term that they didn't understand the culture, and this had been in place for five years before I got there. The culture was that they would all turn to their laptops and type in the term at the Nokia Term Bank, which was a web-based application that everybody in the company could get to. And if it wasn't there, then they know they had a challenge and they would have a conversation among themselves about whether this term should be included in the Nokia Term Bank or not. And this term bank got richer and richer over the years. And I saw it. It was literally part of their culture that they would move to this thing to go to the common terms. And that's what everybody needs to develop. And the challenge with this is that most people think this is about technology. And unfortunately, your knowledge when you're starting out at this is low. At this point in the vendor's knowledge is extremely high. And so we've got to have some way of bridging this technology gap. The way I suggest organizations do this is by learning a little bit about what they're doing and building it themselves. There's no better way of doing this. It is absolutely possible. This model right here will tell you all that you need to know a metadata repository together for your organization. Let me show you one that we built a couple of years back. This was for something called FTI, Financial Transactions International. And the FTI, excuse me, FTI account here is one of the tables that's in there. You can see I've got it in the piece. And when you click on the entity button, it would give you certain bits of information or you could go click on the domains or you could go to a higher level around all of this. Now, when you're looking at how this works, we could click on this table here and say FT underscore T underscore ABDF as a table name. Where else does this occur within the metadata? Well, for each column, we could look at it and say here's the column detail. I could also click on the buttons down here at the bottom for primary key table, foreign key table and table usage in general overall. I'm not showing you anything miraculous here. This took a couple of hours to program and a good series of access to it. So here's the foreign key table. Again, the primary key table is there. And of course, where does it show up overall in the entire list? And again, when we look at this, we've got a tremendous amount of information that you can look at in your own organization and start to do organizing. Yes, eventually it will be cheaper but I've never seen anybody spend millions on a glossary or on software technology at this point and actually have it pay as a productive investment where I've seen dozens of corporations build their own, learn from that process and then mature into it and have a good conversation. Again, do we have these specific or this class of assets can be asked? Yes, I do. Is the data used somewhere else? Well, not at this point or whatever. What did it cost to acquire them? Again, all of this particular information can be easily obtained as you're going through and looking around how all of this goes about. So the ability to build your own is much, much less expensive than going out and purchasing a major size once. So treat these things as a capability, learn how to use them yourself. One of the key ingredients there is to make sure that you have a business sponsor who finds it useful. Again, I'll tell a story. One of the organizations we worked with, we kept a repository up just to count on the number of data items that were being considered for a migration and it was a wobbly number and it showed that we never really did understand the requirements thoroughly. So it was tremendously useful and cost us practically nothing as a result. Strategy number four. There are tremendous number of building blocks out there. Do not ever let anybody tell you you're unique. I'm sorry, it has all been done before. So let's talk about what architecture is for just a brief second. Architecture is the discussion of things and the function of those things, what those things do. And in our case, from an information perspective, looking at the sources and uses of data, of course, all the way through and how those things interact. And again, whatever that description is on that, all of this description here that I'm describing to you is of course metadata. And the metadata is organized in very, very strict ways. Details are organized into larger components and that permits the introduction of intricacies into the process. The larger components are organized in the models and that sets up a series of dependencies that are permanent. That's why data models of existing software are rarely if ever updated at all because of these dependencies. And finally, the models are organized into architectures that are composed of these various architecture components. And this gives us a purposefulness, just the same way as a system has a purpose. Our architecture has a purpose here and all this metadata slides right over onto the other side. The attributes are organized in that way, the entities into the models and the architecture. It gets us there and why I'm telling you this is because it has all been done before. So there are a tremendous number of design patterns. For some reason, the clip art for this slide is the city of Perth, Australia's 12 largest buildings. And my question to you is, why are the restrooms in the same relative place on each floor in each building? And the answer, of course, is because it's cheaper to do it that way. I remember the whole purpose of a restroom is to take that stuff out of the building using only gravity and pipes. So you've got to be accurate and good at that process. And what you do is you make the shortest length of pipe that you possibly can just as a general rule. So the bathrooms are literally stacked one on top of each other. That's a pattern. And just as a building a house or building a car, you can make a large one with pretty much the same infrastructure. You just make it work a little bit harder all the way around. And of course, we're talking about the designs of the electrical wiring, the HVAC, the floor plans. All of these types of activities are there and there are a tremendous number of books that are out there. I'm going to recommend a couple of authors in particular. Len Silverstone, Paul Agnews, three volume set of universal modeling patterns. I've got an XML book that's got some great things in it. David Marco has a book called Universal Metadata Models. It's just been fantastic. And David, hey, kind of started the whole patterns movement off with his data modeling patterns that's out there. For example, here's another pattern. This is a metadata for an interface. If you were literally transferring data from one thing to another, this would be the common format that you would transfer to as you were transferring back in the other direction. So everything would come into here and map them. From here, you'd map out to the other way, which is a lot cheaper than mapping every little thing to every other little thing. So little trick people haven't learned over the years. So I'm babbling a bunch about metadata and that's important. Let me add another bit of babble here. This is a fun thing from IBM where they claim to have all of the different types of metadata that you would need to have on all of these things. Again, the idea here is that there's just a tremendous amount of patterns that are there. And if you have trouble finding, start with Google. Google is always your friend in this instance, but nevertheless, it's still a little bit going all the way around trying to get that. So it's got some bits and pieces here, but again, don't look for it. It's been done. We've got it. There's accessible stuff that we can do too. And this works in the semi-structured field here as well. We look at semi-structured here. By the way, this business of converting unstructured data into structured data is a bunch of hoolly. If they can claim they can do that, I'd hand them a glass of water and say, please turn it into wine for me as well. The real key to this is to understand that we can take semi-structured data and add a little more structure to it. And that's about the best we can hope for. And it's a tremendously useful process, but let's be honest about what we're actually doing. So again, better description for it would be to call it non-tabular data rather than tabular data. But even within here, we've got all kinds of things. And most CRMs are focused around these types of an area here. There's a bunch of examples of existing metadata that are wonderful platforms that you could take a look at. The Dublin Core, for example, for all sites, the reference metadata that's there, this tremendous XML scheme that's that have been developed over the years. You do not need to start at the very beginning. And that is the key. So again, look for these things and say, do we have a, I got a call. I remember we were at EDW one year and I got a call. Shannon was in one hotel room. I was in another hotel room doing this exact webinar here. And somebody on the call, one of your Q&As, was, do you know where I can get a model for a pharmacy billing system? Lenn Silverston just happened to be walking by my hotel room at the time. I grabbed him, pulled him in and said, and he was able to tell me chapter verse which book of his had the pharmacy billing system in it that was there. Well, software fit. And this goes to another best practice that we really should consider as a metadata practice. And that's the idea that if your organization is looking at the purchase of any software at all, it ought to be a mandatory requirement that the vendor provide a logical model of that data so that you can see whether it will be a billing system or not in there. Again, are there been industry best practices? Yes, we're getting there. And if somebody actually started to publish models, that should be GDP or my oncology is there on that. So let's talk about some benefits here. And again, the public is becoming more aware. You can have a discussion with, quote, non-data people about it here. So this is a bit put together by the Electric Future Foundation where they don't listen to your phone call, but they know that you ran a phone sex service at 2.24 AM and you spoke for 18 minutes. But they don't have any idea what you talked about. Or you call the suicide prevention hotline from the Golden Geekers. What's the topic of the call secret? You can see there's a lot of information about you, about everybody in the metadata that we maintain on organization. So don't let anybody fool you and think that we are just talking about the metadata. In fact, my favorite book on the subject night, I apologize for renting on this, but I feel very strongly about it. She had a Zuboff's book here on to be a mandatory reading book for all people in the data field. She's done a great job of pointing out that there is a tremendous widespread ignorance out there. In my own capitalist society, she likes to say, technology was, is, and always will be an expression of the economic objectives that direct it into action. So yes, there's always a monetary goal on there. You may or may not know this, but if you have a Roomba and you connect your Roomba up with the cloud, you're giving permission for Roomba to share your data with an unknown number of other people, including perhaps law enforcement, who might be able to download a map of your living room to be able to show you where is an ottoman in the way so that they don't trip over it when they do a no-knock warrant. Picking on law enforcement in particular, but that's one that's been on the minds lately. Nest, again, didn't bother to tell people that the third generation of Nest operations that the Nest thermostats had a microphone in them. And that being a nice undocumented feature, brought them quite a bit of grief on that. And my favorite example, the sleep number bed. If you connect your sleep number bed to the internet, as most people do because it's free and it offers some very nice features, part of the information that sent back to the sleep number bed place is the bed's numbers, excuse me, the bed's estimate of the number of people that were in it the time of day, how much moisture was produced as a result of the bed interaction. This gets pretty bad pretty quickly. And all of this, of course, depends on having metadata, which we forget to the ultimate metadata question is, as Susanna Juboff says, who knows, who decides and who decides who decides on this. Let's go a little bit further here. Here's an example of a really interesting company that I worked with that was right here in Richmond, Virginia, called obviously in Vera. And their proposition was that they were going to connect the world by connecting all the metadata and their world was the chemical industries. So that's what they were doing here. And I'll run this little bit here. So their proposition was company A talks to company B and company A talks to some third party people and they do it by all kinds of different document types and different graphs. But in addition to that, they also did it by phone, fax email and electronic data interchange here as well. So given all of that type of an exchange, it's very messy and very confusing. And again, if you map out what's actually happening, it goes absolutely bonkers crazy after a while. And this is, by the way, the day to day operation that most of our organizations are having to deal with. So of course, the in Vera proposition here was that they were going to replace all of this interaction with a metadata hub, just exactly the example I gave a few minutes earlier in that data migration example and put the in Vera hub at the middle of all of this. And the in Vera hub would allow these organizations to speak more confidently back and forth as they went back and forth. Now, one other piece to look at this as well, it weren't just being nice. And Vera also would be getting all the trend information from all the people if they achieved a critical sign up mass in this area. So they would understand the trends that were going on in the chemical industry and that was pretty useful. They had done a good, good job of signing up lots and lots of critical mass customers in this. So again, very, very interesting example. They advertised it actually kind of interestingly too. They would simply say, hey, look at all the metadata we're going to gather in the process of running all these transactions through our transaction processing engine. And we'll have information about all of these bits and pieces that will enable organizations to really, really get the idea of what's going on and we will become the experts that those would be in the know information as they do that. So let's change again topic and I apologize for whipsawing but you guys, you get all the slides and again, you can come back and watch it if you get it thrown up on YouTube. But the federal government interestingly enough has done a really interesting job force in the metadata world. And most people don't think about the federal government having a role to play, but they passed something last year called the Federal Evidence-Based Policy Making Act. And again, it was very, very interesting process. So I've got a couple of little bits just to tell you about it. They were going to describe the federal data management a little bit better and blah, blah, blah. And there was some interesting opposition here. For example, the list that's scrolling in front of you very quickly is the list of 100 or so data items that you get if you sign up for federal student aid loan program. And it's quite a lot and they're worried, I think correctly that the government may or may not pay for it if the data is revealed in Congress. I'm gonna turn her volume down there because I know the volume's not coming out very well there. But she's a very earnest speaker. And what she says is that, you know, this will give the federal government without the proper controls in place without data governance the ability to do data mining on lots of US citizens. And she's earnest. Unfortunately, her efforts didn't really make much difference there. Past the House, 356 to 17, passed the Senate unanimously, although I'm pretty sure they told the Senate that, hey, this is for the data geeks, give them a Christmas present, right? And the President Trump signed it on the 14th of January, 2019, which was in the middle of the federal government just shut down that was there. So why is Peter babbling about the Open Government Act? Well, first of all, all federal data is now open by default. That's a tremendous shift from protected by default to open by default is absolutely going to be a game changer in the federal government. In addition to that, they are requiring non-political CDOs to be appointed for every agency. In fact, it was supposed to be done by August of 19th. And most importantly, that CDO could not be the organization CDO that was part of the statute. And finally, they generally were requiring the use of open data and open models in policy evaluation. And let me explain what that means. If an agency decided to change policy from policy A to policy B, it is against the law for that agency to do so unless it publishes in advance the data that it's going to use to evaluate whether that is a good policy decision or not and the model that they're going to use as part of that decision. These are very important pieces. And of course, you can see that metadata is absolutely crucial. People will be fingerprint checking these organizations, making sure they're not switching data sets and all sorts of other things, which is good. That's a wonderful way for the organization to be transparent. And it will be then against the law to make a decision that isn't supported by the model and the data. And this is, again, it's going to be interesting to see how it shapes out policy-wise. But the penalties are higher than HIPAA. And you know that people are scared to death of HIPAA, even because the movie really understands what it is that's going on in there. The reason I'm babbling on about the federal government in this area is, first of all, that it is one-third of the economy. So we now have one-third of the economy following what we consider to be best practices. And in this case, it's probably time for the industries to start following the federal government's lead as opposed to the government looking to obtain best practices from the private sector. So let's talk about some metadata benefits here. Again, these use of data as metadata can increase the value of strategic information. I can't tell you how many organizations I've gone to. They've bought something wonderful like Salesforce and then polluted Salesforce with bad quality data. So everybody then blamed Salesforce for being bad, which it's not, it just happens to have bad data in it. Reduces the training costs and the impact of staff turnover and all the way around. Metadata can be considered just documentation, but of course it's much richer than that. It reduces the data-oriented time. Every business analyst or data scientist will tell you that they spend 80% of their time finding the data that they need to have and 20% of their time managing it and doing the algorithmic work and whatever it is. And if we can just chop their in productive time from 80% to 60%, we can double these people's productivity. It is a tremendous lever. It can help us with the communication gap all the way around in there. It can specifically look at the time to market by system development time and reducing project risk. And finally, through very good governance practices, you can identify and eliminate redundant data practices, processes, et cetera, et cetera. So we've spent a little bit of time here talking about defining metadata in the context of data management. And what do we mean by using data as metadata? And hopefully now you understand that use of metadata when it is valuable is a tremendous lever for you. But making everything metadata is a wonderful exercise of boiling the ocean and something certainly that we do not need to have. I've given you an example using a very popular piece of software that is, I didn't mention downloadable on the Windows platform so you can show it with those examples if you want. And hopefully you have an idea now that I, what I mean by metadata is a gerund. So don't try and treat it as a noun. Again, everybody runs around and looks at things and says metadata, metadata, if it's metadata, it should be in the repository. No, it should be in the repository if the use of that technique will provide the organization with value. And you are now the best experts at deciding what that value is. Second strategy, again, the language of data governance. Not only will you help your metadata efforts, but you'll tremendously help your data governance efforts as well that'll be much more focused if they haven't been. And then treating these glossaries as really capabilities that can be maintained with, if you hate to say at a spreadsheet, hopefully it's a publicly accessible one and we have some controls in place and versioning and all sorts of other things around it, but none of them require any fancy technologies. And number four, when you start to build your metadata, you may have some building blocks that you could start from, you should have some building blocks. If you're starting off with a blank sheet of paper, I think it's probably incorrect. Again, talked about some specific benefits and things like that, which is that metadata defines the essence of organizational interoperability. If the metadata is correct, we can communicate if it's not, we cannot. So let's look at a couple of quick takeaways around this as we're getting ready for the question. Again, the idea here is, if we've got focused access on metadata, sorry, it's my benefits slide there, there we go. Most organizations think of data as data about data. And while that's a good way to think about it, and it's great for that 30 second conversation, when you really sit people down and say, it is about data about data, but it's really about delivering value. And if we have the ability to quickly and easily evaluate the perspective of the value of the data, then we can start to determine where management should be involved. Management's probably not gonna be interested in this, and I'll tell you guys a quick story here, but management generally doesn't understand this stuff well once you get about two levels higher than where you are in the organization. I spent the entire summer of 1993, I believe, or it was 94, in the basement of the US Pentagon, shoveling electronically data elements into the DOD data dictionary. Now, this was our version of the repository. And I remember very clearly, the reason that I was doing this was because somebody had promised somebody somewhere that we would have 17,000 data elements in the DOD data repository by August 17th. Now, again, it was a miscommunication. Nobody had any bad intent about it, but I was the one that was sent down there because I remembered how to use mainframe computers in those days and could actually work on these systems. And when we finished the exercise, we had 16,995 data elements that were of questionable quality. They were actually of unknown quality. So it wasn't a very useful exercise, but there were five elements in there that we considered good. So we had the DOD data dictionary partitioned into these various segments, some that were pertaining only to groups, others pertaining to the entire Defense Department. As I mentioned, those five data elements that we'd started with were part of the five on that. But when you look back at it, it was clearly not a, now it was not a very expensive resource. So nobody probably cared that I didn't do something else instead of sitting in the basement of the Pentagon and shoveling data elements in there. But it was nevertheless a good thing from the perspective that yes, DOD was starting on this. We created the first data models. We had more advanced data management practice in the Department of Defense than in industry, certainly in the day, according to our averages and measures that we took. But at the same time, it was not a totally productive exercise because management really didn't understand why it took so long and we didn't get expectations correctly around all of that. Anyway, enough of the worst stories. You probably don't want to hear about them anyway, but thought it was kind of fun to put it in there for the record on that. Another takeaway is that metadata is much less about what and much more about how. And if you think about it, this does make sense because what we're really interested in is being able to put your hands on something and get a quick access to it. If you can do that, people are more likely to do it. We've had this problem in the software engineering community for many years. We've been trying and frankly failing outside of the open software source movement to actually achieve any kind of reusability on these. And the reason is because somebody will, if they take time to write it up themselves, they'll try to look it up at the same time. It just doesn't work. They won't find things. The descriptions aren't precise. Think about this. We haven't standardized on data in 200 years. We're certainly not going to standardize on software in even less amount of time than that. So again, much more about what, excuse me, much more about how and much less about what on here. Again, language of data governance. I feel like giving you a bunch of examples of how you simply can't do data governance without communicating. And the language of data governance should be at the core of what you're doing. Metadata is truly at the essence of most business challenges. I mentioned earlier of putting in a new CRM system that we've seen happen to lots of organizations recently. And people turn it on and it has bad data in it. And they blame the system. It's not the system's fault. The metadata is behind the business or the system challenge in almost every case. And simply misapplying metadata has had all kinds of failures, as you've heard, you probably heard about the Mars lander and things like that. Those were all classic metadata problems because somebody forgot to convert the meters to feet or back and forth when we're doing it. But again, the real question that should be guiding all of us is, is it worthwhile to include this data item within the scope of our metadata practices? And at first, you won't have a clue. But after you do it for a little while, you'll start to have a pretty good idea of how long it takes, how much it's gonna cost and what value it can deliver. And then only then will you be able to start assessing where you're going with respect to your metadata practices. As always, we've included some references here for you so that you can do some additional reading on the side if you'd like to do that. And we will now head over to Tony and talk about the Q&A. And just on the way they were mentioned, I'll get some special event pricing on the books. So if you're interested in those, you can pick them right up and put the Anything Awesome coupon in the corner. And Tony, we're back over to you. Thanks, Peter. I was amazed after your talks how much you've managed to get that specified period of time. And there are a couple of comments that we were going through about slowdown, as we take in here. But as you mentioned before, if there's anything you want to go back and cover again, we will be distributing the recording and the slides within a couple of days in this presentation. So, all right, we do have a number of questions, Peter. Let me take this one, first of all. So, comes from Todd. He asks, what about when a company buys an advanced data management tool first? Do you just model based on the constraints of that tool at that point, or is there another way to go about it? I think you're kind of saying in your talk that the technology, the specific technology really comes with that. But as so often happens in this space, the tool is purchased first and then everybody has to figure out what do we do? Or sometimes it's inherited too. So, we don't always want to necessarily say it's bad practices to do so. But yes, the more that your organization, your data group learns about their specific requirements, the better able they will be able to select appropriate technologies as to when they're used. But it does happen. And when it does, the question sounded like, which do we do first? Do we conform to the tool, or should we perhaps change the tool to a way we go? And we'll abstract the problem up just to a little more general piece. When you buy software from anybody and you include it in your organization, after you turn it on, is the worst time to discover that it doesn't match your needs? But of course, that's generally where it occurs because people don't have a good approach to the process of testing. Again, go back to the requiring a logical data model as an advance, as a key to play, is a very, very useful practice that are considered to be best practices today. So the idea here is that, yes, you could spend time conforming the tool to your uses, but it'll be much easier and much safer for you to do just the opposite direction. So I come down squarely on the side of let's not bend over backwards, try and make the technology do things that wasn't designed to do. Let's find out what the tool does well, let's keep what people do well, and let's design good workflows that complement the existing strengths of both parties in here. And the only way you can throw a workflow back and forth between people in automated systems is to have good consistent metadata be used throughout the whole piece. So I'm very definitely on the advocate of postpone your tools if you've inherited your tools, find out what it does. And then figure out how to use it appropriately, but don't try to make it do things that don't work for you. All right. This question myself, I'm gonna borrow George's verbiage here though. He asks, what scorecard metrics would you use to show the value of metadata, I assume? It's a great piece and probably a scorecard is the wrong way to think about it. We find that storytelling works better the higher up you get in management. And so having the ability to describe how metadata is used. For example, I worked a couple of years back ago with a really fine bank or institution and their metadata was perhaps a little bit more finely tuned than most organizations. It was such that if their metadata went down, their production went down and if their production went down, the metadata went down. Now that's not saying that one thing is good or bad, but it was the way it was. And these guys understood it and they were able to work with it in that type of a context. Other organizations don't have that kind of insight into how they work and how the best they can be served by this technology. So the more we learn about our specific requirements, the better off we'll be able to do with a job of describing that. Tony, I think I got off-tractive. Did I answer the question? Sort of. Perhaps fully, my question is going to be more around cost-justifying metadata management or the approach of a metadata tool, construction of a metadata repository or maybe what is the business case? There's various ways to ask a similar question that kind of all get to the same basic idea. How do you pitch metadata value, particularly to senior executives? So it turns out that the biggest source of untapped productivity for us globally is the knowledge worker. There are more than one billion of them worldwide and pretty soon we will be able to start putting them under additional screening for their data literacy skills and we'll be hiring the more data literate if we have an option in order to do that. But you're asking specifically about the case for doing it and I've helped a lot of organizations do it. There are many, many different ways of communicating but the first thing is to find out what is management concerned about? So while you may say it'd be great to have a standard vocabulary, it sounds like a really good thing but a necessary thing to have all of these organizations work together, it's much more tangible if you could say and the cost of not communicating well has caused us to have fines from the federal government because we've been sending our hazardous transport systems into cities without proper notification and we are trying to notifate them properly. We're trying to do the right thing but our systems aren't letting us do this and when I can add up at the fines of the six figure fines repeating on a regular basis, it becomes pretty easy to say if we can get rid of the fines plus the cost of the lawyers in that instance, that would be well worthwhile to solve that problem. Now it's not to say that we have to solve every problem that fine grain but if you do a couple of those people start to trust you and when they learn how to do this organizations become very, very good at it. So it's absolutely the case that you have to have an economic justification. I tell this to data people all the time. If there are five of you in the group and you guys are being paid $100,000 a year just to make it very simple, you have a justification to show the organization that you have delivered at least $500,000 plus to them on an annual basis or they very correctly can look at you as a cost and say, no, I don't think we can absorb that particular cost. So it's a lot of adding up small things in many cases, Tony. It's finally, I had one group that we did where we justified it on the basis of saving every knowledge worker in the company one hour per year and that still turned out to be a very positive return on that investment. To my knowledge, we never followed up to see if the measurement actually did, but I have worked with organizations that have measured this. We were doing some with the Defense Department where we were deploying troops and part of the provisioning of a troop deployment results in a data set being created for the troops. And if that data set was a slow chain in the process, we could speed up troop deployments by putting metadata around the troop deployment data that they needed to have and making sure it was understood in that way. So there's a bunch of examples that we can look at and I get sort of passionate about this because a lot of people say it's hard. It's hard because we don't practice. Yeah, it'd be worth us actually looking into this at some point, Peter, and trying to come up with a simple metric. I mean, I'm sure you're familiar with some of the stats around what does data quality cost the typical organization in a year or remember when enterprise search tools first came out there were statistics quoted around the average knowledge worker spends 70% of their time looking for stuff. I think if we could net it out to a couple of key statistics related to effective metadata management, that might help a lot. So let's put our heads together on that one. Actually, I've got a bit I can add to that, Tony, real quickly we grabbed this other presentation that I was working with. We've got some numbers that we're actually pretty confident that we can start to talk to specifically. Since you brought it up as a topic it's probably worth checking on this. So let me just get to that slide. This is why we have so much fun with these things guys because we can do these spontaneous things. There we go. We believe that likely organizations can save 20 days on the part of most of their knowledge workers. Literally, we can save people 10 days time through better processing of email. That sounds incredible, but that literally is the number that we've come up with. On this five days of navigating between apps there's a lot of things that go up here. So we're thinking minimum of 40 days productivity annually would be a target to shoot for given some of those opportunities. Sorry, 40 days, O2 what? O2 increases in productivity due to better metadata management. Okay. All right, let me address a couple of other questions and also let me just ask those in the audience if you could put your questions into the Q&A section rather than the chat. That would just make it a bit easier for me to find. Having said that, I'm gonna go back to a question that Mary asked in the chat section. So she said, you had a slide that said if my company does not understand my role in enterprise data strategy, then they may see me as a cost rather than value. Could you say a little bit more about this please? Sure, this is just a natural bias that we as human beings have. If we understand what the person does we are more likely to be able to associate a value bias towards them. So again, I may understand that my colleague over here if nothing else, always make sure that the gosh I'm having trouble with the example. Maybe I was gonna say something about coffee, making sure the coffee pot is full. That's not a very good value, but let's get a proper value. My colleague is much better at double checking details. So I do the first generation of the report and my colleague double checks me on my English to make sure that I get all the spelling correctly and things like that. It's easier to associate value with that once you understand it. If there's nothing, that's what it is on the other side that don't understand what you do then yeah, people look at you as a cost in that. So yeah, thanks Gary. There's a great emphasis here because we have to do that in metadata. The key is that if we don't understand strategically what we're trying to do with metadata we'll not have the ability to focus in on that value proposition specifically. All right, change the pace a little bit here. There's a question around knowledge graphs. And is there a relationship between knowledge graphs and metadata modeling? There's obviously a lot to talk about knowledge graphs in the industry at the moment. So how do those relate? I mean, absolutely. Again, first of all, just take the representation of a knowledge graph visually that by itself has to have metadata in place in order to actually be able to do it across a variety of platforms, browsers, et cetera, et cetera, et cetera. So yes, at the very most basic level those are absolutely relationship. I think probably what the questioner is asking though is has metadata gotten sophisticated enough to be able to capture the richness of some of these capabilities in there. And I have not seen examples of that but I'm sure it's been done. If it hasn't I'd be extremely surprised. That's about all I can say out of time. Yeah, a lot of the graph vendors, especially the semantic graph looks have targeted metadata management as a great application of their technology. And some of them have built tools others are used for essentially homemade metadata repositories. So the graph space is pretty rich but in terms of its metadata capabilities. And I think that's only going to expand in the next couple of years. So let's put that one more piece on to Tony which is a really interesting piece that you reminded me of the main source of learning algorithm data, training data. All right, so again, we're talking about the data that you need to train the machine learning algorithm or whatever it is that you're working on which is really the cutting edge of artificial intelligence these days has not been available. Again, the new data literacy book that I'm hopefully just about done with has a start that says this was the year that the machine learning algorithms ran out of data because they don't have that training data. Therefore, the algorithm can only become so good at what it does. And of course, the pandemic ended up taking all the air out of that particular set of sales but turns out that the most useful source of the area in which the greatest strides are being made are by using metadata as that data. And that is a tremendously not well understood piece. Most organizations, when you get the gardener presentations on their dark data, the data that they're not really using at all. That's where they can really gain some value very, very quickly. So if you're looking in those areas, investigate that just a little bit further. There's certainly some fruitful low hanging fruit to be mined in that area. So Daniel had asked if mapping through central nodes and then pardon me, if mapping through a central node, essentially a meta, meta model, isn't it worth losing something on the way? Yes, if you are not careful, you will lose things on the way. So what I've described here, there's an entire paper on this in the IBM Systems Journal. So let's not get too distracted on it. But what you're doing is you say that you have if you put this slide that you're looking at in front of you sideways, so that you're looking at it on the edge on this. The data that's coming into this can require much less detail in the mapping. And I don't mean detail as such as in detailed data. Most of the time the way organizations do their data mapping is they hand you a spreadsheet and they say, tell us what data fields we pull these fields from. And that works in many cases. But if it's truly an unknown to an unknown situation, we created this for the Defense Department as a generalized model for data being mapped onto this and then data coming out the other side. And yes, if you do not include in this, it's somehow filters that you will lose data that comes in there, but you could just as easily expand it in order to do that. And this will simplify your overall mapping problem by at least in order of magnitude around that. I know there were a couple of questions on that. Okay, so I'm going to preface the next question with a bit of a caution. So we have a request for tool recommendations. And though this type of question comes up a lot, it's something that we are generally a little bit cautious about doing the course of these webinars because well, just for a variety of reasons, we don't want to demonstrate favoritism. We don't want to be unfairly missing out on referencing somebody. And frankly, we don't necessarily know everything about every tool in order to make an appropriate recommendation. So it's very much an easy kind of answer. Having said that- Absolutely on top of that Tony, we don't know enough about their specific requirements as well. So yes, having said all that. So there is a tool, I mean, I'll ask questions in the face of putting it out there. Spencer is asking, is there an open source or commercial off the shelf tool that you recommend to author a specified metadata? So because I don't want to be avoiding that question altogether, I'm going to either, well, I'm going to invite Spencer to contact you directly to discuss that if that's something important to him. And I'm also going to pop again and we could have a conversation afterwards about if we have anything that might help address that issue on the DataVoteCity website because we have a wide variety of material for this. Absolutely. There are some pointers, I think, out there to some of the open source tools that you guys have referenced on some of those discussions. Yes, yes. So let me avoid your question, Spencer. I just want to put it in the right context. Okay, so there's a question from Daryl which is a two-part question. He's asking, how much metadata is a full metadata file like a file type, age, name, owner, accessor, modifier, a lengthy list here. There's a dark X file can have upwards of 60 different fields. So how would that non-common metadata be used? He's had numerous common metadata there but for the less common stuff, how would that be used? Well, first of all, the role of you as the data specialist. Nobody is more qualified than yourself to actually determine whether or not that stuff is useful or potentially useful in the future. And I'll tell you a very specific story. Tell me, do you remember Noreen Kendall? Very well, of course. So Noreen was the data architect for Delta Airlines and she told the story a couple of times publicly. Delta at the time considered, well, I say family, it was employee-owned airline at the time. And they considered that they were different from the other airlines. And if you look at the paper today, Delta still does have a different approach to some of the ways of doing things. And one of the things Noreen fought for and won within Delta was use of a company ID throughout the entire system, throughout the entire infrastructure that Delta had in there. And you may say, okay, in fact, people did say to her, why are we doing this? We are all Delta. Well, when the airline merger started, the first thing Delta did was they went out and bought Northwest Airlines specifically because Delta wanted the systems that Northwest had. And again, they were, quote, better systems. Long story short on all this, they found out it didn't work out as well as they thought it was. And they had to undo some of those bits and pieces that they pulled together. And the only way they were able to do it effectively was because the Delta ID was embedded throughout all of the various systems and therefore throughout all the data structures that they had as well. Again, long story short to say, you never sort of know where value's gonna come from. And I certainly can't say that going around and slapping a company ID on all of your systems will actually produce the results that you are trying to get or that it would be valuable enough to do so. But the models here that you're describing, and again, I just popped this particular slide up because it's a good basis to take some of this from. You may look at these fields as they're coming across and say, yes, I'm not sure whether they're there or not, but if it's an XML structure, you can incorporate them in a semi-automatic fashion. And if you do that, you can then do a quick calculation to say, I got 60 things in here, maybe 13 of them are useful. And the other ones are gonna cost me X number of dollars a year to store. Do we really wanna store that knowledge on the possibility that some of these things will be good? By the way, if you can't get an example, another example is when many people are discovering that they're online sources for their photos. My dad went through this back when he was getting started in the cloud and everything. And if we come up to him and say, you don't have enough space in the cloud, do you wanna go through and remove all the duplicates of your photos? You just wanna buy more. And he would say, well, no, for three bucks a month, I'm gonna buy more cloud because I never know what I'm going to need these things long story short. So tell me back to your question, which is a very good one. What about all the stuff? We don't know. You've gotta look at this list of things that are there. Determine what are the things that are essential for you guys to meet strategy, which may or may not be what's specified. You have more expertise than most of the people doing the specifying there. And then say, if possible, future use would only cost us, again, a dollar a year going forward, of course we're gonna do it. But if it's gonna cost us a billion dollars a year, we probably are gonna be a little bit more risk taker. Okay. So Peter, not to be too argumentative here, but the classic metadata definition is data about data. But I find that the more educated you are about data, the less likely you are to use that definition. Now you did use it. I don't necessarily, didn't necessarily interpret that that's your most preferred definition. Where I'm leading to with this is, what's your elevator pitch on metadata? Somebody ask you for an explanation. What's your preferred one minute definition, justification, do you have one? I do. And again, it unfortunately depends in many cases, but the typical elevator pitch is very much going to be data about data. And then hopefully a challenge to that individual to say, could I come and tell you more about it at some point when we're not writing in an elevator? Those of you that have participated in these activities know that they come in 30 seconds, three minutes, 30 minutes and three hour versions. And it's important for people to have these pitches and for everybody on the team to be saying the same pitch when they do this. So it's very likely that the first encounter about metadata will be data about data. But I try as quickly as I can to move them back into the gerund conversation that we had earlier, which is to say that, given it's really more of a use of that data than a specific type. And the reason that's so useful is because if we say that metadata is everywhere and pervasive as it is and that all of it deserves to be managed, you're making an absolute wonderful Boil the Ocean project here and not really delivering business value. And I've seen many organizations that have done very good job of managing, over managing their metadata, but it costs them an awful lot in the long run and in fact the entire programs were not sustainable because of that scope creep, I guess it's the best way to describe it. Mm-hmm. Okay, well, let's sign off. You've seen a lot of these things. Oh, is it time? Okay. I said, I'm going to sign off today. All right, good. Sorry, let me talk to the manager. There we are. I'm sorry. I was just going to ask what you were seeing because you've run into a complicated reverse. There's one more question here, which goes to one of the topics that at one point in your career was very near and dear to your heart and that's XML. So the question here is that XMLs be mentioned earlier and if it's used to serialize data then should the metadata be formatted in XSD, which is XML schema definition? Is that correct? So there's a new piece I shouldn't even say new. It's a danger to know, but there's a more flexible approach beyond the XSD approach and I have to go back to the presentation to look at it. But again, it goes back to exactly the same question. Most of those XML structures end up in JSON somewhere, which is how people get the interface development pieces very, very quickly and easily and again in a very semi-nice automated fashion. So yes, you can pick up, again, save a Word doc, right? And the Word doc by itself has literally thousands and thousands of attributes. Do you need all of them to maintain the Word doc for your purposes in your repository? Likely not. But do you need to know of what they are so that you can make a good decision on each one of them if you go by? Yes, absolutely. It's very critical that you understand what is good in there and what is not useful in there. Okay. All right. Well, I'm gonna stop the process of wanting things up here. I will give you back to Shannon in just a moment so that she can give you final advice on looking out for our follow-up. But I first of all want to thank Peter for his presentation today. Great job as always. I also want to re-mention the virtual version of our Enterprise Data World Conference, which is coming up next Tuesday and Wednesday. You can sign up for free at the DataVocity website. And I want to thank everybody for their participation today and hand you back to Shannon Kemp. Thanks, Shannon. So, any thanks, Peter, for today's presentations and just a reminder to everybody, I listened to follow-up email by End of Days Thursday for this webinar with links to the slides, links to the recording of the presentation. And again, as Penny mentioned, I hope you all can join us at EDW next week in Peter's next webinar. Hope you all have a great day and stay safe out there. Thanks, everybody. Bye, everybody. Thanks, Shannon. Thanks. Thank you, folks. Bye-bye.