 Here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity webinar made of Data Strategies sponsored today by InfoJix. It is the latest installment in the monthly webinar series Data Ed Online with Dr. Peter Akin brought to you in partnership with Data Blueprint. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle of your screen for that feature. And for questions, we will be collecting them by the Q&A in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions by Twitter using hashtag data ed. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days. And likewise, we'll send a link to the recording of the session as well as any additional information requested throughout. And if you'd like to continue the conversation network after the webinar, you may go to community.dataresignet.net. Now let me turn it over to Jonathan for a brief word from our sponsor for today, InfoJix. Jonathan, hello and welcome. Alright, thank you very much and welcome everyone. Looking forward to our talk and the questions afterwards. So I'm going to start off with a brief introduction to InfoJix. Keep it to one slide. And then what I thought I'd do is really just talk about some of the metadata observations that I've had over the years as they relate to sort of the pitfalls and some of the things that Peter's going to talk about as he drills down into the details here. So first of all, about InfoJix, we've been around since 82. You know, that's a long time for a data company. That's mainframe kind of time period and we've still got customers that are chugging away on those mainframes. They're not going to give them up until they can't get spare parts, I guess. And we've got those customers and obviously spanning the gamut and you see the names across the bottom there. And really we're focused on providing products and capabilities to expand the life cycle of the data management challenges, if you will. You know, for what products or what data do I have? How does it need to be structured? How do I get at it? How do I connect to it? That's part of our data 360 platform in the analyze component. But once I have it, how do I catalog it? What's the data lineage? Who's using it? Why are they using it? What are the workflows that support the governance and the collaboration and so on? And that's our cataloging, our governed product. And once you have what you need in the catalog, you can begin to drive some good data quality. And our third component in the data 360 platform is the data quality component or BQ+. And most of our customers are sort of large to mid-size customers. They're healthcare, finance, a lot of supply chain work in that space as well. So large CPG customers and so on. So with that being said, that is what Infraject is all about. And what I thought I'd do is really just drill down now on some of these observations over time. So number one, it's not just about the technical metadata. So many people seem to sort of gravitate towards the technical metadata. That's the easy metadata, right? But the metadata that preserves context allows you to share information, allows you to understand it and create that semantic kind of relatedness and awareness and so on. That metadata is harder to be sure, but it's also the valuable metadata. So a lot of people, this relates to my second point. They don't want to get to that level of detail. They're afraid of overwhelming customers or their stewards and so on. But they don't get the value unless they get to that level of detail, right? So number two is let's not be afraid of the detail. Unfortunately with metadata, it's data about data, right? You've gone to a detailed level and then you've peeled the onion and gone one step further. Number three, don't forget the governance metadata, right? And that's something people forget that quite often in order to create scalability, accountability, transparency and so on, you need some of that governance metadata. And quite often that's the metadata about what's happening in your environment. And so it's kind of metadata about metadata, which is I think one of the reasons people forget it sort of thing. So my fourth one is really, don't hear this so much anymore, but I do use to hear it a lot. And so occasionally, this graphic you have right here is a fairly recent graphic, used to hear people saying I can get all the relationships and all the concepts and everything that are made into the data model. And at some level from a sort of a theoretical perspective, that may indeed be true, but in reality, you know, it makes it very difficult to discover. Is it understandable? Can I get to it? Is it usable? And the answer is sort of invariably no, right? The graphic that you're looking at on the right is from a large Fortune 500 or larger company. It's a good data model that you can put anything you want in there. But you know, there's a handful of people that understand how to get it out, and it's not easy on a good day. So in reality, it becomes very hard to maintain, very hard to build interfaces to and so on. And of course, much easier to deal with that, the metadata level, especially as you move into the big data world. My last point really is metadata is critical to the future, future state vision of what data should be in an organization, right? If you have a digital transformation or an operational transformation initiative, metadata is going to be huge, quite likely, much bigger than the actual data that you are, right? And so get on top of it while it's small, right? And so you see people start out with these digital transformation activities and kind of miss the point that metadata is critical. And then of course you've got my favorite t-shirt right there. Computers aren't going to be able to understand that unless there's metadata. I thought that was incredibly funny. That's one of the funniest data jokes I've seen. So there are a lot of them. Anyway, let me hand that off to Peter with that and looking forward to the Q&A afterwards. Jonathan, thank you so much for this great introduction to InfoJax. And as he mentioned, if you have questions for Jonathan, he will be joining us in the Q&A section after the presentation today. And now let me introduce you to our regular series speaker for today, Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of articles and 11 books. The most recent is Your Data Strategy. Peter is experienced with more than 500 data management practices in 20 countries and consistently named as the top data management expert. For some of the most important and largest organizations in the world, he has sought out his and Data Blueprint's expertise. Peter has spent multi-year inversions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, let me turn everything over to Peter to get today's webinar started. Hello and welcome. Hi, Shannon. Jonathan, thank you for a great introduction. And I would like a copy of that t-shirt if you can find them. So we'll meet up at some point and grab it because that is an excellent joke on there. Welcome, everybody. I'm really glad you joined us today. I'm calling in today from Santiago, Chile. So we've had a great DAMA chapter event down here and finished a couple of hours on the DIMBOK. And now we're going to talk a little about metadata strategies that are really foundational practices that we need to have. So let's take a jump into the agenda. First of all, we're going to start off by defining metadata in the context of data management. And we'll talk about using data as metadata, which is really much more important than looking at metadata as a thing. And I'll finish up that section with a specific teachable example using iTunes. Now, iTunes has actually gone away, but everybody nevertheless is familiar with it. And so this is a way that you can explain this stuff to other people. I find that's always very useful. Then we're going to look at three specific metadata strategies. The first one is that metadata is a gerund, so do not try to treat it as a noun. It's a use of data, not a type of data. Second strategy then is that metadata should be, must be the language of data governance to keep your data governance efforts focused. And third, we need to treat glossaries and repositories as capabilities, not as technologies. I'm sure Jonathan will have some more to say on those as we get started. We'll finish off with some metadata building blocks. There are lots and lots of places where you get started where you do not have to look at a blank piece of paper. I can say blank screen of paper. Obviously I'm getting my technologies confused here. And finish up with some benefits. And then as Jonathan said before, Shannon will moderate a good Q&A section with all of us. So let's jump in and start off first of all with what is it called? We started off calling it metadata. But in the history of language, when two birds are pasted together to form a concept, initially hyphen likes them, so then it became metadata. And somebody actually, believe it or not, patented the word metadata and started to try and make problems with people using it. So that has all passed now, so we are way beyond that. The word is metadata. Do not put it in two parts. Do not put a hyphen between it. It is the way we need to refer to it now. And our first task in metadata is convincing other people that this stuff is important. It sounds like it's esoteric, something that may not be necessary for us. And as you heard from Jonathan, it is absolutely necessary for us. So let's start off with talking about what is data management? Data management is how you get from the source when data is originally captured to when it is used. And in between happens some things. First thing that happens is we end up with a storage for it. I have to explain this to the younger generation. Those used to be record platters and we stored them in a round thing because it's fun. They don't spin anymore. They're all flash drives. But this implies that we need at least three specialized disciplines in their data engineering, data storage, and data delivery. Without those specialized skills, we simply cannot do anything in data management. And then because it is a process, we need some governance around that process to help those specialized teams do what it is they need to do. Even this model, however, is poor because it doesn't well represent what data is. Data's use is not really its best use, but it's reuse. So let me say it another way. Data used once is good. Data used and reused and reused again and again is really where we as data management professionals add value to this. Then, of course, we need to understand how it's used in our environment. And it's going to be used in different ways by different people. And only when we study that process as an organization will we gain enough insight to be able to improve our data management practices. One of the things we've discovered over time, though, is that those specialized team skills and data governance areas are actually much broader than we thought. It's not a narrow discipline. It's a very broad discipline being able to use all of this. So our definition of data management is understanding the current and future data needs of the enterprise and making that data effective and efficient in supporting business activities. There's a value proposition there. I know that Alex has already asked a question about that, so we'll come back and make sure that we talk about your question and all the rest of your questions at the end when we get into this. You may have trouble, however, though, convincing people to do this. So one of the first things I'll urge you to do is to go visit your organization's networking group. Let's pretend that you have 20 people in your organization. And they keep track of things like, are the devices that are in your organization permitted to log on to your network? Where are they permitted to gain access to it? What are the things around that? And there is a named person who will be responsible for this. In fact, if a group of 20 is doing this, you'll probably find at least two of them devoted to the task of making sure that the right devices are making the right types of connections at the right locations under the right circumstances trying to access the right data. And all that said, that is itself metadata. So one of the things you can point out to your organization is that you're already spending money on managing that data in this context because your networking group is doing all of this. Let me give you another perspective on this. The other perspective is I like to equivocate data management programs to HR programs because you will both need them the same length of time. Nobody ever says to your management, I think we're done with HR. And if they did say that, HR would say, no, no, no, we have other things that happen here. Well, you have an HR head and the HR head has some section heads, whatever we're going to call them and some group heads. And eventually we get down to managing the people because they manage it this way. The HR head is leveraging the uses of the section heads of the group heads to manage individuals. So I've worked with organizations that have literally millions of employees, including the US Department of Defense, and they manage it in exactly this fashion. And then if I have to point out that your organization's data is being managed by Dave over here in the corner, he's the guy who says human error in this thing, that's probably not the right number of people to take care of this. So let's look at the prefix meta. Meta means beyond transcending, more comprehensive, and have a higher state of development. That's important. And let's see sort of where this came from. I actually trace a lot of this back to a glamorous movie star named Hedy Lamar. Now there's lots of things that Hedy Lamar has done and there's a wonderful Netflix story out there about her. I definitely recommend you watch it. It doesn't talk a whole lot about metadata, but one of the things that she did discover was that when people were listening to traffic on the radios, and when I say traffic on the radios, we're talking about Morse code signals. So they're listening to somebody go dots and dashes. Even if she didn't know what the content of that message was, she said you can actually discover quite a bit of information about the message. So the imitation game, the other cool movie that came out last year about Alan Turing, was about decoding the contents of the message, but Hedy was looking at from this perspective. When you don't know the content, you might understand the time and the duration of the message. For example, if a message comes on every morning at 7 a.m., it might be a wake-up call or a weather report or a changing of the shift. This periodicity may tell you something about the message. The location of the transmitter. Interestingly enough, I was able to point out today here in Santiago, Chile that one of the reasons the Eiffel Tower is the height that the Eiffel Tower is is because they wanted to make sure a radio signal from the Eiffel Tower could reach Santiago, Chile. And yes, in fact, it can. So that's one of the reasons that the Eiffel Tower is as tall as it is. You can also, by listening to these messages, the specific fist of an operator. Every operator makes the letters dots and dashes differently. And gosh, it sounds like an awful job, but if you listen long enough, you can start to tell them apart. And during World War II, we were able to identify the operators that were associated with certain generals on the bad guy's side. And if we heard that operator, we knew where those generals were. And that was a real well-kept secret in there. Our data about data is a correct and good definition, but it is really insufficient. Let's take a look at a couple of ways that we can enhance that. If you're looking at your information that we've stored from that previous slide, and I say, do I have this specific data asset or this class of data assets, metadata is what tells you yes or no. What is the quality of these data assets? And that may be unsuitable, because you should enhance those data assets. What would be the cost to improve this class of data assets? Well, maybe 35 cents a piece, and somebody would say yes or no, that is a good investment or not a good investment. And finally, can these data assets be improved by providing more granularity? And the answer may be yes or may be no. This is the way we increase insight into how our data assets are and can be used in support of organizational strategy. Now, if you're seeing this for the first time, I apologize. We hopefully will have gotten to you sooner than this, but this is our body of knowledge, and you can see that metadata, even though it is spelled wrong, is absolutely part of our core body of knowledge. In fact, on the new one, it looks just about the same. One of the things I always include here is reference. I know you guys take these things and use them later on, and please do. And if you need other copies, let us know. Of course, we're generally glad to get them to you. So this is one way to describe this. I'm not going to read this to you, but you should have a good understanding of this by the time we finish this particular session. So let's get to my teachable example, which I do like with iTunes. If you remember what a CD was, stuck a CD in your PC. I know that this audience at least will understand that I'll have to change an example sometime sooner or later. The metadata on the left-hand side is the same. The metadata on the CD was kind of problematic because when I stuck the example in there, the iTunes metadata could only understand the number of tracks, the number of songs that were on it, and the links of each song. Well, while that's interesting, it doesn't really help us if I'm trying to find a particular tune. So let's take the example a little bit further. When I stick in a CD and I'm connected to the Internet, iTunes automatically in the background reaches out to access something called the GraceNet Media Database. And each CD has a particular signature, and that signature can be used to identify the metadata that's associated with that CD, pulling down for you automatically over the Internet the CD name, the artist, the track name, the genre, as well as the artwork for that particular CD. Now that's a great use of metadata because it tells you an awful lot of additional valuable information that was simply unable to be encoded on the CD in the first place. Our example here is on iTunes. So let's keep going. One of the things iTunes can do is make a smart playlist. So if I say I'd like to have a playlist containing just Miles Davis songs, just Miles Davis music, then for this album I can type in artist contains Miles Davis. If I were clicking on each of these fields, you'd see there's lots and lots of metadata that you can use in the process. And when I do this, it now allows me to have a Miles Davis playlist. However, notice that I didn't get the results that I was looking for. I thought I had only one Miles Davis tune, and in fact I had excuse me, an album, but in fact I actually had two Miles Davis albums. And that was because I had forgotten about the other one, and it had gotten lost somewhere in there. So I need to fine tune that metadata request by saying I want to have not only the album of Miles Davis, but in fact the album, excuse me, the artist Miles Davis, but the album, the complete birth of the cool, which was the one I was looking at. Nicely, you don't have to worry about that because iTunes does all that kind of organization for you. However, if I wanted to leave it just as is, I would now have all my Miles Davis music on one playlist. Let's take this example one step further. It turns out that the same interface in iTunes, the same iTunes code, the same iTunes data structures can be applied to podcasts, movies, books, PDF files, and the economies of scale are enormous in the sense. And that's generally a good thing, but we've seen an interesting thing happen here literally in the last month Apple got rid of iTunes because this was becoming too complex for some people. And they decided that even though they could do this and did it successfully for many years, they could change and separate these things out. Apple is entitled to make that decision. These are the same kinds of decisions that you all will base your metadata on. So let's finish off this little example here with looking at the iTunes assets. Do I have these specific Miles Davis recordings? I can look and say yes or no. What is my most played Miles Davis recording? Well, it's a song called, excuse me, it's an impolite, but it's Bitches Brew. It's a great song. And iTunes keeps track of that because that is also metadata. What would be the cost to improve these metadata assets? Say I had bought a low quality format of them in MP3 and I wanted to upgrade them. Well, maybe I can do that for two cents a piece. And can I listen to the entire album before dinner? Not easily because it's a long album. Again, hopefully you see here that you can use this to create additional insight about your data, in this case the songs that we have. So let's dive into our first metadata strategy. As a topic, data is complex and detailed and outsiders don't want to talk about this. And frankly, most of them are unqualified because they don't have the requisite architecture and engineering backgrounds. It is taught completely inconsistently with a focus on technology without any concept for the business impact on it. And it's not well understood. There's a lack of standard privacy, literacy, all sorts of things that are there. And most importantly, all of your individual work groups have had to learn this completely on their own. So I'm showing a little movie here. You can see this. It's a guy named Wally Easton. I'm giving you the YouTube link. And Wally has learned how to play the piano by throwing balls at the piano. I say that's just about as useful as learning how to play guitar player instead of learning how to play guitar. Now everybody in your organization is managing their metadata in some format and could become skilled at this. But gosh, if Wally really wanted to learn how to do the piano, get rid of the balls and get a real piano. This stuff is crucial because if people do not understand what you do, then you are perceived as a cost. Whereas if they do understand what you do, what you do is perceived as valuable and is really important to make sure that you understand how to make your good case for this, which really is a value-driven proposition. Let me show you how they did it at Walmart. We've done a lot of things with Walmart over the years. This is not telling you anything secret about them. Walmart has something that they call, called the spark. It's that thing to the right of the Walmart logo. And to incorporate this and to make sure everybody understood it, a wonderful fellow named Brad Melton actually put together their definition of Walmart. Saying for Walmart, metadata is any combination of any circle and the data in that center. It's any data that describes either who or how or what or where or when or why, and those of you that are listening carefully will also recommend, excuse me, also recognize that those are the top rows of this Ackman frame with the columns that we have up there. Another example real quick, whether you use Outlook or whatever email client you use the what for the subject material. You use the how for the priority. Where goes into a user inbox or a personal box. You recognize this. The why is your body. When is your sent in received. And you use different tools, different capabilities that your email client has to make sure the message from your boss goes to the top or a message from Shannon which is even more important than my boss goes to the top of my list. And you weed out the junk that you don't have. Because we organize this, we use these metadata rules to help improve our productivity. And imagine how managing your email box would look if you didn't have these capabilities. Email is bad enough as is, but without them we would be a real, real pickle. Another example here, people often say I don't really get metadata. I don't understand why it's valuable. So let's look at a book. We can arrange a book. And a book generally will have page numberings, alphabetic order of indexes, table of context, lexicons, maps, diagrams, all sorts of things to tell us what's in the books. And we arrange things with the intent to communicate, but we can't actually make information. Users have to do that for us. But so too does the metadata help us. Imagine instead of the book, if I rip the spine off of it, and handed you a bunch of disjoint pages, remove the structure, and things fall apart rapidly. So a definition of metadata that a lot of people use is this data about data. I like in particular a gardener definition here that says metadata unlocks the value of data, and therefore it requires management information. So metadata management is the process to ensure that we maintain our metadata. Another quick example here is something that people will not understand in 20 years. This is called a card catalogue. I know most of you on the call know this. And the card catalogue is how you access the millions of books in your local libraries. It identifies what the books are, where they're located, and what we've done is taken that data, the metadata, and put it online so we don't even have to visit the card catalogue. But when I was growing up, this was all we had. That's just why you say that's why your parents were so ill-informed. Here's another use of metadata around this. Here's PeopleSofts, HR, Workforce, and Compensation Modules. And we've done a little metadata analysis showing that those three modules, Develop, Workforce, Administer, Workforce, Compensate, Employees are clearly more complex than monitor employees to find businesses, target system, manager, EDI, and regular system tools. You can see there are very small pieces of this. And I've taken one of them in Minister Workforce and expanded it out to include Recruit, Workforce, Manage Competencies which you can see are more complex processes than planned successions or trainings, plan careers, and manage positions. Now all I'm doing here is counting the number of components in each of these systems. It's not a great representation but if I don't know anything at all about the package, I can tell a lot from looking at the metadata. We've used this in a number of different ways and really understood particularly that there's a lot of ways in which we take care of all of these data structure problems. Here's another one where we're looking specifically at a system. This is an old mainframe based system and this is the data model from that mainframe system and it is called the Student Database Master and I don't expect you to read it but you can see clearly the upper left hand corner well maybe not so clearly but it is I promise you labeled SDBM. So everything in this data model is connected to that Student Database Master and looking at this diagram from a metadata perspective you can pretty easily see that everything revolves around the student. Now I'm showing you this in particular because the replacement system for this actual system that we reverse engineered somebody proposed to us to use this one and it's just a mess and because it's a mess it's really not something that we can look at valuable so we rejected this system on the basis of the metadata in and of itself. Conclusion for this section metadata is really not best treated as a noun although Jonathan was talking about some very specific types of metadata that are correct technical metadata etc etc it is best not to have people running around your organization and pointing to things and saying is this metadata or is that metadata. Instead it's much more important to say how are we using the data and if you are using the data to derive additional value from your data you are using it as metadata. So the wrong question to ask is is this metadata or what type is this and the right question to ask is is it valuable enough that we should devote resources to including this within the scope of our metadata practices. That is the metadata strategy number one. Number two metadata strategy has to be the language of data governance. A lot of you have seen this example before but it's worth repeating here for those that have not. If I give you a fact 42 and tell you that it was my age 18 years ago you don't care that's data. If I want to make it useful data then I might say I'm presenting the fact that my age excuse me 18 years ago was 42 makes me eligible to buy adult beverages in the Commonwealth of Virginia which is where I am domiciled. I need to take that and make it into information which then the cashier saying to me are you old enough to buy adult beverages. Now I've turned that into information. You can have information excuse me you can have data without information but you cannot have information without data and one of the reasons people get in trouble is because they try to manage these things separately. Now even this is still not enough for us and we need to add another layer on this which is how do our users actually use the data because we know the users they will always say give us all the data. But we need to understand how they strategically use it and once we understand these strategic uses we can then take it a little bit further. This represents an architectural concept arrangement of metadata. All of these are different ways data is used as better data and I've just used it to explain it to you. Data strategy with an architecture in context means that we have using all of these pieces as we go back and forth. The data strategy tells us which metadata pieces in data architecture we need to have and that data architecture actually goes the same with data governance. So data governance is highly concerned with these things. Now a data strategy is absolutely useless without the context of an organizational strategy. Our data strategy is designed to make data use better in support of organizational strategy to leverage our data assets in support of organizational strategy and metadata obviously is a key component of that because we're talking about uses of data. There's some other pieces in Peter's world. These are subordinate to period IT projects and we'll put in some feedback loops but I really wouldn't show most people that. I keep the diagram kind of simple here. And again if we look at data strategy and data architecture or in this case relative to data governance, these data assets should be described in business goals and the language of data governance has to be metadata because only when we're using data governance and metadata can we tell our data stewards what the most effective investment of their valuable time and effort is and they can report back on progress and projects. Let's look at the components comprising our governance community. In all cases I have a couple of less than greater than things. We don't need to worry about them right now but governance is generally composed of some leadership, some stewardship, some subject matter experts and everybody else. And leadership and stewards are typically what organizations use when they put together their data governance programs. Leadership is responsible for making sure that you have resources and receiving feedback from the other sources. Leadership makes decisions and tells the stewards what to do. Those stewards then request action from the subject matter experts and the source users. This involves change and this is why people don't like to do this sort of thing because change is no fun. We get feedback from these folks and we get new ideas into this process which turns into guidance. All of that is metadata and if you're speaking in data governance meetings and you're not using metadata you're going to be off track very, very quickly because what happens in organizations is the data leadership starts out with governance and this gives us the ability to improve data over time. But we also need to have some data improvement projects that are going to fix data that needs to be fixed more quickly than through standard maturation processes. So the data governance promotes the metadata to the stewards. The stewards then propagate that metadata down to the data community participants who eventually get it down to the data generators and users. And now we can start to say data things will happen. Now we want data things to happen but we have not been real good about helping you all as data professionals tie these data improvements to things that happen in the organization. So the symbol I'm using here is approximately equal and the better you are able to tie those improvements to things happening in the organization the better often you will be able to maintain your data. There are several very, very bad practices here not getting buy in, ready fire aim which is buying things too soon or trying silver bullets, trying to solve all of your metadata problems at once, thinking that the middle pass is going to be perfect because you'll have everybody mad at you, making them overloaded on committees, failure to implement, not dealing with change, assuming the technology alone is the answer, not building sustainable processes and ignoring shadow data systems that happen out there. One other thing too I often get offered 10% of 10 people to be part of my data governance. I would much rather have one full person than 10% of 10 people because the one person can get much better at metadata very, very fast. By the way, this little movie on the side here you can find on the Internet I'm looking, I think it's a little video called Data Governance Hell and it's playing Hotel California in the background. We won't go through that but don't have enough time right now but it's a very cute thing, certainly something you might want to introduce to your folks. Third metadata strategy treat your glossaries and your repositories as capabilities not as technologies. It is so important to make sure that you understand that because cyclical approaches do not start with technologies. Technologies will add value after you're ready to use them. Again remember our card catalog, somebody had to, oh my goodness, maintain and build these card catalogs. Thank goodness we are now moving into an area where we can now start to do better things. And again valuable information about these data assets where we can answer the questions that I asked before. Do we have them? Are they available? Can we make access to them? How much of a cost to do this is a good investment of my time and effort into this. Let's take another aspect of the governance piece and this is that we tend to tell people simply to do definitions. Definition, bed, a place where I sleep, right? Well that's good but you can do better. Clive Finkelstein taught me many many years ago that a purpose statement incorporates motivations in all of these things that we're talking about. So instead of defining something and just saying it's a bed, here's an example from a Defense Department system that I worked on at one point in time where they told us that the bed was how we were going to locate the patient. So the patient was going to be part excuse me the patient was going to be located because it had an RFID tag on the bed and every believe it or not in hospitals they lose patients all the time. So where can I do it? I can pop this up on a screen. It was going to be a great thing. Until we pointed out to them a little problem here because the purpose statement told us what we were doing and we discovered that a room could contain zero or more beds. What else had to be a room in the hospital system that we were looking at? And the answer was astoundingly a hallway or an elevator which are two places that people get lost all the time and if I lose somebody in a hallway it's better information but it's certainly not a room and it's not able to be discovered. So we were able to cut off and eliminate some real big overspending problems in the Defense Department area. One final example from a governance perspective that I think is just a phenomenal example I spent a number of years with Nokia which is an absolutely phenomenal company and Nokia had something that they called the NTB or the Nokia Term Bank now this was a glossary that they had that was hand built and anytime that you were sitting in a meeting with this with anything going on and somebody mentioned a term that you did not understand they were trained literally socialized in the organization to immediately turn to their computer and look up what is the meaning of this term in the Nokia Term Bank and they had been using it for 10 years when I showed up I was astounded I've never seen this replicated in any other organization I do know that other organizations use glossaries very well but this Nokia Term Bank had the valuable addition of everybody paying attention to it and if the term that they looked up was not in the Nokia Term Bank there was a protocol for the group to immediately stop and vote quickly to see whether they should represent this item as a new item for inclusion in the next version of the Nokia Term Bank which occurred on Friday afternoons with a little bit of alcohol involved in it. You see the problem is if you go out and buy technology too quickly you will have very uneven conversations. Your customers simply are not as knowledgeable as your vendors are. The vendors are very knowledgeable and they have wonderful products but we've got a way of addressing this technology gap and making it better and the idea is to build one yourself. So you might build something for example with a student activities file that might have some different activities students are doing and you look at this and say hey this is not a problem here right? I can build a little student activities file so student 150 has taken swimming for $50 and student 175 has taken squash and it cost $75. But suppose I want to decrease the price of swimming from $50 to $40. You can see that this data structure is going to be problematic because while I might catch some of these I won't necessarily catch the ones that were misspelled because of bad use of metadata. So a better way of doing this would be to join the tables. Again most of you recognize this and the reason I'm showing you this is because if I join the two tables I will never have a problem with swimming being listed in the student table for the wrong price. The issue has been resolved in fact it's been prevented from happening in the first place. Now I say that because here is a model of what you need to store a data model in your organization. Why would you want to do that? Well here is a system that we built at one point in time for a customer called the FTI metadata repository. And we could, if I clicked on that tables, FTI underscore T underscore ACCT I can find that entity. It describes the entity. I could click on this and look at these pieces. If I click on the other buttons on the screen you'll see that they show some different things. Table for example FTI underscore TABDF is used in this column. And I can take this column and click on the primary key and say where else is it used as a primary key as a foreign key or just as a column in general. Each of these represents another data as a use and it is how data governance should be done. Third piece of this is to look at these components as a technology not in fact as a data. Now I say that it's important to understand that most organizations try to start out with the tables and that's really not a good way to do this. All of these building blocks allow us to start looking at things. And by those funny Dilbert they've got lots of those good Dilberts that are out there around us. But these metadata building blocks are really important because when we use these metadata building blocks we can now start to understand how things are built at the architectural level. Architecture is about things and what those things do and how those things interact. And if we understand those three pieces we've got a pretty good understanding of what it means to look at metadata to use metadata in these ways. So how are these metadata components expressed as architectures? Well we start out by organizing details into larger components. You don't approach a door and say I'm going to grab here and twist it and make it open, you say I'm going to open the door. This means that intricate details can be pushed out of sight when it's appropriate. These larger components are then organized into models and each model then introduces dependencies into your architecture. Finally the models are organized into an architecture and the architecture signifies intent. All systems are built with some type of intent. And these data structures expressed at them means you do attributes into entities that's where you get your intricacy. You organize your objects into modules and that's where you get your dependencies and your models are organized into architectures which is where you get your purposefulness. The reason for this is to introduce you to the concept of design patterns. Now one of the things that's interesting is that when you go into a large building these are all the tall buildings in Perth, Australia. One might ask the question why are the restrooms in the same place in each building and the answer is quite simple. It's cheaper to do it that way. Yes we could have a restroom in each office that would be ugly and awful but it's dependent on gravity and we want to make sure that gravity works which means the simplest path is almost always the best path. Similarly a large house is the same as a small house except it has bigger rooms and sometimes it has more heating and air conditioning equipment. So all of these patterns of electrical wiring HVAC floor plans are various types of metadata patterns and there are some great books that are out there that you can use. I'm showing a couple of them here one by David Marco and Mike Jennings one by myself. David Hay really gets the first book out on this which is his data model patterns of thought. It's a very, very good book and the data resource book series from Glenn Silverstone contains universal patterns for data modeling. So literally this happens to Glenn and I all the time somebody will call me up and say hey Peter do you have a metadata or know where I can find a metadata pattern for an accounting system for a dentist practice and I say no I don't but I know who can and I call Glenn Silverstone up and he sells me it's in book 2 page 17. Now the wonderful thing about his books particularly if you buy the actual books many of them still have a CD in the back that has Erwin models in that you can use to start to pull together these starting places and if you've ever done this you notice a whole lot easier to edit something than it is to create something from scratch. Here's another use of metadata which is a model that you can use when you're transforming between two systems. I have a source system and I have a target system and I need to make sure that I rationalize all the data through this. This data model right here will take care of that and there's an entire article devoted to this if you're interested on how to see it so that you don't have to go back and reinvent the wheel. It's a much cheaper way of doing this and this is one of the things that the dataversity community is super good for because if you have a question like that you can ask the community and the community will be very, very happy to respond. David Hay will respond to you just about any time day or night because he loves to talk about this stuff and likes to see his stuff utilized. Here's the ultimate metadata model IBM I think it was in the early 80s created something called the application development lifecycle information model and they literally have models of all the models that are here. You don't need to invent this stuff. You can go out on the Internet and find it and use it yourself. So don't start by having a blank screen of paper. Instead look at your models and see what you can build on, adapt and modify from. Let's do one more thing before we finish up this section and talk about semi-structured data. Now semi-structured data is typically described as unstructured data that is of course incorrect. You cannot convert unstructured data into something else because by definition unstructured means it has no structure. That said, most data has some structure to it and a better way to describe it is to convert non tabular data into tabular data. So semi-structured data is any data that's not in a database or a data file. People think of it as word documents and they say word documents have no structure. Well, if you follow word guidance and use internal word structures there is in fact a clear XML structure inside that can be parsed. It is useful. It's not perfect but it sure is nice because Microsoft put it in the public domain. Thank you very much and allows us to use this and build on this. And on the right-hand side of this diagram I've listed several examples of structural metadata, the Dublin core in particular, which is the folks in Dublin, Ohio. I was corrected. I kept trying to get to Dublin, Ireland to see this but it's Dublin, Ohio. And the folks there have done a great job of describing the metadata that you need to use to manage larger collections of information on the card catalog in most libraries. So there's lots and lots of these places that you could go to try and find this metadata and use it in a very, very useful way. So we've been building blocks here. Let's now drop down to application benefits. Let's talk about what the benefits of metadata are, why metadata matters. And one of my favorite groups to work with is the electronic frontier foundation. For example, they know that you rang a phone and a sex service at 2 24 a.m. in the morning and spoke for 18 minutes but they don't know what you talked about. Or they know you called a suicide prevention hotline from the center of the Golden Gain bridge. But the topic of the call remains a secret. They know you spoke to a medical testing service and then your doctor and then your health insurance company in the same hour but they have no idea what was discussed. They received a call. These go on and on and on and on. And as you can see, metadata is absolutely invasive. This is a great list. It's a great way to describe how important metadata is. I can remember even when he said at one point in time during his term in office, it's okay. We're not listening to your phone calls. We just have the metadata. We'll look at the types of things they have about your metadata. And the really sad thing about this is that most people have no idea what's going on. If you'd like to look at this from a sociological perspective, I recommend hardly this book, The Age of Surveillance Capitalism by Shoshana Zuboff. She has done a marvelous job describing a hierarchy. Who knows? Who decides? And who decides? Who decides? It's a great book and it's all about metadata. I'm going to change here a little bit and I'm going to play you a little video here. This is an advertisement for a company that doesn't exist anymore. The company was called Invera. Invera was an interesting company trying to connect things. Now I'm going to play the little video and narrate here how they actually describe this. So let me give you the business proposition and of course what you'll see is that it's all based on metadata. So companies talk to each other in a variety of different ways and they do it through a different means. So here's a long list of things that these two companies might exchange data about. And they can do it via phone facts. Email, electronic data interchange, as well as all sorts of things that we can do. Oops, I'm sorry. I get the wrong button there thinking it was in the head. We'll go back and replay this a little bit. So again, they can look at these things. These are all the different types of exchanges that they have and these take place in various different mediums, phone facts, email, and electronic data interchange which was hot at the time. Don't worry about it. Now the idea is still a good one. And as you do this, company A talks to company B and company A talks to company C and D and they also talk to their banks and their freight company and they put product data in their product catalogs. And of course B talks to A and B talks to C and so they all talk to each other. And yes, that's a mess. So in various business proposition was that they would at the metadata level connect different companies with these different means. And if they could do this, the amount of time and effort it take for all of these companies to communicate. It was a brilliant piece of work. A lot of my friends worked in there. The president of Virginia Commonwealth University was on the board. This was how important a product it was. Unfortunately, the company that ended up buying them didn't understand this and so literally all of this wonderful technology is in the basement of an office somewhere in Dallas, Texas right now. So this is a case of underappreciated metadata in this case. They also had a business proposition that they used. They like to say that if we do this we will have all this information because we will know what everybody's logistics, everybody's demand planning, everybody's order fulfillment will actually take place. Again, a really, really cool way of looking at uses of metadata. Now, we shouldn't pay attention to metadata for a couple of reasons but one of them is, especially if you work in government, a brand new love. Now, this is a really interesting thing. While the government was shut down last year, some of you remember that, sorry, the US government was shut down last year, this bill popped out on January 14th. It was kind of interesting and they had some, you know, what does this bill do? Well, it requires federal evaluation activities and improves federal data management, yay, and of course some other stuff too, right? Well, when we talk about this, there's some really interesting things in here. For example, was there any opposition to this? And part of the opposition was here are 108 data elements that just flashed by on your screen that now may be considered open data and accessible. And people generally don't like that. Here also is a little video giving you the reference to it. I'm not going to play it here for you. But it's a very professionally produced video that a couple of people talk about the invasiveness of the federal government snooping on us. Now, I hope you understand from the example I gave you previously that they already have a lot of this information about some types of communication. This would expand the amount of metadata that they are able to access. It's kind of interesting, it's worth going out there to see what this particular group was afraid of. And they were actually probably correct in that it is a dangerous piece. But we're now governed by a law. And in previous attempts we weren't governed by this. So, first question everybody asked is, were they successful? No. This passed the House 356 to 17. It passed the Senate unanimously and is signed by the President, as I said before, on the 14th of January of this year. Which means it went into effect on the 14th of July this year. There aren't many things that we do in Congress that pass with these kinds of margins. This is almost certainly unanimously in the Senate. And I'm pretty sure that most people who were involved in the process did not understand what they were doing. But let's take a look at what it does. All federal agencies are required to manage data according to industry best practices. And they're using our DIMBAC as the best practices. All federal agencies are required to regularly analyze the data and to use those results to inform policymaking. Well, that sounds like motherhood in Apple by no problem. On the other hand, not a lot of people really get this. But what it really means is that in advance in the future, as people are doing this, making decisions for federal government policy. Again, Agency Director A decides that Policy B should be enhanced in certain ways to enhance the value of the goods and services that the government is helping the private sector produce or do or whatever it is. They have to specify in advance which open data sets the policy will consider. That is a great use of metadata. I'm going to use this data to evaluate let's just say school types. Well, here's some data and here's how we're going to evaluate it. They also have to publish in advance the model that is proposed for the evaluation. And the policy can only be changed in ways that are supportable by that evidence. You now have to be able to show that that evidence exists so that you can make this policy change. You cannot simply bring it out of the thin air anymore. Title II of this says that all data in the federal government is now open by default. That has already occurred. They are responsible for making the data open by default. The agency can say, okay, we'll come and get it. They actually have to provide it. And it requires that anything that is nonsensitive be made available and in machine-readable formats by default, which means no more PDFs from the government. They start the process of doing a data inventory in a federal catalog, developing comprehensive assets because these assets are public data assets and they should be used by the public in the ways that they are intended to be used. One last piece of this, Title III on the whole thing, makes mandatory the use of chief data officers in all federal agencies. That role is distinct from the CIO role that the federal chief data officers must be non-political and they must have objective qualifications to pass, like a CPA exam. And it sets out very specific responsibilities for the CDO. So there is no more arguing at least within the federal government space. And our opinion is that they've done a good enough job that these are adoptable and should be adopted by the rest of the world. The last part of this is designed to improve everybody's confidence in the types of data that's being used. And what really happens here is that they've taken a bunch of agencies and put them together and said you need to work even work closely together. You need to operate with the reliability of a Rolls-Royce jet engine. And I'm not sure why somebody decided to take a Rolls-Royce jet engine and put it in their backyard and turn it on, but it makes a nice video. Having nothing really to do with metadata on this. So these foundational practices are absolutely critical. There are seven specific benefits here. You can increase the value of information. You can reduce training costs, lower the impact of staff turnover, reduce this research time by not just business analysts, but all of your knowledge workers will benefit from the improvements in metadata. Improve communication between business users and IT professionals. Increase the speed of system development. Reduce project failure risk. And most importantly, reduce the amount of data wrought that you have. Data wrought is data that is redundant, obsolete, or trivial. So as we're getting ready to finish all of these up, we've now done this contextually piece here. So I'm giving you some definitions of metadata. I've said there's three specific strategies. Do not treat metadata as a thing, but as a use of data. Make metadata the language of your data government's operations. And treat glossaries and repositories as capabilities that you have, not as technology, because we all know that technology, people, and process are required to solve problems. I've given you some metadata building blocks around this, and giving you some benefits and applications in this. So while you're thinking about some questions for Jonathan and I, which we'll get to in a second, I'm going to do two more minutes of wrap up on this just to give you some time to start thinking about the questions. All right here, so let's get to our wrap up. And that believe it or not is slide number 98. So take away. Again, remember our metadata uses are exactly the same as our data uses. Metadata is data, therefore it needs to be managed with the same rigor that we use to manage our data. Data about data is not really a good definition of it, but it's a good one for an elevator speech. A better definition is that metadata unlocks the value of data. And because it is talking about value, management needs to pay attention to it. Management is much less about what and much more about how. Metadata must be the language of data governance. I can't tell you how many data governance programs get off track because they don't have the same vocabulary. They are not using words the same way. They do not understand how all of this stuff goes together. And metadata really finds the essence of most business challenges. Oh, I forgot to mention one thing on the FEPA law, the federal process law, the penalties for violating the federal process law are higher than HIPAA penalties. So the real question is, should we include this item within the scope of our metadata practices? And the answer is yes, if it provides more value. Value that is worthwhile. With that, we're right at the top of the hour. I'm going to turn it back over to Shannon and invite Jonathan to come back on so we can start to talk to you guys. Peter, thank you so much for another fantastic presentation. As always, if you have questions, feel free to submit them in the bottom right-hand corner in the Q&A section of your screen. Just a reminder, I will send a follow-up email by end of day Thursday for this webinar with links to the slides and links to the recording and anything else requested. So, Peter, and Jonathan, what is the best way to present Dataflow to front-end users? I have explored the possibility of process diagrams, but it tends to get messy and cluttered. Can you recommend another way that is more friendly to users who are not experts in quality? I'll go first and see what Jonathan has to say about my answer, but I'm appalled because for the last 20 years we have neglected a very important tool in teaching our students how to do data, and that is something we call a Dataflow diagram. Many people are simply unaware of this because it's not part of, quote, object-oriented methodology. Remember back when object orientation was going to solve all of our problems? I know most of you aren't old enough, but believe me, it was a time when we said that. And we stopped teaching not only that Dataflow diagrams weren't important, but the case tools. We don't even teach students the existence of a case tool. Oh my goodness. And we've done this for 30 years. It is absolutely appalling and I'm on the inside throwing rocks inside of the last house, but I'm still going to throw them because it is absolutely crazy. Jonathan, what are your thoughts? The whole discussion of lineage I think is interesting because you almost have to nail down what kind of lineage. If the use case is driving it, if you're trying to do source-to-target lineage, that's one perspective. If you're trying to do a control model that might be appropriate as an auditor or as mapping your data to regulatory requirements, for example, you're going to have a slightly different kind of lineage and so on. I think I don't have a good sense of the tools. Of course, I work for a tool company so start with us I guess, but a lot of the tools will have a lineage diagram of some variety. In our case we have multiple different versions of that, but the challenge then of course is that you're creating confusion again unless we have to watch out for it. So find the use case define what you want to see at the level of granularity that makes sense. If you're doing lineage and you need to validate ETL jobs, you're at a very low level. If you're just trying to identify where to put a control point you don't have to be at such a low level. Jonathan, I'm sure your tool set does include these data flow diagrams which are a very good way of describing how the tools go in, but let's talk about the purpose of a metadata exercise. Many people think that the goal of metadata is to start with the A's and go all the way through until you've gotten to the Z and then you're done. Well, that doesn't speak at all to organizational resources and strategy and current challenges. If I were to say, for example, I need a data model, I put one up on screen because you mentioned the source and target mapping and this is one of the ways to use this diagram in that context. Most organizations will try to be completely comprehensive about it, but if you're only moving certain types of data, you don't need to map all that data right now. Yes, eventually it would be good to have it, but the most important thing is that as you're going through and doing all of your various activities, whether it's integration or modeling or creating new systems, make sure that that metadata is kept in one of the tools that Jonathan was talking about so that you can access it later on. I keep every piece of email that I've ever written or saved since the early 1970s. It's all on my computer and what that means is I can walk back into a program actually I gave a program here in Santiago, Chile in 2003 and the only reason I knew that was because I was able to go back up and look at my email and in fact I met a gentleman that I had met the first time I was down here just a few hours ago who had been to my first session that was down there. These are all uses of metadata and it's important to understand that we have a number of tools and that your students have not been well served by their university education in this area because the word metadata typically never comes up in most undergraduate programs and graduate programs and if it does Jonathan used a term called lineage. If we're in the academic world we would call this provenance. Well I'm sorry that's just BS. We don't need a word called provenance even though it does mean the same thing as lineage but it's confusing to the users and they have enough trouble with it if it is. So take a look at data flow diagram techniques and see if that doesn't help you but also don't look at the idea of applying these techniques in a comprehensive form look at it as doing enough to solve your problem and then store that information so that you can come back and build upon it later on. Jonathan anything to add on that? No I think that nails down I mean it starts small think iteratively and start with the use cases because if you try and get the whole flow down right away it'll be overwhelming. And to look like a science project I couldn't agree with you more. So what do you say to people who believe that artificial intelligence and machine learning will eliminate the need for metadata? Well that's an interesting conundrum. Again Jonathan sorry for jumping in here I would be happy to let you take somebody's but I've got a passion around this particular topic that maybe the rest of you are not as familiar with. So it turns out that with artificial intelligence the big problem is not developing the algorithms the algorithms are out there. The big problem is that we have a incredible lack of data to train these algorithms on. So for example if I'm trying to train a pattern recognition system a visual system that can recognize real objects and photographs the dataset that we use to do all that training on contains exactly one concept, excuse me one slide for the concept of a bride it's a white woman with a white veil and I think we've left some people out of that particular description. So there is not enough data I mean of quality that you could use to train these algorithms and there is not enough quality data around to train these algorithms except for the one area in machine learning that is using metadata as its inputs to train the algorithms. There are wonderful advances being made in machine learning because of this metadata and guess what if you're extracting the metadata from your physical as is systems that metadata will be true. We do not need to worry about the veracity of whether something is connected to something else if I've just read it off the database catalog we know that's what's in production. So it's a very interesting question. I don't know if somebody is trying to set me up for a rant on that one or what but outside of the area of machine learning being able to learn from metadata the big problem with artificial intelligence is that we have not sufficient data set to train the algorithms and the metadata is providing the best set of data for training our machine learning algorithms in that particular area. Will it replace metadata? No but it will absolutely increase the leverage with which our metadata engineers can apply their trade to our business problems. It's interesting that it's one of your things there Peter because back a few years ago I was involved with a group of people that were doing forward work and the sort of mantra of the time was we don't need anything the algorithms will figure it out and then life will be perfect and of course theoretically we get back to that theory versus practice if you have enough data and the environment you're trying to analyze stays stable then theoretically you can get to this point the algorithms will group things and they'll self organize and so on and essentially create their own metadata the reality though is especially in forward this was in the financial space the data doesn't stay for the adaptive the moment you figure it out the people that are committing the forward will change what they're doing and you need such a large number of data to train these algorithms that it really is impractical so in many respects AI and machine learning relies very much right now on very well labeled data or certainly works better with well labeled data and so in that sense I would say it's it's even more important going going forward and in fact simply on the basis of there's too much data to actually move around most of it happens in the metadata right you don't move the data you just give it a different set of tags so in that sense you know metadata becomes very important absolutely can you talk to the role of knowledge graphs in this landscape discussion about data model and metadata makes me wonder where and how you see knowledge graphs belong I'm not qualified to answer that question Jonathan the question is how I see knowledge graphs I think like yeah and presumably that sort of versus a relational type of data structure you know there are things that you can do with knowledge graphs in terms of how you query them and I'm not super technical in this respect but it's quite often a lot easier to deal with semantic kind of relatedness many to many type relationships in a graph database world than in a relational world so in that sense I just see them becoming more more popular I have to say that in many respects there's sort of a combination a hybrid kind of world evolving where certain information is stored in a graph database and then managed or dealt with sort of transactionally in a traditional relational database but it's not a other than being critically important going forward it's not something I haven't been involved with recently enough to have a strong opinion there and Shannon don't you guys want a conference in that area to cover some of those topics yeah we just had a forum in Chicago in October we did indeed I might be getting a hold of Shannon and asking her to take her teaching and point you to some of those upcoming events and learn more about them I'm sorry we couldn't help more yeah yeah indeed there's definitely additional information on the website on dataversity.net and you know of course yes so okay so this is could be a loaded question but you know if we can but I think we can do it without going down the rabbit hole of politics hope there hope there regarding open data and government do you see it being made open to US citizens or in the whole world is there a chance that a peer competitor could use our data to garner an advantage that would ultimately work against us or against us as a whole anything we'll start on that one man that's an interesting thought most of the open government open data that if you go to opendata.gov or one of those sites has been sort of more scientific in nature or demographic in nature so I haven't really thought about it in terms of sort of competitive competitive world but yeah I just at this point I guess ultimately one could have scenarios where that happens but at this point in practice it's been fairly it's for the sociologists and the economists and the weather people and so on I haven't seen anything get out there that's you know NASA hasn't put the latest design for the next generation rocket out there for example and that's a very good point there are specific vetting procedures in place however I do have privy to some of the numbers and it turns out that there are as Jonathan said lots of very good uses in the way he described the metadata but the Chinese and the Russians individually each outweigh all of the accesses to the other types of data that Jonathan has described so if you look at the uses of open source metadata on data.gov the biggest users are the Russians and the Chinese and that's got to be something that's got to scare us so I think the questioner raises a very good point and I can tell you that people at the highest levels of government are in fact grappling with that question because we do want an open society we do want people to have access to these things but gosh if we're doing all this work and we're making our competitors richer that's something we need to take into account as well well done guys thank you can you go into more detail regarding the Nokia term bank that was discussed earlier how did people agree as to what terms needed to be better defined we are trying to implement a similar group in our organization super so the Nokia term bank was something you have to understand the culture of Nokia this organization is an amazing amazing culture first of all just if you're Finnish you have a 2% chance of running into somebody who is Swedish and so because they have a 2% chance of running into somebody that they would be speaking Swedish instead of Finnish they train the entire country to be bilingual which is a really good idea for societies anyway I can't tell you how embarrassing it is that I'm here in Santiago, Chile with three years of high school Spanish trying to ask questions and make my way around I'm terrible at it and I should have done more homework to come down here I'm not going to be however said we're not even just going to stop at two languages if you want to be involved in the business world the language of business is English so they mandated English language for all of their hires and all of their business meetings in Finland were conducted in English and it was against the rules to speak in Finnish this is a way to import this into culture and as part of that culture a lot of the questions were about what do these words mean because as you might imagine to have different things I was at lunch today and I said KS what is this? and they said Coca Cola and I said good but anyway the Nokia term bank was something that they said this will help us overcome the disadvantage that we have with English not being our first or second language but in fact being our third language so they made a very big concerted effort they trained everybody how to do it this was outside of data governance completely although it is the first place that you should start with your data governance efforts and what they did was they built their own system it was literally a lookup table with definitions that would come out and it was very quick and lightweight and easy to implement it was probably not good enough to be called a metadata repository which is why they called it the Nokia term bank and when they were in meetings they would literally stop the meeting if somebody said something that somebody else did not understand and they all looked at it but I can tell you this is the way it worked and if they did not have the term in the term bank they had a protocol where they would sit down and say do we think as a group that this term is important enough that it should be considered for inclusion in the Nokia term bank and if it was they popped up another form on their web browser and they filled out a couple little bits of information said who they were, when they had done this what the information was very very quick two minutes literally to do this and sent it off and once a month the Nokia term bank committee would get together as I mentioned they had alcohol and sit down and look at all these terms and they would see what had been submitted if the same term was submitted by a bunch of different groups obviously it would rank higher for inclusion and then they would assign people to write the definitions for this publish the next version and send back a reward to the people who had actually submitted the term so there was a reward system also for improving the knowledge of the Nokia term bank I can tell you that when Microsoft got a hold of it they said what's this we don't understand it and got rid of it but the Nokia people kept their version of it of course and they're reusing it now What are some of the big changes in metadata governance strategies when you come across an organization that is slowly moving towards big data and cloud A lot of the old guidance was that you had to be comprehensive but things weren't useful enough until you had everything in it and I think Jonathan agrees with me on this the use cases really have to drive what you're doing because while it's valuable to have everything written down it takes a lot of money resources time to do this and most organizations will not see a positive return on investment from being comprehensive but we'll see a positive return on investment by investing in the core infrastructure some of the technologies that Jonathan has talked about as well as training their people on how to use this now it's probably not the case that most organizations are going to pay as much attention to culture as Nokia did but I do know that there are lots of organizations that pay attention to culture I can remember when I went into the army and pointed out to them that something was not governed in the army space they went oh my gosh we can't have something ungoverned what is it and I said data and they went okay let's fix that there wasn't any argument any discussion at all because the culture of the army is also one of governance so very important on that anything to add Jonathan just the whole iterative I mean you sort of grounded it in the use case I would also say use cases and then capabilities and I sort of think of those as the app tech right so there's sort of a broad set of things that need to happen in the big data world when you land data but iterative in the sense that with the tools nowadays you can begin to tag things very lightly because you don't you know you need it you know you want it you just don't know really how you want it structured who's going to use it and so on so you've got to tag it pretty lightly so you know who to call for example you know when to roll it off I worked in a place where there was I think 25 meg terabytes that no one had used forever and the guy in the data center was getting so frustrated he didn't know who to call he didn't know why it was there and he just got rid of it and he said I'll just deal with it when someone screams but so there's this notion of tag it very very lightly and then I think also you need to talk about the epic or the larger capability you can't you need to bring that data board and tag it in the context of some larger use case or class of use cases because you don't really know what it's going to be useful but you know it's it's marketing data right if you're bringing a pilot Nielsen data for example then you know that's the kind of thing you're going to have a lot of you might have overlapping versions of it but you at least need to know what the product people are going to own that and they're going to use it for you know new product development or whatever it is but tag it at that level and then as time goes on you can begin to get to the structure and the business rationale for it and what kind of report it goes into and so on and Jonathan you just reminded me too another thing I hate to be touting that the federal government is doing a good job but in these areas they really are one of the other things the federal government has come up with is something called UCI uncontrolled sorry classified uncontrolled CUI that's it classified uncontrolled information it's a group out of the national archives and they understand lineage they understand metadata and they understand the process of tagging and they have developed a tagging standard that is applicable to all organizations they didn't intend to do this they developed it for the federal government but they made it useful enough that any organization can adapt it and if you're interested in this subject I encourage you to look it up uncontrolled unclassified information CUI tagging if you google that phrase you will start to see the work there it is very very good work the background on that it came right off the 9-11 they were like how do we share information between industry and the government and it turns out that the CUI handles not only what you can do with it but who can see it so it sort of splits it up from the traditional approach but yeah I spent some time back then dealing with some of that Peter and it was quite fascinating but it's and I didn't know that was the origin of it so great we're all learning this is wonderful starting a data governor's program and beginning to capture the organization's metadata do you have a recommendation as to where you would start capturing the abundance of metadata across the organization beyond crawl walk and run I'll turn it over to Jonathan let him give you some guidance there sure love that I'll say it this way so I companies struggle with this all the time and I think it depends a little bit on what kind of company you are what kind of problems you have and so on that's grounded in the use case and where the pain is make the pain go away kind of thing but what I tell folks is you nail down a critical business process or set of processes back your way into what the critical data is off of that kind of a traditional critical data element exercise and then really ask yourself the question what do I need to know to manage this data across the either the data life across the process and then across the data life cycle and that'll land you at a point where you can define life stage taxonomies you can define the various reference data label or consistent labeling schema that you might want to apply to your data to manage it across those two dimensions to support the business process and then to support the data life cycle process that tends to be how I do it unless if it's a big data thing and you've got a particular problem then it's slightly different I think because you're managing blocks of data and at least initially you don't care about what's inside them right you just need to get on top of the big old blob that came in from you know I'll pick on Nielsen but pick Bloomberg a lot of the third party data tends to have this characteristic because it lands in a big pile and quite often you only need a bit of it right so the you know down in Bradstreet you need a few pieces of information about the company but they might be delivering the whole file to you because the sales guy's really good so in that instance you have a lot of stuff you go to wheat through that helps absolutely and somebody had asked the question right at the very beginning I saw how do you value this and this is I think what Jonathan is heading towards as well you could be more comprehensive about it but it will cost you more so the question is where do you value this how much will that speed up the process and I showed you the little PeopleSoft example in here we were able to demonstrate that that use of metadata for PeopleSoft meant that you could implement PeopleSoft much more effectively with the metadata than without it I know that sounds obvious to us but believe it or not it wasn't obvious to PeopleSoft I literally had a catalog of all the PeopleSoft metadata that I kept trying to give them and say include this with all of your versions that you send out to people and they said well I just don't think our customers are interested and they gave me a challenge and said put 25 customer names on a list that would want to see this and we'll start to include it and then said where we'll buy them so and the story fleshed that out what you're seeing nowadays I said we did a lot of work in the ERP supply chain space but you're seeing companies buying one another so they ended with multiple SAP systems or KD Edwards you know one of those big big systems and all of a sudden they're now combining that and trying to put data into a data warehouse or doing the analysis so a single view of customer you've got salesforce.com that's creating an interface that those systems just weren't ever built to handle so all of a sudden the data is leaving I mean the reason PeopleSoft didn't think they needed metadata was no one ever took the data out of PeopleSoft if you go back far enough likewise with SAP the moment you take it out then you need a lot of metadata and of course the moment you begin to layer in regulatory GDPR you know on the personal information side HIPAA etc etc then you need that each of those is an additional layer of metadata yeah go ahead you just had on one of the real important things so even in Chile they are getting ready to implement a GDPR very much like the California provision very much like GDPR this is absolutely all about metadata and if that's an issue in this just a GDPR three times and your management will get scared you don't have to dance at the same time and you know it's real easy I got to stop laughing so I can get this last question in we just got a couple minutes left so short answer do you consider a data dictionary and glossary different metadata repositories I would guess different capabilities but they can be combined into similar technologies I think that's a good way of putting it you know they're linked I mean I guess that should be obvious so they should be linkable if they're different data repositories and I guess fundamentally if you think about it in a large enough sense you're always going to have glossaries that are potentially external to your environment right when you sell the Walmart you're basically consulting one of their glossaries to map down to your product offering if you're a vendor to Walmart for example so at that point you've got an external glossary potentially mapping to an internal one which is mapping down to the dictionary physical location of that data so it can go both ways if you're you know things like concept systems and semantic concept systems quite often will be external to your to your environment and those are just larger referential repositories of glossaries in many instances absolutely Jonathan it's been an absolute pleasure to work with you again and thanks InfoJax for joining us on this particular thing Shannon thank you Peter thank you and Jonathan thank you for joining us sorry Jonathan I interrupt to do there no that's that's alright this is a favorite topic it is not a simple topic and so it's good to see we're exploring it and I learn something every time I do one of these so I appreciate the opportunity and so do I from a wonderful community for keeping everybody involved you guys are taking away all my lines it's awesome we're just trying to get you off track a little bit that's good I love it well again thank you both for this great interaction and Q&A and thanks to all of our attendees and just a reminder I was on the follow-up email by end of day Thursday to all registrants with links to the slides and links to the recording and as Peter mentioned thanks to InfoJax for sponsoring today and helping to make these webinars happen we appreciate it thanks everyone hope you all have a great day a great day we'll talk soon thank you