 Hello and welcome. My name is Shannon Kemp and I'm the Executive Editor for Data Diversity. We'd like to thank you for joining today's Data Diversity Webinar, Metadata Strategies. The latest installment in a monthly series called Data Ed Online with Dr. Peter Akin has brought you in partnership with Data's Blueprint. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right corner for that feature. For questions, we will be collecting them by the Q&A in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share our answer questions by Twitter using hashtag Data Ed. To answer the most commonly asked questions, as always, we will send up a follow-up email to all registrants within two business days containing links to the slides and the recording of the session, as well as any additional information requested throughout the webinar. Peter, for some reason, your slides went away. Now let me introduce to our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide, including at our recent EDW Enterprise Data World 2016. He has more than 30 years of experience and has received many awards for outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of articles and eight books. Most recent is Monetizing Data Management. Peter is experienced with more than 500 data management practices in 20 countries and constantly named as a top data management expert. Still can't see. That's really weird. Some of the most important and largest organizations in the world have sought out his Data Blueprint's expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense. George Banks, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. He often appears at conferences and is constantly traveling. Peter, where are you today? So we're getting ready to go up to Philadelphia and do the day of the chapter up there, Shannon. Hopefully the screen's back up, though. Are you seeing anything? It is. Okay, now I just got to get it to advance. We actually had something that's decided to start dinging on me here. Anyway, hi, everybody. Give us a quick second and we'll get this thing working. As usual, last-minute stuff. We had everything working perfectly. And then it decides to stop. But anyway, our topic today is metadata management. And the idea here, as you heard Shannon and I talking a little beforehand, is that this topic is sort of getting resurgence. And there's a lot of confusion around it. So one of the things we wanted to do is to start to clear it up. And I'm going to start you off with a little quote here. Let's get the screen back up. There we go. Perfect. Somebody's saying they can't hear anything. Hopefully they can hear us. I've been trying to chat there. There we go. There we go. All right. So I pulled this quote off of a book I'm reading right now. It says, if we're to make the best and sanest uses of our resources, we must first adopt a sober view of our capabilities that permits reasonable things to accomplish, foolish things to be abandoned, and utopian things forgotten. And I'm now showing you the ultimate in foolishness. This was metadata management in the late 1990s, early 2000s. I've got a whole history of metadata management that we can talk about if you want to catch me at an event sometime. But the idea was IBM put together something called the AD cycle, and it's the full and complete perfect set of metadata. And the problem was everybody realized that if you actually put this together and had perfect metadata management, about the closest you could come would be looking at the organization as it looked five years ago. So our good friend and colleague Dave Eddy put together a sort of, oh my gosh, this stuff is dead. Because people really do not see achievable investments in this. And I back him up on this. What we see is about one in 10 organizations actually being successful with metadata. So to try and overcome that and to help all of us collectively get better, the webinar today is going to talk about an overview of data management that I always start with. We'll then dive into what is metadata? Why is it important? We'll talk about metadata types and subject areas, benefits, applications, and then we'll get into some strategy pieces that lead us to some building blocks that we need to put in place. I'm not going to go, and many of you know these webinars. I give you a lot of additional reference information here. So I have some guiding principles that we do. And I always like to finish a metadata talk with a specific teachable example so that if somebody really doesn't get the theory of what you're talking about, you can show them something on their computer very, very quickly. So let's dive in and talk again what is metadata management in particular. It's the idea that data needs to be managed between when it's captured and when it's used. So we've used this definition for a while, and it involves data engineering, data storage, and data delivery with a component of governance. Each of these has their own entire topic area that we do. And understanding that these are specialized skills, we need them to build larger and larger sets of teams and capabilities organizationally in order to do this. The problem with this, however, is that this does not represent the true power of data. Recalling that it is our sole non-depletable, non-degrading, durable strategic asset, data's real use comes when it is not used but when it is reused. So we've jiggered this picture a little bit to try and give you a better view here, which gives you the idea that these resources come together and they support various things that the organization is doing but if we don't have feedback mechanisms and we don't look at how well it's being performed, we have no ability to make it better. So our data management perspective is much, much larger. And the framework that we use to improve this is something we call the capability maturity model for data management or the data maturity model. And it's the idea that if we manage data coherently, if we manage the assets professionally, if we make sure that the data is in fact fit for use with the right architecture and the right lifecycle approach and the right supporting processes, then we can in fact try to get better about it. Again, each of these slides has an entire webinar on it that we can do at some point in the future or that you may have already done. And remember Shannon has all this stuff in her archives as well. A couple more pieces on data management. It's an awful lot like Maslow's Hierarchy of Needs, which some people remember from high school. The idea is that we have these wonderful advanced data management practices in the golden triangle, but they really represent just the tip of the iceberg. And that if we don't pair them and build them on top of foundational data management practices, we have no ability then to do things well. So we want to put these all together and try to do a good foundation so that we can then have these other pieces onto the top of this. Because most of these other pieces are technologically focused, whereas notice the bottom is capability focused. And again, people ask us all the time and say, well, I understand that, but I really need to get to the stuff in the golden triangle. Can I just do that? And the answer is yes you can, but it will take longer. Cost more. Deliver less and present greater risk than if you do it the other way around. Metadata management is incorrectly labeled here on our data management body of knowledge. If you're seeing this for the first time, this is actually progress. We didn't have this until recently, but we've now put it out. And it is the first guidance that we have in the data management community. And in metadata management in the chapter on it, it starts out with this input-output diagram. So again, this is just review and reference material for which you can see there are inputs, there are processes, activities in the middle in the teal, and various deliverables on the right-hand side of it in order to get this. So this is just to give us context on this. Now, to get to why is metadata important, we have to first ask the question, what is strategy? And strategy really is a pattern in a stream of decisions. In other words, when people think about what they're supposed to be doing, we want them to think in terms of a pattern. And one of the best examples of that is Walmart's business strategy, which every one of you listening to this call knows, every day low price. Now, you know that, the customers know that, their business partners know that, and all of the Walmart associates know that because it's simple and it's easy to do. When they need to guide things, they are guided by that pattern in their stream of decisions. Now, from a strategy perspective, we also need to understand why is this spelled funny? And we started out putting two words together, and we put a hyphen in between them, but you know what, we've been using it long enough now. We're going to call the term metadata as one piece. So what is metadata? Well, let's go to the prefix meta. Prefix meta is definition number four, beyond transcending more comprehensive at a higher state of development. So data that is beyond transcending, excuse me, transcending more comprehensive and at a higher state of development than regular data. All right. So it's everywhere. I'm going to quote a couple of Gartner things in here that are interesting on here. The first definition that we like is data describing facets of a data asset. Hold that thought for just a minute. Their second definition is actually much more useful. Metadata unlocks the value of data, and therefore it requires management attention. Wonderful Gartner definition from 2011. Therefore, metadata management is the set of processes that ensure that metadata can be used to unlock the value of data and that it receives the proper management attention. Now, this really dates me and many people listening on the call here as well. We recognize, of course, the ubiquitous card catalog from the days when we used to go to the library because that's where the books were. And in the card catalog, what we kept were different things. For example, it identified what books were in the library, where they were located. We could search by various subject areas or authors or titles. We could look at the catalog and see how these things were listed. We could look at things like the publication date, the version history. We could determine, without grabbing the book, whether the book would meet the user's requirements without the catalog, finding things as difficult, time-consuming, and frustrating. So we take this a step further and say that metadata in the card catalog is managed by a data management environment, and the metadata are the descriptive tags of the managed environment. We chose the business and the technical users where to find the information in the various repositories, and it provides details where it came from, how it got there, what it should be used for, again, any details that are necessary. Now, remember the Gartner definition I gave you a second ago here talked about how this worked. Here's a group that we worked with at one point in time that said metadata is any combination of that circle, the data in the center, and anything on the outside describing who, how, where, when, why, or what. Again, an OK representation makes perfectly good sense. Brad Milton came up with that a couple of years ago, and it's been a wonderful way to use it. If we look at this from the library perspective then, the library tells us who the author is, what was the title is, where's the shelf location, when was it published, et cetera, et cetera. If that's not hard or that doesn't mean anything to you, think a little bit about how you would use your inbox. If you didn't have the ability to say what in the subject line, or how in the priority field. In other words, a message from your boss might be more important than a message from a colleague in there. The where comes in the user ID versus personal, body of the text is the why, when, sent and received. You use this metadata to filter out the important stuff and weed out the junk. You use rules and outlook or whatever email that you use so that you can do this. Again, imagine if you were trying to manage email, which is already a non-trivial problem. If we didn't have the ability to make use of the metadata, such as to or from or other things like that, it would be crazy and nuts. Now, I've given you a couple of definitions around metadata. Let's back off for just a quick second, because metadata really isn't a noun. It's more of a verb. What it does is it describes use of data, not a type of data. We'll come back to that in a second, but let's just look at a couple of examples. Again, not a terribly important one, but this is one that the Electronic Frontier Foundation put out with a whole series of things out here that talk about things. You've made some phone calls, and they don't know what the call was about, but they know where you called from, what time you called, and how long you called on this. Remember, this is the president who has said this several times, we're not taking your data, we're just looking at the metadata. As you can see, the metadata is reasonably darn important in order to look at that. Let's look at a more data processing oriented example. Here's an example from a system of the Veterans Administration that was put together by a group I was working with in the late 1990s. It was describing a bed, and it said, well, first of all, what is a bed? Well, the purpose of a bed is it's a substructure within a room, a substructure of a facility location that contains information about beds within rooms. You can see it has a lot of descriptive information. By the way, if you change from definitions to purpose statements, you've actually strengthened your metadata capabilities, because purpose statement adds motivation, whereas definition is very, very passive in there. It has sources of information about it, the partial list of the attributes, the characteristics, and a room can contain zero or many beds. So these are some business rules and processing in order to do this. So it's critical to organize things, and we do it all the time. There are lots of different types of metadata, and I'm not going to spend a lot of time here describing the various types, but if we have process metadata, it means we have information about how processes are performed. If we have business process metadata, it's who created the documentation or the dependencies that are aware of it, and these sorts of things here. Again, the idea is not that we're going to focus in on this, but that we use this type of information. Sometimes for business metadata, the most important characteristic of a business process metadata is its existence or not in order to do that. We can also differentiate out technical and operational metadata and talk specifically technical metadata, for example, might say which physical database table is it in, or operational metadata might talk about the backup jobs, the frequency of backups that are done. Metadata is the language of data governance, by the way. So as you're working with stewardship activities, the language that the data stewards speak should be as close to metadata as possible that drive different things in the organization as they're trying to manage the resource that they're doing. Another final area here just to take a quick look at it is provenance and other is the history of the metadata. If I were to say something that the metadata came from an unreliable source, then you know that you would use it with a grain of salt. There are lots of these subject areas here, and I know I'm going very, very fast because there's no point in listing all of these things for you. The real question is how do you go back and put them together? So most people, when they sort of get this and say, okay, so metadata helps me organize the data. Remember, data is our sole, non-depletable, non-degrading, durable, strategic asset. What are the benefits of doing that? Well, we can increase the value of this information. We can reduce training costs. We can reduce the amount of time that you're knowledge workers. And by the way, we're all knowledge workers, so increasing knowledge worker productivity is one of the best benefits that you can apply organization-wide in order to do this. It literally is the only thing that gets us on the same sheet of paper in terms of improving the system's communication around business professionals and IT professionals. We also can very much speed up the time to market by reducing the system development, lifecycle time by proper use of metadata, reduce project failure risk, and identify redundant data processes. After all, you wouldn't want to keep the organization collecting two different types of the same thing. It's like having two different watches. We also have metadata for unstructured or semi-structured data. By the way, I don't like the word unstructured. If you took it unstructured and made it structured, it can't work, so I always like to say semi-structured data on there. But again, if you're looking at content management application, websites, intranet sites, archives, journal collections, the various community resources that you have, all of these things are various types of metadata that can help to improve productivity. One of my favorite examples on all of this was something we called the Nokia Term Bank, and I'll use that as a preview of a talk that Karen Akins and I are going to give down in the Data Governors Conference coming up next month on business glossaries. We'll talk more about that at the end of the program today. But even administrative metadata can become specific, can become important. So let's talk about just one business problem that we were working on, and we used actually four different sources of metadata. We had a reference model. Now, the reference model was kind of like a Wikipedia article that we could use as a reference point, and it was something called ADRM. And then there was a conceptual model that had been created a couple of years prior to where we were, but it was still historical and good. It wasn't current, but it was still good. We had an existing system that we could reverse engineer, and we had an enterprise data model, and we used all of these to come up with the answer to the particular problem. Each one of them, of course, you can tell, is a specific piece of metadata. So when we look at the value of metadata, what we look at really is a situation that says these building blocks are key. In order to get business intelligence, we have to have knowledge of logistics, of reporting, procurement, tracking, inventory, et cetera, et cetera. All of these things come together in order to pull these things together. Now, let's move on to strategies. And this is where I think it gets a little bit interesting here. So most of the time when we see these articles out there on the web, they'll say, hey, five steps to develop a metadata strategy, right? Well, initiate and plan it, conduct interviews, assess existing sources, develop future metadata architecture, and make a phased approach to implementing it. Again, I don't mean to take anything away from the authors of this one. I don't even remember where it was actually stolen from on here, and it's good guidance, but it could be more helpful. What it really means is that if somebody is asking you to do something, they're saying it's worthwhile to invest in. And how do you convince IT and business staff to plan, budget, and apply resources for metadata management? How do you explain to them what it is and why it's important, what technologies are involved? And the answer to an awful lot of it is you already have these existing capabilities. So metadata strategy is a statement of direction about metadata within the enterprise. And it should be a part of your data management strategy, but that your metadata is really the lever that helps you leverage your data. I used to have a graphic on that. It was a lever within a lever, and it was a little bit complex, so I decided to pull that one back out. But you can build these metadata strategies from the components that were on that previous slide. It's also, though, important to have a real motivation here. So you don't want to manage metadata for metadata's sake. You want to manage metadata because it makes a difference in the business. And this is what we're going to focus on are a couple of examples here that hopefully will be really, really helpful for you. What that means is you need to understand the current environment, but you also need to know the near-term future. And I'll give you just one example. The government is building some large ERP-based systems. ERPs are these large enterprise resource planning programs. And they handed each of these three ERPs to a different contractor. The problem is nobody told the contractors that they had to speak the same language. It was not part of the formal requirement. And yet the near-term approach from everybody's mindset was absolutely they need to interoperate. After all, they were all the same brand of software. So you would expect that they would talk to each other. But unfortunately, not having the metadata objectives, each one of these systems speaks a different dialect of this software package. And consequently, when they turn them on, they don't interoperate. And I have to say that I'm not entirely convinced that that couldn't have been avoided up front. Because when we look at this, we find only one in ten organizations actually keeps a focus on this. Now, the metadata strategies are very key, but it's hard to think of a tactical strategy. But that's sort of what I want you to think about. So underneath your data strategy, which is that you're using data to support your organizational strategy in here, your metadata strategies become short-term wins for making sure that you actually use data correctly. Remember, metadata is what allows you to leverage the data. And what I find is that most organizations really have the wrong focus on these. So they will ask questions and they will hold up pieces of data and say, is this metadata? And if it is, I should manage it, right? And the answer is no. You should manage it as metadata if and only if it is worthwhile to include in the scope of your metadata practices. Because almost everything can be metadata. Remember back to our definition back there. It's not really a noun. It's a verb. It's a use of data. Again, I'll give you a quick example here. My age 15 years ago was 42. And most people say, what? That's a bizarre way to describe an age. Well, is 42 data or metadata there? Well, it's metadata if you're trying to decide whether I'm old enough to buy alcohol in the Commonwealth of Virginia, which is where I happen to live. We set that at 21, 42 is twice that, so at least I'm of age in that sense. Should I manage that as metadata? Now we actually have a different rule in Virginia. If you have facial hair, you're old enough to drink. So we don't manage the 42 there as metadata. It's not valued enough in order to do that. And where we find most organizations are going wrong is that they are managing things they think about our metadata because they believe they should manage their metadata. The real question is, should this piece of information that we're managing be included within the scope of our metadata practices? Because if it is not, then we shouldn't put time and effort into it. And this gets into another little piece that we're working on here. But we're trying to find out how much organizational data is wrought, and the answer is about 80 percent. Actually, we're pretty sure the answer is at least 80 percent, which means you shouldn't spend any time and effort managing most of your data, and the metadata is the only thing that's going to help you differentiate on that. So let's move on a little bit here, and there may be some questions on that when we come back to the Q&A piece at the end here. But if we look at how our metadata practices are actually working, metadata is the thing that groups most of your initiatives around data together. And not having metadata is the thing that causes these operations to fail. So I like to pick on master data management as a poster child for how these things are done. One of the organizations I work with had spent $60 million implementing a master data management stack from a very well-known vendor, and the software worked really well there. The joke was that they were bringing in three plain loads of consultants to help this organization spend the $60 million to get it done. And where they failed, because of course the project was a failure on here, was that they did the master data management in isolation, thinking that they could do just that and not actually focus it on the rest of the complementary organizational goals. So as you can see from the slide here, these metadata practices are inextricably intertwined, in this case with data quality and knowledge management that were other pieces that they did. And these master data catalogs, the metadata management that we did, helped them to understand how the master data that they were doing had to be of sufficient quality in order to do it, because after all, if you fill good software up with bad quality data, you have a case of garbage in and garbage out. In order to do this. And that those pieces were supporting specific knowledge management practices that were going on within the organization. Metadata is of course the glue that keeps all of these pieces together, and they managed it separately, and consequently the effort was $60 million that they did not need to spend. That's another example of a little short-term metadata strategy. But the idea here is that most of our systems have grown up through a process called entropy, which means that things tend to become larger and more complicated and more intricate as they become older. We add things to the systems. And when you look at this, metadata strategy might be to reduce the number of systems. So we can do that through a very, very nice set of practices by understanding, for example, the only value that System 4 does in this diagram is that it takes outputs from System 2, some outputs from System 2, and transforms them so that they can be read by System 3. Now, if I have that information, I can actually program that. So this is a programmed data store, and I can generate programs. And when I generate the programs, I can replace lines of code or systems, applications in our environment with something called a rule engine, which is a lot easier to maintain than a bunch of code. And in this particular example, we found three systems that added practically no value whatsoever to this so that we could take metadata and eliminate actual application programs and manage strictly the metadata but not have to maintain the similar lines of code. Now, I have to tell you the upshot for this program as we eventually ended up working on it meant that we figured out that the data getting from System 1 to System 6 was actually a whole set of transformations entirely. And even though I showed three of them replaced here, we actually replaced the whole thing with a series of metadata management activities. So again, as a strategy, it was a very good strategy that went in and allowed them to show some tangible benefits of eliminating some programs that we didn't have to maintain anymore. Here's another metadata strategy. This is a model, a business model that works well in the financial community called FTI Financial Transactions International put this out. And we were implementing their rule engine carrying on that same theme from the last slide in order to do this. And one of the things we did with the metadata that they gave us was that we reverse engineered it into a repository. Now, if you look carefully, you'll see that our quote repository was done in Microsoft Access. So this was a Skunkworks project that took a couple of folks a little bit of time to do, but that they could now look specifically at a table. And the one I'm showing you is ft underscore t underscore abdf, upper left-hand corner there on the screen. And that we can click on it and say show us the columns and show us the primary keys, the foreign keys, where it's used in the table, what are the names, what are the definitions. Again, a lot of information on here that wasn't able to be understood unless we were able to manage this metadata, which meant that for this organization they were able to implement the repository and the rule engine in a much easier fashion. That took them a little bit of extra time to manage the metadata. And again, Access is not our favorite tool for doing metadata management for this project. It was superb. By the way, what you're doing here, of course, is reverse engineering the DDL schema into these things so they can all be read. I can talk more about that at the end if we go a little further. Here's another example. This is a case study that was written up. You can see it was a number of years ago, 1999, 17 years ago. Wow. It's still a great illustration though. What we were doing was implementing a PeopleSoft implementation. And in order to do this, we actually needed to understand what it was going on. And what I want to show you is that this was a cycle that we went through. And I'm going to show you a number of different uses that we made of the metadata. But what we found was that it worked in a pattern. So the key to this was, as I showed you earlier, the specific skills that you need to have. Once you have a team that understands how to do this well quickly and efficiently, then you can use this team to follow this sort of generic strategy for doing this. The question was, what type of metadata was useful for implementing specific requirements of the system? So for example, one of the tasks that we had to figure out to implement the PeopleSoft was to understand workflow metadata by listing all combinations of a step name, along with each associated component, business process, and home page. And when I say that, that is the structure of PeopleSoft. Home pages have one or more business processes associated with one or more components, and then each component has multiple step names. So again, it's a one-to-many relationship from home pages to business processes, business process to components and components to step names. But which ones? So to do that, we actually pulled together a little quick formula, and I have to give a shout-out to my former student, Bibiana Dway, who is still working in data down under in Australia on this. And we've had her name on a lot of different projects and things. This was a complex query that she used as a PeopleSoft query that was able to pull through a 13,000 line report. What do we do with that report? Well, we exported the metadata so that we could pull it into a spreadsheet, the dreaded spreadsheet for managing metadata, so that we could do complexity analysis and understand the validation and everything else around this. And here's a copy of the spreadsheet. You can see Bibiana's name right at the top of it there. What you're looking at here is the top set of this 13,000 line query that was returned. And you can see the business processes labeled in pink, administer-based benefits. The next business process starts immediately below that, administer-benefits billing. Maintain billing accounts is the component name for administer-based benefits billing. But if you go back up to administer-based benefits, it's really easy to see there are four components. Manage-benefit enrollments for the U.S. Manage-dependence benefits. Manage-leave accruals. Report-benefit participation. Those are the four things that administer-based benefits did. And each of them had multiple steps. You can see the steps listed there as well. Once I have this information, I could import this back into another one of those dreaded access databases. We're just going to call it the repository in here, but it was, again, an access database that we put together to do this so that it could then be queried. Now, the first time we did it, we only had to import it once. The second time we did it, we had to integrate the existing metadata with the new metadata. So the structure of the database, the metadata base in this case, became a little more complex each time. And each time we did this, the resulting richer metadata would spawn another query. And I'll show you what we did with this stuff as we went through it. But of course, the last step, again, started the cycle all over again. So the key to this strategy was to develop a team that, for the duration of this implementation of this very large complex piece of software, knew how to go in and pull specific pieces of metadata out so that we could use them to answer some questions. And I will now show you some of the questions that we did. One of them had to do with understanding the requirements. For example, the customer in this case was under continuous audit. So might we get audited, but it was going to be audited. It was continuously audited as we went through. And one of the things they said was show us that this software meets the requirements. So we used metadata to link specific functionality, and we were able to demonstrate each requirement. Requirement number 14 was demonstrated by this screen, and we could point to it. And the auditors were thoroughly, thoroughly happy. Similarly, we had some modifications, of course, that we were making to the system. And as we did this, we were able to go through and say, hmm, I could either change the business practice, or I could change the software, or I could do a little bit of both. What should I do? Well, the only way you can answer that is if you know how many. So how many panels would be required if a given field was doubled? And we were able to go through and give the auditors, as well as the people planning the project, very concrete information that said the cost of this change is approximately x, y, and z in order to do that. When we looked at the various process components and mapped them to the user activities, one of the things we noticed was that a screen in the legacy system that had all the information on it, that same information was now spread in one instance over 23 different screens. And when we looked at those 23 different screens, we said, wow, it would make sense also to reorganize the business practices that were there. And as we looked at those various business practices, we now have the opportunity to say, hey, what can be changed in the software? What can't? We could also align the various practices. So some of the things the software did gave us good functionality, and some of them kind of left us hanging. So we could compare the system's inputs and outputs with their own specific information needs. And I should tell you that the way we did this was a very practical piece of information. We took every screen in this PeopleSoft implementation and we added it to the access database that we had put together with all the rest of this metadata. So the screens for PeopleSoft became now metadata. Why would somebody want to do that? Well, I already mentioned one reason, which was the requirements that people were doing. But now, well in advance of the software being implemented, we were actually able to start training. And we could do analysis to see how comprehensive the system was and how much we needed to change the various business practices that were involved in this. The training person came to us and said, I hear you guys have some stuff that might be useful for me. And we said, yes, that would be metadata. I said, I don't care what you call it. I need to know how to do training. Well, again, I showed you the example earlier there where we had the full business practice analysis in order to look at this information. And the training manager said, wow, I can use this to actually see some of the things that are going on in this. We grabbed some additional metadata because we had also reverse engineered and obtained metadata from the Legacy System 1 and Legacy System 2 that allowed us to do mapping between the various data elements, not just in spreadsheet basis, but mapping into a topic that had a full definition support. We also used the metadata to design the extract database so that we could redesign the physical database and also show it in logical user views. People soft at the time had a physical database, relational database that was very, very nice, but they weren't really supportive of the logical views that were involved in this. We were able to derive a set of logical views that we then ran into the security people and said, hey, this is good information for how to devise up the security. We also just ran some statistics on this as well. And I'll show you this one last chart that we put together. So here we were looking at the practice called Administer Workforce. And we just asked the number of steps that were involved in each of the sub-processes. So again, what you're looking at is Administer Workforce is composed of recruit, manage competencies, plan successions, administer training, plan careers, and manage positions. And it's quite obvious to see here in this case that Administer Workforce, the recruit workforce and manage competencies are more step-oriented. It doesn't mean they are necessarily more complex, but it's sure a good indication in there that those pieces are much more complex than planning successions and administer training would be. So all of these bits that we were able to pull together, we were able to use in a number of different capacities to take a look at how metadata strategies could be used to complement the implementation of this new system. And again, if you're interested in the article, you can find it on the web. Just look up Reverse Engineering New Systems. I think there's a dozen copies or so on the web. If you have trouble finding it, let me know. We'll send you a copy of it on here. Let's move on to one more example here, which is just to say that as you are doing any sort of a mapping, so I'm going from a legacy system on the left-hand side to the new system on the right, that process is a many-to-many process. And the analogy that I use is let's pretend you were desperate to solve a puzzle and everybody was running around with a piece of the puzzle and comparing their piece of the puzzle to everybody else who had also exactly one piece of the puzzle. The analysis is completely unfocused. That's the way most mapping is done. However, if you use metadata to focus your many-to-one and then one-to-many in order to go into and out of your system, the analysis now becomes focused. So the second half of this diagram takes the legacy system on the left-hand side and maps it to a standard metadata model. It could be a model of the actual application if you have it and have it do a reverse engineered it. If it's not, then it's something that you can use, one of those reference models that I was talking about as well. A lot of different options that you can use. And of course, our data modeling community here will notice that this is the same thing as solving a many-to-many problem with putting an intersecting entity in the middle of it. Yes, that's exactly right. That is the value of metadata in this case. So instead of just trying to everybody run around with a piece of the puzzle and saying, does it fit? We can actually fit to a target and then take it from that target into the new system, much more efficient to process. I absolutely consider this best practice. And if somebody is trying to tell you that they can convert your data without using this approach, I will tell you that it's going to cost a lot more money than you probably planned on spending in the first place. Final example of a metadata strategy here are good friends David Marco and Mike Jennings put out a terrific book a couple of years ago. The reference to it is down in the bottom right-hand corner there. But what they did in their book is they included the Erwin models so that you could actually look at the metadata that came with each of the models. So one of the things we talk about is don't start off with a blank sheet of paper. There are so many sources of information that you can get in order to look and see what will likely move you ahead very, very rapidly in your environment that it is silly not to. Their book is, I'm sure, less than $100 on Amazon. It's so much more productive to do that. I should be reticent if I didn't also call it David Hayes Data Model Patterns book and Lynn Silverstone's Universal Data Model Set in order to do this. There are great sources of metadata models that are out there. So you do not have to start over from practice. And again, we owe David and Mike very much thanks for doing this. Notice they spelled metadata wrong, but again, that was earlier days. So we won't pick on them for that. So those are some metadata strategies and how you implement them. Let's drop back a bit now and talk about what it is we're trying to do. And the idea here is that you're trying to provide organizational understanding in terms of how something is used and applied within the organization and to integrate all of these from a set of diverse sources that you want to provide easy integrated access into the metadata and to ensure that the metadata quality and security are good. I have to tell you this is a great piece. I was doing a joint presentation with a colleague from UPS. And she called me up and said, woo-hoo. And I said, oh, you're excited. What are you excited about? She said, I just got a call from our lawyers in Atlanta. And they told me to be very careful about what we tell in this presentation because metadata is a strategic corporate asset. And I was like, you got that through to the lawyers. We have made progress here. Thanks for that, Pat. That was a great story and a lot of fun back into those days to do that. Let me give you a very specific example on this. I got a hospital system that we worked with recently that had also one of these very large ERP packages in it. And the ERP package worked really well. And it had a field in there called Admit Date. Now, you might think that was pretty clear. Everybody understands what Admit Date is. But if you know anything about health care, you know that actually they don't. We didn't audit. We found 11 different definitions. People were using Admit Date in 11 different ways. Problem? Yes. Because as soon as we tied those to funding sources, it made tens of millions of dollars difference for this health care system. So one data field, one metadata term being used multiple ways, everybody thought they were doing it correctly, was costing this health care system tens of millions of dollars. It is an amazing statistic. And it happens every time we go in and do an organizational analysis on this. So you've got to understand that Admit Date means the following everywhere for the entire organization. And then if you misuse it, you are incorrectly using the system in terms of how it's supposed to be used. Let me give you one more example on this. Again, did about four years with Nokia. Nokia is a fine company. And they had something there. Nokia is very much of an engineering-oriented culture, the group that I was working with, very much an engineering culture. And when we would sit down in meetings and use a term that was not understood by everybody in the meeting, their culture was such that they would immediately, as a group, collectively turn to the terminal at the corner of the room or one of their laptops and say, is that term in the Nokia term bank? So if we had been using the example I gave you before of Admit Date, they would go in and type in to the Nokia term bank, which was a website that they had internally for all of their people to use. They would type in Admit Date, and they would have come up with the standard organizational term for Admit Date. And they would say, oh, Admit Date should be used in this way throughout the entire organization. And they would breathe a collective sigh of relief. If they didn't find the term in the term bank, then the Nokia team would decide very quickly at the meeting should it be included. And if it was, one person would be assigned to make a little recommendation that would go in an email form or a form that would go to the term bank users group. And the users group would meet periodically and decide whether the term was worthy of being included in the Nokia term bank and what definition it should be used. Now, this only worked because Nokia had a very strong culture and Nokia was very, very committed to making sure that they understood how to use this information all the way throughout. Again, very, very good goals and principles, very good use of how it is. So what are the activities? Well, you're going to want to understand the requirements and define the architecture that you need to have and develop some standards. If you've got the ability to leverage additional standards, it makes sense to use them. And to put this into something that can be used. Now, I will say here that many organizations have a big hurdle in the sense that there are a number of tools out there that are very fine metadata management systems, but that organizations are not really mature enough to use them. So it doesn't do any good to spend a lot of money on a very complex, expensive software product if it's not able to be used. Again, the analogy is handing the keys to a Tesla to a 16-year-old is probably not a good outcome. So all of these vendors would actually not rather have their products used as shelfware. They'd actually rather have them used in there. So you can create your own metadata management or repository-like functionality in there. You can integrate the metadata. You can practice using access. We do an awful lot of metadata management using SQL server in here. And this can be used to distribute and deliver the metadata. In Nokia's instance, for example, it was simply we go to the term bank every time we find a term that we're not sure we agree on what that term means. And consequently, we want to use it all the way throughout. And then, of course, you want to, of course, make sure that everybody can distribute and utilize the metadata. Let's talk specifically about standards in this context. Again, there are two types. There are consensus standards, and this framework here just shows how they are related and how they can be used. Standard work is some of the most thankless work that you can imagine in order to do. It's very, very problematic, but we should pay attention to it because there are a bunch of very, very good standards that we can use out there. A couple of them that are worth watching, the common warehouse metadata. If you're doing warehousing, CMW is an evolving standard in the object management group that are working to do this, and they're extending it out here in the same way that the AD cycle was before, but I think in a little bit more practical fashion. So again, not that we're picking on them. They've actually done some really good work in order to do this. So if you're doing warehousing, it's certainly worth looking this one up and following up to it. So what are the deliverables out of your metadata management practice? Well, new ones certainly have some repository-like functionality. Now, again, we've built a lot of these for people over the years. Many of you have showed me ones that you have built, and they are terrific. Eventually, if your organization becomes serious about it, you will want to get an industrial-strength one. But as long as your user group is managed in dozens and small amounts and things like that, it's actually perfectly fine to build your own. You want to increase the quality of the metadata that's around it. You want to show people how to analyze the data. So now we're introducing something called a metadata scientist. Oh, my God! Just what we needed is another terminology there. We won't go into that one. Data lineage is going to be important again. Where did the data come from? Another word for this again I already said was provenance. I don't know why we need a fancy term for it. Where did it come from? Right? But that's how people understand it. We can use this for change impact. So when we make a change and just think about why 2K is an example. If we'd had all that metadata and we wanted to know where everything had to be changed from a two-digit to a four-digit, we could have done that very, very easily and we would have fainted early on with the size of the price tag instead of discovering it as we did. Certainly what metadata control procedures, models, architecture, and operational analysis so that you can improve the efficiencies of it. There's a number of specific roles and responsibilities where you have participants, consumers, et cetera, et cetera. The important point here is to find out what yours are. Concentrate on a couple of them. Get them started. As I mentioned several times here again, you have all the tools already in your environment to do everything that you need to do to get started in this. And my guidance to you is to start and play with it for a bit. And after you've played with it for a year or so, then go out and have a really good conversation with the industrial strength vendors. They will appreciate it because you'll be able to really play stump the sales person and they'll get their sales engineers involved and you can ask them really serious questions. But an organization just starting out probably should avoid going in too early to buy some of these things and spend too much money because it runs not being able to produce a return on investment. I'm not going to go through these 15 principles. They are all very good coming from the data management body of knowledge that I've mentioned a couple of other times. But I do want to sort of wrap up here at the top of the hour with an example for you to take away. And there's two parts of the takeaway. The first one is there's got to be a pressing business value for somebody to be interested in metadata. So one of the groups that we worked with said, we're trying to identify our customers. And this example is actually a Dataflux example here, but it was a still very, very useful thing. Dataflux is not part of SAS, by the way. But the example was cogent all the way through. So on the website, Microsoft tends to put your initials in it. So the website picked up from the browser that this customer's name was J.E. Smith. By the way, carrying on our musical theme, the channel and I do, we've been watching some J.E. Smith videos out there which are fascinating to watch. He's the guitarist from Bob Dylan and a number of other people. He's got a great web series that he runs. So you can check it out on YouTube. Back to metadata, okay, J.E. Smith. But now the same customer called the call center. And the call center had a crackle on the line. So it was, you know, hard to hear. So they wrote the name down as John E. Smith. Okay, well that's another customer. It's a different customer from J.E. Smith. Then of course, probably like you, I tend to write my name wrong when people ask me the things so I can see who's selling my data to what other suppliers. So J.E. Smith wrote her name this way on this third party list and they went, whoa, that's three customers now. Great, okay. And then of course we've got the customer prospect database where J.E. Smith actually shows up in the customer database. And the problem is this company thought they actually had four times as many customers as they actually did. That's an issue. And that's a terrific issue to show how metadata management can help you to get the correct number of customers that you're going after, which means then it's more likely that you will actually have better sales forecasting. That's a business example. Here's a technical example. I'm pawing through some data and I'm using a tool here that we call a data analysis tool or a data profiling tool or a data discovery tool. They're known by all three names. And what I've discovered here is that in the blue circle in the upper left-hand corner, there's an inferred minimum value on a field called PayCoat. So PayCoat is a column in the database. And the inferred minimum value is Astrick. Astrick, what is that? I don't know. Well, I remember somewhere that somebody said to me, ah, you know, they're about 14 or no, about 11% of our customers get paid by the British payroll system. So I do a frequency distribution. If you follow the red arrow down to the right-hand corner, you'll see that Astrick corresponds to 587 values that are there. In other words, I double-click on PayCoat and I get 587, which happens to correspond to 11.4918% of the data values there. And I double-click on that and it turns out, gee, everybody in there has a payment method of UK. So the metadata, even though it wasn't clear, allowed me to visualize and see what was actually happening in that particular piece. Again, these metadata pieces solve specific business problems. You do it all the time. Your colleagues do it all the time. They just don't know that it's metadata in order to do that. So I thought I was going to finish with a teachable example here and I'm going to go to iTunes. Now, when you take iTunes and you stick a CD in it, what iTunes does is it counts the number of tracks. Okay, 25 tracks. And it determines how long each track is because that's all the metadata that is stored on that CD. But you never notice that piece because most of the time when you stick your CD into your iTunes piece, you're connected to the Internet. And it connects to something in the top corner. You'll see so-called the grace note media database. And it takes some information, some metadata about the CD, and it has developed a way of linking it up. And notice when it connects to the grace note CD, grace note very nicely supplies you with the CD name, the artist, the track names, the genre, and the artwork for that particular album because it sure would be a pain to type in all that information. Now, many of you didn't know this because it happens almost instantaneously when you're doing this. But this is, of course, a wonderful use of metadata in the iTunes environment. And if you don't use this metadata, it becomes a real problem for you. For example, if I'm now wanting to come up with a smart playlist for all of my Miles Davis album, I can make a rule that says put in this smart playlist everything from the artist named Miles Davis. And when I do that, it now lets me have a Miles Davis playlist. And when I did this, this was actually real time when I was doing this, I found out I had a second Miles Davis album in there. So I didn't get the desired results. I thought I would have the complete birth of the cool in there, but I actually had a live at Fillmore East. So I need to fine-tune this. And I need to say, okay, well the metadata request then should not be, put everything that's Miles Davis in there if I only want to have a playlist for the complete birth of cool, I should really specify the album contains the complete birth of cool. And now we can get the actual metadata piece. This is a wonderful way of illustrating to anybody the value of Miles Davis. And I can now move Miles Davis to a folder and put it into a larger collection of Miles Davis albums. But now let's take it one final step further. Notice that that structure works exactly the same. Interface, processing, data structures that are applied to the podcasts, the movies, the books, PDF files that are out there because they've applied the same metadata management procedures. The economies of scale, even though we're not real happy with the interface of iTunes, we're happy we have a place where we can manage all of this stuff at once with one set of particular metadata management. So hopefully that's an example that you can demonstrate to anybody else. A couple quick takeaways then as we get to the top of the hour and I'll look forward to your questions. Again, remember our definitions. Metadata unlocks the value of data and therefore requires management attention, requires specialized team skills and usage. Metadata is the language of data governance and it defines the essence of any integration challenges and that metadata is what really connects the bits that we try to do in data management, whether it's a matter of metadata management and data architecture and data governance or whatever, to all of the other bits and pieces on this. Again, our summary is really wonderful that you can use from the DIMBOK piece. And remember, don't try and do this, which is the boil the ocean approach where we have entirely too much metadata but it's perfect if the organization would just stop evolving for five years. So I've included a couple of other sets of references around this for you for take a look at on all of this but now we get to the top of the hour and it's time to ask Shannon what sort of questions do you guys want to talk about in metadata? There's a lot of great questions coming in. Just submit your questions in the Q&A in the bottom right-hand corner and of course one of the most popular questions that we always get people asking about a copy of the slides and recordings. Just a reminder, I'll be sending that out within two business days for this webinar by end of the day Thursday with the links to both of those things and anything else requested throughout. So we had questions coming in pretty early. Pretty exciting. How can we integrate NoSQL into this metadata repository which is focused from a relational perspective? Great question. And we don't actually have a lot of terrific answers that are embedded. So one of the things Shannon and I were talking about early on is that we recognize that accounting has literally 7,000 years at least of tradition. If we've been doing this for 100 years, we're kind of stretching it. So just give a little context around all of this. The relational model is what most people are familiar with. We teach lots and lots of kids in school, college, and university how to do SQL so that they can get information out of relational databases. NoSQL, remember, does not mean NoSQL. It stands for not only SQL. So the question is how can we relate SQL and not only SQL? We'll call it plus SQL or extra things that go into it in order to manage the metadata. Now I'm going to take it a step further and I mentioned the quote unstructured data that's in there before. It's a terrible term. Just as NoSQL, it's a terrible label in there. But the labels we're dealing with, so at least we know what we're talking about because we have standardized them. So when we talk about unstructured data, we're really not talking about unstructured data. The definition of unstructured data is a blob of jelly that you can't nail up to a wall. It doesn't work. That's the definition of unstructured. What we're talking about more likely is semi-structured data. Now maybe it's a jellyfish that has a little bit of a skeleton and you can't really hang it on a wall, but you can certainly capture it in a net. When we talk about NoSQL, the NoSQL community has done a fabulous job of showing how NoSQL, not only SQL uses, can be used to enhance our existing traditional relational database management technologies. But let's not stop there. Let's also extend it out to include, in addition to this, network and hierarchical databases that are something that is not even taught in college and universities. All of the books on them have now had these chapters removed, and yet if you took non-relational processing away from the banking community, they would stop processing tomorrow. So we have to look not only at SQL, but at the big data technologies that go along with it, which means that we have some new ways of doing it. I remember the primary motivation for the non-not only SQL community is that we are taking what previously was structured data, and our previous approach was to take data and structure it into a relational database management system. We could write a query on it, and we knew we could always get information out of it because we've cranked out thousands of kids that understand how to write SQL. Well, now we're applying a little bit of a brute force to this, and the brute force approach is a little different in the sense that we're taking the data, and instead of putting it in a relational database management system, we're putting it on a very flat baking tray. And the baking tray is actually not a baking tray, but a very fast flash drive. And by not putting it on one flash drive, but on multiple flash drives, we're going through it. We're processing it. We're parsing it. We're doing the data discovery that I was telling you about earlier on in a parallel fashion. Only certain types of problems can be usefully dealt with that way. Most of them are new and novel, so they give us some really tremendous opportunities and characteristics, but we can't forget the relational models and we can't forget the non-relational models, the hierarchical and network databases that are out there as well. All of these have to be pulled together, and the only people who are qualified to do this are your data management people in conjunction with your business people. So we go back to the strategy approach, which is to say what business problem are you trying to solve, and then can this particular set of approaches in combination or separately actually result in a solution that is worth the price? Now I have to tell you, when we look over the last five years of, quote, big data, unquote, it's about as successful as most IT projects, which is to say not so much. So we really do need to come back and rethink this, and one of my more popular talks this year is actually rethinking big data, approaches to big data, where we're seeing an awful lot of people that are saying, okay, first attempt didn't work so well, now what can be realistically done? So again, just to simply give you the example, a sequel piece is what's going to tell you the average payroll. You're not going to run your payroll using big data techniques, but you might want to run your fraud first alert systems on a big data platform, because that might in fact work very, very well, so that you can combine your existing rules-based fraud approaches with a newer approach that is more of a real-time environmentally scanning ability to come in and do it. So I know that's a long, long-winded question to the actual question, but the idea here is that these things complement your existing stuff. They don't replace them, and if we don't learn how to take what we have and build on it gradually and carefully, we have no ability to make it actually work in the real world. Did you get all that, Shannon? I think so. There'll be a place for it, right? It is a more common question that's coming up and no SQL webinars, too. It's just how do we deal with the metadata? It's a very important thing. The next question coming in, FKI metadata repository versus engineer. How do you relate business terminology to the reserved engineered repository? Okay. Good. I think the question is asking, if I go from a series of SQL statements and reverse engineer them into a, quote, repository-like functionality. Again, we're not going to name any specific repository vendor or anything like that, but just simply to say that we want to pull it into somewhere where we can start to query it. How do you go about pulling in the data there? Actually, that article that I described to you, the reverse engineering new systems, gives a very specific example of how we did it in that particular case. What we were able to do was that once we had put the core metadata pieces together, we actually ran through the documentation. And so when you would click on a word, it would immediately flip into a PDF version and have all those terms highlighted. It didn't show you necessarily only the right one, but it showed you everywhere that that term or set of terms occurred in the documentation. And good discipline. Again, a company like Nokia could implement this. All you have to do there is cut and paste the thing that you find the actual definition and add it and maintain it and tell your teams that this is part of the documentation that they have to produce. And slowly and gradually over time, your key set of terms that people have asked questions about will be fully defined in your repository. There are some more advanced techniques that you can use for the logic and pattern matching and things like that that we've done. We don't have time to get into them here, but let me just assure you, if you've got data that is in machine readable format, we can pull it in and I know we can do this because when we did the DOD data model back in the late 1980s, early 90s on this, we actually had the Code of the United States. And so when we looked at something that was an entity in the DOD, which became the Federal Enterprise Data Model, out there, we were able to go back in and find the relevant section of the Code of the United States, the federal statute that told us why we could do certain things. And that process, even though it involved tens of thousands of rules, did not take an inordinate amount of time. It was well worth the investment that we did. The entire project creating that was under half a million dollars in 1988 terms. So again, very, very doable process. Again, happy to converse with you on the side outside of that, but to take my word for it, it can be done. Now, several slides back. You mentioned a book. What was the name of the book? I tried to... Boop, boop, boop, boop, boop, boop. Book, book, book, book, book, book. Sorry, I'm drawing a blank. Oh, David, Marco, and Mike Jennings. Oh, yeah. Probably that one? Yes. Okay, so David and then Mike's book in there was a terrific book on metadata, what was it called, metadata practices or something like that. If you look at Marco and Jennings at Amazon, I'm pretty sure they're the only ones that pop up. The other two authors in there were Len Silverstone and David Hay, both of whom have spoken at your event, Shannon, and are just terrific people. So again, all three of those books are great starting places for it. And Len's and David's books, they come with the Irwin models on a CD-ROM. So it is the best under $100 purchase you can make in investment in metadata period. I'll have to add it to that. I should get David to pay me for that, yeah. The university bookstore, yeah. I will. And I'll send out a link to that as well in the follow-up email. Can metadata management strategy be implemented at a business unit, BI project level, or does it require a separate approach? Can it be justified as intangible benefit? I find that it's best applied at the granularity that you're describing. If you go much higher in the level of abstraction, it becomes difficult to distinguish from the data strategy. Actually, I have a book coming out on that later this summer, so you can learn more about it when we get that all the way together. But yes, absolutely. So talking about metadata, it is a finer level of granularity, and it's a department, a task-oriented, a project-oriented approach, as opposed to being this grandiose thing that the, oh my gosh, we're going to manage all the metadata in our organization here. If you follow this with the numbers, every piece of data, even if they only have two pieces of metadata, it means your metadata management environment is quite as complicated as your data management environment. So you definitely do not want to have a metadata strategy that says we're going to manage all of our metadata perfectly. Again, go back to this slide here. We don't want to manage it because it's metadata. We want to manage it because it's within the scope of our metadata practices, meaning it's going to provide a positive return on investment. Managing the one data item for the hospital at mid-date was literally tens of millions of dollars the first year we implemented it. That's an impressive return on investment. If they tried to manage all thousand data items that were in there, they'd have got bogged down and they'd never got the return on investment that they needed. By the way, they haven't stopped. They've done the first one, and they're now looking at the second one, and third and fourth and fifth, and building their way up, and eventually they'll get to diminishing returns. But as long as they're achieving more return on than they put into the investment, it's clearly a worthwhile project for the CIO to run. Awesome. Now, what's your advice in collecting metadata from unstructured source such as big data, data in the cloud, or web-based applications as opposed to the traditional way of ingesting metadata from structured data stores and ETL tools? Again, great question here. So when you're looking at the structured metadata, most people are able very easily to figure out a way of pulling it in. Just tell me what the rows and columns are, and I can set up a piece to do this. Again, remember, the guidance around big data technologies is to look at them as an adjunct, a complement, a plus, if you will, that you're doing in addition to your traditional. And again, I don't know that they're so traditional, but you're more relationally focused data management pieces in here. So it is absolutely reasonable to take. And think about the way it works on your PC. When you start to take your Windows or Mac PC and you type in a question on it, it searches everything. And it says, where do we find the word metadata? Well, it turns out everywhere in there. So what you're really trying to do, and actually I can do it right here. So let me just give you a quick example here. I'm going to change the viewing here real quick from just that. So it's going to stop sharing for a second. I'm going to share the whole desktop. Okay. So hopefully it's back up there. Hey, Ms. Rappaport, I see you out there. So here's a question, and I'm just going to type in metadata. Notice what it's done in here. Oh, look at that. This is a metadata registry. That was actually one of the things. I was just talking about why you can't make this stuff up. So that was one of the projects we were working on, the ladies. But notice what it's done. The top hit it's doing is from my contacts. So it's saying, first of all, I'm going to go for the structured data. But then it's going to go for the semi-structured stuff, which is in presentation. So this was an information architecture course I gave for Shannon a while back. This was a course called Practical Metadata Strategies. Notice when I'm hitting them, it's actually showing me bits and pieces. And there were some documents that it was looking at. And it's some PDF stuff. And then it eventually goes further and further down in terms of its priority. And by the way, if I don't like these priorities, I could change them in here. And notice it's even bringing in tweets. And finally, I guess it's a grudging piece to Microsoft. It's bringing in some Bing searches in here. So there's a great example for how to integrate both of those pieces. The first piece is clearly a structured query from one of the databases that's running here, running my contacts. The rest of the stuff is coming in from some more or less big data-ish stuff. And by the way, that is how the PCs and Macs are keeping track of most of their data and search capabilities on the systems out there. So again, great question. Hopefully the example was useful. All right. So I didn't hear you mention much about scanning tools like ASG or Shade and B-Cubic, Adaptive, et cetera in the build versus buy decision. Are you a fan of Bill? I know it can be a personal choice. No, I think we've got past that point. So that's, again, a great question. What I was saying was that an organization just starting out with the repository technologies should probably wait a year before they go and talk to the people that do this absolutely professionally. But that if you've got a specific business problem, who's going to be better at writing the scanners, your own technology staff, or some people who focus on it for a living? And so I would never steer people away from those types of technologies if there's a valid business purpose for it. We've seen it in many, many instances where people will have a specific need for a scanner in order to do this. Those scanners work and the people who do it are professional and they know what they're doing. And I would much rather rely on their knowledge, skills, and abilities than somebody else. We had one instance where we were working with a group and they had developed their own scanning technology and it turned out it was wrong. It was missing stuff. So they would say things like, yes, and here's every instance of this set of phrases within this group of legal documents. And you know what? They weren't even close. And that hurt them immensely in this particular legal action that they were working at. So a very, very big problem that they had in there. So no, absolutely the vendors that do this for a living do it really, really well and should do it really well. So if there's a valid use for it, definitely do it. So definitely I don't think it's a personal choice thing. I think it's a matter of who's more competent at it. And if there's one that's been done out there and it's been tried and proven, I think that you're going to have much better results without them building your own. That makes sense. So Ross wants to know when we can hear your talk on big data. I know we don't have one necessarily scheduled in this series, although in October we're talking about modeling in a sequel. So that certainly will be dealing with big data there. But do you have a general big data talk coming up? Don't take this anything scheduled right now. One of the things I have is a list of places that the universities can have me. So Ralph, if you've got a university nearby, just call them up and get them to invite me to come in as a guest speaker to do that talk and you can sit in on it. I'm happy to do that for you. And we certainly have some on-demand recordings of you talking about big data in the past. There is a LinkedIn follow-up email that certainly that takes you to all of Peter's on-demand recordings. For the last, oh gosh, how many years have we been doing this together, Peter? Well, I was going to say, Shannon, so it is appropriate to say at this point that this is the first recording of our sixth year doing this together. And Shannon is too modest to say it, but I will say it doing this. When she opened up Dataversity five years ago, the first month you had 353 visitors, right? Yes. Now they have over 100,000 a month, and I think that is a real testimony to the organization that you guys have built up here. And we're sure glad to be a part of it and have enjoyed our association over the years. Well, it's not quite the hype. It's getting the hype. But there's a lot of big data on there there. So five years' worth of content. But I know we've done the webinars in the past on big data, so there's certainly a place to go for additional information there. Is there any value to this jsoneschema.org to MDM? Are you kidding? Yes. So again, this is the build-it-your-own versus... JSON has a very proven technology for doing exactly some of this kind of work and then doing some of the bridging that we're talking about between sort of the big data and the more structured approaches on this. So yes, super stuff. In fact, Shannon, do your smart data conference come around in August? You know, we could not get the San Jose location again, and we wanted to move it out at the end of August. Anyways, announce our smart data online July 13th. There we go. And then we'll be doing our face-to-face event in the early next year. So that is a terrific event to go look at and get into detail with some of the people who actually do the coding work around this. And again, some of the examples that you see there is the draw-drop. It's a phenomenal event. In fact, our keynote, Jim Cobelius, is talking from big data to smart data. The evolution of all of that. So let me get back to the questions. I love all these questions coming in. By the way, if we don't have time, we have half an hour for Q&A for this webinar, but if we don't have time to get to your question, there are still so many great questions coming in. Peter and we get answers written out, and we'll get those also in the follow-up email. So keep them coming in. We'll be sure to get them answered, whether it's in this time or later. Too bad. Yeah, so we kind of talked about tools already. Universal metadata models is the book, by the way. Thank you. Lentil versus Marker. David Marker's book. There we go. Yeah, yeah, yeah. If you have a data model tool like Irwin, can you just export the metadata from tools like that? So the answer is usually yes. Irwin and Marcadero have both been working on trying to get interoperability going and things like that. And these were, at one time, considered revolutionary capabilities, but yes, those tools operate on a database. You can query that database and pull the data right out of them, and I should have said that. So most of the time you can hook Irwin up directly up to any relational database, and it'll pull out the schema and do it exactly that fashion. So yes, absolutely, and make use of those tools because they've put a lot of time and effort into making those things very, very easy to use. Sometimes they run for a while, which is okay, but it's much easier than trying to have to guess. And again, the alternative very often is a blank screen, and people say, whoa, where do I get started? Even if you can pull it back out, you at least can look at it and edit it, which is a whole lot easier. So yeah, thank you for saying that. That's a great addition. Okay. What kind of deliverables are generally associated with metadata management? Dollars. Now, most people don't think that way. They think, oh, look, I've got some definitions, right? But if you don't go the next step and show that people using the definitions incorrectly has cost the organization money, time, or something else, then people are not going to understand what it is that you do. So you've got to go out and connect it to something that means something to management, and that's usually dollars. So take it all the way. I mentioned this in my other seminars. There's a great book called How to Measure Anything, and I think we do the Monetizing Seminar when we do that one, Shannon. But if you have trouble trying to get to dollars, it's really worth reading the book. I can tell you I've sold a whole lot more of Doug Hubbard's books than I have my own doing these webinars in here because he's such a great writer. How to Measure Anything by Douglas Hubbard. If you have a challenge with metadata and you can't figure out how to get it to dollars, call me. I will guarantee you we'll get to something. All right. Absolutely. Yes, certainly. I've worked at a very large call center as a telecom analyst, and we analyzed over 1,000 data points and could definitely tell you how much money we were wasting in managing poor quality data. So it was a huge initiative to build a 12-point call. Aren't you glad you're doing this now? Most definitely. Yeah, you're right. It's a huge initiative. The next question is, do we get CPEs for this session? That's a great question. That's more for me. So, yes, you can get... Well, we do give certificates of completion for attending the webinar. Just send me an email, Shannon, at dataversity.net. And my assistant, Victoria, will have... we'll get you a certificate of completion. We'll just confirm that, yes, indeed, you intended. And so you can apply that to however it will be accepted for your certifications and so on and so forth. If you guys think I travel, that's what part of the world Victoria is in right now, right? Well, she's in LA right now. That's not nearly as exotic as the last one. Looks like, though, that we actually managed to get through all the questions. So again, just a reminder, I will be sending out a follow-up email within two business days. So for this webinar, by end of the day, Thursday, with links to the recording, the slides, and the books that you guys asked about, anything else requested throughout the webinar. And as Peter has mentioned, we've got some upcoming events. The next webinar is Data Modeling Fundamentals on June 14th. So we hope to see you there. Thank you, Peter, so much for another great presentation. It's fantastic, as always, that this is just such a hot topic and so important. And also, likewise, Peter will be speaking at our Data Governance and Information... Are you speaking at DGIQ, right, in June 27th? Oh, it says winter covers, isn't it? Yeah. And so you can meet him in person and then lie there if you haven't already. It's likewise a great face-to-face event. And thanks, as always, to our attendees for being so engaged in everything we do. We love the dialogue. We love the questions coming in. It's just so important that the dialogue keeps going. So we can always help each other to be better. Peter, thank you, and thanks to all. And I hope everyone has a great day. Thanks, Shannon. We'll talk to you soon.