 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for DataVersity. We'd like to thank you for joining today's DataVersity webinar, Monetizing Data Management. Show me the money. The latest in the monthly webinar series called Data Ed Online with Dr. Peter Akin brought to you in partnership with Data Blueprint. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right-hand corner for that feature. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag dataed. To answer the most commonly asked questions, we always, as always, we will send a follow-up email to all registrants within two business days, containing links to the slides. Yes, we are recording and likewise, we'll send a link to the recording of this session as well as any additional information requested throughout the webinar. Now let me introduce to you our speaker for today. Peter Akin is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of books and articles, dozens of articles and eight books. Is it now nine books, Peter? Yeah? Up to 10 now. Oh, 10. I need to update my bio. I love it. Peter has experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. He often appears at conferences and is constantly traveling. Peter, so where are you today? Get Murray to hop on a plane and go up to Boston for the annual MIT International Society of Chief Data Officers meeting. Looking forward to a couple of good days up there. Shannon, just to mention also, most of those books are available through your online Web Store. So we definitely want to put a shot in for that for Data Diversity. So good afternoon, everybody. Our topic today is focused on money. I want to actually tell you a little bit about the motivations that we spoke about in terms of getting this together. When people hear data, they kind of think technology and sort of a thingy, but it's somebody else's problem. It's really what they think about. And that makes those of us in the data management profession a little challenged, because when we're trying to get the ear of an executive in one form or another to say, this is important and you should pay attention because, it's very hard often to say, collectively, this is what's going on. So I wrote this book specifically to try and help organizations see some patterns in here so that you can take this information and make it useful to your own specific organization. Now, what we're going to do today is, as usual, I start out with a quick data management overview to show you where this is placed in context. I'll talk a little bit more about the motivations of the books. And then I'll talk about leveraging data and, most importantly, accounting for leveraging the data. Then we'll go in and present six very, very brief cases that have a monetary return on investment and two cases that have a non-monetary return on investment in there. In other words, there is something more important than money. Yes, absolutely. And depending on time, I think we'll get to the legal case where we can talk specifically about how this comes to play in legal cases as well. As usual, when we finish an hour from now, we'll then start to open the floor up for your questions and answers where you guys actually teach me a bunch of stuff which is really, really wonderful. So let's get started and look at data management. Most of you are aware that we have now released the DEMBOC version 2 out there on the Web. It's a great achievement and we're really proud of the fact that we've had so many volunteers, so many people helping out with that process and working our way through. And I want to shout out particularly to Laura Sebastian Coleman for being the editor of that process because it was an absolutely thankless job and, nevertheless, Laura, we are going to say thank you. Data is an awful lot like Maslow's hierarchy of needs. For some reason, everybody remembers Maslow from high school. And basically what Maslow's insight was that you have some basic needs that need to be met in order to get to another level. So at the physiological level, if you don't have food clothing and shelter, then you will never be safe. If you are never feeling safe as a person, then it's impossible to become part of a group or to be in love with somebody. And that love is something that is necessary and belonging to a group, a larger thing, is something that's necessary for one to establish self-esteem. And, finally, we get to the top part of the thing. If you don't have self-esteem, you will never get to what we call self-actualization. Another word for that is slow that you'll see in the popular literature here, but what it really means is working at your peak. And data has a parallel construct as well. Well, we have some things in this golden triangle here. MDM, mining, big data analytics, warehousing, et cetera, et cetera. The only thing I've changed in 33 years of using this diagram are the things that go into the golden triangle. And this is, of course, not the final list either. Sometime next year somebody else will invent something cool and it'll go in there and something else will fall out. But what you'll notice is that thematically these are largely technology-based. And because they are technology-based, they solve one portion of what we need to do. But that portion that they solve is just the tip of the iceberg. And we know the way icebergs work. If you see an iceberg get out of the way on that, because there's a lot more below the surface. And the five areas that we have below the surface here are governance, quality, strategy, architecture, and operations. In other words, these are capabilities that the organizations need to start to put together. While everybody kind of says, yeah, you need technologies and capabilities, the way we like to think of it is people process and technologies and this PPT. And by focusing only on the technology piece, organizations end up running into some problems. The question that we get an awful lot is, can you do all this by Friday? And the answer is, well, you can do it faster. It will take longer. It will cost more. It will deliver less for your organization and it will present greater risk to the organization. If instead you learn to crawl, walk, and run your way to the top of that pyramid. Now, if you watch this transition, I'm going to take those five areas at the bottom and move them into something that we refer to in shorthand as the DMM. Thank goodness for Carnegie Mellon University pulling this together and the CMLI Institute now, part of ISACA, pushing out the data maturity model. And you can see the same five pieces there, strategy, quality, operations, platform, and architecture. Or let's get a little bit more detailed. If you're going to have a strategy, you've got to have a coherent strategy. Everybody's got to be following the same strategy. Do you have a professional class of managers that manage your data assets? Do you have the ability to understand and determine what is fit for purpose data? What data is fit for purpose for making whatever decisions you're trying to do using the right technology and the right processes in place? There are, of course, some supporting processes that you need to have as well. And most importantly about this structure, we don't tend to teach engineering concepts in college and university, and yet they are so crucial in order to understand this. So part of the scale here is not just to have these practices done, but to understand that the entire foundation, remember this is the bottom half of that pyramid I showed you a slide ago, the entire foundation is only as strong as the weakest link. Now we rate these on a one to five scale. So if I say you have one, you get one point for having a pulse, two points if your practice is documented, three points if it is somewhere where other people can reference it, four points if you do some measurements around how well you're doing these things, and five points if you actually get together and say, can we improve the process? Another piece of shorthand here as well is to understand that this is the same capability maturity model that has an excellent track record of producing better on time and within constraint. Project delivery. So we're now applying this to data management and the goal here is to just tell your managers because they're already familiar with CMM, this is CMM for data. Now what I'm going to do on this next little bit is rate each of these. So I'm going to just arbitrarily say that the data governance, the data quality, the platform and the architecture are at level three. However, there is a weak link in the chain there in that this particular example, a made up example but nevertheless very typical is only a one. They do not have a data management strategy in place. So while they have necessary components, they don't have the directional component that they need to have, which means that they could put more money into data quality and the entire operation of the data in this particular hypothetical organization would still only be as strong as the weakest link. So the goal is to, in this case, build up the strategy so that it's also at a level three. In other words, let's get everything to level three before we try to make any of the threes and make them into a four. We understand this because data is the most powerful and yet underutilized, poorly managed data asset that we have. It is our only non-depletable, non-degrading, durable, strategic asset. And by that concept, data really wins out when it's compared to other assets that we have in the organization. So many people have said data is the new oil. I personally hate this description of it, even though I was over in Saudi Arabia earlier this year. They loved it over there as well. But the problem is if you think about data as oil, we don't think too much about stuff we put into our tanks. So whether it's oil or gasoline or any other petroleum derivative, it is not designed to be reused and more valuable the second time it's used than the first time it's used. So I like to think of it a little differently. Data instead is the new soil. This brings into play two additional components that gardeners all appreciate and that data people wish everybody else appreciated. One point is that you do not take and just throw the seeds on the ground. You prepare the ground. You need a good foundation in order to build your garden on. And secondly, you don't plant these things on Tuesday and expect to eat tomatoes on Thursday. It takes time. So the gardening analogy works out really well. I don't care if you want to call it the new bacon. That helps you sell the thing. That's just terrific on that. But data does deserve then its own strategy. Data deserves attention that is comparable to other organizational assets. And finally, it needs some professional administration in order to make up for past neglect. So let's get started now and talk about the motivations of the book here. And the book in particular came from a couple of surveys that we did early on. This was 2013, but how does your organization define value? And an organization that defined value, in this case the largest percentage of them was customer satisfaction. Another lower one was profit, quality of products or services provided on here. Then the next question was, what was the most important goal for your data management project and what percentage of your data management projects were successful? And you can see here that they came down very, very much. 75% for customer satisfaction, profit 50% and quality of products and services, only 25%. You can see there's lots of organizations having lots of challenges about this. We also asked the question, of your ones that were unsuccessful, what was the problem? And the vast majority of them said the lack of organization being ready. Again, they didn't prepare the ground before they threw the seeds on top of it. Or let's just be frank, spent thousands, hundreds of thousands or millions of dollars on data management technologies for which the organization was ill-equipped to respond. Also they had lack of advocacy, so again, an understanding of what the full life cycle was. And finally, many organizations are still having trouble aligning their goals. Now this book is ranked at the 1.5 million best-selling book at Amazon. So that's a good humbling statistic. It's actually kind of nice to be in the top 2 million books. But of course, Amazon is very nice to all of us authors. They said it's the 765th best-selling book in management information systems and the 641st best-selling book in computer technology, computer science systems, analysis and design, which it has nothing to do with. But thank you Amazon for at least making it feel good. Really the tasks involved helping those of us in the data management community better articulate the importance of what we do. And while it's good to say it's important, if we aren't able to communicate meaningfully with the people in the higher offices, then they're going to think we're just a science experiment and that we have no relevance to the business. Because today's business executives are smart, talented and experienced, but they are far removed and have not been made sufficiently data-knowledgeable. And that leads to too many poor decisions about data. The book is divided into four parts. One, again, unique perspective on the practice of leveraging data. I have 11 cases of specific financial results. I'm going to give six of them today. And five non-monetary results. I'll give two of those today. And again, we'll talk about the lawyers because the lawyers always come into it sooner or later. I do want to share with you a couple of the reviews that were on this, though, because I think that the interpretation was interesting. So we had a couple of reviews that were out there. It's always nice when people do that for us. So this particular reviewer said, my reason for purchasing the book was to learn how organizations are finding ways to monetize their data assets. By that, I mean generating income using their data assets or insights derived from those assets. This is not that book, I'm very disappointed. I do believe I advertised the book correctly by saying we're not monetizing data, we're monetizing data management, which is a different thing. Another part of it was the five-star reviews. And again, nice comment. It's a book you can read cover to cover on any airplane trip. Yes, absolutely designed for that. And it's a concise summary of how to put a value on data management in your organization. It's not a how-to book. It's more of a brainstorming book. It inspires people to do it on their own and contributes some cases back to us so we can write a version two of it. So let's move specifically now into leveraging data and how to account for it. And the first question is, does anybody know what the number 42 is? Now, since you're all on mute, you can't answer. I'll give you the answer. It is the meaning of life, the universe, and everything. And you say to yourself, I'm sorry, what am I doing wasting my time talking about philosophy in a data management seminar? What I did there actually was putting a fact, the meaning of life with meaning, which is the meaning of life. So 42 as data is a combination of the fact, 42 paired with, oh yes, I know that means the meaning of life. Now, for those of you that are thoroughly confused, pick up a copy of the Hitchhiker's Guide to the Galaxy, which is a wonderful work of fiction. As part of the subplot in that story turns out that the white mice and the dolphins are running the world with us as the experiment. They want to find out the meaning of life. They want to create a gigantic supercomputer. It runs for 300 centuries and says the answer to life, the universe, and everything is 42. So if you get nothing else from this seminar, you understand now that at least according to one science fiction writer, the meaning of life is 42. It's also my age, 16 years ago. Does anybody care? No, I don't think so really, but it happens to be that way. More importantly, you can see that in order to get data, we have to have a structure, a structure consisting of facts and meaning, to find out whether I'm old enough to buy beer in Boston. They can pull my driver's license out and look at the fact that my age was 42, 16 years ago. Boy, that would confuse most clerks, wouldn't it? And see that I am in fact a legal purchaser of alcoholic beverages in the city of Boston tonight. In order to get from data to information, though, we now have to start adding additional information such as what are our customers looking for in the way of information. That typically occurs in the form of requests. And then finally, to get really useful about it, we've got to find out how we can strategically use that information. So not just apply it, but really leverage the information. Now, one of the reasons I show this particular chart, unlike this model that Dan Appleton developed in 1983, is because it shows you that data is a necessary insufficient prerequisite for information. Information is a necessary prerequisite for business intelligence if we're going to move to the top. And yes, true story. I was in fact flown 6,000 miles one time to help out an organization. They said, hey, so what are we trying to do here? And they said, well, we're trying to build a data system. Actually, we tried to build a data system about 10 years ago. It didn't work, so we gave up. And then about five years ago, we tried to build an information system. It didn't work either, so we gave up on that. And now we want to build an intelligence system. Well, the problem with that is we want to just start at the top and build all of these things without the necessary prerequisites. And at that point, you might as well get on the plane and go back home because they have no more chance of doing that than they do of many other things that we won't talk about right at the moment. Leverage is an important concept for data. And leverage, again, is an engineering concept. So when we talk about leverage and engineering concepts, what we're really talking about is the ability to apply engineering concepts to this. And if we move this into our data space and say that there's our organizational data on the left-hand side, and here are our wonderful knowledge workers on the right-hand side, we now have the people part of the equation. We can then throw together some technology, in this case a lever, which we know works well. However, we also know that a lever works better with a fulcrum attached to it. So this is a little engineering concept here that we do. And we want to do the same thing by having our people leverage our organizational data. That process, then, is unfortunately replicated in many, many varieties and works extremely well at the work group level of your organization. However, when you try to get above a work group level, we tend to have less things that go into place. Reducing data rot helps to increase your data leverage. And 80% of organizational data is redundant, obsolete, or trivial. And consequently, it's a lot we can do to help shrink the size of the task that we're trying to do. Underutilized data, underleveraged data, means you spend more in manpower, you spend more money in money, you spend more in methods, and you spend more in machines in order to build all of this structure for your organization. There's a reason for this. And I'm a college university professor, so I'm going to feel free to take on and criticize the system. If you think about it for a minute, we teach students how to build new systems. Now, I'm sure that that's the first thing you all think about when you're building a new system. 80% not only of our costs, but our effort goes into evolving existing systems. And only 20% of our IT spend goes to build new systems. So when you're thinking about building a new system and it's an important system for your organization, absolutely every one of you run straight to the local college university and hires a fresh grad out of those programs and says, show us how to build these new important systems. No, it doesn't work that way. Putting fresh graduate students on new projects makes this proposition absurd. Only experienced professionals really should be doing this. And the question is, who's actually managing the data assets? So another aspect is that we teach them how to build new systems, which is wrong. We should teach them how to evolve existing systems. That is a useful skill. And everybody I've talked to says they much rather have us concentrating on that in college and university instead of building new stuff. But the other part of it is IT thinks that, sorry, business thinks that IT is taking care of your data assets. After all, it's called information technology, right? And IT thinks if you sign on to the system, their job is complete. So we've had this enormous chasm that has opened between business and IT into which data has fallen. Finally, there's a third aspect of the incorrect data focus, incorrect educational focus, in that systems development practices are simply set up to very well-support projects. But data is not a project. Data has no distinct beginning or end. In fact, when we look at it a little bit differently here, IT projects are set up very well to produce good results, and we've gotten quite good at the process of delivering IT. But data can only work that way if data does not extend beyond the boundaries of the project. However, we know that most data is in fact shared across organizations, and that that data evolves at a different cadence, a different rhythm, a different timbre, whatever you want to call it in there in order to do this. What this means is that data evolution needs to be separated from made external to and precede systems development, lifecycle activities. Because in our IT-centric world, we take just the right-hand side of the diagram and say we've got some strategy, therefore we're going to put some IT in place, and data becomes an afterthought. What happens then is that we develop project-specific assets in our data. In other words, our data may evolve this way and we may have a payroll and we'll have payroll data. Well, a lot of that payroll data is shared with finance, a lot of that payroll data is shared with marketing, so it's not strictly that. But what we do is we duplicate that data over in IT. And this works great until somebody says, let's tie it all together very, very neatly, and now we end up with the proverbial Gordian knot. Einstein had a word to say about this that we thought was rather profound. The significant problems that we faced at the level of thinking we were, sorry, cannot be solved at the level of thinking we were at when we created them. Because our charge originally was to create a data for the payroll. Now what we're saying is we want to use the payroll data both in marketing and in terms of finance. And it's not optimized for that. Consequently, it's more difficult to use. So a better way of leveraging our data occurs when we start off with our strategy and immediately move to the information and data layer. That information and data layer then should drive our IT project and not the other way around. There's a handful of organizations that are practicing this right now. And let me speak just a little bit further about that. This is not something that you're going to go to your organizations. St. Peter says you're doing it all wrong, so let's do it this way from now on. But if you've got 10 projects going next year, take three of them that aren't on the critical path and try it. Look forward to an A-B experiment and see if the kind of results that you want come from doing this. If you do, I feel certain that most other organizations will adopt these methods. So now let's talk about what it means to monetize. And really, it's a term really more of a part. Many people think monetize meant counterfeiting thing there. But what it really is, is that we're trying to take something that most people perceive as not being worth much and turn it into money. How do we go through that process? Well, our method here is a very interesting method that I learned over putting together a couple of different disciplines. One is the basic old re-engineering the corporation thing. What my camera taught us early on was that a task orientation means I do a task. There's four steps in the task and I go in and I first person may chop a nail, second person may flatten the head of the nail, third person may sharpen the point of the nail, and the fourth task is to put it in the box. So that's a task orientation. We can look at different tasks, different ways of putting things together, and industrial work should be broken down into the simplest and most basic tasks. But Hammer's process on here was to look at a process orientation. And that's to say that what we have is perhaps some value added tasks and some non-value added tasks. And that as things have grown over time, they have now become more or less valuable depending on how they have evolved. So reunifying the tasks into a coherent business process is a way of helping organizations to achieve better value from what it is they're doing. This can't be done, however, without identifying and abandoning outdated rules and assumptions that underlie the current practice. As I mentioned before, this is straight out of re-engineering the corporation, which is a book that has sold many, many more copies than all the books I've written from odd. Another piece that we use in here is an automated or semi-automated reverse engineering of processes. I'm showing an example here from a slide from a company called QPR out of Finland. Basically think of this as, I'm not sure what's happening in there, so I'd like to get some factual information. Once I get the factual information, I can then say that, oh, it looks like the first part of the process always occurs here, and the second part of the process occurs two days after that. So then somebody puts a measure in place and says it takes about two days to accomplish the first part of the process. This reverse engineering of processes allows us to infer process models in the same way that our reverse engineering of data allows us to infer data models. It's a very powerful technique. It does not give you the right answer perfectly every time. It is what we call semi-automated, as opposed to fully automated. The third piece of our little triangle was a cartoon, actually, that I like so much. I bought the rights to this one on this. And this is what we call the Sheena tax on society. So poor Sheena is driving to the airport very slowly. She's causing 120 other drivers to arrive each five minutes late to their destinations. So we can add up the cost of the Sheena tax to society, 120-person minutes times five minutes divided by hours equals 10-person hours for that part of Sheena's trip. Then, of course, she's going through TSA, and she forgets to take her laptop out of her bag, so she costs another 40 people three minutes a piece. We, again, multiply the 40 times three minutes for 120-person minutes divided by 60 equals two-person hours. Notice on the joke here, she's still wearing her shoes, so she's going to delay people further, but that's okay. We won't get into that. And, of course, she happens to be the person at the front of the plane who can't get her one piece of luggage out quickly so that everybody else misses their connections. Well, let's just add on to that again. 30-160 people times two minutes a piece is 240-person minutes divided by 60 is 12-person hours, and Sheena's tax on society is that she's killed an entire person day. Well, that's sort of a silly example, but it really did come out of a wonderful book. His example is not in the book here, but I saw the cartoon and knew it pertained to the book. Douglas Hubbard has written three of the most important books out there. They are in risk management. His failure of risk management is a must-read, and his book on pulse is the best big data book that I have read to date, and that's more than 100 books. The one we're going to talk about today, though, is the How to Measure Anything book. And he has a couple of bits that he comes out of it, which is pretty straightforward with hindsight. If you formalize stuff, it forces clarity. So by writing some things down, you can now make it better or worse on this. Whatever your measurement problem is, it's been done before. You have a lot more data than you think. You need a lot less data than you think, and getting more data is more economical than you think. So you probably need to think completely different about your data in the process. Now, an example that he gives here in the book is a bit that Enrico Fermi used to do to his classes in Chicago where he would ask the question, how many piano tuners are there in the city of Chicago, and know you can't use Google, and know you can't go to the phone book and count them? So you say, oh, my goodness, what does that mean? All right, well, let's just take some numbers. The city of Chicago had about 3 million people in 1938. The average number of people per household was two or three. That's an interesting number. I would have thought it was larger, but these are historical numbers. The average number of households with regularly tuned pianos was one in three. So how often does a piano need to be tuned once a year? How many piano tuners can a piano tuner tune per day, four or five? How many days does a piano tuner work, 250? So the answer to the question is the number of piano tuners in Chicago is equal to the population divided by the people per household times the number of households with tuned pianos times the number of tunings per year divided by the number of tunings per tuner per day times the number of work days per year. And you know what? That's not the right answer either. One of my favorite statements about data is one by George Vox that says all models are wrong, but some models are useful. Well, this is hopefully a useful model because now somebody can come along and disagree and say, oh, I think the current population of Chicago in 1938 was three and a half million instead of three million. Or I have good knowledge that a good piano tuner can tune 10 pianos a day instead of four or five. Well, now we're starting to make our model even more useful. And what I'm going to do now is put those together in a series of scenarios for you. So let's look at six specific cases that talk about how we go about the process of monetizing data management. The first case here is one that came from one of the state agencies that we run here in Virginia. And it was kind of an interesting one. This particular agency had about 300 employees in it who were spending a fair amount of time doing some tasks that were really not value-added. Now, when I say tasks that were not value-added, what they were doing was keeping redundant sets of timekeeping records. In Virginia, one must, by law, fill out a state timecard. And that state timecard data went into a state timekeeping system that covered the entire state and was with most large systems produced bad quality information late so people never used it. However they were supposed to and they had to, in fact, get it set up to do. The second system that they had to do then was another system in there that VDOT put together. And they said, well, the state system produces bad quality information late, so let's make our own system for all the VDOT employees. And guess what? In those days, they also produced a system that produced bad data late as well. So it was, again, not helpful to it. The only system, in fact, that worked and worked well was the system that they did at the workgroup level. And every workgroup understood when people were going to be in or out because those were your colleagues. Those were the people who were going to have to work with or without if you were going to miss work from an employee, sorry, for a doctor's appointment or sick child or something along those lines. Well, that's right. In the summer, let's not forget vacation here as well. So when we look at this, VDOT challenged me and said, can you find $10 million? And I said, I think I can. And what we did is we put together a slide on this next page here. This is the slide that you're seeing. And we looked at the number of employees who were doing these functions. We looked at them by district and we also looked at them according to pay grade. Now, most of the time, you know, it's really rude to run around and ask people on that. So we didn't actually go in and ask them their grade, but we did find at least there were 300 people spending 15 minutes a week doing time and leave tracking for multiple systems that was clearly not a valuable system. Oh, by the way, I forgot to tell you there's a force system in all of this. And the force system was the group of people who were trying in this case to keep consistent information between their workgroup-based system, the VDOT-based system, and the statewide system. So this is a big bear. Now, I said you don't have to know how much they make because we can go to the public records and look up and see what an entry-level person at a grade four makes and put this number down. So when we used the numbers, we started to calculate how much time it took for everybody to do all of this work. You can see here at the bottom line, they spent $21,000 a month, excuse me, annually, $21,000 annually, keeping track of time, leave tracking, and $137,000 keeping track of their time. So you add both those $128,000 and $137,000, and that corresponds to the Lynchburg district on here. And we added it up for the other districts and came up with $10,000. This is the steady drip, drip, drip of unproductive workers doing non-value-added work. And we went to the General Assembly and showed them how much the savings were, and the General Assembly is now considering the process of eliminating requirements to keep up with those other systems. Yes, we need to know how much people take, leave, and things like that, but at the same time, making them do it four times every pay period seems a little bit non-value-added. Okay, let's look at our second example. This is an international chemical company that we worked with on and off there, about eight different projects with these guys. And their mission is pretty straightforward. They're trying to come up with engine and machine performance enhancements that help them burn the fuels cleaner, the engines run smoother, the machines last longer. They perform thousands of tests annually, and these tests cost often uptires of a quarter of a million dollars. Now, this was an interesting project. These are great people to work with, and the resources they had in their research division were about 100 PhDs in chemical engineering. I'm going to make it even at 100 just for simplicity's sake. So we have 100 PhDs in chemical engineering, and each of them is making $100,000 at the $10 million resource. Now, I have to tell you, we showed the organization this chart, and they said, stop, you're done. And we said, wait a minute, we haven't done what you've asked us to do. And they said, but we've never had a workflow process to understand what it is that these tremendously talented chemical engineers do in order to come up with new and exciting products for us. And we said, well, that's not why we put this chart together for you. First thing we put this chart together for you was to show you that the piece in the circle there gives a $100,000 a year resource, digitally taking information off of computer A and retyping it onto computer B. Anybody on this webinar, I'm sure, could have helped them with that problem and come up with a higher quality, more efficient process in order to do that. They were also using NikeNet by handing around USBs, and in fact, when we started, they were using floppy disks on this. So the USB problem is that you can only have data appearing in one place at one time. There's no opportunity to introduce parallelism in it. They are also having to manipulate files for specific purposes by hand because they didn't know what a macro was. They had synonym reconciliation problems. These were problematic as well. We also found that they were sometimes using macros, sometimes using macros correctly, sometimes not using macros that had been created. Again, their PhD is in chemical engineering. Why would we expect them to understand what an Excel macro was? And finally, of course, the last part about this, some of you have already recognized it, they were using what we call non-sustainable technology. That's an icon for a database called FoxPro. FoxPro was never made Y2K compliant, so consequently, you can see how old it is. Why would a chemical PhD engineer know anything at all about Y2K? So let's get to the results of this. And we started to add in some good data management that helped them to reduce expenses and make this research group more productive. How much more productive, you might ask? Well, the customer told us that they were $25 million more productive every year thanks to this data exercise that they did. Now, I can't tell you all of our clients are like that. I can certainly tell you it didn't cost them $25 million to achieve this $25 million in savings, but I'm willing to bet that if I could guarantee your organization would be $25 million more productive every year after we finished a one-year exercise, it would still be worthwhile to spend $25 million on that activity. Here's another one. This was for one of the military services. It turns out that when you buy a tank, you're basically buying 3 million parts, 3 million data values that go forward and hopefully work in concert. Now, we've been buying tanks for years and years, and it turns out that only one of these data items actually controlled whether a tank was obsolete or not. Now, when I say obsolete, if it's an old tank, we don't want our service members driving it, maintaining it, storing it, or anything else with it. We should, in fact, if it's really old, give it to our enemies, because it probably hasn't worked very well. Again, I'm not qualified to comment on that, so we'll leave that one where it is. But the point is, when you get this tank and you have 3 million data values, if you don't know which one tells you whether the tank is obsolete or not, you don't know whether to maintain that tank. And what do you do by default? You keep maintaining old tanks so that on the chance that they are useful, they will be as opposed to not maintaining a tank and then potentially needing it. Again, very reasonable mindset from somebody who's a risk-averse officer in this particular service. Well, we applied some, in this case, data quality techniques to this using an Informatica tool. I'm not going to go through here, but we're basically mapping things back and forth. And in this particular example here, we were able to show them that their inventory calculations were off by $5 billion. Now, I can tell you that is a lot of tanks. And again, very, very happy because we'd like our military to put more money into serving the country and less money into maintaining things that are clearly obsolete. Next example. This is Defense Logistics Agency. They're not all defense examples, but these two are. DLA had a project where they were taking millions of SKUs, stock-keeper units, items, things they were keeping track of. And the problem was that most of that data had been stored in what we call clear text or comment field. In other words, it was not well-structured data. The original suggestion was to sit down with people and go through each of these 2 million NSNs and figure out where they went. Well, that's a good job for somebody. It's not terribly interesting work, but it does not help with the overall restructuring problem. Interestingly, the senior executive for the government happened to be taking one of my classes, brought me this problem outside of class and said, could you help us with this? We said, of course we would. We developed what would now be called text analytics to help them convert the non-tabular data back into tabular data. And I was particularly proud of this. So while we saved the government a minimum of $5 million here, we literally also did our first-person century as savings. So this was a lot of fun. Now, you should apply automation to the point of diminishing returns. And when we look at that, we can look here and say, after four weeks of working on this project, we were halfway done at the end of week four. Well, how do you know if you've reached the point of diminishing returns? And remember, diminishing returns is where you get less out than you put in, right? If I was putting $1 in the bank and they were crediting me with $0.95, I'd find a different bank pretty quickly. Now, the only way you can do these calculations is to understand that one half the equation has to be held fixed. In this case, we held fixed the team of two of our data engineers working 20 hours a week. So this was halftime times two, which really meant one FTE per week in order to do this. If we know what that number is, we can then look and see should we keep going. The answer in this case was clearly yes. So we ended up with an 18-week build-out on this particular project. And if you look at the numbers here, the first week we didn't match anything. So it's also a lesson to say, let's make sure we keep expectations the way they should. But by the end of the fourth week, we had matched 50% of the data. No problem, that was really great. Then we also were able to look and see that at the end of the fourth week, 12% of the data was absolutely ignorable, right? It had no value in there. So one fifth of their database was absolute, complete, and utter rubbish. That's, by the way, low for most of the projects that we work with, but still not happy. And the unmatched project, this was the size of the problem space, was about one-third of the problem. So we kept moving on this. By week 14, we had actually gotten the unmatched down to 9%, and we said, should we go further? And the project controller said, yes, if you can get this one more piece of data, I think we can call it a project. So that one more piece of data took us an additional five weeks. Again, very measurable, defined, specific cost on this. And at the end of that five-week period, we now had the problem entirely solved. As in, 70% of it was completely done, 22% of the data was ignorable entirely, and leaving us with 7.5% of the original problem size. Now, what does that mean? Well, here was the original problem, and here's the problem that we had left to solve. It was clearly not worthwhile putting in any more time on that, so now we transformed it into a manual effort, and let's look at the cost to see where that $5 million savings came from. Once again, we revert to our spreadsheet. We had 2 million NSNs, or that's a national stock number, or SKU, a stock keepers item number. We multiply that times five minutes per, comes up with a total amount of time. We put in some work weeks per year, come up with minutes, person years in order to fix this. What you're looking at here is that 92.6 is really my person year on this, and we multiply that up times a $60,000 fully loaded individual, and we end up with $5.5 million, is what the original project would have cost. However, if I now go and change the numbers to the lower cost by adding the animation to it, you'll notice instead of 2 million NSNs, I now have only to calculate 150,000 of them. When I do that, that changes this number from 10 million minutes to 750,000 minutes. We work our way on down, and we end up here with the cost of cleansing this leftover 15% of the data, excuse me, 12% of the data, 7.5% of the data, there we go, at only $420,000 instead of all of that, which means we've now gotten our numbers really far down, and the savings, of course, come back here. We start out with these things, and now really the important part on this chart is that does anybody out there think they can do this in five minutes? And the answer is no. Nobody can do it in five minutes. So that's when you say, well, what number should be there? If it's 10 minutes, it's three person centuries and $15 million, right? If it's 15 minutes, I'm sorry, it's three person centuries. If it's a half hour, we're looking at a lot more money. So these are necessary but insufficient costs that must have been accumulated on here in order to show these very large savings. These are the kind of hard numbers that your management will pay attention to. Let me take you to one more example. And that example is British Telecom, interestingly enough, trying to educate people a little bit about their data management exercises. It's one of the best articulations I have ever seen. So let's take a look at it. 38 second flash animation. And that flash animation cost them 250 pounds, British pounds in order to come up with. It's a wonderful articulation of the value of, in this case, a mass data management initiative. And that little animation was mailed by the president of British Telecom to each of the thousands of employees that belong to British Telecom. We could track the number of them that read the message, that played it. Sometimes they played it more than once. And when we did follow-up surveys, we were able to tell that their employees were able to name some of the seven sisters, which is their alliterative name that they used to describe the technology effort. Again, money well spent in this particular instance. Let's now move on to a couple of non-monetary pieces because there are things that are out there more important than dollars. First one is a very tragic example. I want you to imagine some of our troops who are off doing something somewhere trying to get the bad guys. Now the way they do this, one of the ways they do this, is that they light up a target by pointing a laser beam at the target. And then they call an airplane or a drone and say, hit that thing that's being lit up by my laser beam right now. It seems like a reasonable way to do it. Well, it turned out, in one instance, the laser pointer that they were using ran out of batteries. So it's called in the airstrike, they've lit it up, and then the thing runs out of batteries. They change the batteries. Here's where it gets tragic. The batteries then didn't tell the user that they had reset the coordinates from what they were pointing at to the place it was. And we dropped a bomb on our own troops. Well, again, not a very good way to do it. I am appalled that a vendor would put a combat-ready piece of equipment out there without notifying the young people who were using this. It says, by the way, when you change your batteries, you need to start the process over again before that bomb will come raining down on your head. Another military example here, not all of our stuff, it's about a third that we do here at Data Blueprints, about a third federal sector, about a third commercial, about a third non-government organization. We've got some good non-government stories here as well. But this one was one we were particularly proud of. We were working on the military suicide mitigation process. And you guys probably are aware that unfortunately more of our soldiers, even today, are being killed by their own hand than they are from bad guys. That tragic thing, we have to stop it. We happened to be there and got this project. We were doing data mapping, trying to find sources of data and finding the best source and getting a look at it. And it was a little bit kind of hard to follow. I mean, it was sort of all over the map. You know, this is just very, very difficult for us to pin these things down. We ended up with a room that we called our Council of Kernels. And we had a group of kernels that had come in, and were saying, we're going to do data for this purpose. And we'll be able to, you know, make progress in order to do this. Well, this room full of kernels, while they were being helpful and correct, we clearly weren't going anywhere in terms of solving the problem. So I was able to call in a chit with the Secretary of the Army. And the Secretary of the Army attended a meeting here who came in to the room, put his portfolio down on the table, like that, in order to get everybody to pay attention. He said, Peter, why you've asked me into this particular meeting? Ladies and gentlemen, he said, I have an announcement to make. Anybody that wants to use our data to support our soldiers is welcome to do so. And anybody that wants to tell me why they can't use my data to serve my soldiers' lives can make an appointment with me. My door is always open. Any questions? As you can imagine, there weren't. It empowered the team. It told them they could do this and that mistakes would be allowed along the way. We were very able to move quickly into a prototype form. So in this case, the motivator, while it's not a monetary benefit, was saving soldiers' lives. And while the Secretary of the Army said, I'm probably not authorized to make the statement, but I just made. He only said this to me afterwards, not to the room. But it was the right thing to do. He also told me I could put this example in the book. So it's elaborated a little bit more in the book. It's there. But the point I'm telling you guys is that I have told this story to more than 100 private sector CEOs, and not a single one of them would say, okay, I'm going to do that for my organization. I'm going to say it all, the data belongs to Company X, whatever Company X is, and that we should all use it together. It's a very simple step, but it is not happening out there. Very, very problematic. Well, back to our story here about suicide mitigation project. These improvements allowed us now to go in and determine the types of communication patterns. Some of the soldiers will voluntarily sign themselves up so that when their buddies see there's a problem, they will come after it and say, hey, let's intervene here before things get really, really bad. Final example on this? Yeah, we do have to pay attention to lawyers. So a necessary evil for our society here. So I'm going to tell you a true story that I wrote up in an article that was kind of an interesting one, but I want you to see how the data monetization piece was critically important for this case. So Company X on the left-hand side of the screen here was told by their parent company they had to implement, in this case, PeopleSoft HR. Now this was a temp company, a company that sold people out for, you might work an hour for this rate and two hours for a different rate for a different company. It could be as finely grained as that. And they went to PeopleSoft and said, hey, PeopleSoft, we don't, we can't use your standard module for time and attendance because the standard module assumes you get a raise once a year and that your pay rate is consistent throughout the entire year. PeopleSoft does make a special module for that. And they indicated that Company Y, in this case, was their preferred implementation specialist for implementing that specific module. Excuse me, module. So in July of that year, they contracted and said, okay, you can convert this stuff. You implement the software and convert the legacy data. By the way, we have to have it done by December 31 so we don't start the new year out with two sets of books that are being run. They began the implementation and in January, Company Y said to Company X, we didn't convert your data. And Company X said, why? And Company Y said, well, your data was bad. That's not a very good explanation, but here's an important point. If you are migrating your data from place A to place B and you don't know what it looks like at place A, how are you going to tell what has correctly been done when it gets to place B or not? So always, always take at least a small sample of your data so that you have something to compare it against. They kept working, believe it or not, for another six months. Finally, Company X stopped paying them and Company Y got mad and removed the project team so Company X filed in an arbitration request and said you worked on our stuff for a year. You didn't fix the system. We want to go out and take you guys to court. Now, in this case, it was governed by an arbitration clause. Do not sign a contract that has an arbitration clause in it. It only helps the contractor. It does not help the organization. Because now we have to explain not to a judge but to a couple of other lawyers who own the risks. Believe it or not, there was disagreement as to whether somebody who was the project manager was the data of poor quality. Did they use due diligence? Was their method adequate? And were the required standards of care followed? Now, the way this works is everybody goes off, not like a big courtroom drama on law and order, but it's actually rather boring. You write a report. And luckily, our report was able to show that Company Y's conversion code introduced errors into the data. That's a big bullet on that one. We also were able to show that Company Y data that had been converted by Company Y was a measurably lower quality after the conversion than it was before. That Company Y had caused specific harm by not performing an analysis of the legacy system and by withholding specific information around the project. Now, to give you a quick rule programming question, some of you may not be programmers, but nevertheless, I think you'll understand, if I'm writing in any programming language, if column one has an M, then set the value of the transformed bit to male. The wrong, wrong way to do this next is to say, and everything else is female. Now, we actually tricked the lawyers on the opposing team to do this. They kind of say, oh, okay. After all, you're either male or female or you're not. Well, according to Facebook, there are 63 different gender definitions right now. However, according to the law we were operating under, the Canadian Social Security System required us to maintain nine gender codes. They were male, female, formerly male, and now female, formerly female, and now male, uncertain, won't tell, doesn't know, and then, of course, male soon to be female and female soon to be male. These were required by law, so if they weren't included in there, there was no question that they had introduced errors into the conversion process. The proper way to solve this problem is to say, if column one is an M, then set it to M. If it's an F, then set it to F. If it's not an M and not an F, put it in a third pile so we can go identify what the data problem is and figure out how to sort it. The second aspect of the case was that PeopleSoft has conversion codes that they use, and when they do conversions, they don't want to add duplicate records to their database. That would be silly. So they have code in their conversion routines to prevent that from occurring. These guys dumbed that code out. They took that code out of the jobs and re-ran the job several times. In fact, so many times that when we went to the site and looked at it, instead of 6,000 employees that they should have, the customers that they should have in there, they had 63,000. And instead of 10,000 employees, they had 100,000 employees. There's no question that this data was of lower quality than the data that was originally in the system, and this was particularly distressing because this is an actual project document where they identified the quality of the conversion data as being a high risk. So they should have been paying extra attention to it. Now, if some of you out there are PMP certified, what you'll see is that there's a way of dealing with risk, which simply says if you have risk, you have to have a management plan around that risk and allow that management plan to be exercised on a regular basis. However, in this instance, they rewarded the company for doing it poorly. They got paid more for doing it wrong than they did for doing it right. The bottom line for all of this stuff was that the defendants ended up losing. I'm sorry, winning in this case. And it cost them 5 million Canadian dollars. Again, a very easy, tangible process to go through and put some monetizing around all of this. So what we've done here in the last hour very quickly is looked at a reason people need to do this. We in data management need to have arguments, articulations, ways of making sure that management pays attention to this. It has to do with leveraging our most important strategic, non-deflatable, non-degrading, durable asset. And I've given you six cases of how to put some dollars on it. And sometimes the dollars aren't enough. We actually want to get down to the return on investment. Maybe dollars aren't the most important thing for you. One of our clients is a group called Feeding America. They are the umbrella organization for most of the food banks in the country. And one of the things they've noticed is that when they get data from their food banks, they actually know where demand is going to occur. So they can do predictive analytics and forecast where peanut butter and jellies are going to be needed next week and put in place situations that will give people peanut butter and jelly. But the organization, really great organization, also said, I wonder what else we could do with that data. And they went and looked at the maps of the subway for this particular city, which was an interesting one. And these maps created food deserts. So they went to the city council and said, if you change this bus route by a couple of blocks, you could eliminate a food desert for literally thousands and thousands of people. The irony of all this is, I was visiting their offices a couple of weeks ago, and they said directly across from the Trump Tower in Chicago. So that's sort of an interesting thing. The food bank has to look at Trump Tower right across the street from it. Not that there's anything wrong with it. So we did an example on time and leave tracking. Excuse me, $10 million annually. International chemical company, $25 million. ERP, $5 million annually in person centuries. And the British telecom rolled out 250 pounds, not much. Again, the monetary examples. And of course, when lawyers are in the case, it becomes really important as well. Wow, look at that. Three o'clock right on the dot, Shannon. I couldn't have timed that one any better. So it's your turn now, folks. What questions do you have about monetizing data management? Peter, thank you so much for this great presentation. If you have questions, submit them in the Q&A section in the bottom right-hand corner of your screen there. And to answer the most commonly asked question, I will be sending a follow-up email by energy Thursday with links to the slides, links to the recording of the session, and anything else requested throughout the webinar. Oh, how about a link to the book? That we can do as well. Absolutely. Everyone's really quiet today. I don't know if the sun or sun is getting to everybody, or... Well, sometimes that's the way it works, right? Yeah. We'll give everyone just a quick second here to type something in. We'll talk about it coming next week, right? Yeah. In August, we're going to take a look at data structures and how important they are relative to all of this. In September, we're going to look at sort of what's the status of big data at this point. I love it. That is fantastic. And there's a request for your contact info, but I will send that as well in the follow-up email. I'll send all the contact information. And there's also a link to the books making sure that you have all of that. I think that's it, Peter. Everyone's just kind of... Oh, we got a question. John, hi, John. Peter, you didn't mention the cost of security breaches and such, like not knowing that SSNs are on a certain file being set externally. Huge potential cost avoidance there. Fantastic point. And the cost of data breaches, even if you think about just the retail, excuse me, the wholesale cost of doing security monitoring for a million customers per year, it's a tremendous amount of money. There are actually several good articles on monetizing or non-monetizing, but figuring out how much the cost of your data breach does cost you in terms of customers turnover and all the rest of the things out there. So I didn't want to replicate other people in there, but that is a great point. There are all kinds of costs associated with that. Thanks, John. Absolutely. And, Peter, have you found to be most effective in helping business executives see the need for our data strategy? We get that concept. Yeah. Yeah, the real key is, first of all, if you have an asset that has these unique characteristics, think about it for a minute. Most corporations say, our employees are our most important asset, right? There's a good dog bird on that. We certainly agree they are important in there. Yes, absolutely. But if you are managing data as assets, human beings get hold, right? We get ill. We retire. You know, all sorts of good things happen to us other than dying, right? I'm not trying to be positive about this. But at the same time, you know, there's no way that you can expect a person to do what they're doing for 100 years. It just doesn't work. Yet our data does work. I remember Pacific Bell a couple of years back, and they were showing me telephone numbers that hadn't been validated since 1890, but they were still in their system. You know, I'm pretty sure that phone's not in service anymore, but we've never checked it, so we're definitely not going to get rid of it. And the cost of maintaining that obsolete data is fairly important. Perfect. So what was the worst scenario you encountered with trying to tell an organization that wouldn't listen or accept more data management? Well, many organizations believe that it's being taken care of. And this really gets back to your strategy question as well, Shannon. When you look at what's happening out there, a person in an organization looks around and sees a title, Chief Information Officer. That must be the person who's taking care of the data, right? I mean, why else would they have that title? And the answer is because it's a very poorly named title. Truly, most Chief Information Officers are Information Technology Officers or Integration Officers, if they'd like to use those words to do it, they're not Information Asset Managers the same way your CFO is managing the fiscal assets of your organization, the same way your Chief of HR is making sure that you have the right knowledge, skills, employees set to do the jobs of your organization. And so we've just done a poor job in IT of figuring all this out. But as I said at the beginning, it's because accounting's been around for 8,000 years, and this has been around for maybe 150. It's okay for us to be presently immature. It's not okay for us to remain immature. And that's really our quest here is to help everybody get better about it. And by that I mean you, Shannon, as well, with the education you do through dataversity. Sure, absolutely. Those are great points. And have we reached the tipping point yet where corporate boards of directors understand things like data management and governance well enough that the business case becomes easier and practices are accepted as simply a cost of doing business with data? We're getting closer to this. The real big hit there was the target data breach because what happened was the CIO ended up resigning. They fired the CEO. And after a bit, they came after the board of directors. And when you come after the board of directors and the board of directors say, what, they're suing me as a director of target? What for? Well, they had an audit committee and a security committee and the security committee, they contended, had not done their work properly in order to do this. I would also say that leaving one-third of the servers with the default password on them was probably not a good way to do that as well. These were all things that were found out in the Brian Krebs analysis of this. Brian is a great author. I think he did one of those articles on the cost of a data breach. So I'll just say Brian Krebs on security is his blog. It's a really, really great site. Yeah, so, you know, and then what about the use of SSN as primary key, which was not supposed to be allowed? How do you get organizations to stop doing that? Well, I think one of the things that's kind of interesting, most people haven't noticed, but the Trump administration actually requested, as part of their attempt to prove that President Trump won the popular vote, they requested the voting records for all of the 50 states, including Social Security number, last four digits on that one. And 44 of the states have refused to turn over some of this information saying that it would violate their own state laws in order to do this. So there are some structures in place that are helping people to understand these, but it's going to take some time to do this. I mean, my mom recently wanted to change where her Social Security check was deposited and guess what ID they used? It's her Social Security number, you know? Of course, it is their number, so I guess they're probably allowed to use it. Well, you know, fair enough. So anything else that got some questions still coming in, you know, relates to data topic breaches? No, lots of thank yous, Peter. I think that's all for the questions for today. Well, thanks, Peter, for another great presentation. I hope you have a great time at the MIT conference. And thanks to all of our attendees for being so engaged in everything we do and participating, and we'll hope to see you next month when we talk about data structures. So I hope everyone has a great day. Thanks. Take care. Bye, everybody.