 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity Webinar, the Seven Deadly Data Sins. It is the latest installment in the monthly series called Data Ed Online with Dr. Peter Akin, brought to you in partnership with Data Blueprint. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Click the chat icon in the bottom middle for that feature. For questions, you'll be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataEd. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise link the recording of this session, as well as any additional information requested throughout the webinar. Now, let me introduce to you our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen his conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of articles and 11 books. The most recent is your data strategy. Peter is experienced with more than 500 data management practices in 20 countries and consistently named as the top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. Peter has spent multi-year immersions with groups of diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, I'm going to turn it over to Peter to get the webinar started. Hello and welcome. Hi, Shannon. It's great to be with you at the end of the full year of this today. And looking forward to talking to everybody a little bit about these seven deadly sins. So, as Shannon mentioned, one of the things that I've had the real privilege of doing is working with literally more than 1,000 now. We just keep the number 500 because nobody believes 1,000. Different data management practices. And from working with all of these groups, I've sort of synthesized seven things that really prevent organizations from strategically leveraging their data. And that's what we're going to talk about today. Now, I always like to preface these things, particularly when it's fairly new thoughts with a wonderful quote from George Box. If you haven't seen this quote, it's very famous. All models are wrong. However, some are useful. And hopefully these will be some useful things to take away with you on this. The data strategy really is the highest level of guidance that is available to an organization surrounding data and data type issues. And it focuses on data-related activities and on specific goal achievements that provide directional but specific guidance when faced with a stream of decisions or uncertainties about data. Now, we don't tend to talk about this as a one-two series. But on January, we're going to get into the broader picture from this book of what we're really talking about from a data strategy perspective. Right now, we're going to concentrate on an issue that's been problematic for us, which is just that IT hasn't got a great track record on this. One in three IT projects succeeds on the time, on the schedule, excuse me, schedule, on the functionality, and on the cost dimensions. Now, if my dentist was that bad, I would find a new dentist. And the reason for that is hugely problematic. It's that we teach everybody in school and university absolutely incorrectly on a number of these issues. And it's bad enough that college and university costs are going up. But then to find out that we're actually teaching you guys incorrectly is abhorrent to me. Let's just start off with a very simple example here as a way we describe systems to people. So if we're talking about IT systems, and we say that systems are comprised of people, processes, hardware, software, and data. But they look like all of these things are equal. And data really is not. In fact, if we think about data from a couple of different perspectives, first of all, a really wonderful set of charts that DOMO.com puts out every year. And for the year 2017, they had a couple of really interesting stats where every minute of every day of the entirety of 2017, Netflix managed to cram almost 70,000 hours of video onto the internet. Now, that's an astounding engineering feat. It also comprises in terms of what happened every minute, every day. A half a million tweets every minute. 15 plus million text messages back and forth. Three and a half million Google queries. And of course, 103 million email spam. That's every minute of the entirety of 2017. My point is that data is exploding in a way in which the other professions, people who process hardware and software, are simply not exploding. And if we don't start to take advantage of that and recognize that there is something different about data, we will never be able to achieve the successes that our organizations are asking us as professionals to achieve. In fact, I'd like to sort of finish up this section of it with a little quote from one of my favorite people in the world, a woman named Micheline Casey. And she had a really good way of winning arguments. It was very simple. There will never be any less data in the world than right now. And if you wait 10 minutes and say it again, it will still be true. That is simply not true about those other components of systems. And therefore it is important for us to understand how and what we need to do to take care of it. What we're going to talk about today specifically is that there are a lot of different sins that are out there. We're going to talk about seven deadly data sins that I've observed from a number of different organizations. And let's just jump right in. The first one is failing to address culture and change management challenges. Now what it really comes down to, and people will take this away from an awful lot of these meetings that we do with folks, is that they will come back and say, oh, this really is a people problem, isn't it? And the answer to that is yes, if you don't resolve the people issues around this, the culture and change management issues around this, then you have no chance of solving it because a fool with a tool is still a fool. Now let's dive into this a little bit further and see exactly what we mean. And this is, again, going to work in just about any context that we're working at. So as I mentioned, I've spent a lot of time with a lot of different organizations. And we'll see little bits. And I use this chart from Mary Lippert, even though it's almost 20, 30 years old at this point, to say what's going on. So if I see, for example, a gradual change in an organization, then I usually see the vision, the skills, the resources, and the action plan, but there's no incentive. Same thing, if I see frustration, I see the vision, skills, incentive, and an action plan, but I don't see the resources that are there. So this is a wonderful tool where you look down the right-hand side of the diagram and say what symptoms are we experiencing, and then you can figure out what's missing by looking on the left-hand side of the diagram. Now diagrams are interesting, but what it really is, is the bottom line, unless you have vision, skills, incentive, resources, and a good action plan, you are unlikely to get change. And why do we need change? Well, it turns out that culture is the biggest impediment to getting a shift in organizational thinking about data. Now, I know that sounds a little bit problematic and probably a little bit different from where you've heard it in some cases, but it is nevertheless true as far as my experience goes and the experience of these thousand companies that I've worked with over the years. For example, I also write books with titles that say things like CIOs are not really CIOs. It's a terrible thing to say outside of mixed company, and I don't mean that in the sense that they are not doing the job that they want. I know lots of CIOs and work with them on a daily basis and they kick me in the butt a lot too. But really, the idea that they are chief information officers in the same way somebody is a chief finance officer or a chief risk officer is not true. CIOs are enormously talented individuals that have an enormous amount of things on their plates, which means they have an enormous amount of complexity. I said the word enormous three times there. Do you get the sense? Yes, there's a lot that they do. What we really need to do if we're going to do more with data is we need to introduce yet another chief officer into most organizations. And Gartner has backed us up on this and said yes, they are going to get more and more of these organizations are going to be bringing CDOs into the place chief data officer. By the way, even that title is problematic because in many organizations, a CDO is a chief digital officer, which is absolutely not the same thing. So we've got this sort of tension going on. Now exactly half the CIOs I talk to say, you're right, I'm not doing enough with data and I'm really glad there's somebody else that is now here to it. In other words, they're looking at the new CDO and saying it's your problem now and not my problem anymore, which is probably the way it should go. On the other hand, exactly half of them are also thoroughly offended that I'm chief information officer. What else would I be doing if I wasn't doing this? And they are responsible for the integration, the information technology, all sorts of capabilities to make it run, but they are not really paying full time and attention to data as a resource and that's what we're saying needs to happen in here. This is why we have such big cultural changes. In fact, Mario Faria, who's been a good friend for many years, Gartner analyst, if you haven't read his stuff, I've got a reference to one here on this. So when you introduce this CDO into the organization, you end up with the standard feared out and uncertainty kind of things that go into here. If you Google this report, what chief data officers need to do to succeed by Mario, he will see this. The rest of his report out there again, very, very good analysts. It's the idea that we're bringing in somebody who's doing something that somebody else already has at least a ginkling of in their title. And how do we make major changes in organizations? Well, I don't wanna say it's easy, but there is a class of professionals out there called change management and leadership that help organizations learn more about what they're doing and try to get everybody to move in a different direction from what they have been doing in the past. Now that's incredibly important and if you're not paying attention to those issues, you will have problems with this because I've seen all the technologies and all the plans and all the strategies in the world, none of them can help you if you don't address these change management issues. Now I've spent exactly five minutes talking about organizational change management and leadership changes in here. If you'd like to learn more about this, I have a full article on this in one of the scientific journals that we paid for so you can download for free. So just take these slides when Shannon sends them to you and either snag it with that QR code in the bottom right-hand corner or click on the links, they are active and that will take you right to that case study and download it. Something else to take a look at in that case study too is that the organization that put this out, the information data quality journal that it's published in, if you look at the rest of the table of contents there, this one case study has had more than 10 times the number of downloads that the typical article in that journal has had. So there is a demand for this that's actually quite important that the academics who are not really tied into the real world understand that this is the kind of information that you all want to have. In fact, I'll tell you guys a little secret here. I've had 12 different companies come up to me and say, wow, you wrote about our company. How did you learn so much about our company that you could write an article that was this good in order to do this? Now I know when I get 12 companies coming up to me, they can't all be correct. However, they can all be in the same general direction pretty careful. So that's what I mean by failing to address culture and change management issues. The next deadly sin has to do with understanding sequencing. And sequencing in a data strategy context is very, very important. One of the main mantras that we use here at Data Blueprint is that we say you need to buy technology last. I know that sounds terrible to do, but unless you have the other things correct, it isn't going to make any sense. So that's one sequence there. Here's another one, though. I'm past president of Dame International. I love the organization. It's something in my life, working with it. And we've put out a body of knowledge back a while ago. And my critique here is the same one that George Box gives. We now have something that is good enough to critique. And we'd like to get the rest of the data management community together to help work with this. This is our version one of this format. And I said we now have something that is good enough to criticize. See, it's showing that these are the data management functions that we have within here, our data architecture, data development, data operations, et cetera, et cetera. But it lacks two important things from data perspective. The first one is optionality. So people will read this dim-bock wheel. We call it the data management body of knowledge or dim-bock for short. By the way, we can use some marketing help if anybody wants to help us rebrand these things. We would gladly take it. We obviously branded this one on the dim-bock. Those of you that are familiar with it, the project management is two body of knowledge. It's a great job of putting their body of knowledge together on this. We copied it and said we'll call ours the dim-bock. But anyway, people read this and they say, oh, it says I must do data warehousing. Now actually what it says here is that data warehousing is a part of data management. However, people read it incorrectly and say I must do it. I must do document and content management. So optionality should have been shown on this diagram in here. And another important data concept, dependencies. I will always start out and say that you should start off with metadata and data governance before you start any of the other things on this wheel in order to get these up to speed. So this is what we talk about sequencing. Don't start off with your hardest, biggest project. Instead, let's try to bring it up to speed and crawl, walk, and run our way to get to the top of this function. Here's another dependency function on here as well. By the way, we did update this. We've added another slice on the pie and taken off some words on it. So this is our dim-bock version two that we have in here. Here's our second then aspect of dependencies here. This is my home in Montpelier, Virginia. And I'm what's called a horse husband. For those of you that don't know what a horse husband is, it's a spouse who wears a very big t-shirt. And on her t-shirt it says I love my husband and underneath in very tiny letters it says almost as much as my horse. So it keeps us happy and keeps the horses happy as well. Now, I tell you this to tell you that I had to build a barn. And in order to build this barn, the bank gave us exactly this much money to build this much of the barn. What are we seeing? Well, obviously the foundation. And this foundation is critical. The bank was not going to give us any more money until we had proven to them that we had sufficient foundation. Before construction could proceed and building a good barn on top of a poor quality foundation, in order to prevent that mistake from being made, the bank says you can have this much money, build a foundation, prove to us that it is a good quality foundation and then we will give you money for the rest of the barn. My point here in showing you this is that there is no IT equivalent. And that is a major problem because if we don't have that foundational practice in place, it's very unlikely that we're going to build a good set of data practices on top of our poor quality foundations that we have in most organizations. In fact, this foundational piece actually translates back into something that many of you would remember from the old days, Maslow's hierarchy of needs. It starts off by saying if you don't have food, clothing, and shelter needs, the physiological needs, if they are unmet, then you will never be safe. And if you are never safe as a person, then you can never be part of something that is bigger than you love and belonging to something that is larger than you. If you're not part of something that's bigger than you, it's very difficult to develop your own set of self-esteem capabilities. And if you don't have any self-esteem, you have very unlikely, you're not going to be self-actualizing. This is the place everybody would like to be, self-actualizing, when they go to work and hobbies and playing with friends and all the rest of the things that go into this. So self-actualization is where everybody wants to get to. There's a new term for this. Now it's being called flow, right? But they forget the little beats that if you don't have your physiological needs met, your safety needs, your love and belonging and your esteem needs met, you are never going to flow anywhere about anything in this. Now we take these foundational practices and I'm gonna put them in context for data force as well. The most what happens in data is that everybody goes out and they want to do these really cool things with what I call the advanced data practices in the golden triangle of data. But that golden triangle really just represents a technological focus on the tip of the iceberg and that there's an enormous amount of foundational practices that need to occur underneath there. But I was gonna add some more things to the top of that pyramid. I would add Bitcoin and blockchain to it on there and they're just wonderful, wonderful pieces to go into there, but they do nobody any good unless you have a good foundation. By the way, I got a great thing off of Twitter the other day that says how do you explain Bitcoin to your grandfather? I want you to imagine running your car 24-7 produced solve Sudoku puzzles that you could trade for heroin. Great way to describe it to your grandfather. And there point is everything that's in that golden triangle there is a technology base and that what we really need to do is look at the capabilities of the organization, the people in the process, sides of things as well. And those are really foundational in nature if we don't have the foundation met, we are very unlikely to move onwards and upwards in that way. Now, it doesn't matter what we do in this area and how many times people have heard me make these talks, I still get calls here at Data Blueprint where people will say, hey, I know you take long to do it, but you haven't done it by Friday, can you do it faster? And the answer is, yeah, we can do it faster, but if we do it faster, it will take longer. If we do it faster, it will cost more. If we do it faster, it will deliver less. And if we do it faster, it was a greater risk to the organization. Again, another sequencing thing here, let's get the capabilities down well before we address the technology pieces on this. And let's finally get to one additional component here of sequencing. This sequencing piece has to do with the way most organizations decide they'd like to try and make their analytics journey. So I'm gonna put most organizations in this V1 quadrant and I might as well label the axes here. In strategy, there are only two dimensions. The first one, you improve your existing operation. The second one, you innovate. There are no other dimensions. We've done everything we can around that to prove it. And it is rock solid that that's the way it works. Well, most organizations don't have a formalized strategy in there, so we're gonna say that's not very good way to do it, but instead what we'd like to do is look at what's in quadrant two over there, V2. And that may be a improve operations piece. And I'm gonna just put Walmart in there because Walmart is generally recognized the world over at being effective and efficient. They are very, very good at that. Most people would also describe Apple as innovative. We'll keep that in there as well. If you're gonna innovate, that makes perfectly good sense. Now, my point here is, again, remember we're talking about sequencing here. Well, if I'm gonna take the people at Apple and I want you to imagine Johnny Ives, who's the erudite British guy that comes on and gives you the voiceovers on all their new products. And I want you to imagine the beauty that goes into this of the aluminum finishes and all these sorts of things. Johnny's a great, great designer. And I just want you to imagine him being told by Tim Cook that he's gotta be cheap the way Walmart is. Well, that's not gonna happen. That's not the way that stuff works. And conversely, I want you to imagine the guys at Walmart who are very, very good at efficiency and effectiveness. I've worked with these folks. They are phenomenal at what they do. And I want you to imagine them being innovative. It doesn't work. So, when you're looking at this journey, what I typically get is most organizations go, I'd like to do both of those at once. And that, again, is wrong. You need different types of people, different types of thinking in order to come up with this. And a way that works well for many organizations is let's get the efficiency and effectiveness part done so that we can save some money and use that money to fund the innovation things that occur out there. It doesn't do any good to say I wanna do digital transformation, digital innovation if you haven't got the way of supporting that particular piece. So, use your data to help drive costs out of something and put those costs into something that you can innovate. Let me give you one example of how that worked well for one organization. Now, this, first of all, comes straight out of the book. I mentioned that at the beginning. In order to do something along getting rid of those seven deadly things, you're probably trying to improve your organization's data. That's important because data points to where valuable things are. Data has intrinsic value all by itself and it has wonderful combinatorial value that we can also use to pull together. In fact, the biggest threat to the United States of America today is the combination of the Marriott Passport losing data breach that just occurred last week when I combined that with Ashley Madison and the OPM data breach because we now have people who have security clearances who are being problematically targeted for perhaps registering with a adult dating site in Canada. We'll go too far into that but I'll let your imagination use the next direction in that. Second thing though is even if you've improved your organization's data, you still need to improve the way your people use its data because most organizations haven't had any formal training in that area at all. If you need any check on that in your own organization, walk into a room and ask people to tell you when they learned how to use Excel, did they also learn that Excel had an automation feature called Macros to it? You will see one in 10 of those individuals raise their hands which tells you the state of people having to use data in your organization. It's generally at a very poor level because we haven't done much to formally educate them and we certainly haven't done much of it outside of the data science community. You use data to measure, manage and motivate change so you've got to use that to improve the way your people use data and of course you'd like to improve your data and only when you have improved the way your people can use the data that is now of better quality can you go and do what your boss really wants you to do which is come up with a creative advantage to your data in this. I'm gonna use the example of Rolls-Royce. Rolls-Royce is a really fine organization and they had an old model that they used to sell things to airlines. They would sell jet engines. They transformed their business completely to a new model that says instead of selling things to organizations they would sell hours of powered thrust so they've moved from a product line to a service line. That is an enormous transformation for most organizations. The name for this is Power by the Hour and they've been selling it around the globe for a number of years. We'll get to that in just a second but one of the things you might notice is that if that jet engine isn't making money for American Airlines or whoever they've sold it to it's not making money for Rolls-Royce either. They have shared interests and because they have shared interests they can now share a common dialogue about certain things and have other questions and one of the questions people always want to do is how fast can you change the engines on the planes? Well before when they were just selling engines to the airlines there wasn't a whole lot of room for a dialogue there. But now that they're both on the same side of the page they can look at a little lesson they got from NASCAR. I like the way you change the tire by hitting it with a hammer. I shortened the clip a little bit. This is the Indianapolis 500. Obviously a couple of years ago. So there's our measure, 67 seconds to change tires. Now we go to 2013 and the Melbourne Formula One races. And we now have a different measure. The different measure is four tires in four seconds so that's easily an order of magnitude improvement. And now Rolls-Royce can have those conversations with the airlines because instead of selling them things they are now partnering with the airlines in order to do this. When I'm doing this for groups that are live I ask them when do you think this really innovative business process was invented? And almost nobody gets it right, 1962. So well over 50 years Rolls-Royce has been celebrating this new way of doing business. It is absolutely phenomenal. It could not have been done without the sequencing we just spoke about. So that's our second deadly sin. Our third deadly sin is managing expectations. Very important in order to do this. And I just will point out today I thought this was kind of interesting. A colleague of mine actually presented this diagram in Spanish on LinkedIn today. So if you're out on LinkedIn and you happen to read Spanish then you'll be able to see a Spanish version of this particular diagram. The idea is that most organizations start out with some business needs and we take those business needs and then transform them into some sort of a solution. Makes perfectly good sense. When you think about it but if you think about it a little bit more it makes no sense whatsoever. Because it's leaving out the most important dynamic element of the equation which is where is your organization on its current journey. And only when you have a business need that matches your existing capability to deliver it should you then make that a strategic data imperative. Again, sequencing here is critical not just business needs. My tool with the fool's statement earlier I'll make one more piece along that same line which is to say would you hand the keys to your brand new Tesla to a 16 year old driver that had never driven anywhere and expect good results? The answer is of course not. I don't care how good your 16 year old driver is the Tesla is a beast of a vehicle and a lot of fun to drive. But you do not hand it off to a novice driver and expect good results with it. So only when we have a match between the business needs and the existing capabilities should we pick one of those as a strategic data imperative. And then our execution cycle a roadmap if you will that we're putting together also has to have a balance between providing some business value as well as some new capabilities on this. Only when we have a good balance between the two of those does it work. If I focus exclusively on new capabilities what we're doing looks to management like a science project. And if we're on the other side of the equation if we're providing nothing but business value then we have no ability to get better in our capabilities. And that's going to be business value that vanishes as soon as the consultants or whatever the flavor of the month leaves the organization. Many people have found this data quality framework excuse me data implementation framework quite helpful here. Again, please take it like to get attribution if you find it more importantly if you find a better way of expressing it send it into us. How we've gotten things this far is to have this community that works with us. Let's talk about another way of expectations here. And I work with a hypothetical group here of five data people who are each getting paid $100,000. So clearly it's pure fantasy but nevertheless five data managers that are getting paid $100,000. I always ask them do they feel obligated to demonstrate a half a million dollars in benefits to the organization annually. It is a reasonable assumption that's how much it costs them to have you on the payroll. So they should have a similar expectation that every year you can demonstrate to them that you have helped to save more than $500,000. Now I hate to give some dour news here at the end of 2018 but we are headed into every session sometime in the next 24 months. No question about it. It's as close to a certainty as we can have it. And unfortunately if people don't understand what the data folks are doing they are the first ones to get laid off in most organizations. So I would urge all of you that are listening here to help try to figure out how you can show at least what your salary is each year to management so that they understand that you are an investment as opposed to a cost. If you have trouble coming up with that I'll be glad to work with you on that and help you get better at it. Because one of the questions people like to do is they say when will you data people be done? Now my answer to that is we'll be done the same time HR is done. When you no longer need HR you will no longer need data people. That's a little bit hard of a leap for most people to say that but in this instance we'll say their CIO gave them five years. Well now the benefit that they have to show is two and a half million dollars over this five year period and that is absolutely even more difficult to show. You don't wanna start at the end. You wanna start at the beginning. Working your way through this and make sure that everybody understands what it is you do. Because again if management does not understand what you do you're considered to be a cost. Whereas if they understand what you do you're considered to be an investment. One last example in terms of expectations management and this is a program that we've been running at VCU here for quite some time. Now Governor McAuliffe back in 2014 started this program where we have lots and lots of students and we have lots and lots of state data. And we are taking this data and matching it up with the students and helping the students to understand more about learning what it is to be a data professional and in particular giving them some job and experience at munging their data. In addition to that however, we're also producing some very interesting results for state government. I'll give you just two brief examples. First one is that we have every automobile crash that we have here in Virginia cost the state taxpayers about a million dollars and we collect about 250 data points on them. And nobody had ever had a chance to go back and analyze that. So we had one group of students go through and take a look at it and find that they were, there are some areas of the commercial truck driver's license. Let's just say that could be improved that might end up reducing the graduate students or might end up reducing the number of fatal automobile crashes that we have in Virginia here. Another example of this too, we had a state agency that was doing child intervention. You just picture somebody coming into the family situation and asking a bunch of questions, taking eight data points essentially, excuse me, 80 data points from each intervention. And it turned out that took about an hour to complete those 80 data points as an interview, a structured interview. And again, a class of students went through and found out that fully half of those questions had no probative value whatsoever. Nothing that was ever done with them or helped out in any way, shape, or form. And we could reduce the interview from one hour to 30 minutes and that enabled the agency to go back and transfer more than a million dollars from administrative overhead into service delivery. Those are the kinds of things that will get you noticed in these organizations that help people to understand expectations around what's happening with data. Now, our fourth piece here is not aligning data programs with IT projects. Again, this is where we get into really, really bad education pieces. See, we've thought for years and years that in support of strategy, organizations should be implementing IT projects. I don't think anybody disagrees with that basic promise. But the problem with that order, that sequencing that we have there, is that the data and information become a afterthought. We have many state agencies that are looking at ERPs. I was talking to one agency the other day that said, well, maybe we should just swap out our EMR as well at the same time. Well, that's a five-year change. Now, the data and information become the tailwagon, the dog on this. It is not a good way to do it. What it does is it makes sure that the data is formed around the applications and not around the organization-wide requirements, that the processes are narrowly formed around them and you get very, very little data reuse under these circumstances. So how do we fix this? Do something we call data-centric development, which is just to say your strategy starts first, but then your data and information layer should be the next thing that is specified. On the advantages to this approach, then make sure that your data assets are developed from an organization-wide perspective, that the system support the organizational data needs and complement the organizational process flows and that you result in maximum data and information reuse around here. Again, the goal is let's just switch IT projects and data in terms of their precedence and we'll see better results. And I've got several organizations that are running empirical experiments right now. Hopefully, we'll be able to prove some results of these in the near future. If we look at this in a little bit more context, your data strategy is really all about what the data assets that you have in your organizations do to support strategy. And your data governor's feedback from that is how well is that data strategy working? Of course, your data strategy is useless without an organizational strategy, whatever that happens to be. So your data strategy is how is the data going to be used to support the organizational strategy? And if you're in Peter's world, you get to say that data governance then decides what IT projects move forward and at what speed, what cadence in order to do this. And we could put some feedback loops on there and get an overall complicated picture. I would never show that picture to management. I'd keep it at this level here. And keep it very, very simple and say, what did the data assets do to support strategy? Well, they have to be expressed in terms of business goals and that the language of data governance has to be metadata in order to support this. Let's get another set of these things up here. Robust means of sharing data. Now, everybody has heard of the term agile and I'll get to a little bit more on that a little bit later on and they wanna know why can't we do data as part of an agile project? And the answer is because data is not a project. Data is a program. I've already mentioned it once. Data program should last as long as your HR program lasting your organization. Tie those two items together. Nobody says are we done with HR and nobody should say are we done with data because data is a durable asset. It is not a project. It has an asset life that is more than one year. Reasonable project deliverables might be 90 day increments if you were doing an agile sprint. It might be two week increments, right? No problem there but your data evolution is measured in years and those things operate in a different cadence, a different rhythm and they do not sync up at the project level. Data evolves. It is not created. IT projects are significantly more involved in stable issues. Sorry, I went off on another thought but I've been working with some organizations on and off for 30 years and I can come back and show them. You should see how the lights go off in their heads. When I show them data models and things that we were doing in their organization 30 years ago and things we're doing today and they are largely unchanged. Your organizational data is much more stable in general than are your process flows that are in there. And that means that the only thing that you can do for agile development is have ready made data architecture components. If you don't have a good supply of ready made data architecture components and you're going to try and develop them as part of your agile process, the only possible outcome from that is more small piles of data, which means you'll be a data blueprint customer sooner or later, I'd say that in all jest but halfway in series. Again, these components are prerequisite to successful agile development. If you're in the middle of an agile sprint and you notice that a data requirement is imprecise or we're still incorrect, if you don't pull the plug on that sprint and shift over to a different one, you will have more small piles of data that results from all of this. Again, the only alternative is to create more small data silos. And there's a great agile joke here. I just can't resist throwing in. Wait, you're going to perform surgery without putting me under? Yes, there's the surgeon. This is agile surgery. We need to ask you about your symptoms and complaints after we open you up. Oh yeah, we also need to know what you want us to work on this first round. Boy, doesn't that sound exciting. Now let me show you how bad things actually are. If you look up the name Winston Royce, he's a very famous software engineer from the early days. I got to know him a little bit towards the end of his life. He's a wonderful individual. He had a little quick story that I'm going to relate to you very, very quickly here. It's the story of how systems were developed. Now Dr. Royce as a famous software engineer was always asked by people, can you tell me how to do this stuff? And he'd always say, no, my job is to do it. It's not to tell you how I do it. And so a reporter from Look Magazine took him out to a steak dinner in those days was considered a good enough bribe and he got him talking. He said, well, what we do is we do requirements and we get to the point of diminishing return. We hand those requirements off to a good designer and the good designer finds all the flaws in our requirements and throws them back in our faces. And we say thank you very much. And we work on another iteration of those same requirements. And then we hand them off to another designer, not the same one we did before because they've already seen the errors, but we hand them off to a brand new designer. And that designer, similarly, comes back and throws the incorrect ones back in our face. Hopefully it's the second iteration so they are less pure in this case. So we can do the third one. We get down more design and we may get to the point where the design is now good enough to flow down to our coders. And the coders will start to work with the design and eventually they'll come back and throw the design specs in the designer's face and say, I can't code this, you haven't been good enough about this. Making another version of this down here. We get an implementation. Again, we hand it to a different coder. They do different things. You get the sense here. Sometimes we have to go back up and fix the requirements because they're not exactly right. So we're moving our way back and forth and changing these iteration and flows here as we go through this entire process. Now, the process of doing all of these things repeats and repeats until you get to the point of diminishing returns. Unfortunately, what happened when we put this into a textbook, you'll see in a little bit, it kind of lost some of the robustness about this. But even if we're following this process right, we're still not doing it very well. And the reason we can tell that is when we look at following the money. Now when you follow the money here, you'll see that 20% of the money that we spend in IT is done on the things above the blue line. And 80% of the money we spend in IT is stuff we do below the line. Well, this is actually very, very problematic. In fact, we've done measurements and discovered that only 50% of the problems are detected after we've gone through this. So again, here on this chart, up a right-hand corner there, if it costs a penny to fix something during a requirements exercise, we know for sure it costs 50 times more to do that during the design cycles. And if we don't fix it until we get the coding, it costs 100 times more. If we don't catch it until after we've put the thing into production, it costs up to 2,000 times more. So these are enormous numbers that we simply aren't paying enough attention to and that have not changed significantly over time. We do know that 50% of the problems are only detected after we complete them. So that's a little bit of a problem. And yet I mentioned before, this whole process with all the wonderful iteration and everything that was described by Dr. Royce in this original article, when they went to try to figure out how to do this. Now I'm gonna jump you ahead. This was, by the way, this conversation took place in the year I was born, 1959. On this, a lotier, Dr. Royce described this. It was a very, very entertaining lecture that he would give to the classes that we had. And they were fortunate to have him as an expert to come in to do this. But they, okay, some people decided around 1968. It was actually NATO decided around 1968 that they needed to make a profession out of this. And so they went to the one article that had been written, which was the article that this Look Magazine reporter had managed to extract from Dr. Royce's head over this steak dinner that I was talking about. Now, long story short, this is the way it actually worked and the way Dr. Royce described it. And here's how we simplified it to show students. First you do requirements, then you do design, then you do implementation, then you verify it, then you test it, then you maintain it, right? Well, of course, what's missing here is almost everything. I mean, the absurdity of saying that you can develop and implement your software and your data in one set of iterations through this, what we call the software development or systems development lifecycle, is so patently absurd, it's just silly. That's why I threw Xs all through everything. In fact, if you think about this, the only way this can work is if we're not sharing data outside of this application. How many times does that happen? Virtually none in today's socially location-oriented mobile society in here. And if we're going to fix this, I could take this project, and I can put it there, and I can have a second project, and if I want project one and project two to exchange data, I actually have to set up project three in order to exchange data between project one and project two, because of course project one and project two are both gonna kick off 10% of their budget to do a data exchange program between the two of these things here. Oh, we know that's not the way the world works, and we also don't wanna have ourselves projected to death, so this is the reason you need a data program in order to do this. Shared data requirements require programmatic development and evaluation. Again, I've said this a couple times, the programs versus projects difference, your data program must last at least as long as your HR program in order to do this. Our sixth deadly sin here is the lack of qualified data leadership, but I find this to be universally true. Data as a subject is very complex and detailed and taught inconsistently and poorly understood, which means nobody wants to hear about what it is you're doing, which means data decisions are very, very difficult to get some airtime at the board level in order to think about it, and that is really where many of these subjects need to be addressed is at the board level. Think about it for just a second. My definition of a knowledge worker is somebody who works with data, and what percentage of them are taught about it? Virtually none, and what percentage of them deal with it daily? Every single one of them. It is a big, big problem, and the failures in data are really problematic, but they don't tend to be gigantic dramatic failures. I mean, sure, everybody's mad at Mary because now you've got to go get a new passport. Actually, you don't, but that's a different issue, but everybody's heard their passports were stolen with Mary, it's a little data breach that they have there last week. This bridge here, you might remember stories about the Tacoma Narrow Bridge. Just call me crazy, but I'm pretty sure it wasn't designed to be able to work that way. I have a can of Coke here in front of me, and if I take that tab at the top of the Coke can and shake it back and forth, we know that that tab is going to break off. And you know what? They knew this bridge was going to break apart as well. In fact, they knew it so well that the insurance company, SafeCo, set up this camera to watch the bridge failure so they could try and learn from what happened, what went wrong in order to do this. Now, the bridge failure was dramatic. Data failures are insidious. They cost organizations between 20 and 40% of all of their IT budget. There's some numbers just that picked up over the years. I just want you to imagine a query that's running 30 billion times a day. Okay, it's the last number, the 29,838,518,78 daily queries, that's a lot. And if this was the query that was running, I might ask the question, hey, have you guys ever heard of query optimization? Most have not. I can optimize that query from something very, very complex to something less complex. It's not easy, but it's easier in order to go forward. This process repeats hundreds, thousands, millions of times going through, which results in death by 1,000 cuts. That death by 1,000 cuts is really problematic. Those death by 1,000 cuts, mean nobody understands where in their IT budget they're spending money for data migration, data conversion, and data improvements in order to come up with that. These are really problematic. It's even worse, though, when you look at what we teach IT professionals. They are given one course on how to build a new Oracle database. I say that as an Oracle database because Oracle is the only company that still gives it to us away for free. All right, so we learn how to build one new Oracle database. Now, if there's a skill we do not need any more of on planet Earth, it is how to build a new database. I don't mean that that is a not useful skill, but think about who's gone through these classes. All sorts of IT professionals, including our leaders. And our leaders have learned from us in the university community that the only time you need data people is when you're going to build a new database. So if I'm going to implement a software package, I don't need data people. If I'm going to merge two databases, I'm not building a new one, so I still don't need data people. And if I'm implementing an ERP or any other piece of package software, I don't need data people because I'm not building a new database. Of course, there's the other side of the coin too, so they don't think what we do is important and they don't know that we can contribute to these other parts of the world. But if we've only taught them one thing about data, which is how to build new databases, the Maslow quote, if the only tool you know how to use is a hammer, you tend to see every problem as a nail. Get very, very important there. Is it any wonder that people go out of colleges and universities and when you have a problem as a business, they build you a new database? Well, again, we can hem and haul what we want, but we really need to understand that data leadership is a problem. We do not have a vast number of people that know and have these skills. I'm probably listening to talking to a good percentage of them on this particular webinar on this. Imagine, again, you're hiring panelists. Somebody's trying to hire a chief data officer in an organization and they don't know that they don't know how on earth are they gonna hire somebody who's competent at the top of that level. Now, job labels are really important and I'm very much opposed to specifically calling everybody a chief data officer because as soon as you say chief data officer, the first question that everybody else on that level comes back with is, are we sure we need another chief around here? And that's a huge problem. So I like to call them the top data job or enterprise data executive. If you need to call them chief data officer, it's somebody to be in charge of the data governance organization that is going to interface with specifically the top IT job in there. And I've got three characteristics that I wanna have specifically on this. Dedicated 100% to data asset leveraging, unconstrained by an IT mindset and reporting to the business, not into IT. Because if you're reporting into IT, you're going to be expected to report in a project management leveraging set of constructs and that simply doesn't work. In fact, for the most part, we find that CDOs worldwide are lasting about a year and then they're getting blasted out of there. So not exactly a recipe for success. It used to be we used to joke and say, CIO meant career is over, right? Well, CDO is just a little bit in that same category there. We wanna have somebody that goes in and can be successful in order to do this. Obviously, one of the things I'm recommending here is that the first thing a CDO should do is to in fact get rid of the seven deadly data sins that are out there. Let's get to the number one issue on all of this, which is really not understanding what we mean by data centric thinking. So I mentioned before the Agile Manifesto. This is a version of the Agile Manifesto that I repurposed for data, calling it specifically the data doctrine. The idea here is the same language. This is the language of the Agile Manifesto. We are uncovering better ways of developing IT systems by doing it and by helping others do it. Through this work, we have come to value. This is all the Agile Manifesto says and they have four tenants. I replaced their tenants with these four and I'm gonna walk through them very briefly here with you in the remaining eight minutes of our program. Data programs precede software development. These are two incompatible things. We talked about a little bit with the project versus program mentality. Again, IT is very, very good at creating new things that need to be made by IT, but data operates at a different cadence, a different tempo, a different rhythm if you will than the way this works out. Data evolves over time and that evolution over time means we need to separate and sequence these activities. Data evolution needs to be separated from, made external to and precede system development activities. A little bit more briefly, data management and software packages must be separated and sequenced if we're going to do this. That's what we mean by data programs preceding software development. Stable data structures need to precede stable code. Here's an example that I've used for many, many years, but it's a very good illustration of the concept. I have a business rule here on the left-hand side that I've outlined in purple. It results from the association between person and employee from one of the old DOD systems that I worked on a long time ago. And the business rule was one employee can be associated with one person. While that's a nice business rule, it did not in fact meet the Department of Defense's needs at that point in time. Those needs had a requirement that said that 30% of the DOD workforce was in fact working a second part-time job for the Defense Department. And consequently, we needed to have a different rule in place so that manual moonlighting would not be associated with that. Instead, what we wanted to have was zero, one or more employees can be associated with one person. I've made a slight change to the data modeling notation down there highlighted in red, simply showing that zero, one or more employees can be associated with one person. I'm gonna do exactly the same thing here for the other business rule that I've got on here. The first one, again, is moonlighting. Can we allow moonlighting within this? Second one says one employee can be associated with one position. Well, again, that's the way it is in some organizations that it's problematic. But if I've got a job situation where I'd like to have somebody work essentially eight till 12 and somebody else work one to five, I may want to have what we call supported job sharing in this case. So again, the data changes, zero, one or more employees can be associated with one position as opposed to one employee with one position. These are minor things but they're very good ways of illustrating how more flexibility in your data can permit your organization to achieve more business results from the process. So I'm gonna put the more flexible data structure on the left-hand side of the screen. The left, sorry, the right-hand side of the screen has the less flexible data structure that's there. So again, the moonlighting and job sharing on the left-hand side are more flexible because you can see the red and the multiples there whereas the manual moonlighting, the rigid data model manual job sharing is on the right-hand side there. And let's just, if I'm gonna write code for this, I'm gonna have to have two more structural loops that are required in the less flexible data structure than I am in the more flexible data structure. So it's a very simple illustration of how the dramatically these programs, the application software programs that we developed have to be coded only after the data structures are stabilized. We can stabilize them on the more flexible and adaptable pieces and we will always be able to meet additional requirements. Data structures must be specified prior to software development or acquisition. It is now considered to be best practices to ask your organizations that are purporting to sell you software packages to send you a copy of the data model over before. You do it so you can move that data model into your evaluation process as well. Third piece of this data doctrine is shared data needs to proceed completed software most people as I said before do not understand. If you look in the upper right hand corner of this diagram you'll see a tiny little gray area which is how most people perceive what we're doing in data. But if we start to use that data as it's properly done in a design project we then extend our existing data, make it more visible. You notice the gray is becoming a little bit darker. And if you repeat that process and make it obvious all the way throughout everybody will eventually understand how that works. Shared data structures cannot exist without programmatic development and evaluation of these data structures. If they don't have that they will not be able to do it and you'll have to reinvent them for every piece that you do out there. Finally data structures need to proceed reusable code. Again if I've got a application domain where the database that is gray is controlling program A, B, and C. The first question in organizations that happens is who's keeping track of the green and the orange databases. If they have separate DBAs that simply may not work out very well for the organization. But let's take it a little bit further. If we make decisions about changing a program I may have nine changes that I need to make at max whereas if I'm gonna change the data my upward theoretical complexity on this is N times N minus one or two or 36 programmatic changes because the data has to connect with all the other pieces of the data at the worst case scenario. So once again our simple thinking about data-centric thinking is that data programs need to proceed software projects. Sorry I'm gonna go back on that, hang on. Data programs need to proceed software projects. Stable data structures need to proceed stable code. There we go. Shared data needs to proceed the completed software and data reuse needs to proceed the reusable software. We're not saying that the things on the right aren't useful but we're saying we're going to value the things on the left more. And that's really what we mean by data-centric thinking. If you'd like to carry on a dialogue with this we'd love to have you in the process. There is a website we set up called thedatadactron.com just register there and we'll get you involved in the dialogue around this. So we're at the top of the hour and our seven deadly sins then are not understanding data-centric thinking. Not being able to get qualified data leadership not implementing a programmatic means of sharing data. Not aligning the data program with the IT projects failing to manage expectations, not sequencing data strategy and failing to address the culture and management issues. If we do these seven things then hopefully you'll have your organization doing a little bit of growth around data. Right now most organizations perceive data as the sort of that sign between business and IT. And if you do some of the things we've been talking about here today you may get to this level here which is to say that data really is something much bigger than both business and IT recognize and in fact if we go the whole way recognize it for everything we realize that IT and business are really just swimming in a much, much larger sea of data and that will allow us to properly put the time and effort that we need to have into data. So we're now at the top of the hour and I'm gonna open it up for some questions and get Shannon back on the line here. Remind you guys of some upcoming events including Enterprise Data World which is early this year in March so we're gonna be doing it a little bit earlier this year but I hope to see you guys out at some of these things and I'll get back to Shannon for some questions. Peter thank you so much for this great presentation as always just a reminder I will send a follow-up email to all registrants by end of day Thursday for this webinar with links to the slides, the recording and anything else requested throughout. A lot of great questions coming into the Q&A at the bottom right hand corner there already and so let me dive right in here. Peter why does the foundation not include a vision that sets a target or ideal state for the organization's data management capabilities? So what a great question. This is a partial description of the material that's in the data strategy book and that is exactly what would be driving this. In fact I'm not going to go to a different slide here. Shannon read that question a second time because I think that is a really good way of wording it. I just want to make sure people understand that I agree supportive overall here with it. Yeah why does the foundation not include a vision that sets a target or ideal state for the organization's data management capabilities? So I firmly believe that the foundation should in fact include that with a caveat. First of all the foundational pieces, the foundational practices that we talk about are really the five areas that I had at the base of the pyramid. So let me jump back and forth between that slide and this one here. The foundational piece is that I do agree with the person that says that we have that that we sorry put it on here. There we go, let's get it up and running. Get out of that. And there. And our foundational practices are data management strategy. So right there is actually that strategy piece that people were, the question I was asking that, but it gets glossed over an awful lot of the time. And so really to take advantage of what we're really talking about there. And I said I had a caveat that I put in there. The key for this is that I've found that most organizations are absolutely incapable of executing on a sort of global data strategy. So I've revised strategy and I worked with Todd Harbor who's my co-author on the data strategy book to take something that many of you may have heard about. It's from a book called The Goal and the theory is called The Theory of Constraints which simply says that in any system that you're working with there is something that is blocking you the most, find it, fix it and move on to the next. So that really sets up a cycle that we would do a data strategy focus that may only last 30 days or 90 days, sometimes a year depending on what's going on and that that focus should be the focus of data governance efforts. And when you fix that one, you go back and do another one. So again, what business goal is that we're trying to achieve in this particular cycle and then how well is that strategy working and we use metadata to manage that effectiveness. So thank you for over-pointing that out. It was a very good piece, a very good observation there. Indeed. So Peter, when you say buy tools last, do you mean build up your tool set as you go? So if we start with data governance, we might want a tool to help with that before buying anything else? Yeah, I'm really mean last. So apologies to Stan and Calibra and all the rest of the vendors that are out there with data governance tools. Yes, all of these tools are helpful. But what happens is when you buy the tool, people start thinking the tool is the solution. The tool is always a part of the solution. So again, not to pick on any specific tool or vendor or anything else, they have their place and most of them are extremely helpful in order to do this. What I'm talking about here is that you've got to get good with the capabilities around whatever aspects of the data management body of knowledge that I'm showing here on the screen are going to be important to you. So one of those data cycles that we talked about in the last question might be, for example, that we want to get better at doing what we call reference and master data management. Okay, that's the pie slice that's here at about five o'clock on the screen for you. Well, I've never seen anybody succeed in their reference or master data management efforts by buying a tool simply. What I see them get happen is that most successful master data programs occur with a focus on data governance and quality. I say that there's at least three parts of every, three parts of this diagram comprise a strategy cycle. So the first cycle might be let's get better with data quality, let's do it through use of data governance and let's put that data into something we're going to call a reference or master data management technology stack. Now, the other part of this that's so important is that most organizational maturity in these areas, and again, I've literally measured thousands of these. So if you want to read the papers at night, send me an email and I'll send you these papers. They work better than Ambium without any of the side effects. The idea is that rather than going out and spending lots of money with one of these very good vendors who will sell you a master data management technology stack, instead you can build your own. Every organization has a SQL server implementation going somewhere, and using SQL server, you can use their capabilities. And I don't mean Microsoft MDM solution. I'm talking about strictly SQL server. You can use SQL server to build yourself a technology stack. I've got a half a dozen customers doing it right now. And after using this technology stack for two to three years, then they are able to go in and have a conversation with the rest of the world about what technologies they should buy. But right now it's trying to talk to somebody, kind of like talking to somebody who does not have any kids about what it's like to raise kids. They just don't have the experience in order to do this. So that's what I mean by buying the tools last. Only when you have a good idea of what you expect the tools to do in your environment should you then begin to have conversations with these vendors about it because they are very good products and they're very good vendors, but there's a big gap between where the organization is and where your organization wants to be when it's using these technologies. I hope that was clear. Indeed. So any advice on quantifying ROI in this space? How do you help people understand that data governance and management is worth the cost of developing and buying tools? Gosh, Shannon, can you look up and see what month we're gonna do monetizing on this? Because that is exactly the focus of that. Yes, there's not been a lot of work done in that area, but I did some work on that and have come up with a short little book on 17 specific cases of how to put dollar values on some of these things. I alluded to a couple of things here. Again, if I've got a query running several billion times a day, having that query even just a little bit faster each time is gonna save you guys some time and money in order to do this. So starting to quantify these things is a really good practice. We're not good at it, but that doesn't mean that it's an impossible task. It just means that we're not practiced at it. And we should start off by measuring some of these things here. Again, a query that runs 30 billion times a day. Even if I only make it 10% faster, it's still 10% of a lot of things. So you can put some dollar costs on this in terms of human time and costs and things like that and start to get at some of these answers. Again, I don't know what month we're doing the monetizing seminar, but I'm sure it's coming up there somewhere. To what extent, Peter, can the data governance group help in exercising these data sins? To what extent should they? Fantastic question. In my mind, the data governance group is instrumental in doing this. Part of it starts with going back to your charter. Now, if those groups that I've been helping out with, many of those groups have a piece in their charter that says, by the way, we will determine where and when we're ready to implement IT projects. You know, again, it sounds like I'm at war with IT. I'm not at war with IT. But if we have governance preceding this, it has a much more strategic perspective on what's happening organization than strictly from the IT perspective. Think about it for a minute. IT cannot put a value on data. They just know whether you can connect to data. That is not the same thing. So they don't know what are the important aspects here. Data governance is an absolutely key role in this, in that data governance can say to organizations, you can buy, you can invest, you can move forward with certain things when we have done the right type of preparation. Again, I am not gonna go run a marathon like somebody like Karen Lopez who runs a marathon a week, right? No problem for her. She's in practice and she's good at it. I go try and run one of those and I'll end up on my face very, very quickly. Because I know I'm not in shape to do this. And organizationally, you need to develop those same kinds of skills. It's very critical that data governance, look at the health of the organization, focus specifically on business goals and metadata in order to come up with projects that are meaningful. Most of the time we find out when data governance efforts fall short, that they are falling short because people are overthinking the problem. For example, our good friend, both of us know David Plotkin very well. He's written a wonderful book on governance out there. It's I think it's data stewardship and he outlines 14 different kinds of stewards. Well, let's just take that as an example. I don't recommend to organizations, they start out by trying to allocate 14 different types of data stewards around their organization. Instead start out with just one and the one steward takes care of data. That's it, simple message, let's get that working for a couple of years. Then maybe after we get that going, we can look at bifurcating the stewards and say that some stewards are more supportive of a business role and other stewards might be more supportive of an IT role. So you have technical data stewards and business data stewards, but don't start there. And for goodness sake, don't start with all 14 of them. It never works for anybody. So Peter, can you share the name of the book again for monetizing the data improvement efforts? Monetizing data management out there at Amazon.com. Perfect, any other questions? And Shannon also at the at the university bookstore, right? Oh, no, no, it's all on Amazon now. Oh, it's all on Amazon. Okay, we used to have a university bookstore, so. Yeah, let Amazon do the things it's good at. Absolutely. Alrighty, any other questions out there? Everyone's quagging, getting ready for the holidays. It's shopping online right now. Yeah, nobody's shopping or playing Facebook while we're doing all this, right? But yeah, we will see you guys in January for the other half of this, which is data strategy best practices. And then hope to see everybody in Enterprise Data World coming up real soon. Yeah, in Boston. You've got it, looking forward to it. All right, everybody. Happy holidays. Thank you. Thanks, Shannon. Thanks, Peter.