 Hello and welcome. My name is Anita Kress and I am the Webinar Production Assistant for DataVersity. We would like to thank you for joining today's DataVersity Webinar, exercising the seven deadly data sends. The latest installment in a monthly series called DataEd Online with Dr. Peter Akin brought to you in partnership with Data Blueprint. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the Webinar. If you would like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right corner for that feature. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataEd. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise send a link to the recording of this session as well as any additional information requested throughout the Webinar. Now, let me introduce you to our speaker for today. Peter Akin, PhD, is a modern-day business guru, highly sought after as a keynote speaker and widely respected as one of the top 10 data management authorities worldwide. During the course of the career spanning over 30 years, Peter has revolutionized the data management discipline and achievement to which more than 50 organizations in 20 countries and myriad industries from healthcare to defense, telecommunications to manufacturing can attest. Outside of winning over crowds all across the globe, Peter is founding director of Data Blueprint. He has written nine books and dozens of articles. He is constantly traveling. Peter, where are you today? Thanks, Anita. It's great to have you with us. And I'm on my way to a very exciting event at the White House where they are paying lots and lots of attention to data these days. So I told Shannon, who's known me for many years, that I'm actually wearing a tie today. So we'll dive right in. Thanks. Great to have you with us. And our final event for the year, we've got kind of a special one. This is, if you will, an event that we're going to announce something at the end here that we hope we can have you all participate in, but calling it exercising the seven deadly data sins. And the occasion is that myself and a colleague are now going to bump our total from nine books to 10 books. We hope to have this out in the next quarter. It's called Your Data Strategy, making it actionable, concise, and understandable by business and IT. And the reason for that is because we are still faced with large amounts of IT project failures. You can see by the graph here, and this is information from the Standish group. I've given you the reference in the upper right-hand corner. But still, large numbers of IT projects still fail. And consequently, we have challenges, organizations that are still trying to get better at this process. And while we are getting better, the real question is, you know, what makes your organization think it's a lot better than most of the others? In other words, when you start your IT project, you have a one-in-three chance of it succeeding on time within budget with full functionality. If my dentist was that bad, I would find a new dentist. The reason for this is because data is perceived very differently by different parts of the world. Most organizations are project-based from an IT perspective and most business things are program-based. Data is perceived as this sort of thing that sits between the two of us, but we'd like to sort of change your thinking about that to a little bit broader focus. Again, IT largely project-based, but we'd want to extend that program aspect to include the data programs. And really, what we found is that there's a mismatch of impedance between data as people try to implement it as projects and business programs that need to be implemented that way. So let's dive into the material. First question that somebody asked us is that, well, if you've got a data strategy, how does that relate to data governance? And the answer is pretty well articulated, I think, from this diagram. Data strategy is what data assets do to support the strategy. And data governance is how well is the data strategy working. Data governance and data strategy are components of organizational strategy and that the data asset support for the organizational strategy is the focus specifically of the data strategy. In other words, how am I going to use the data assets of my organization to help the organization achieve its strategy, whether it is a nonprofit, a non-government organization, or a for-profit organization? The governance component of this then tells how data is delivered by IT and that those IT projects should be used to help support strategic implementation on an operational basis. We need to have some operational feedback in those loops and, of course, this is not the only aspect of strategy. However, in recent years, it has become an increasingly important component of the strategy. Now let's look a little more directly about how data strategy is implemented in organizations correctly. We are hitting a milestone this year at Data Blueprint. We're going to turn 18. I don't know if that's a good connotation or a bad connotation, but 18 years in a business is kind of good. And the 30 or so of us here are going to have a fairly big celebration around that. And the idea is we've had also 150 customers that we've helped over the past 18 years. And every one of them has had this challenge. So we finally broke it apart and said, what really needs to happen in data strategy is a two-phase approach. The first phase is a prerequisite phase. There are things that are necessary but insufficient to help your organization out. And in order to do that properly, these things must happen before we get to the iterative phase, the standard piece what most people consider as a data strategy. And what we see is most people implement data strategy without these prerequisites. That is a problem. There are three prerequisites. First one is to prepare your organization for dramatic change and making it determined to work well. Second is recruiting a knowledgeable, qualified enterprise data executive and other staff. Some people call this the CDO based on my book. I'm actually not in favor of the CDO as a term, but we can get into that a little bit later on. And the third one is what we're going to concentrate on today, eliminating the seven deadly data sins. Let's take a look and see what those sins are. The first one is not understanding data-centric thinking. Now, you have to understand this, but your organization has to understand it as well. I've already mentioned qualified leadership, and that is an issue we'll talk about specifically how to address that aspect of it. It's third sin, not implementing a robust programmatic means of developing shared data. And fourth, not aligning the data program with IT projects. Fifth, failing to adequately manage expectations around this entire process. Six, not correctly sequencing the implementation of your data strategy. And seven, failing to address the cultural and change management challenges that are associated with all of these. Let's dive in and take a look at the first one, not understanding data-centric thinking. The most important part of this is to take a systems perspective. We can talk about a system as a set of detailed methods, procedures, routines that can carry out an activity, perform a duty, or solve a problem. In other words, it's a group of things that you've organized for a purpose. A little bit more dictionary-ish, academic-type thing is organized, purposeful structure, regarded as a whole and consisting of interrelated and interdependent elements influence each other directly and indirectly, maintain their activities and existence in order to achieve the goal of the system. When most people think about systems, they think about five components. The first one is people. The second one is processes. Third one, hardware. Fourth one, software. And the fifth one is data. Now, many people look at this diagram and say, okay, well, all of these things have equal importance. And that was probably true as long as you couldn't make the following statement. But now, since you can make this statement at any point in time and you will always be correct, it does say that the data components are vastly exceeding the complexity of the other four components that are out there. The statement is simply, there will never be any less data than right now. And you can say that today, tomorrow, anytime in the future, and you will always be 100% correct. Do want to acknowledge that came from our good friend and colleague, Micheline Casey, who used it many, many times when she was making her speeches. So, shameless rip off there to Micheline, but thanks for allowing us to use it. Now, when we talk about data-centric thinking, what we're really talking about is the idea that data is a very different piece of how all this works. Let's move now to sort of the next deadly sin, which is to say that we don't have qualified leadership in many cases. If you think about it, many people on this call have gone through some sort of a program in information technology, computer science, information systems, computer engineering, and likely that you achieved one course. And what they taught you in that one course about data was how to build a new database. If there is a skill that we need less of on the planet Earth, it is how to build new databases. After all, think about it. With the only tool you know how to use as a hammer, every problem looks like a nail. And more importantly, our management has gone through this as well. And they get the impression that data is a technical skill that is only needed when developing new databases. We have good evidence of this. I've published some scientific papers where we have measured the distance between the top leadership of an IT organization and the data leadership. And there's a lot of evidence that shows specifically that the data people 20 years ago reported directly into the top leadership of IT. And that now many three-quarters are actually reporting two or even three levels down from that top leadership. The impression is clearly IT professionals believe that data is a function that you only need when you're developing new databases. And it's not strategic in nature. What this means is that if you're facing a company, and I've worked with companies that have faced each of these challenges, if we're migrating databases, we are not creating new databases, and we don't need organizational data management, knowledge, skills, and abilities. If we are implementing a new software package, we are not creating a new database. And therefore, we don't need those KSAs, knowledge, skills, and abilities. And finally, if we're installing an ERP, similarly also, we do not need to have data people involved in these. This is a problem because as long as management believes that the only function data people are there for is to build new databases. And they don't build new databases because they are migrating old ones, implementing software that comes with its own attached database, or doing an ERP, which of course has its own database, but they're not building a new one as a result. It is a problem. In fact, the only way we found to address this is to make changes at the top of the organization. All organizations have a top job in operations, a top finance job, a top marketing job, and the goal of the top finance job, for example, is to manage the fiscal assets of the organization. We need a top data job as well. The top data job could be called the top data job as far as I'm concerned. It could be called the enterprise data executive. It can be called the chief data officer. I don't mind, but it's important to have one person in charge. The question is, whose throat do I choke when something doesn't work well? Management always likes to have the answer to that question. And consequently, if you have an enterprise data executive, that's the place that you start right now. If we ask the top IT job to do the data part as well, they have a lot of other things on their plate that they are working with. This enterprise data executive top data job, CDO, whatever it is we're going to call it, needs to interact with the top IT job on a continuing basis, and I've already shown that it's through a data governance organization. The key for this top data job is that they need to be dedicated 100% to data, just the same way as the top finance job focuses on finance activities, marketing focuses on marketing activities. They need to be unconstrained by an IT project mindset. I will come back to that in a bit. And finally, data is a business asset, so it needs to be reporting to the business. The main reason for this is because IT doesn't pay the price or understand the cost of data-related errors. They just sort of get swept up into something went wrong type of explanations. The biggest example of all this, of course, is the recent US 2016 election. All of the models that we used to predict the outcome of that election were very good and correct. However, the data that went into those models was quite problematic. If you need a more detailed explanation, then that Nate Silver has written a wonderful column about what's gone wrong with that whole thing. By the way, he said he wasn't wrong. He said people just didn't understand statistics, and that's a fair assessment as well. The real key for this top job is that there has to be somebody who knows enough to develop a robust programmatic means, who does develop the leadership that we need to have around this. And I've already mentioned the CDO provides significant input to the technology job. The Gardner website back in January, before last, 25% of organizations would have CDOs by 2015. IDC said it's going to be 60% of CIOs will be supplanted by chief digital officers. Experian found that 90% suggesting the CDO will be there. And Gardner actually updated their presentation to say that 90% of large global organizations will have access to this. The question comes up, where are we going to find people to do this? Now, it's really interesting. I've been on this and my colleague at Todd Harbor on the book. Both of us have been on lots and lots of different job interviews in our lives, and never once in 30 years of doing this have we had a hiring panel that was in fact qualified to determine whether we could solve the problems that they were being asked to solve on here. The hiring panels don't know that they don't know. They think if you have, oh, let's just say a Python on your resume that you'll be able to solve the organizational data problems and employ data assets more effectively in support of the organization. It is not true. And as long as we have faced with that situation, it will be a challenge. They don't know that they don't know. One of the things we're suggesting to the U.S. Federal Government is that they share hiring panel qualifications across different agencies so that they will get a better talent pool of people that are there. Again, if they don't know that they don't know, they can't make good decisions. Another important characteristic, and you've seen that these prerequisites are absolutely non-trivial, there are no unicorns. You can try and find the one person who can do all of this, but our experience shows really that the first person at this is going to do some things, change some things, eliminate these seven deadly data sins, as we're calling them, and at the same time, they're probably not going to have much political capital at that point or energy. And so we really talk in terms of two phases and the enterprise data executive, particularly the first one, taking one for the team. So it's very, very tough to find this qualified data leadership. And of course, I want to shout out to my colleagues in DAMA and also at the CMMI, which are both organizations that we work with on a regular basis so that we can help to develop and increase the professionalism all the way around in this area. So that's our lack of data leadership. Now let's look at not implementing a robust, programmatic means of developing shared data. The first thing is to understand there are very key differences between projects and programs. The programs are ongoing, the projects and programs are tied to a fiscal calendar, and they are governance intensive, which means data governance makes very good fit into your programs and has been very tough to fit into your projects. The reason this is so critical is because when you look at how we have been teaching people in colleges and universities for the past 30 years, we have taught them something called the waterfall design method. First you do requirements, then you do design, then you do implementation, then you do verification, and then you do maintenance. I have a story that if you ever want to hear, catch me at a conference and I will be glad to tell it to you offline because it takes too long to tell here, although we'll probably put it down in some form in a book sooner or later. This, of course, method really is a work of fiction as well. First thing that you might notice is that it shows no iteration. So think about it just from a reasonableness perspective. At the beginning of the project where you know the least about it, you're supposed to tell everybody how long it's going to take, how much it's going to cost, and what functionality it's going to have. This is, of course, exactly the reason that we end up with the very poor IT failure rates that I showed earlier on in this particular system. Now, we teach them this, and that doesn't teach them the iteration. The iteration is the only way that you improve the products based on feedback at the next level. And so what we really should see here are multiple arrows going up and down, back and forth, etc., etc. But let's take it from an even more fatal flaw with this project implementation. And that is the idea that we can develop and implement software and data within this. It can only work when there is no sharing of data outside of this project. Let me be very clear on this. This is how you implement a project. It has a beginning and an end. If you are going to share data across projects, something outside of this project and the project you are going to share data with must occur. That is either another project or, more appropriately, it is a programmatic activity. To develop shared data structures requires programmatic development and evaluation and not project-oriented. If you are trying to do this as a project, the only result can be siloed data. Now, I joke a lot and say, but that's okay. If you don't want to listen to me, you'll become a data blueprint customer sooner or later. Sorry for the commercial there, but it's very, very much true. You cannot implement a data program using a project implementation method. It's impossible. The project owners are going to protect their budget. They're not going to pay for it. The process of implementing shared data structures requires programmatic development and evaluation, a program, and the program has to be overhead because it is shared across a number of projects. That is data thin number three. Let's move on to number four. Because of the project nature of all of this, we need to make sure that we align the IT projects and the data projects in a way that makes sense. Let's, again, take a look at how it has occurred in the past. Excuse me. In support of strategy, organizations say I have a strategy and I need to implement some IT projects. It seems very natural and reasonable. However, data and information are considered within the scope of the IT projects. That, of course, leads to the problem that I was just describing a minute ago. It means that the data is formed to the applications and not conforming to the organization-wide requirements. That whatever process architecture that you have is narrowly formed around the application. We call it an application-driven process architecture. And finally, very little data is reused, given the set of circumstances, because nobody in a project mentality has the incentive, the reason, the means to develop, in this case, reusable shared data structures. So you'll notice all I've done there is change the order. I moved strategy, excuse me, I skipped strategy, and I swapped data and information in IT projects. So what we'd like to do instead is say, in support of strategy, the organization should develop shared data-based goals and objectives. Now, the closer your organization is to a digital organization, the easier this is going to be to sell this. But this works for all organizations. If we need to understand our customers, then we need to be able to go off and develop data assets that support our customers. I can't tell you how many times, but it's been dozens of times, where people have asked us to come in and help them build a data warehouse that keeps information about their customers. And the neat thing about that exercise is that, yes, you can have information about their customers, but people really don't have an interest in their customers. They have an interest in what products their customers buy. So the business question is very different from the subject matter that people like to do this. And as an IT project, we have seen many, many organizations. We've helped them get them done. We've helped them kill them. We've done everything under the sun in terms of working on these things. Because as an IT project, they say I'm going to build a customer data warehouse, and they have lots of good information about customers, but they have no information in there about what the customers do. The question is, what are the business needs? And the business needs are, I need to know what products my customers buy. Because these data goals and these data objectives should be the things that develop the IT projects, with an I, of course, to organization-wide usage all the way around. Strategy first, then data and information, and subordinate to that are the IT projects. The advantage is that the data, the information assets are then developed from a programmatic organization-wide perspective. From the systems who support this, they complement the organizational process flows that are already there. And finally, we can maximize information or data reuse. I have an article that I'm working on right now. I'm not sure exactly where I'm going to publish this one, but it's called Data is Not the New Oil. Thinking about data as oil is one of the worst things that you can do, because of course we put oil in our cars. By the way, it's called gasoline when it gets to that piece, but let's not quibble over it. And we use gasoline up. Data cannot be used up. In fact, data's best use is not when it's used, but when it's reused. Now, the reason we want to focus on data and information reuse is very simple. We've talked about software reuse for years, and outside of the open source software movement, which is very well organized and very successful, there is very little software being reused, and that is an enormous challenge to us as we look through this. So again, our goal here is to say the IT program and the data projects must be aligned in a manner that complements. The IT projects have to implement specific shared pieces of information, and that should be the overall goal of our systems development activities, as opposed to focusing in on developing new piles of data for each specific IT project, which is more the common practice now, and as I said, the results are obvious for everybody to see. Our fifth deadly data sin, then, is failing to adequately manage expectations, and I can tell you many stories, but I'll just do one right at the moment. If you have five data managers, and will pretend that they're all paid well at $100,000 annually, do they have an obligation to show the business at the end of the year that they have saved the organization at least $500,000? My answer is yes, because if they don't, the next year, they need to be able to show the organization that they've saved the organization at least a million dollars. If those same people are employed for three years, then their overhead has become a million and a half dollars, $500,000 for three years. If we don't show people that we are contributing directly to organizational success, a smart CEO will invest elsewhere. Let me tell you a little story around this. The organization that I was working for in this case said, I've got five years. My CIO has given me five years to come up with payback. Again, just using these numbers, five data managers, $100,000 annually, that's two and a half million dollars that they were going to be able to show they had saved the organization in five years. Now, I'm not suggesting that five data managers can't save the organization a lot more, but what I'm saying is you have to show them. You have to shove it in their face. You have to make sure that they understand very clearly, because smart people making good decisions look around and make decisions in sort of a similar fashion. They look at what they believe the utility of the thing they're being asked to evaluate is. So in the old days, when we had only paper-based reports, remember the green bar paper that came off the line printers and things like that from the old days, we had a report that we weren't sure was being used. The easiest way to figure out whether it was being used or not was to stop producing the report. And if nobody complained after a day, a week, a month, a year, then we must have done the right thing by eliminating that particular report. Similarly, when management is looking around at the organization and they see five data managers who are costing the organization a half a million dollars a year and they can't tell what is the tangible output from those five very good productive professional employees and they cut them and things don't get any worse, it must have been the right decision. There's one other additional piece with this too, and I pointed this out to my sponsor on this particular project. They said, well, it's okay, my CIO's given me five years. What is the average tenure of a CIO? And of course, the answer to that is somewhere between 18 months and 36 months, depending on which numbers that you use. So what is the likelihood that that same CIO will be there to give the cover of that particular organization? Again, managing expectations is absolutely key. I have an entire book dedicated to that piece. We won't get into that now. That will come sometime next year when we look at the monetizing piece. Let's now look at an implementation framework here. Again, managing expectations. What many organizations do as they're trying to work with their business strategy and their data strategies that they say, what are my business needs? I'll understand them and then I will develop a solution. That seems to be a very nice problem result, right? So problem solution results is the way we want to do it, but it's wrong. And I blow that up. I burn it with that first graphic if I could go back to that one and put it in there again. The reason that's wrong is because you have to take into account the current state of the organization. How does the organization in fact work? The organization is only able to deal with certain things based on the maturity of the organization itself. We have to understand what those existing capabilities and only when we have a match between the business needs and the existing capabilities of the organization can we properly develop strategic data imperatives that we can then target for specific implementation? So again, managing expectations here. I'll give you a very, very brief example on this. We had an organization one time that had spent many, many dollars with one of the big consulting companies. There were literally three plaintlords of consultants that were brought in to manage the implementation of the MDM master data management solution that this organization had because they had determined a business need was that they had very poor measurements and very poor information around some of their master data items that they needed to have. The problem was once they implemented the software and the three plaintlords of consultants stopped showing up on the organization, the organization didn't know how to manage a sophisticated platform like master data management. And again, I'm not saying anything bad about the platform. The platform worked very well technically. But if people don't know how to use it, you start hearing things like, well, I didn't know where to put the data so I stuck it in the MDM. And right there, that's a warning sign. And what it means is that the organization doesn't understand the needs that it's trying to solve with this particular technology. And what we've seen is just another master data management, excuse me, just another technology solution. One of the other interesting pieces that we've seen, one of the states that I work with has implemented something called master data management, but they called it enterprise data management. And it's just the wrong label for it. So people are very confused as to what it is because most of us think that enterprise data management means something completely different. So they're afraid to stick anything in there because they're assuming it's going to go to everywhere in the entire state since it's called enterprise. Again, these are things that can be handled by making sure that you match the solution appropriately with the organization's capabilities. If you need one final example, the question is would you hand the keys to a brand-new Tesla car to a 16-year-old that had just passed the driver's exam? And the answer to that in most cases is absolutely not. You cannot expect good results from handing keys to a very fast sports car to a person who has had very little driving experience. Let's take it a step further now from the strategy. And again, we're talking about meeting expectations here. So the strategic implementation is we're going to start here and then we're going to move to a next level higher and then we're going to move to another level higher according to our roadmap. The roadmap then has to balance business value with new capabilities. If you are producing only business value, people who are producing good results but they're not actually learning how the organization works and this is the example of the master data management solution and the three plain loads of consultants that I described earlier, the organization itself was not learning the new capabilities that it needed to learn in order to implement this. Similarly, if we go back to the previous slide's example and say we're going to learn new capabilities for five years before we deliver business value, they think what you're doing is a science experiment and correctly they will cut you off. You have to have a balance between what you're doing from a business value perspective and what capabilities you're implementing as well. If you don't, you will not manage the expectations of the organization. Now let's get into sequencing, which is another very interesting aspect of all of this. When you go to sequence your data, it's very, very important to make sure that you understand how it can be done and how it can't be done. So the idea behind strategy, this takes us all the way back to Michael Porter, those of you that studied strategy in school, know that Porter was famous for distilling strategy down into two specific dimensions. You either make it better by improving your operations or you make it better by innovating. There are no other strategies. Everything else is a derivation of one of these two pieces. And when we talk about sequencing strategy, it's very important to understand what that means. First of all, 90% of organizations have no formal data strategy at all. That's been that way for years and years. Most people think, well, why do I need a strategy around it? And the answer is very simple. Data is our sole non-depletable, non-degrading, durable strategic asset. If we're going to use data in its proper fashion and we realize that our best data use doesn't come when we use it, like gasoline and oil, but when we reuse it, such that we are combining it with other things and doing other things about it, it changes our thinking on the entire process. Or most organizations are at version one. They do not have a formalized data strategy. Walmart almost always comes to mind when people think about improving operations. Walmart's operational excellence is absolutely worldwide acknowledged and referenced and copied. Walmart does a great job improving effectiveness and efficiencies. The entire process by which Walmart rolls out a distribution center and then builds stores around that distribution center to complement existing pipelines, delivery means that they have on their organization, is world-renowned and absolutely should be applauded for its brilliance of its efficiency here. Walmart does a great job with efficiency and effectiveness. However, if you ask Walmart to innovate, you don't want the people who are really focused on efficiency and effectiveness to suddenly turn around and say, now be innovative. It's a different mindset. And you would also not take the people at Apple that come up with the alleged innovations that they have there and tell them to be efficient and effective. It's the wrong use. It is absolutely critical from a data strategy perspective to say we are not going to do both everything at once, et cetera, et cetera. But we are going to instead pick either innovation, Apple, or effectiveness, Walmart, and concentrate on getting good there. There is a benefit, however, if you sequence it according to the way I've done here in this chart, which is to say that if you improve your operations by increasing your organizational effectiveness and efficiencies in V2, you can save some money and use that to fund your innovation projects. It's very hard to do it the other way. It's not impossible. You can come up with a new, innovative way and then go hire some efficiency and effectiveness experts. That's what consultants used to be called, were efficiency experts. So what you've got here is the need for organizations to make a choice. And when we talk about doing data strategy well, the last thing you want is the organization to say we're going to do it all better because we're company X and we're better than everybody else. Remember, you have to be significantly above average to even get more than half of your IT projects to come in on time. So do not try to do this as well. Hopefully that makes sense. We'll come back to that when we hit the Q&A section. I usually get a couple questions around that as well. As I mentioned before, only one in 10 organizations actually has a board-approved data strategy. So next one is failing to address cultural and change management challenges. And I want to give you a quick little deal bird here just to make sure we keep it a little bit light here. But as usual, Scott Adams has brilliantly hit the nail on the head here. We're hiring a director of change management to help employees embrace strategic changes, says the pointy-haired boss, PHB, to Dilbert and Wally, Alice, and Asuk, the intern, his engineering team. So we're hiring a director of change management. And Dilbert says, or we could come up with strategies to make sense that employees would embrace the change. The PHB says that sounds harder. Of course it isn't actually, but still, we need to pay attention to this. Change is difficult for organizations. We use the following complex change model by Mary Lippert to sort of figure out what's actually going on there. In other words, we use this as a diagnostic tool. If we see confusion, it usually means that there are skills, incentive resources, and an action plan, but no vision. If we see anxiety, there's usually vision, incentive, resources, and an action plan, but missing are the skills. If we see vision, skills, resources, and action plan, and no incentive, there's usually only gradual change. If we see vision, skills, incentive, and an action plan, but no resources, we see the organization experience frustration. If we see vision, skills, incentive, resources, but false starts, we usually see no action plan. You need all of them to line up in order to make change. And this is an important aspect of data as well, those of you that have seen my pyramids pieces. What it's saying is that the foundation is only as strong as the weakest link. And in this particular diagnostic tool, if any of these five pieces are missing, you will not have change. It will be very, very difficult. And of course, culture is the biggest impediment to shifting organizational thinking about data because we haven't, in fact, been able to get organizations to think about this correctly because we haven't got them to think data-centricly. We haven't got them to get a hold of the right type of leadership. We haven't got them to build robust programmatic means of developing shared data. We haven't managed to talk about educating people about the need to align data programs with IT projects. We haven't managed expectations well. We haven't been able to take the data strategy and break it down into something that is more useful and doable for the organizations. And finally, we have failed to address the cultural and change management aspects that we have in there. So what are we going to do about it? How are we going to exercise these seven deadly sins? All those of us gathered today are witnessing an actual unveiling. I know we don't usually do such dramatic things on here. But we took a look at the agile manifesto for some inspiration. Agile has been a very important development in the world and the most obvious component of agile implementation has been software engineering where it has done a good job. So we've created here, and this is a chapter in the new book. We'll talk about this aspect of it today, and then we'll do some more on the new book next webinar in January. But we really finally said we need to encapsulate all of this stuff the way the agile people did because the agile people did a wonderful job of doing all this. So let's take a look at what we mean by the data doctrine. I'm actually using the language of the agile manifesto. Very similar. We're uncovering better ways of developing IT systems by doing it and helping others do it. Through this work, we have come to value data programs that precede software development, stable data structures preceding stable code, shared data preceding completed software, and data reuse preceding reusable code. And I'm going to blow them up a little bit just to make it even more obvious. In other words, there is value with items on the right, but we value items on the left more. This is critical. If we help organizations to understand that just as agile software development methods have helped organizations develop better quality software more quickly, we can now start to do the same thing in the data environment and that these benefits will accrue not just to software projects, but in fact to systems overall. So let's take a look at how this works at a little bit more detail. Data programs, as I've already said, need to precede software development. If you don't have a data program, and again I spelled the word program here using the British spelling to make sure that everybody understands the distinction between this and software programs. Software programs are completely different animal data programs. Remember I'm talking about the difference between a project and a program. And trying to fit data programs into a software project makes about as much sense as what these fellows are trying to do here on that thing. Let's take a look at it perhaps in a nonhumorous way and see how that works. Software projects create capabilities for the organization to deliver data to a group of users. These new capabilities that the organizations have are something that they didn't have before. After all, why would you do it a second time? You might do it better with a new version, but the second time you still have to make sure people can actually get what they need to do. What happens instead though is that data programs have a different cadence, a different rhythm. Data programs are not fundamentally creation-oriented activities. They are fundamentally evolution-oriented activities. That is, your data evolves over time and at a very different cadence, rhythm, timbre, whatever it is you want to think of how to do this. What this means is that data evolution needs to be separated from, made external to, and precede systems development activities. If you do not, you run into that problem I had before. Data and software development must be separated and sequenced if you are going to achieve the kinds of results that you'd like to in your modern organizations. Data programs precede software projects. If you have a software project that is well-defined and you do not have a data program to support that software project, then your software project is at risk. The software itself may in fact work, but if you're filling the software with bad quality data or inadequate data in any way, shape, or form, the software cannot succeed. Let's move on to the next one. Stable data structures must precede stable code. If you think about it, if I'm going to change data structures, I can never stabilize the code of course operates on the data structures. Let's take a look at an example that I use quite a lot. This is an example from the Defense Department where we had a business rule that said zero, one, or more employees can be associated with a person. That's a very, very bad way of saying it. It's a true representation, but it's still a bad way of explaining to executives the need that you might have in your organization for up to 30% of the population actually moonlighting. So as I mentioned before, it's a DOD example. In the Defense Department we had at the time up to 30% of the employees at the Defense Department who were moonlighting working part-time for the Navy as well as full-time for the Army. The systems that accounted for those people had to account for the fact that a person could be zero employees, a person could be one employee, or a person could be multiple employees. And with 30% of the workforce falling into that category, it was absolutely crucial that they support this particular business rule. I'm going to give you another business rule here as well, zero, one, or more employees can be associated with a position. Again, in the Defense Department we could have a position filled by zero employees, a position filled by one employee, or a position filled by multiple employees. Somebody could work eight till noon and somebody else could work one to five. If the systems that we are supporting here that we are getting ready to implement do not support these rules, then both of those business rules must occur manually. And again, I'll give you this example here. It's just the same data modeled on with just some very, very slight changes. In this case, one employee can be associated with one person and one employee can be associated with one and only one position. So if I had two people filling a position, something extra had to occur, something external to the existing set of systems that was here, and something that was likely to be manual and error-prone. Similarly, if I'm going to do a person and an employee, I was not able to automatically account for their tax information at the end of the year and up to 30% of the DOD workforce had to be manually done. These data structures must be specified correctly and stabilized in order to proceed the stable code. If they were not, they were not going to be able to, in fact, utilize that process well in order to come up with this. Now, stable data structures then must proceed stable code. If they do not, we end up with multiple duplicative stable structures that do not support, I do not share it across the code, and this is a problem. Shared data must proceed completed software. What happens, of course, is that we start off with not much shared data, and I have a tiny little gray triangle in the upper left-hand side of your corner. Those of you that are only paying attention or are trying to drive while you're doing this, gosh, I hope nobody's doing that. You have trouble seeing that, and that's actually representative of the organizations that also have trouble seeing the value. And so in an individual IT project, we may go through some requirements, we may request one or more aspects of shared data structures from the organizational repository of data that we want to reuse over and over again. And the results of that question of, for example, what information do we have on people in our organization may drop down into that project. At the end of that project, however, notice that I am extending and making more concrete and more complete the organized shared data as a result of doing that project. Again, this IT project is complete, however, our shared data is not complete. So we move to another IT project, Lather, Rinse and Repeat, and we now end up with more organized shared data that is hopefully in time going to precede the completed software as the number of requests for the shared software increases, the utility of the results increases, data's contribution to all of these increases, and most importantly is recognized. Remember, we have to manage expectations well in order to do this. Data reuse then must precede reusable code. Think about it, a database is set up to run on a family of programs. So program A, program B, and program C absolutely could be in a project and have database for application domain one working with them. But unless you control that at the programmatic level, there is absolutely no ability for them to coordinate the activities of the gray, the green, and the orange database within this particular project. Finally, let's get to an example of how all this works. This is a customer that we've had for a while, and it's a several billion dollar chemical company that develops fuel additives to enhance machine performance fueling, sorry, burning the fuel cleaner, engines running smoother, and machines lasting longer, and they run tens of thousands of tests annually, costing up to a quarter of a million dollars for a test. Now, that's an important aspect of this. And this group of engineers, they had a $10 million a year cost of 100 PhDs in chemical engineering, each earning $100,000. So it's a $10 million investment in that overall process. They wanted to know how they could make them more productive. So what we did was, first of all, understood what their existing environment looked like, which was this. Again, you can see here it's just a very, very nice set of flow charts. And they said, actually, you can stop right there. We don't need any more information. We never understood what the workflow was, so you've done a great job. And we said, well, no, that's not what we did. We built this chart to show you some inefficiencies in the existing environment because you're not thinking data-centrically. First of all, you have very smart PhDs in chemical engineering who know nothing about data management. They're not expected to know anything about it. Taking digital data from one computing system and typing the same digital data into Computing System B. Again, anybody on this webinar could have solved that problem for them. Yes, there are ways of doing this, a number of different ways of getting them to work on all of this. Similarly, they also had very bad manual file manipulation duplication type problems as well. USBs, flash drives, spreadsheets, all of these things were moved around here. There was an awful lot of manipulation of these various files here. And this manipulation involved cutting and pasting and doing all sorts of things individually that were only known to specific work groups. They were never implemented as organization-wide practices around there. There were synonyms that needed to be reconciled. Again, think back to the Mars landing program where you had a group of contractors who were trying to land a spacecraft but they hadn't neglected to agree on whether they were flying the spacecraft according to metric or English measurements. And so there was a $300 million disaster that you can go look up on the Internet and find out all about. There were macros that were specific to organizations and to specific spreadsheets. People would actually walk down the hall and say, wait a minute, I need your macro in this spreadsheet. They'd copy all the data from spreadsheet A into spreadsheet B because there was a macro that they needed to get, not the greatest type of thing at all. And finally, how much did these folks know about data management? The answer was not a whole lot because they were using FoxBase as their program. So the overall solution to this was data-centric thinking which allowed them to come back and come up with a data architecture that reduced expenses, improved competitive edge, and the customer service, save time and materials. And most importantly, in this case, the organization told us that they were gaining $25 million a year productivity out of this thanks to this integration project. Now, I can assure you it didn't cost them $25 million to achieve that gain, but if you think your first one's going to work that way, bring it to us, we'll tell it as a story. It's unlikely you're going to be that good, but this is what you're trying to do when you're exercising the seven deadly sins which are not understanding data-centric thinking, lacking, qualified data leadership, not implementing a robust programmatic means of sharing the data, not aligning your data projects, excuse me, data programs with your IT projects, managing your expectations correctly, and aligning your data strategy with your implementation. Now, we're getting right around to the top of the hour and I said I had an announcement for you here. This website has just gone live today. It is called The Data Doctrine, and just like the Agile Manifesto, we'd love it if you guys would come along and join with us by signing at the signature line at the very bottom there so that you can see this. But more importantly than just joining us on the journey, we hope this is the start of a dialogue where we can, as data community, start to push back on the organizations that seem to think that lots of effort should go into other aspects of data. If you are not valuing your data programs over software projects, if you are not valuing stable data structures that precede stable code, if you are not valuing shared data over completed software, or not valuing reusable data over reusable code, you're not thinking data-centrically. And we'd like this to be a first step of an ongoing dialogue for you all to do this. So I'm going to try it right now. I've got my screen sharing here. There we go. As I said, this is where you trust your staff, right? So here we go. The data doctrine. Oops, I put in .NET. Sorry. Let's try it again. It comes right up. And you'll see I've got a couple of my staff who've already signed this as well. We would love for you to sign this and help join the community. One thing you will see when you sign up for it, we're asking for just information on how you guys actually do the sign up on that. And we'd love to stay in touch if you'd like to about more information on this data doctrine and data-centric thinking. And with that, we are back at the top of the hour, and I will turn it back over to Anita here for our series of questions. So thank you guys, and I look forward to hearing what you get to say about this. Thank you, Peter. If you have any questions, please put them in the Q&A section in the bottom right corner while we're waiting for those questions. I'll answer the most common one we get. We will be sending a follow-up email within two business days with links to the slides, links to the recording of the session, and anything else requested throughout the webinar. I'm not seeing a lot of questions today. My goodness. Sometimes you throw something new out there and people don't know how to respond to it. Again, the goal of this is always to start the dialogue. We're not going to claim we have the absolute core of data-centric thinking and all the rest of this all figured out. So what we're really doing is putting it out there and asking you guys to critique, help us improve this so we could play the jeopardy music, right? Yeah, everyone's so quiet. This may be new, you know, and that's okay. We're perfectly okay with that. I'm sure I'll get a bunch of stuff eventually when people are in the showers tomorrow or in the morning thinking about the data doctrine, but this is what we'd like to try. See if we can get you guys on board with this. Sounds good. So we've never actually sprung anything among them and needed most of the time. It's about education and they want to know about data modeling stuff and all this sort of thing. All right. Well, if you guys don't have any questions, do think about it. Get back in touch with us. It's really easy. We do have a question. Is there a process that can be followed? For data-centric? That's a good question. So we're not to the point of wanting to prescribe methods. If you have any seasoned years in this business, you'll remember that most case tools, computer-aided software engineering tools, enforced a method. And the method was not really one that most people like to follow. So no, we are not looking at this from that perspective. What we're simply saying is that if you've got your software project done but you don't have a data program, you're going to have some trouble. If you've got stable code and you don't have stable data structures, you're going to have a problem. If you've got software that's done but you don't have shared data, you are going to have a problem. And if you've got reusable code but you don't have reusable data, you're going to have a problem. Good question, though. Can you elaborate on the first deadly data send maybe provide some more examples? Oh, yes, let's see. So what we tend to see, it's very interesting from the data community, we often talk about people who get it or people who don't get it. It's kind of shorthand for saying people really understand the tenets of data. Now, those of you that know me know that one of my mentors is Clive Finkelstein and Clive was very, very good at explaining this to executives about why they should really invest differently in their IT. He didn't call it data-centric thinking. The word for it in those days was information engineering. But I think everything we're doing here honors his entire approach to information engineering, which was the idea that you could engineer your information and the same way that you engineer your software. So from a perspective of data-centrism, what we're really saying is that organizations need to be smarter about using their sole, non-depletable, non-degrading, durable strategic asset. And that if you don't use your data assets in that fashion and treat them with care and feeding and professionalism, you will not be able to succeed really in today's data-driven business environment. Now, a specific example on that, so what I gave here was what's the system, and the point of showing this was simply to show that not all parts of the system are equal. So when you have not all parts of the system being equal, you have to pay more attention to the rest of them. And that is a challenge for the organization in order to do this. Now, examples of data-centric thinking? Well, again, if you can say there will never be any less data than right now, it says that we should have some aspects of being able to do this from a more professional perspective, knowing that the data volumes are going to be significantly larger than the software that we're dealing with in order to work on all of this. So this is all going to be a very bit of complexity, and coming up with specific individual examples is going to be kind of hard in order to do this. So I'm not sure that's a good example for everybody, but we can give it a shot and see maybe some of you guys can come up with something like that as well in terms of examples. Again, that's the dialogue that we're actually hoping to start with this. Okay, sounds good. So considering the, we have another question, considering the application can administer the data via the administrators, do you need an MDM or data governance program if you centralize the data in a central location? So really what we're talking about there is functionality versus technology to implement this. What you need is an organization that, an organizational approach to managing assets. And in the context of managing your assets, there are a number of different ways of doing this well. An MDM may be a good approach, but a Tesla is a great car. It's not the one I would hand to a 16-year-old that doesn't know how to drive very well. So it's really a matter of saying where is the organization on its journey to maturity, to IT maturity, to data maturity, and then matching a solution that makes perfectly good sense. I'll elaborate just to touch more on that. We have a lot of organizations, and by the way, Master Data Management, which is a very good technology and it works, has about the same failure rate as most of the IT stuff does. So that's not very good. And the reason for that is because it's very good software, but organizations aren't experienced using it and implementing it. So you can have it implemented. It works. It's great stuff. But if the people in the organization don't understand the purpose of it, then you have a very big problem, a very big disconnect that occurs all the way around. Okay. Well, it sounds like data-centric and data ownership is of interest to more than just one person. So that's something that sounds like one person comment is worth another webinar. But there are a few... We can absolutely dive into that. Okay. And there are some more questions. Is there a metric of measurement for data program maturity? Yes. For maturity? Absolutely. Great question. The folks at Carnegie Mellon's CMMI Institute have developed something that we use, and it's actually on the cover of the book here. So if you're looking... Oh, you can't really see the graphic in there, but it is on the cover of the book. Those are one of the iconic images that we use. And the neat part about it, too, is the capability maturity model, which, by the way, was something that I funded out of the Defense Department early in my career to help everybody come up with these measures. If you just Google CMM, CMMI, and DMM, you will see it on the web there. But I think we actually have a webinar scheduled for that for next year. Melanie Mech and I are going to do that together. But let's just go to the specifics. It's a one-to-five-point scale, and we can measure objectively where your organization is on the journey so that a relatively immature organization can then be discouraged from investing in relatively sophisticated technologies in order to come up with that. And again, just like a teenager learning how to drive, it's probably good to give them an old beater pickup truck or where I started out, a Ford Pinto or something like that in order to learn how to drive and not the big fancy sports car. Better results are expected given that particular piece. So yes, there is a measurement framework. I actually have articles that have been published on that. You can Google them. Data management maturity. You'll see them out there on that talking about the state of the industry. If you've got specific questions about that, let me know and I'll certainly point you to the right place on that. Okay, sounds good. What is the best way to make use of re-engineering the data model of source systems to gain insight into how to achieve higher levels of shared data? Wow, I'm going to guess that's one of my students because there's very few people that ever ask me a question like that. Super, Anita. Let's make sure that we understand the question for everybody else. So you're in a situation where you are replacing or enhancing an existing set of data. That data has a data structure. That data structure is part of the data architecture of the organization. If you are going to change or enhance that data, it makes sense to understand it well enough to know that some things about that existing data structure are good and some things about it are maybe not so good. And it makes even more sense once you understand the good things to preserve the good things into your next data design and to eliminate the bad things that are in your data design. Failure to do so will keep you replacing the exact same software with the exact same problems over and over and over again. Let me give an example on here that is actually one that I talked about in the webinar. The right-hand side of this diagram shows a data model that was deemed to be insufficient for the use in the Defense Department because it meant that we would have to manually process 30% of all of the taxes that would be occurring for people every year. So we needed the system on the left, the data model on the left, in order to be able to manually process the 30% of the employees in the Defense Department who needed to have their stuff, their jobs, multiple jobs accounted for by the system. Now the question specifically was asking about software going forward. It is now considered to be a best practice when you are purchasing new software to ask the vendor to provide you with a logical data model of the system. If the vendor can't produce that software, excuse me, the logical data model for that software, then you may have a problem with the vendor. We found a very high correlation between inability to produce a stable data model and very stable software. So our stable structures proceeding stable code actually has been proven out many, many times over the case. But the other part of it is when you sit down with the models and look at this from a requirements perspective, at least for this organization, the U.S. Department of Defense, the model on the left provided better requirements than the model on the right. And so that model should be scored higher as far as deciding whether to use it or not. The alternative is that we might modify the model on the left to work like the model on the right, but that may not be the best thing to do given how the vendors are and how accessible the software is. Something easily customizable, like PeopleSoft or SAP, no problem. But you'd be amazed at the number of times we see PeopleSoft or SAP implemented without even these changes in it because the organization didn't know to ask the questions. Again, it's a case of not knowing that they don't know. What a great question. I wish I knew who did that one. Well, you'll know later. Where do you start data integration when you have silos of data sitting everywhere? For example, is it doing an inventory of what you currently have? How do you manage when you have identified different systems not reusing the data? Would you have to redesign and recode those systems? Super question. So, first of all, that is the possibility that you might, in fact, implement a deadly sin by not understanding data-centric thinking. One of the other tenants to data-centric thinking, although we haven't stated it especially here, it does come up in the writings, is that 80% of your organizational data is ROT. By the way, ROT is an acronym standing for data that is redundant, obsolete, or trivial. Given that, it's very clear that you should not try to do everything with all of your data. What you need to find out is what is going to deliver that business value and how you can unlock that business value for your organization. So, it's not just a matter of inventorying your data and then trying to improve it and integrating all of it, but, in fact, to find out which data is redundant. And with that redundant data, what you need to do is eliminate it. By the way, my wife, just to be perfectly clear and make sure she gets full credit for this, it's actually not ROT. But if I tell everybody that their data is riot, they think I'm laughing at them, so we'll keep it at ROT. Okay, well, we're wondering if there's any similarity to what you're talking about. Is there similarity or differences between the single DBMS for the company that was out in 1970s? That's a great idea. That sounds like Dave Eddy up in Boston High, Dave. The idea was, and Dave and I were both around in those days, where we thought we could take one database and make that one database hold all of the organization's data. That proved to be a very nice pie in the sky, but it was absolutely unattainable. The reality is that organizations are very complex, and IT is very complex, and it is completely unrealistic to expect this. The only company that I've even seen have a single ERP on a global scale was Nokia in the late 1990s, had a single instance ERP that was a beautifully engineered ERP that did all kinds of things, and the only reason they got that was because they worked for about 10 years to get to that particular state. But that ERP did not run all of their systems. The ERP really ran most of their global finance pieces. So by narrowing the scope, they were able to achieve greater application of the data. The key is to understand what data you need to manage locally and what data you need to manage globally, and then implement, through the re-engineering process that we described earlier, a way of making your data more effective and efficient. Remember, that's only the bottom half of that innovation chart. That doesn't talk about anything on the innovation side. That's the operational effectiveness and efficiency pieces. So we're actually hitting a couple of deadly sins on each of those. Great. Do you have any experiences you can share about helping government make these sorts of shifts and thinking contrasted with the private sector, of course? Are you there? It looks like Peter lost his sound. Let's just give him a moment to get back on the line. Not sure he knows. Okay, let's see if I can get him to notice. Okay, we'll wait one more moment. And if Peter doesn't come back on, then we'll wrap it up. And then we can, you know, give him the additional questions to be sent in the follow-up email. Let's just give him one more moment. Okay, we could see he's dialing back in. So if you want to hang on. Sorry for the problem with Peter's audio. If you want to wait a few moments, we'll see if he can dial back in and answer a few more questions. Now we see Peter. He's trying. Well, it looks like he's having some challenges getting on. So I'm thinking we should probably finish up the event. And so, again, to answer the most common question we get, we will be sending a follow-up email within two business days with links to the slides, to the recording of the session, and also anything else requested through the webinar, including some answers to the questions you've asked, that Peter didn't get a chance to get to. So thank you, Peter, for this great presentation and QA. And thank you all for attending and being so interactive. I hope you have a great day.