 Hello and welcome. My name is Monica Marcinko. I'm with the Digital Office at DataVercity. I'm stepping in for Shannon Kemp, our Chief Digital Manager. We would like to thank you for joining this DataVercity webinar, Exercising the Seven Deadly Data Sins. It is the latest installment in our monthly series, DataEd Online, with Dr. Peter Agen, brought to you in partnership with DataBluePrint. Just a couple of points to get us started. Due to the large number of people that attend the sessions, you will be muted during the webinar. For questions, we will be collecting them via the QA in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataEd. If you would like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle of your screen. To continue the conversation and networking after the webinar, just go to community.datavercity.net to answer the most commonly asked questions. As always, we will send a follow-up email to all registrants within two business days containing links to the slides, and yes, we are recording, and likewise, we will send a link to the recording. Now let me introduce you to our speaker for today, Dr. Peter Agen. Peter Agen is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of articles and 11 books. Most recent is Your Data Strategy. Peter has experienced with more than 50, 500 data management practices in 20 countries and consistently named a top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, let me turn everything over to Peter to get today's webinar started. Hello and welcome. Thank you, Ma'am. I got to play with everybody on this wonderful day in December here. Unfortunately, not at Data Governance Division. I was there yesterday in D.C., where Dataverse is holding a wonderful event, and if you're missing out on that, my understanding is that Mary Boy gave a wonderful talk. She's the Chief Data Officer for the FBI. It's great to see even the FBI coming into play and understanding that data education and data engineering and all sort of things data are important. So what we're going to talk about today are some prerequisites that you haven't been able to figure out but farther from the background data strategy that Ma'am mentioned there as well. A lesson that I've learned over the years, probably some of you have learned as well, is that if you have bad data plus anything awesome, it's still an ultra-poor result. In fact, the most pressing problem in technology today is to be thoroughly excited about the really wonderful potential advances in artificial intelligence. The biggest problem that we have at the moment is that there are very, there is an absolute lack of data to train the algorithms, and that presents a vital challenge that we have. See, we call Barbadian garbage out. There's garbage on the side of it, and you've got a analytical model, or we've got a warehouse, or a sort of machine learning algorithm, intelligence, blockchain, AI, I mentioned, and the end, data governance, analytics, this goes just on and on and on. All of these things, of course, depend on data. And if you have garbage data going into the process, you're not going to put good results. So only when we change the more fundamental pieces, again, it ends with various processes. If you have one or ten, it's also going to present challenges over in data analysis. Peter, I'm going to, your audio is very distorted, and we're having a very difficult time hearing you. Yeah, I think that was Charlie's one thing that just grew. Unusual that it would be back and forth, but hold tight. We're perhaps ready. That might be a little bit better, but you, let's keep going and see if that works better. All right, but let me know if it's not moving. No. No. Cut out and cut back in again. I was trying one more thing here. We apologize for the audio difficulties, everyone. We'll see if we can get this fixed in a minute. Very quiet. Not quite so garbled, but quiet. So we may have lost Peter, and I apologize for that. Stephanie Byrd has a very good question out there, and you think of examples and your experience where you've experienced GIGL. Christy has a very interesting response. When customers enter their own data, it's like a frequent flyer program. It can be interesting. I can see that. Thank you, Alexander. Getting structured data is so much better. Yes, I can see how that. When you have an organization that has no QA, QC protocols, you have to judge GIGL. That's interesting. We seem to have lost Peter's complete audio. So Belash has a question. I would be interested to know any organization on this earth free from GIGL. All right, can you guys hear me now? So much better, Peter. All right, we're reverting back to the old system. Sorry, everybody, took a minute to switch things around. But we are good to go. Now let's get back into it and jump in. I enjoyed you reading the questions. So we need to read the chat. We do pay attention. All right, here we go. So the first of the seven deadly sins is that 95% of all data problems are cultural as in process and people problems. One of the first things that people notice when they get a CDO or an increased emphasis on data is that there is a tension that starts to occur because the CIO is typically a person who, well they have the title chief information officer, but they've been doing lots and lots of other things in addition to trying to do with data. And data simply needs more attention. And in particular, it needs a singular focus in this case. We'll come back to that in just a bit. But just to give you a sense of how that was, when my book for the chief data officer was translated into Chinese, it was translated as chief data officer combat, which I thought was kind of interesting. I did not encourage them to use that, but that's how it actually translated. So it is a challenge to make sure that we've got enough people on it. And there are a class of professionals that we have called change management and leadership professionals that can help us with this. If we ignore the talents of these wonderful individuals, then we risk our data programs regardless of what the goal of the data program is. And this is true of course across private sector as well as public sector types of organizations. When you're trying to figure this out and make it change. We're seeing your presenters view with two slides up. I'll switch it. There you go. Thank you. We do rehearse these things, guys. Everything was all set up when we started. Great catch, Monica. All right. So the key to this is that making sure you have change in an organization requires a number of things to line up. In fact, according to Mary Lippert, whose research we've worked with for a number of years, there are five things that really need to occur here. If I see an organization that's got confusion, but they have the necessary skills, incentive resources, and an action plan, then I know they're missing the vision. If I see that they've got the vision, the incentive, the resources, and the action plan, but they're missing the skills, they're anxious about that process. If we don't have incentive, we will end up with gradual change. If we don't have resources, we'll have frustrations. If we don't have a correct plan, then we will have false starts. So only when all five of these things line up, just like keys and a lock, although I guess we're going to have to explain that to the young people because locks don't work that way anymore, then you have change happen. It's not four out of five. It's five out of five of these. And as I said before, culture is the biggest impediment to a shift in organizational thinking about data. We're not going to dwell on this particular aspect of it, but there is a free case study that you all can download. You will get this link embedded in the set of documentation that Monica will send out at the end of this. And if you're interested, you can just go and download it, read a little bit more about it. So that's our first deadly sin. The second one is not paying attention to sequences. Now, I know that sounds sort of vague, but let's take a look first of all at the DIMM BOC, which is our data management body of knowledge. We named it after the PIM BOC because the Project Institute of Management Body of Knowledge is well known within the technology industry, and we do get a lot of cross-pollination between the two. In this particular instance here, this is our second version of this, so we've revised it a little bit, but we actually took some of the detail out of the initial presentation. I kind of like the first version better. But the idea here is that these are foundational disciplines in most instances, but even our DIM BOC doesn't talk about the sequencing in which these things should occur. They really do leave that up to you all as the experts. However, I think this is an area that we could provide some additional guidance around. Let me just give you an example. I live in Montpelier, Virginia, which is about 100 miles south of Washington, D.C., and I'm what's called a horse husband. You may say, why do I care? Well, I'm telling you a story about the barn that I built for us. I say built. I didn't build it. I borrowed money from the bank and had some very competent people do it because I am not competent enough to build a foundation, much less a barn. Now, why do I have these pictures of a barn and foundation and why am I showing them to you? Well, first of all, this barn foundation is a required part of the loan that I got from the bank. They said, we'll give you exactly this much money and before any further construction can proceed, you will stop and have this foundation inspected by the county inspector. The county inspector came out, checked my foundation and said, yes, your foundation passes inspection. You have sufficient foundational material in place. The foundation is solid. And why does the bank do this? Well, the bank does this in particular because they know that at least with horse people, if I build a good barn on a poor foundation and I have problems with my animals, I'm going to spend the money on the animals before I pay back the loan. So this is just good business for the banks. However, there is no IT equivalent of this. We need to make sure that our data foundations are in fact solid before we try to build almost anything in IT. And these concepts are foreign to most of IT. It's in fact so strange that I like to go back and talk specifically about Maslow's hierarchy of needs. Many of you remember this from high school. If we have physiological needs that are met, if we are short food, clothing and shelter, then we will never be safe. And if we are not safe, then we will never be part of something larger than ourselves. And if we're ever part of something larger than ourselves, it's kind of hard to develop our sense of self. And finally, of course, we'd like to be where Maslow and most of our bosses would like us to be at the place where we are really creatively problem-solving, doing what's called self-actualizing. Another word for this is flow. That has come into the literature recently. They don't seem to have ever heard of Abraham Maslow, but it is all grounded in this particular pyramid. Now, this pyramid relates to data because most of what occurs in the press and in technology is what we talk about these silver bullets. If I were going to redo this slide and I haven't bothered to, I would say add blockchain to this particular piece in there. But it's not necessary. These are simply technologies. And we know that anything to work in any system requires people process and technologies. But these technologies really represent just the tip of the iceberg. And the iceberg in general is much, much larger than most people understand. And the foundational practices that we have here are absolutely critical to making sure that this works. So these are our five data management practices that we have to put in place. They represent capabilities to the organization, which are a little bit harder to acquire. They really do require practice in order for us to actually do something with this. So the sequencing here is to make sure that you have capabilities because only when you have capabilities can you properly evaluate technological adjuncts to this process. And these technologies then can be very, very useful once you have the capability to incorporate them into your project. However, in most instances, people buy the technology first. And our advice is to buy the technology last, in this case. People are always saying to us at Data Blueprint, well, I hear you, Peter, but I still need to have it done by Friday. Can you do it faster? And I say yes, I can do it faster, but if I do it faster, it will take longer. It will cost more. It will deliver less and it will present greater risk to the organization. Another aspect of sequencing has to do with where we're going to focus our data efforts, regardless of what they are. Now, most organizations don't have a very formalized process. So they sit in quadrant one, but there are only two ways of getting things better with data. One of them is to improve existing operations. The second is to innovate. In other words, do something new better or do something new. I'm sorry, I said that absolutely wrong. Do something better or do something new. Doing something new better is even harder to do. And we don't recommend that you start with nothing, but let's go and just put in this quadrant here of improving operations. Really wonderful company, Walmart, who I've worked with over the years, they do great things there and they are fantastic in terms of what they do around logistics. They know how to improve their operations and they are very, very good at it. On the other hand, another company I've worked with over the years, Apple, has got some ideas around innovation and there's no question that over the past 15 years they have introduced some very innovative products into the mix that they have there. Now, I just want you to imagine for a minute, even if you've never seen an Apple commercial, most everybody knows a fellow named Jonathan Ive, who was up until this year their chief design architect for the company. He's the erudite British guy that comes on and talks about how wonderful the iPhone is and the engineering that went into it and the fact that they can cut it out of a single sheet of aluminium, as he says on there. And I want you to imagine telling Johnny Ive that he's got to be cheap as in working quadrant two down there. His head would explode just the same way as the Walmart people, who do great work at squeezing the last dime out of all of their operations and tell them to be innovative. It doesn't work. So sequencing here says first of all, let's not try to do everything at once. In fact, most organizations don't even have a plan for this, but instead, let's use the improve operations to achieve some very tangible dollar savings and use those dollar savings to fund the innovation pieces. I know that sounds elementary, but it is absolutely crucial to doing this because we see so many companies that simply try to do too much at once and not get very good at any of them. Another aspect of sequencing here has to do with trying to get your data better for your organization. And there's really a three-step process here. You need to improve your organization's data. Most of us on the call are not happy with where our state of our data is. In fact, the session I led yesterday at the University was called Rekindling Your Data Governance. And I said, does anybody in the room hear because they think their data programs are doing great? And of course, they said, no, that's why we're here, we want to learn some more. Well, your data points to where valuable things are. Your data has intrinsic value of itself and it has wonderful combinatorial value that we can include in order to do this. But unfortunately, most people are not taught how to use data. So we also need to improve the ways that people use your data. Because you use data to manage, to measure, and to motivate change all the way around. And only when you have better data and your people skills are improved can you actually go to the next process on this which is improving the way your data and your people support the organization. Trying to achieve that elusive, competitive advantage and a sustained competitive advantage is a really good place to look at. So let me tell you a quick little story about a company that you've heard of called Rolls-Royce. They used to have a model where they sold really good engines. And those engines lasted. People generally didn't worry about their engines having trouble. In fact, in the United States we've had exactly one passenger death that occurred unfortunately this year when an engine blew up. But it's a rare instance that this happens. We ended up in Philadelphia with one fatality. The first fatality that we've had in a U.S. airline incident in 10 years. And that is an impressive, impressive safety record on this. The idea here was that Rolls-Royce couldn't have a conversation, however, with the airlines because Rolls-Royce sold things to them and they were treated as a vendor. And Rolls-Royce wanted to have a different set of conversations around this. So they developed a new model of business that instead of selling things to people, a product company, they became a services company. They now sell hours of powered thrust to the airlines. And they have a catchy name for it. It's called Power by the Hour. And now with this, they can start to have some additional conversations. And in this case, we'll tell about a conversation with, in this case, the racing world. This is really just racing chatter, so don't worry if you can't hear it very well. The guy's actually changing the tires on an Indianapolis 500 race car by hitting it with a hammer, which is kind of an interesting thing. So they changed the tires. I cut the time a little bit here. It would have been really boring to watch the whole thing. But here's our measures. Two tires changed, 67 seconds. Now let's look how they do it today. Linker, you'll miss it. The racing car drivers, as well as the airlines, realize they shared a common problem. There is no payment for downtime. If that jet engine isn't sitting on a plane making money for whatever airline has purchased or leased that engine, it's not making money for Rolls Royce. Because their interests are aligned and they are now transformed to a services company, they can now address the problem of no payment for downtime. With data, they've now figured out a new process that works somewhat like you saw in the examples of the racing. It's called wing-to-wing, and it moves things faster, which means that when we get stuck at an airport somewhere, at least we know that Rolls Royce is trying to help them change engines on the plane if that's what needs to happen so that we can get on our way with all the busy holiday travel that probably everybody has. Now here's the real question for everybody. When was this invented? And the answer was in 1962. That's astounding. In 2012, they had been practicing this for 50 years. That's the kind of investment that we need to have in our organizations. And that's why you can't simply start at the top with wonderful analytics. Instead, there are some basics that you need to go through. Our third deadly sin, then, is failing to meet expectations. Excuse me, manage expectations. Now let's take a look at most of the way organizations implement their data programs. They have a business need and that business need then drives some tentative solutions that we'd like to have. And I think this is wrong and I don't like it because it leaves out a really important characteristic. And that is what is the current maturity of the existing organization. Only when I have a match between the existing capabilities and the business needs should I then look at a strategic data imperative. And from there, we can develop a roadmap which will allow us then to start to implement some business benefits on the side. However, we also have to keep in mind that we can't just deliver business value because we, as I said earlier, have to start practicing this stuff. If you deliver too much business value, you will deemphasize the capabilities that you need to have in your organization in order to do data better. On the other hand, if you do all capabilities then management thinks what you're doing is a science project and only by balancing between the two of these can you actually deliver a sustained effort at delivering business value. Let me give you a very specific example. I worked with a group at one point that had five data managers there and we'll just pretend they each got paid $100,000 a year for math purposes. Do they feel obligated to demonstrate $500,000 in benefits annually and my answer is I think that they should because what tends to happen here is if people say well I'll spend $500,000 for the first year but the second year I'll come up with a million dollars in savings. Remember if you go to the third year it's a million and a half dollars in savings and it's not hard for a five person data group to demonstrate that they are saving the company hundreds of thousands of dollars a year because you know what management thinks about this they ask the question all the time are you done yet? And if you're not done then there's going to be an issue with this. One more quick example in terms of expectations on all of this is that we've done a lot with the various states where we've started now to take state data and also some of the federal data that we're using and bring it into the classroom so that we can have students work with these projects students have an expectation that when they get out of school all of the data that you're going to give them to work on is going to be perfect and that perfection really throws them when they find out that 80% of their time is spent fixing the data improving it, migrating it, moving it around, doing something the word is munging that we use to describe all of these bits here to make sure that that works so expectation wise we've got a lot of work to do and that's where we really are where we'd like to be in terms of data. Our fourth deadly fin is a very fundamental one and it has to do with understanding the difference between IT and data. We work right now in a largely IT project or application centric development world and there's a good reason for that our immature understanding of this led us to the following model. Organizations have a strategy they need some IT they apply IT projects to it and then we'll have some good results. Data is kind of considered sometimes up front but mostly it's an afterthought and what that ensures is that the approach will be application centric. You'll be doing things that SAP wants you to do as opposed to the things that you need to do and I'm not picking on SAP it is fine, fine software but you do need to have a good integration between the two of those. It also means that whatever process architecture you describe and support will be narrowly formed around the applications with very very little data re-use possible. In fact the statistic that we absolutely are certain of is that 80% of your data is redundant obsolete or trivial that's 0% of your data and the only argument I ever get from people is that they say maybe in our case it's 85%. So what should this look like? It's actually a complete opposite as in strategy starts but the next thing that you should be specifying as an organization is what data and information needs do we have in order to achieve our strategic objectives and then and only then should we supply IT with resources so that they can start to implement. Again it's exactly the opposite which is exactly why it's so hard because we're telling people that literally it's all been incorrect and it has been. So moving to this approach means that our data assets can be developed from an organization-wide perspective that our systems will support the data needs and complement the existing organizational process flows giving us the opportunity for maximum information and data reuse in this context. When we look at these concepts in terms of the type of relationship between data strategy and data governance the reason I was running that session yesterday at the Diversity event is because many data governance efforts are not doing well people are sitting around in committees and they're having lots of discussions but nothing is actually happening and that's important so the element that's missing in most cases is a data strategy. The data strategy supports the idea that governance is there for a purpose. Now I'm going to explicitly exclude those of you that are in a regulatory environment that's a mandatory and you need to do it by the way if you're in a regulatory environment do you think it's going to get less regulated or more regulated in the future? Of course the answer is more regulated in the future so it might be a good idea if you're in a regulatory environment to get good at complying with regulations. I know that sounds like a really sort of strange thing to do but got a couple of organizations that are working on it and becoming quite adept to it. That way when the world does change you can be done with it at much lower cost. Let's get back to the out of the regulatory world though. So we've got data strategy that says what are the data assets to do to support strategy and that's what governance is. Governance is about focusing those assets to make sure that they are aligned with the data strategy and the feedback from the governance group is how well is that working. Now none of this of course occurs in a vacuum. The only reason for having any data strategy and then for doing anything with data at all is to support the organizational mission. This organizational mission is the overriding piece that gives us context for all of these things. In Peter's world the data governance groups are actually deciding whether or not IT projects are ready to go. I can tell you that if you do that you will start saving money in IT to the tune of 20 to 40 percent of your IT dollars are done because of this bad sequencing and unrealistic expectations around this. I'll put in a couple of feedback loops here and again I wouldn't show this chart to anybody outside of this particular call. Let's simplify the chart to just a little bit like this. Here we go. We've got the data strategy and the point here is that the data strategy should be expressed in very specific tangible business goals. You may have heard the idea of smart goals of sophisticated mark measurements et cetera et cetera and that the language of data governance is metadata. So that these expectations are absolutely critical in order to manage this and the IT programs have to be aligned better in this case with everybody understanding this. Data is largely a program that evolves over time. I have worked and I'm old. I figured that out already and I've gone in and out of companies 10 15 times over the past 30 years but I can tell you that virtually all of them are dealing with exactly the same data that they were when we started our journey together many many years ago. My favorite example to point this out is that my first project that I had when I was a young whippersnapper I went to work for the federal government worked for the Department of Defense I was in something called the Defense Information Systems Agency and the upshot of a lot of our work was that we produced something called the DoD Enterprise Data Model. It actually became the foundation for the Federal Enterprise Data Model that is still in use today and the DoD Data Model is in use today as well. I was out at McDo Air Force Base in St. Louis and as I said McDo that's not right Transcom anyway St. Louis McDo is the general gosh he would be insulted if he heard that anyway they are still using parts of that model 25 years after we created it in the first place because data persists. Data stays with things IT projects are going to change as they should on a regular basis. If we were still trying to attack problems today that can be solved with a mobile device as opposed to a mainframe computer we wouldn't be doing things in society that we are trying to attempt to do today. Okay that's what we're talking about IT projects and data programs they are fundamentally different IT is a project we're going to discipline data management is a program. One more thing on this too. I don't have a slide for it but what I am trying to encourage people to think of is that they will need their data program as long as the organization needs its HR program. After all nobody says hey I think we've done enough HR this week you know or this year or this decade we don't need HR anymore you know we're going to have no more problems we don't need lawyers we don't need any sort of guidance we don't need any sort of professionals to help us manage our workforce that gets me to a nice story by the way the HR is going to be there and your data program needs to be there to the same extent you will no longer need a data program when you no longer need an HR program again nobody looks at HR as a project and says are we done with HR can we dispense with it and yet that is the correct mentality for what's happening within that now I sort of got me to a story I do like to tell this one one of the things that was a lot of fun was that what I went into was a very big organization that had hundreds of thousands of employees and literally tens of thousands of HR managers to manage those hundreds of thousands of employees I said okay so you've got an infrastructure here that supports the human resource that you have in your organization and that organization obviously is crucial to making the function feeding the beast you've got tens of thousands of HR professionals that you have managing your human resources and you've got three guys over here in the corner managing your data it's probably not the right number and they really got that they really did get that so our third piece and I've said this before is not implementing these programmatic means of implementing the change so not only can you not call data a project but you need to make sure that it has the programmatic support we call data a durable asset and that's an asset type that has a usable life of more than one year that you expect to invest in and get more out of a durable asset would also be considered a car and a washing machine although those may be different for your various business types it's reasonable to look at project deliverables in 90 day increments or even if we're in a agile world it's absolutely reasonable to look at two week sprints they deliver higher quality software faster that's an amazing thing but data evolution is measured in years and if we do not look at it as a process that evolves over time but generally data is not created it is significantly more stable and that these ready-made data architectural components are a prerequisite to agile development if we don't do this then we will end up with more data silos now good news is that that is good for everybody on the call because we will never solve the problem and you'll always have a job for life but it's not the best way to run the railroad so it is important to consider data programs as a very very significant piece of what needs to happen which is underfunded in most organizations take another aspect of this too I've got a little bit of subliminal advertising here I'm doing a project implementation when we teach it to students we teach them the requirements design, implementation, verification and maintenance are the various pieces that we do in order to make an IT project successful the requirements are about what the design is about how then we build the how we make sure it's correct after it's running now remember that there's a line above maintenance and that line extends all the way across the diagram because we spend only 20% of our dollars on building new things and 80% of our dollars in IT maintaining existing things and that's a huge problem and we tell students that you can develop and implement your software and your data using this particular life cycle once again I'm sorry but that is the reason it's false is because this approach that I've just described you can only work when no sharing of data occurs now that's actually very appropriate for student projects so that's why we teach it that way but we don't tell them that it's different in the real world so data management and software development must be separated and sequenced as well let's take a look at how that would work if I take project one that I just described exchange data with project two well project one has its budget and project two has its budget and project one's budget manager is not going to give project two any money to share data and project two is not going to give any money to project one to share data so we need to have a third project created in order to make sure that we can share data between the two of these of course if you understand anything about mathematics you'll understand the commutorial complexity of something like this and again hopefully come to the conclusion that shared data structures absolutely require programmatic development and evaluation if they do not you will never be able to succeed the way you'd like to I've mentioned this already again difference between project and program is pretty straightforward but your data program must last at least as long as your HR program failure to do that causes all kinds of complexity number two data leadership now data as a subject is very complex detailed taught inconsistently and quite frankly outside of us on the call here very poorly understood I'm going to tell you a little bit further than this we have been teaching students wrong for at least 30 years on this so not only do we have to change the way we teach these subjects but we have to go back and undo a bunch of things that we had put in their heads originally just take a very simple example knowledge worker in my definition is somebody who works with data I don't think you can be a knowledge worker without working with data so what do we teach them about data absolutely nothing yet 100% of them deal with it daily if you have any questions about this in your organization I invite you to perform this very simple test walk into a room ask everybody how many of them know Excel most of them will raise their hands and then ask them the same time keep your hands up but keep your hand up if you learned Excel at the same time as you learned that there is a capability in Microsoft Office to repeat things in a spreadsheet perfectly 100% of the time I'm talking of course about the macro facility and you will see most of those hands go down 9 tenths of the hands will fall because we don't teach people that there is a capability that will automatically do things in our spreadsheets that will be very helpful to improving the way we manage our data your knowledge workers know nothing about data which means what they're doing is they're learning it on their own they're figuring out their own best practices and what a colossal waste of time and energy a little bit of training on data management for many organizations will go a tremendous way towards improving productivity but it's even worse for IT people because in IT we've tended to teach one course on data and the subject of that course is how to build a new database now if there's a skill we do not need any more of on planet earth it is how to build new databases on the other hand it actually has created the mess that we have here now because there's two parts to this first of all if you're a student who goes through courses and pays for college and university and I'm a university professor right and you pay us at the university to teach you something about technology and we teach you the only thing we teach you about data is how to build a new database then of course you seem to think of a new skill that you should have and that's what data is but it's even more important to understand the impact that this has had on our management because our management has gone through exactly the same programs and when they also get the impression that data is a technical skill that is only needed when developing new databases those individuals who have data skills get pushed further and further down the organizations and if you're slightly interested in this I have a wonderful paper where a colleague, Mark Gillinson and I put together a 25 year retrospective on some of these things and can demonstrate that the data people have literally lost their place at the table because everybody thinks data is a technical skill and I only need it when I'm developing new databases so if I'm implementing a software package I'm not implementing a new database and I don't need to consult the data people if I'm I don't need a new database with that even though we know we do it's not apparent to them and so they don't consult them or even if I'm merging two databases I'm not creating a new one and I don't need data people and yes I have seen organizations do all of these types of projects with it also of course if you think about it if we've told you that the only tool you know how to use is a hammer then every problem tends to look like a nail so we're going back and hitting Maslow yet again one more time talking about data leadership because I'm telling you there are not many people out there who have the necessary knowledge and skills in order to do what we're attempting to do with our organizations yes there are some very talented people but they are not widely known I did a search on LinkedIn just a little while ago a couple of weeks ago to see what's happening there and even the term chief data officer is underrepresented there are more chief digital officers than there are chief data officers out there so I'm not sure that that is exactly the term that we should have and I want you to imagine the group of leadership that you have in your organization are they in fact qualified to make a decision about whom to hire given that you want to have some data leadership in your organization where are we going to find these individuals it is a very very big challenge now that individual whatever that individual is should fit right there at the top data job that's my preferred term for it if you don't like top data job you might want to try enterprise data executive then you don't have to be a chief the problem with calling them a chief is that as soon as you introduce the possibility of another chief to a bunch of existing chiefs the first question they ask is do we really need another chief and you're not having the conversation that you'd like to have when you have that conversation regardless of what that individual is called they should be responsible for running the data governance organization and interfacing with the IT people their first job as a data leader is to develop the first version of an organizational strategy and I say the first version it's very critical to number your versions because if you put out data strategy number one and do it then they will expect data strategy number two but I find most of the time people have here's our data strategy so what happens when you achieve your strategic objective here oh I guess the strategy is no good anymore no it's not no good it's in fact very good we just need to move on to the next one so keep that expectation that this is going to be an evolving process in there secondly reduce the data rot in your organization I've already mentioned that redundant obsolete or trivial data and I challenge anybody on this call to tell me they've got less than 80% of their data being rot finally as a data leader it is very critical that you do what we call monetizing your organization's data and I don't mean like Ford is trying to do this Ford has got a plan that in a couple of years you will get into your Ford vehicle and you will press the starter button and when you press the starter button you will also be signing an end user license agreement that says that Ford owns the route that you took the temperature of the cabin the conversations that were in the cabin are all very very interesting things we will see how society decides to respond to this what I mean instead with monetizing is that you've got to make the data thing that's happening relevant to some organizational thing that's happening in the company so that management will pay attention to it because right now they think data things happen and they don't understand that those data things are connected to organizational things and you've got to establish that we are not good at it as a society as a profession and it's something we've got to get better at so I hope you understand what I mean by the monetizing part there. Now I want you to imagine a chief data officer coming into an organization and having to make all the changes that I've given you so far yeah the first one might not make it so we tend to say maybe there are different skills for an organization that is transforming their data leadership around this and that a change agent would be a good person to have in there but for a limited term and then we can move on to a different one again and all of that all right well we're down to 10 minutes left and it's just time to understand data-centric thinking. I've already mentioned this also as well 20 to 40 percent of the entire budget of your IT department is spent migrating data converting data and improving data and much of it is done in obsolete or trivial there are some obvious of places for some savings there and I hope that you're able to come up with that because it's the easiest way to keep doing what it is you're attempting to do but many organizations will say we're going to become more data-centric in our thinking we're going to adopt Peter's data doctrine we'll see if that catches on or not the idea here is if you say you're just being data-centric what behaviors are you changing and if you don't change behaviors around this you have not actually made any transformations that are useful so I have a set of four changes that I'd like you to consider doing that really do represent data-centric thinking in terms of IT development this is what I call the data doctrine and the language here if it looks surprisingly familiar it's because I ripped it off shamelessly with full attribution from the Agile Manifesto it says the Agile Manifesto and the data doctrine we are uncovering better ways of developing IT systems by doing it and by helping others do it through this work we have come to value and then they missed four things now I listed four different things because the Agile Manifesto is geared towards developing higher quality software faster and it does so I'm hoping we can achieve the same results here by saying that there are four things that we should pay attention to and notice I've spelled programs the British way just to make sure that we understand there's a difference between a program and a software program we want the data programs to proceed the software development because if the data programs proceed the software development we won't end up with oopsies that happen and they happen unfortunately quite a bit at the time now over time what you will start to see is that well the number of requests increase and the utility increases and all this sort of thing increases IT is pretty good at delivering IT products they understand what it is they are attempting to do and they do a pretty good job of it because they have been practicing it for a while we haven't been practicing data and remember data evolves over time so with this evolution we need to create the data make it external and proceed the system development activities data and software management must be developed, must be separated and sequenced if we are going to achieve what we are attempting to do second we need to make sure that we have stable data structures before we have stable code before and most importantly we buy any software again I'll give you an example on this there is a situation here where I've got a business rule that says a person can be associated with one employee that would be a very typical thing but in this instance we might permit moonlighting I worked on a system for the Defense Department where we had 70% of the workforce that had a second job with the Defense Department because we tend to underpay our warfighters similarly here I can say a person can be associated with one individual well there is no job sharing that we have so if we are going to do any job sharing it is outside of the system I am going to change the data model here just ever so slightly and make these not one to one but one to many you can see that I have changed the wording in the business rules that are like that making this more flexible now the reason for doing this is because the more flexible data structure the one on the left there the one that says you can have zero one or more employees or a person so I could have a person who could actually excuse me I can have a person who can be multiple employees and that will work really well if I have a lot of moonlighting going on or I can have an employee who might have the position from before lunch and the second person have it after lunch or maybe somebody on Monday and Tuesday and somebody else on Wednesday and Thursday and a different person on Friday will be able to do it with great difficulty and overriding the existing pieces now the reason you need that data structure first is because the more flexible data structure on the side has two structural loops less required than the more flexible data structure I will say it the other way around the less flexible data structure on the right requires two structural loops that are not required to process the data on the loop on the left these data structures must be specified for development or acquisition or we have no hope of doing this everything else will become a afterthought our fourth element of the data doctrine, data centric thinking is that we're trying to create new software and so we're going to put software up in a repository excuse me data elements up in a repository somewhere and it's a little gray triangle up there in the upper left hand corner of your screen squint you can see it but most people have no idea what we're talking about and we take an IT project and we do some requirements we actually ask for reusable data components that are in there and the part where most organizations fail is that they fail to at the end of the IT project load and extend the amount of shared data that they have in their existing library of data this is why data governance efforts have you focused specifically on a glossary etc etc very much at the first part it works very well over time these requests increase just as they do on the other slide data contribution increases and is recognized and these shared data structures cannot exist without programmatic development and evaluation it doesn't work they operate at a different cadence a different rhythm last one for the data doctrine data reuse has to proceed reusable code we have seen so much as silly silly expenditures on reusing code and finally what everybody figured out is outside of the open software movement no code is getting reused now that's an astounding statement but it's true and we can support it up with lots and lots of evidence because the open software movement provides the only sane environment for doing exactly that process let's take a look at how that works from a data perspective if I want to reuse data before I reuse code and I've got application gray whatever it is and it deals with programs A,B and C that works just fine but who is taking track and making sure that the green database where D,E and F are doing is not incorporating the same or similar data elements much less the orange one the average organization out there has customer data stored in 13 places around their organization now who's making decisions about the range and scope of common data usage if I change a program in this model there are maximum of nine changes that I need to make but if I change the data the worst case is that I may end up with 36 changes because the combinatorial complexity is much much greater again they have to occur in this way data programs have to proceed software development data structures have to proceed stable code shared data has to proceed completed software and data reuse has to proceed reusable code this is what we're talking about from data centric thinking and I think the reason this is kind of useful is because it sets out specific objective behaviors if you have more that you'd like to contribute to this argument we'd love to have you engage in this it's at thedatadoctrine.com so we're right about at the top of the hour and again just to revise here people do not understand data because they do not understand what it is to think data centricly that they don't have the qualified leadership that they need in data that they don't implement a robust programmatic means of developing shared data that they do not align their data program with their IT projects that they fail to manage expectations that they don't properly sequence the implementation of their data programs and that they fail to address the culture and change management issues that are associated with this again in most organizations what tends to happen is that they think that data is this sort of blip that sits in between IT and the business and somebody's got to take care of it but it truly for the past 30 years has fallen into an enormous chasm between business and IT IT's general rule on this in today's environment is if they can connect to the server my job is done and the business looks around and says wait a minute, I thought IT was taking care of the data after all what does that person whose title chief information officer do? It's a tough situation of course after watching something like this you might realize that data is actually a little bit bigger than all that and in fact the real state of things is that data is swamping us in terms of what's coming up so we're right up at the top of the hour quickly mentioned we've got a whole lot of things set up for next year our first webinar will be Data Strategy in January we'll do data architecture, data modeling in February in March we're going to do unlocking business value and then it will be time to go to Enterprise Data World in San Diego where we'll have a big in depth discussion in an awful lot of these areas I'm so looking forward to seeing everybody out there and I think it's time now to turn back over to Monica and see what questions you guys have. Thank you Peter. So just a reminder if you have a question please included in the Q&A section at the right bottom of your screen and I'll be happy to ask Peter. So our first question is I think you touched a little bit on this so you may want to expand should we have product led development for effective implementation of data strategy or continue with application based project development? So that's an either or question. Absence more context it's hard to give a good answer let's talk about the reasons for doing one way and doing the other way if you're focusing around a product in particular and you are a product company then it makes sense that you would be developing specific things applications data sets that would help that product. However if you are a product company and you are making more than one product it also makes sense to institutionalize those processes and say that we have an ongoing continuous source of data. I'll give you a very specific example on this because I worked for about four years with Nokia as you said in the introduction Monica and Nokia was very frustrated because they are very good at developing innovative products in this case it was handsets used to be that Nokia mobile phones were the best mobile phones that were out there and they are still very good by the way they just lost their market share. So Nokia while they were good at coming up with innovations around the handsets they were not very good at managing the information there I'm not taking anything away from Nokia it's a fine organization I thoroughly enjoyed my time with them and still have ongoing relationships with them but what they had to learn was that the process of developing a new product is itself a process and that process needed to have data so in the Nokia product development process there were 17 different handoffs of the various item descriptions that they had for each of the various handsets and it just seemed to them that they were spending a lot of time doing unnecessary translation between the work in there and so we worked with their chief architects and again came up with an awful lot of very very good ideas so that in this case it was the dawn of the XML we were just discovering its capabilities as a data management technology and they gave us some very very good results so I can't speak directly to the questioners specifics on this although again with all of these things you guys are welcome to give us a call and we'll dive in in the community room and other places to work on this if that answer isn't helpful but I hope it was helpful great so we did have a request that maybe mention a few approaches or best ways to capture the business rules from the data model that's a great question it's not necessarily part of the content here but that's what we love about the dataversity crowd because they like to push us so let's take just a data model here that's one that I showed during the presentation when people say capture the business rules from the data model I'm pretty sure whoever asked the question would be astounded to learn that there used to be a case tool that we could use that would actually take and give you every business rule that exists in the data model in English into a word document I believe I'm looking on my shelf here because I happen to be home today and I think I could find the product it was I won't keep that air on there but anyway I think I've got a copy of the old case tool here anyway it is absolutely possible not just to extract these rules but to extract them comprehensively and correctly in a way that can be verified and these business rules give us one of the best insights into what's happening in our organizations because business rules at a low level give us business things that are happening but when we start to drive things out at a third, fourth and fifth normal form those business rules become even more important in terms of types and policy that's happening in the organization so I wish I could tell you it's as easy as turning this into a putting it in a case tool and pushing a button and if you have that case tool it is in fact that easy but I'm pretty sure most of you don't have a case tool because it went out of business a while ago again I just don't see it up here oh easy case to do it right in front of me anyway so we're forced to go back and let me just rant for a minute here we don't even teach students who are going through good colleges and universities as our program is that something called a case tool exists that is criminal and I just absolutely and beside myself that we would describe this to them and tell them they have to do it all manually instead of with case tools thank goodness and Mark Adaro and Erwin are now back trying to push back into the classroom believe it or not they have to get somebody to write a textbook if anybody wants to write a textbook incorporating case tools in there there is a huge market for it out there so anyway the question was about business rules it is absolutely possible to do it in a computer form which means it's absolutely possible to do it manually although hopefully you don't want to spend all your time extracting business rules from data models but it is definitely possible to do in a case tool environment if you can't do in a case tool environment then what you need to do is you need to say here are the things what do the things do and then how do those things relate to each other and you can see we've done that person employee can be associated with a person which means that a person can be no employees a person can be one employee a person can be multiple employees and that's an interesting concept now that's not really a 7 deadly data rule thing but it's certainly a great question I appreciate it great so the next question is it mandatory that data will always follow requirement or can a requirement model be built out of a data model absolutely a requirement model can be built out of a data model I hope nobody ever tries to tell you that that's not the way to do it the real key is if you think about what we're doing in our development practices we're starting off with a finite set of requirements and then those requirements will spawn different design objects those design objects which by the way is another skill that is not even taught in colleges and universities anymore literally the skill of designing is just missing and again I go up a plectic on this but the design process then tells us how those pieces are implemented each of those are going to be larger than the previous state and if I go take design and move it into implementation slide up here hang on let me go back just so you know which slide I'm referring to all of these things right go back and forth through the development life cycle but each one is much larger in scope and so getting the requirements right has a tremendous effect on the ease of the design and getting the design right has a tremendous impact on the ease of the implementation and getting the implementation correct has a phenomenal impact on the ease of verification and doing all of those things right means that your product will be easier to maintain in the long run so absolutely you can start at either place but the key of course is to get those requirements correct because if they are incorrect you will produce good designs from incorrect requirements and if you produce designs that are correct for requirements but incorrect for the product or the process that you're trying to do then you will build something and you have built the right solution to the wrong problem and that happens unfortunately way too many times great question thank you for that so this is the waterfall model for development with data strategies how do you see the agile environment affect data strategies the agile environment is the best way we have come up with to develop higher quality software faster that is an incontrovertible fact I'm not even sure I said that word correctly but it is absolutely 100% the best way to develop improved quality software if you are trying to develop your data requirements at the same time you are trying to develop your software requirements you are screwed excuse my language it doesn't work if you are in the middle of an agile sprinting you discover that a data requirement is wrong you need to pull the ripcord and stop development on that sprint immediately because the only possible solution in an agile environment to discovering a data problem being incorrect is to develop more small piles of data now it will keep you employed but it will not produce the products and services that society needs in order to do this it is absolutely a prerequisite that most organizations don't have just as I was talking about the foundation to my barn it is important prerequisite to having a good barn to have a good foundation it is important when you start an agile sprint to make sure that you are using known engineered data items and if you are not using them you need to do an agile sprint because you have a poor data foundation and while your code may be great if the data doesn't work with your code your product is going to take longer costs more deliver less and present greater risk to the organization than if you follow the correct advice okay great in today's world of machine learning how do you emphasize the importance of data model when the machine learns and delivers the goal just from expectations without making any data model in between fantastic question and it does lead to a real interesting dichotomy that we are facing as a society today I briefly alluded to it earlier the most productive area in machine learning is exactly what the questioner described looking at existing data getting an algorithm that is smart enough to do the type of learning that it needs to do what the requirement is and the best source of data for that machine learning algorithm is the metadata in your organizations or we can go back and tie the two questions together the business rules that you've extracted from your data model all of these things are relevant inputs to it it's so bad in society today there's a wonderful book by a woman named Amy Webb who has done a great job the book is called The Big Nine and we had Amy at a conference last year she did a great talk and she's got a lot of them out on the web she's a phenomenal force Amy's postulate and I think it's a valid one is that there are only nine companies in the entire world who are doing a really good job with artificial intelligence and machine learning six of them are outside of China and three of them are inside of China the three that are inside of China are receiving about a 30 billion dollar subsidy annually the ones outside of it we're beating up we're picking on Facebook, we're picking on Google we're doing all sorts of things that just make no sense whatsoever we need a comprehensive technology strategy in this country if we are going to remain competitive on the stage or all of a sudden we're going to be speaking Mandarin so the question talks about how we can use machine learning and artificial intelligence in this again the problem with machine learning is that we have spent so much energy research, time focusing on the algorithms that we have run out of data to train these new systems on the only good source of data I already said is the metadata that we have in our systems because those represent factual systems that are in fact operational it's a really good measure just to give you an example of how bad the situation is there is only one database that we use to train visual learning in other words if a machine is going to learn how to see if you will and just to give one example from it there's only one concept in this entire data set of the concept of a bride it's a white woman with a white veil and I don't know about y'all but I think we have run out of opportunities here in the sense that the white woman with a white veil only represents a portion of society and we need to be more inclusive and they wonder why algorithms and AI and why the city of San Francisco banned facial analysis from doing this because it is literally that they ran congress so the United States congress through the algorithm and had a significant number I forget what the actual number was of the senators and representatives come up as criminals and of course nobody wants to be tagged incorrectly in that area anyway great question really interesting hope I answered it that was a very interesting question so how can we use the full power of data without facing data privacy problems gosh if I knew the answer we'd be having different conversations there is a balance and it well actually first of all the next book that I'm writing is going to be on data literacy it's a society-wide problem we cannot simply expect a priesthood of data people or data scientists to continue to do all of the work that needs to be done here we have to make society a lot more literate and I alluded to that on a couple of the slides in this case it's just not possible to expect everybody to continue to move forward imagine even the changes that we've seen in the last 10 years and somebody who's been on a desert island for 10 years would not even recognize things that we're doing with our phones today because it is so completely different we have to understand that data belongs to us and the data that is shared with another person should only be shared as part of a fiduciary relationship now we actually understand fiduciary relationships you can tell your doctor something you don't expect your doctor to go out on facebook and say hey peter just told me he's got a word on his nose right we don't expect your financial advisor to search you certainly don't expect your lawyer to be tattling tales on you and yet for some reason we allow the data to go out there and it's gone enough and at the same time I think it's gone a little too far because it's a very black and white type of a process I think as we increase society's awareness we will get people to understand that facebook makes $120 off of each person that's associated with facebook and that's not necessarily something that we should be saying is a good thing that google makes thousands and thousands of dollars off of each person that's searching on google and I'm a google user I'm not saying we shouldn't use it but what I am saying is that our data does belong to us if you want to read further on this I could talk about it for a half hour but then Monica would fall asleep and hit the wrong button and we'd all get cut off of the conversation right there's a wonderful book out there by Shoshana Zuboff called the rise of surveillance capitalism that puts this into a framework where it's really useful to discuss and I'm hoping that more people will read her book because it is a very good book and she makes some very very good arguments in it okay you mentioned earlier we have a repeat for the book you were talking about earlier big nine yes the big nine by Amy Webb just do Amy Webb the big nine at Amazon it'll come right up okay and can you repeat the one that you just sure it's called the rise of surveillance capitalism by Shoshana Zuboff Shoshana Zuboff perfect and I'll include the slides on both of those in the deck so that everybody can look at them great if Peter's comments about congress and face recognition is taken out of context and reported oh it could be responsible for most of the news for at least a news cycle well yes that's true and yeah actually it's already been done so you can look it up and we won't go to the fact that you know maybe those guys are criminals already anyway right yeah we'll try not to make too many oh he said something political bad bad bad bad non objection yes so seems like any more questions anybody would like to ask Peter going going going well it seems like that's all we have for today so thank you everybody for joining us and Peter for this great presentation and some really good Q&A questions I'm really do apologize about the technical difficulties at the beginning there that's entirely I live as you guys can tell from my picture of my house out in the sticks and I don't have a very good internet connection out here so we'll make sure that doesn't happen look forward to seeing everybody in the beginning of the next year we start another cycle of these things and also hope to see you in person and San Diego at Enterprise Data World yes let's hope everybody signs up and comes so just to remind everyone we'll be posting the recorded webinar and slides to dataversed.net within two business days and Shannon will send out a follow up email to let you know the links and other requested information so thank you again for attending today's webinar and I hope everybody has a great day and Merry Christmas everybody or happy holidays thank you so much Monica thank you everybody talk to you next year and I'll see you next year